Data compression


Advanced search

Message boards : Wish list : Data compression

Sort
Author Message
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 635 - Posted 27 Sep 2006 11:36:27 UTC

Thanks to the size of the each file (around 1MB), when workunits are downloaded, it will be necessary for hosts with dialup to wait a long time. I estimate that the number of these hosts is large yet, so I wish workunits distributed to be compressed if possible, so that every host crunch ones as many as they can.
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 670 - Posted 29 Sep 2006 3:13:10 UTC - in response to Message ID 635 .

This is actually on our todo list, but not with a very high priority right now.... Thanks for the input though!!
Andre

Thanks to the size of the each file (around 1MB), when workunits are downloaded, it will be necessary for hosts with dialup to wait a long time. I estimate that the number of these hosts is large yet, so I wish workunits distributed to be compressed if possible, so that every host crunch ones as many as they can.


____________
D@H the greatest project in the world... a while from now!
Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 2682 - Posted 8 Mar 2007 5:02:42 UTC


I just thought I'd add a note on this since I'm stuck on dial-up. Dial-up users get the benefit of all that pre-broadband work that manufacturers put into moving as much as possible through dial-up modems. The modems themselves do a fairly good job of compressing D@H workunits as they travel from the local ISP to the users modem. It's not as good as if they were compressed by something like GZip, but it's noticeable. I have to set my general preferences to about 8KBps to keep the receive light on during a D@H download.

Afaik, broadband modems don't do compression because it takes a lot of cpu and they have the speed where they don't need it. Also, most of the things broadband users transfer are already compressed video/audio. I think the 64kb/128kb isdn modems were the last to do compression on the fly.

Web servers also compress text and html if they're configured to do it and the client browser tells them the client will support it. I think IIS has it built-in, and there's mod_deflate for the Apache web server. Web server admins use this to avoid exceeding their bandwidth limit and to get quicker page loads for dial-up users. It also helps web servers to deal with the huge load increase from the Digg or SlashDot effect, where it's usually a page they can compress once and cache in compressed format to server to many users.

OTOH, whoever pays the bandwidth bill for D@H would probably prefer compression to be a high priority :-)

-- David
____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

baracutio

Joined: Sep 3 08
Posts: 1
ID: 588
Credit: 1,770
RAC: 0
Message 4342 - Posted 6 Sep 2008 11:47:26 UTC

smaller input/output files are always welcome...

so is someone working on this? when will it be done?

Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 4352 - Posted 8 Sep 2008 22:36:17 UTC

Due to the scientific requirements of the application, we had to extend the outputs. The inputs, I think, are not too big. But we will discuss this item in the next D@H meeting.

I'll keep you posted

BobCat13
Volunteer tester

Joined: Nov 14 06
Posts: 22
ID: 239
Credit: 285,322
RAC: 0
Message 4359 - Posted 11 Sep 2008 3:50:11 UTC - in response to Message ID 4352 .

Due to the scientific requirements of the application, we had to extend the outputs. The inputs, I think, are not too big. But we will discuss this item in the next D@H meeting.

I'll keep you posted

When a user has an ISP that limits their usage to 5GB a month (like my provider does), the input/output filesizes can add up after a couple weeks.

You can use file compression on both input and output files as one way to help this. mod_deflate the inp filetype for input and gzip_when_done for the output files.

After checking the inp files, an even better savings can be accomplished but it may require changing the code so I wouldn't anticipate going this route, but here are a couple of ideas.

1) Each inp file for the same complex contains identical information except for the first line where the set seed=xxxxxx is different for each task. To save on bandwidth, split the current inp file into two files: a dat file and an inp file.

The inp file would contain the set seed=xxxxxx parameter only and be about 20 bytes in length. The dat file would contain everything else that now starts with the 2nd line in the current inp files since it is always the same info. This way, the dat file of each complex would only need to be downloaded one time at the start of a new complex run. The inp file would change with each task and need to be downloaded with each task assigned, but each download would only be 20 bytes instead of over 1MB each time.

An example using the current complex would be something like this:
1hvj_mod0011sc.dat 1MB single time download
46068_412698.inp 20b 1st task assigned
46956_226898.inp 20b 2nd task assigned


2) Or if possible, use a command line to start the task and have the seed input from the command line. This is similar to the way Rosetta works their tasks. I downloaded a couple of zip files containing the data needed to crunch a task, with a total of approximately 4MB combined. But then the next 8 tasks assigned to me have used the same input files and no downloads were necessary. If they worked it like Docking does, the same 4MB would have been downloaded each time with just the random seed changing.

This method would require removing the seed line from the inp file, but like the first way would only require the inp file be downloaded once at the start of each new complex run.

Message boards : Wish list : Data compression

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#11 (2) {
      ["db_conn"]=>
      resource(84) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(50) "update DBNAME.thread set views=views+1 where id=60"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#11 (2) {
      ["db_conn"]=>
      resource(84) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(2) "60"
        ["forum"]=>
        string(1) "9"
        ["owner"]=>
        string(2) "15"
        ["status"]=>
        string(1) "1"
        ["title"]=>
        string(16) "Data compression"
        ["timestamp"]=>
        string(10) "1221105011"
        ["views"]=>
        string(4) "1369"
        ["replies"]=>
        string(1) "5"
        ["activity"]=>
        string(20) "6.3875432461995e-100"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1159356987"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(2) "60"
      ["forum"]=>
      string(1) "9"
      ["owner"]=>
      string(2) "15"
      ["status"]=>
      string(1) "1"
      ["title"]=>
      string(16) "Data compression"
      ["timestamp"]=>
      string(10) "1221105011"
      ["views"]=>
      string(4) "1369"
      ["replies"]=>
      string(1) "5"
      ["activity"]=>
      string(20) "6.3875432461995e-100"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1159356987"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=60