Charmm 5.02


Advanced search

Message boards : Number crunching : Charmm 5.02

Sort
Author Message
Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 140 - Posted 14 Sep 2006 8:17:36 UTC

Can you pls post changes with new versions? Thx

Honza
Volunteer tester

Joined: Sep 13 06
Posts: 25
ID: 72
Credit: 5,064
RAC: 0
Message 158 - Posted 14 Sep 2006 12:03:32 UTC

No luck with 5.02 and size of upload size (:-
Result is here

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 165 - Posted 14 Sep 2006 13:06:53 UTC - in response to Message ID 158 .

Yes, we know. Richard posted yesterday that 5.02 will only fix the excessive debugging info in stderr.txt. The fix for the -131 will be deployed today and is only a change in the input file of the app. All existing wu's will have to be aborted for this though. Keep an eye an the news for the next hours.

Andre

No luck with 5.02 and size of upload size (:-
Result is here

Bointu
Volunteer tester

Joined: Sep 13 06
Posts: 10
ID: 36
Credit: 219,034
RAC: 0
Message 173 - Posted 14 Sep 2006 14:35:25 UTC

Tried downloading wu's but got the following:

14/09/2006 14:18:44|Docking@Home|Successfully attached to Docking@Home
14/09/2006 14:18:46|Docking@Home|Started download of file charmm_5.2_windows_intelx86
14/09/2006 14:18:46|Docking@Home|Started download of file 1tng_mod0001_1576_93911.inp
14/09/2006 14:18:57||Rescheduling CPU: result suspended, resumed or aborted by user
14/09/2006 14:18:58|Docking@Home|Finished download of file 1tng_mod0001_1576_93911.inp
14/09/2006 14:18:58|Docking@Home|Throughput 106435 bytes/sec
14/09/2006 14:18:58|Docking@Home|Started download of file grid_probes.rtf
14/09/2006 14:18:59|Docking@Home|Incomplete read of less than 5KB for grid_probes.rtf - truncating
14/09/2006 14:18:59|Docking@Home|Temporarily failed download of grid_probes.rtf: HTTP file not found
14/09/2006 14:18:59|Docking@Home|Giving up on download of grid_probes.rtf: file was not found on server
14/09/2006 14:18:59|Docking@Home|Started download of file lpdb_amino.rtf
14/09/2006 14:18:59|Docking@Home|Checksum or signature error for grid_probes.rtf
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1576_93911_3 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_435_384075_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_436_462952_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_372_420443_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_373_78184_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 2 minutes and 7 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_374_272917_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 3 minutes and 33 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_2138_231828_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1596_29389_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 12 minutes and 26 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1597_308895_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 20 minutes and 21 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1598_472366_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 41 minutes and 31 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_2139_344323_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
14/09/2006 14:19:00|Docking@Home|Incomplete read of less than 5KB for lpdb_amino.rtf - truncating
14/09/2006 14:19:00|Docking@Home|Temporarily failed download of lpdb_amino.rtf: HTTP file not found
14/09/2006 14:19:00|Docking@Home|Giving up on download of lpdb_amino.rtf: file was not found on server
14/09/2006 14:19:00|Docking@Home|Started download of file lpdb.prm
14/09/2006 14:19:00|Docking@Home|Checksum or signature error for lpdb_amino.rtf
14/09/2006 14:19:01|Docking@Home|Incomplete read of less than 5KB for lpdb.prm - truncating
14/09/2006 14:19:01|Docking@Home|Temporarily failed download of lpdb.prm: HTTP file not found
14/09/2006 14:19:01|Docking@Home|Giving up on download of lpdb.prm: file was not found on server
14/09/2006 14:19:01|Docking@Home|Started download of file lpdb_probes.prm
14/09/2006 14:19:01|Docking@Home|Checksum or signature error for lpdb.prm
14/09/2006 14:19:02|Docking@Home|Incomplete read of less than 5KB for lpdb_probes.prm - truncating
14/09/2006 14:19:02|Docking@Home|Temporarily failed download of lpdb_probes.prm: HTTP file not found
14/09/2006 14:19:02|Docking@Home|Giving up on download of lpdb_probes.prm: file was not found on server
14/09/2006 14:19:02|Docking@Home|Started download of file 1tng_mod0001_435_384075.inp
14/09/2006 14:19:02|Docking@Home|Checksum or signature error for lpdb_probes.prm
14/09/2006 14:19:14|Docking@Home|Finished download of file 1tng_mod0001_435_384075.inp

____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 174 - Posted 14 Sep 2006 15:02:42 UTC - in response to Message ID 173 .

Don't know where that comes from yet. See it on our test system as well. I'm looking into it. Thanks.

Andre

Tried downloading wu's but got the following:

14/09/2006 14:18:44|Docking@Home|Successfully attached to Docking@Home
14/09/2006 14:18:46|Docking@Home|Started download of file charmm_5.2_windows_intelx86
14/09/2006 14:18:46|Docking@Home|Started download of file 1tng_mod0001_1576_93911.inp
14/09/2006 14:18:57||Rescheduling CPU: result suspended, resumed or aborted by user
14/09/2006 14:18:58|Docking@Home|Finished download of file 1tng_mod0001_1576_93911.inp
14/09/2006 14:18:58|Docking@Home|Throughput 106435 bytes/sec
14/09/2006 14:18:58|Docking@Home|Started download of file grid_probes.rtf
14/09/2006 14:18:59|Docking@Home|Incomplete read of less than 5KB for grid_probes.rtf - truncating
14/09/2006 14:18:59|Docking@Home|Temporarily failed download of grid_probes.rtf: HTTP file not found
14/09/2006 14:18:59|Docking@Home|Giving up on download of grid_probes.rtf: file was not found on server
14/09/2006 14:18:59|Docking@Home|Started download of file lpdb_amino.rtf
14/09/2006 14:18:59|Docking@Home|Checksum or signature error for grid_probes.rtf
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1576_93911_3 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_435_384075_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_436_462952_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_372_420443_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_373_78184_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 2 minutes and 7 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_374_272917_4 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 3 minutes and 33 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_2138_231828_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1596_29389_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 12 minutes and 26 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1597_308895_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 20 minutes and 21 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_1598_472366_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 41 minutes and 31 seconds
14/09/2006 14:19:00|Docking@Home|Unrecoverable error for result 1tng_mod0001_2139_344323_2 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
14/09/2006 14:19:00|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
14/09/2006 14:19:00|Docking@Home|Incomplete read of less than 5KB for lpdb_amino.rtf - truncating
14/09/2006 14:19:00|Docking@Home|Temporarily failed download of lpdb_amino.rtf: HTTP file not found
14/09/2006 14:19:00|Docking@Home|Giving up on download of lpdb_amino.rtf: file was not found on server
14/09/2006 14:19:00|Docking@Home|Started download of file lpdb.prm
14/09/2006 14:19:00|Docking@Home|Checksum or signature error for lpdb_amino.rtf
14/09/2006 14:19:01|Docking@Home|Incomplete read of less than 5KB for lpdb.prm - truncating
14/09/2006 14:19:01|Docking@Home|Temporarily failed download of lpdb.prm: HTTP file not found
14/09/2006 14:19:01|Docking@Home|Giving up on download of lpdb.prm: file was not found on server
14/09/2006 14:19:01|Docking@Home|Started download of file lpdb_probes.prm
14/09/2006 14:19:01|Docking@Home|Checksum or signature error for lpdb.prm
14/09/2006 14:19:02|Docking@Home|Incomplete read of less than 5KB for lpdb_probes.prm - truncating
14/09/2006 14:19:02|Docking@Home|Temporarily failed download of lpdb_probes.prm: HTTP file not found
14/09/2006 14:19:02|Docking@Home|Giving up on download of lpdb_probes.prm: file was not found on server
14/09/2006 14:19:02|Docking@Home|Started download of file 1tng_mod0001_435_384075.inp
14/09/2006 14:19:02|Docking@Home|Checksum or signature error for lpdb_probes.prm
14/09/2006 14:19:14|Docking@Home|Finished download of file 1tng_mod0001_435_384075.inp

Bointu
Volunteer tester

Joined: Sep 13 06
Posts: 10
ID: 36
Credit: 219,034
RAC: 0
Message 175 - Posted 14 Sep 2006 15:04:55 UTC

Tried on another box and got the same

Nasicus
Volunteer tester

Joined: Sep 13 06
Posts: 13
ID: 35
Credit: 666,725
RAC: 0
Message 179 - Posted 14 Sep 2006 16:31:21 UTC

Just attached one new computer to the project and downloaded some WUs without any problem.
Another PC got some WUs also without any problem.

Maybe that issue is resolved?

Bointu
Volunteer tester

Joined: Sep 13 06
Posts: 10
ID: 36
Credit: 219,034
RAC: 0
Message 180 - Posted 14 Sep 2006 16:35:34 UTC

Just started downloading, problem seems to have been solved.

Thanx Andre

____________

Bointu
Volunteer tester

Joined: Sep 13 06
Posts: 10
ID: 36
Credit: 219,034
RAC: 0
Message 187 - Posted 14 Sep 2006 19:11:27 UTC

First wu still ok after 2hrs

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 203 - Posted 15 Sep 2006 4:04:09 UTC - in response to Message ID 187 .

All existing workunits have been cancelled and about 500 new workunits have been created. These take about 1.5 hours on a P4 3.2 GHz and 2.5 hours on a Celeron 2 GHz. Please reset your project or detach and re-attach to start crunching the new wu's. Thanks for all the help!

Andre


First wu still ok after 2hrs

Bointu
Volunteer tester

Joined: Sep 13 06
Posts: 10
ID: 36
Credit: 219,034
RAC: 0
Message 205 - Posted 15 Sep 2006 4:49:06 UTC

Have tried downloading some of the new wu's and got the following again:

15/09/2006 05:17:51|Docking@Home|Started download of file grid_probes.rtf
15/09/2006 05:17:52|Docking@Home|Incomplete read of less than 5KB for grid_probes.rtf - truncating
15/09/2006 05:17:52|Docking@Home|Temporarily failed download of grid_probes.rtf: HTTP file not found
15/09/2006 05:17:52|Docking@Home|Giving up on download of grid_probes.rtf: file was not found on server
15/09/2006 05:17:52|Docking@Home|Started download of file lpdb_amino.rtf
15/09/2006 05:17:52|Docking@Home|Checksum or signature error for grid_probes.rtf
15/09/2006 05:17:53||Rescheduling CPU: project suspended by user
15/09/2006 05:17:54|Docking@Home|Unrecoverable error for result 1tng_mod0001_63_378078_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:17:54|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
15/09/2006 05:17:54|Docking@Home|Unrecoverable error for result 1tng_mod0001_64_448585_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:17:54|Docking@Home|Unrecoverable error for result 1tng_mod0001_65_284584_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:17:54|Docking@Home|Unrecoverable error for result 1tng_mod0001_66_241073_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:17:54|Docking@Home|Unrecoverable error for result 1tng_mod0001_67_149037_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:17:54|Docking@Home|Unrecoverable error for result 1tng_mod0001_68_373871_0 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:17:54|Docking@Home|Unrecoverable error for result 1tng_mod0001_69_430468_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:17:54|Docking@Home|Incomplete read of less than 5KB for lpdb_amino.rtf - truncating
15/09/2006 05:17:54|Docking@Home|Temporarily failed download of lpdb_amino.rtf: HTTP file not found
15/09/2006 05:17:54|Docking@Home|Giving up on download of lpdb_amino.rtf: file was not found on server
15/09/2006 05:17:54|Docking@Home|Started download of file lpdb.prm
15/09/2006 05:17:54|Docking@Home|Checksum or signature error for lpdb_amino.rtf
15/09/2006 05:17:55|Docking@Home|Incomplete read of less than 5KB for lpdb.prm - truncating
15/09/2006 05:17:55|Docking@Home|Temporarily failed download of lpdb.prm: HTTP file not found
15/09/2006 05:17:55|Docking@Home|Giving up on download of lpdb.prm: file was not found on server
15/09/2006 05:17:55|Docking@Home|Started download of file lpdb_probes.prm
15/09/2006 05:17:55|Docking@Home|Checksum or signature error for lpdb.prm
15/09/2006 05:17:56|Docking@Home|Incomplete read of less than 5KB for lpdb_probes.prm - truncating
15/09/2006 05:17:56|Docking@Home|Temporarily failed download of lpdb_probes.prm: HTTP file not found
15/09/2006 05:17:56|Docking@Home|Giving up on download of lpdb_probes.prm: file was not found on server
15/09/2006 05:17:56|Docking@Home|Started download of file 1tng_mod0001_64_448585.inp
15/09/2006 05:17:56|Docking@Home|Checksum or signature error for lpdb_probes.prm
15/09/2006 05:18:42|Docking@Home|Started download of file grid_probes.rtf
15/09/2006 05:18:42|Docking@Home|Started download of file lpdb_amino.rtf
15/09/2006 05:18:43|Docking@Home|Incomplete read of less than 5KB for grid_probes.rtf - truncating
15/09/2006 05:18:43|Docking@Home|Incomplete read of less than 5KB for lpdb_amino.rtf - truncating
15/09/2006 05:18:43|Docking@Home|Temporarily failed download of grid_probes.rtf: HTTP file not found
15/09/2006 05:18:43|Docking@Home|Giving up on download of grid_probes.rtf: file was not found on server
15/09/2006 05:18:43|Docking@Home|Temporarily failed download of lpdb_amino.rtf: HTTP file not found
15/09/2006 05:18:43|Docking@Home|Giving up on download of lpdb_amino.rtf: file was not found on server
15/09/2006 05:18:43|Docking@Home|Started download of file lpdb.prm
15/09/2006 05:18:43|Docking@Home|Started download of file lpdb_probes.prm
15/09/2006 05:18:43|Docking@Home|Checksum or signature error for grid_probes.rtf
15/09/2006 05:18:43|Docking@Home|Checksum or signature error for lpdb_amino.rtf
15/09/2006 05:18:44|Docking@Home|Unrecoverable error for result 1tng_mod0001_70_156337_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error><file_xfer_error> <file_name>lpdb_amino.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:18:44|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
15/09/2006 05:18:44|Docking@Home|Unrecoverable error for result 1tng_mod0001_71_34287_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error><file_xfer_error> <file_name>lpdb_amino.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:18:44|Docking@Home|Unrecoverable error for result 1tng_mod0001_72_402547_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error><file_xfer_error> <file_name>lpdb_amino.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:18:44|Docking@Home|Unrecoverable error for result 1tng_mod0001_73_337348_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error><file_xfer_error> <file_name>lpdb_amino.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:18:44|Docking@Home|Unrecoverable error for result 1tng_mod0001_74_407127_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error><file_xfer_error> <file_name>lpdb_amino.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:18:44|Docking@Home|Deferring scheduler requests for 1 minutes and 41 seconds
15/09/2006 05:18:44|Docking@Home|Unrecoverable error for result 1tng_mod0001_75_247213_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error><file_xfer_error> <file_name>lpdb_amino.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:18:44|Docking@Home|Deferring scheduler requests for 2 minutes and 7 seconds
15/09/2006 05:18:44|Docking@Home|Unrecoverable error for result 1tng_mod0001_76_306711_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>grid_probes.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error><file_xfer_error> <file_name>lpdb_amino.rtf</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
15/09/2006 05:18:44|Docking@Home|Deferring scheduler requests for 15 minutes and 27 seconds
15/09/2006 05:18:44|Docking@Home|Incomplete read of less than 5KB for lpdb.prm - truncating
15/09/2006 05:18:44|Docking@Home|Incomplete read of less than 5KB for lpdb_probes.prm - truncating
15/09/2006 05:18:44|Docking@Home|Temporarily failed download of lpdb.prm: HTTP file not found
15/09/2006 05:18:44|Docking@Home|Giving up on download of lpdb.prm: file was not found on server
15/09/2006 05:18:44|Docking@Home|Temporarily failed download of lpdb_probes.prm: HTTP file not found
15/09/2006 05:18:44|Docking@Home|Giving up on download of lpdb_probes.prm: file was not found on server
15/09/2006 05:18:44|Docking@Home|Checksum or signature error for lpdb.prm
15/09/2006 05:18:44|Docking@Home|Checksum or signature error for lpdb_probes.prm

gamer007
Volunteer tester

Joined: Sep 13 06
Posts: 8
ID: 61
Credit: 13,988
RAC: 0
Message 206 - Posted 15 Sep 2006 4:54:46 UTC

Strange. I got my several WUs about 30mins ago fine.

Bointu
Volunteer tester

Joined: Sep 13 06
Posts: 10
ID: 36
Credit: 219,034
RAC: 0
Message 207 - Posted 15 Sep 2006 4:57:54 UTC

Had the same problem yesterday, then it seemed to be ok later on

Profile Guy Pauwels
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 21
ID: 71
Credit: 801
RAC: 0
Message 210 - Posted 15 Sep 2006 7:55:14 UTC
Last modified: 15 Sep 2006 8:10:15 UTC

I still have the same download problem on my Linux box. See http://docking.utep.edu/result.php?resultid=9227

<core_client_version>5.4.9</core_client_version>
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>grid_probes.rtf</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>lpdb_amino.rtf</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>

</message>


I have reset the project, but Boinc still seems to d/l the 5.01 app and work.

EDIT: Out of curiosity I attached a Windows machines as well. There it downloads app 5.02, but I get the same type of download errors.

Honza
Volunteer tester

Joined: Sep 13 06
Posts: 25
ID: 72
Credit: 5,064
RAC: 0
Message 213 - Posted 15 Sep 2006 9:29:18 UTC
Last modified: 15 Sep 2006 9:29:39 UTC

Just resetted project and downloaded 12 brand-new results with 5.02 to crunch; no download errors.

Time to complete was initially ~ 28 minutes...this is underestimated.

(I've completed 5 WUs from previous session, 2 with succes from server side, 3 with errors on upload).

Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 215 - Posted 15 Sep 2006 9:56:36 UTC

1,5h on a P4 3,2 HT? Oh a joke, Iam now at ~20% in 1h=~5h to complete :/

[B^S] sTrey
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 26
ID: 43
Credit: 23,318
RAC: 0
Message 216 - Posted 15 Sep 2006 10:20:31 UTC

Same here, I have a P4 3.2 (on Windows) and it's showing 56% with 1:45 cpu time. Had to switch back to another project so this won't complete any time soon.

Profile MacDitch
Volunteer tester

Joined: Sep 13 06
Posts: 27
ID: 24
Credit: 377,838
RAC: 0
Message 218 - Posted 15 Sep 2006 13:06:59 UTC

I've got a P4 1.6GHz currently at 48% after 3h50 so I'm guessing about 8.5-9 hours for completion.

Honza
Volunteer tester

Joined: Sep 13 06
Posts: 25
ID: 72
Credit: 5,064
RAC: 0
Message 219 - Posted 15 Sep 2006 13:08:11 UTC

I've sucessfully completed and uploaded 2 result in 2 hours.
Those WUs have other result with download error, other 2 are unsent.

Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 221 - Posted 15 Sep 2006 13:43:28 UTC

Iam watching many write to disk activities with the app 5.02. Can you check this? I think it will also ignore the preferences. The worst things are gone :)

Honza
Volunteer tester

Joined: Sep 13 06
Posts: 25
ID: 72
Credit: 5,064
RAC: 0
Message 222 - Posted 15 Sep 2006 14:16:46 UTC

Yes, I'm getting ~2GB of disk reads every hour on each charmm_5.2
That sayd and with 2 results sucessfully upload, excessive debug info in stderr.txt have been eliminated, but excessive disk reads remains.

Also, memory usage went up from 13 to ~35 MB. It is low memory usage but I thought that extra 20 mega would help to eliminate excessive disk reads...

Angus
Volunteer tester

Joined: Sep 13 06
Posts: 17
ID: 32
Credit: 15,111
RAC: 0
Message 224 - Posted 15 Sep 2006 15:04:42 UTC
Last modified: 15 Sep 2006 15:21:19 UTC

Just returned 2 successful WUs. No crunching errors, no file errors.

However, I did notice that the reported time (12,550 sec approx.) does not agree with the message log run duration of 4:10 approx. for both WUs There's about 40 minutes missing. The tasks ran without switching, and nothing else running on the PC.

The 4:10 times I'm getting are on an XP2600 running W2K, quite a bit more that the estimate of 2 hours on a Celeron 2GHz.

The estimated run time that is embedded in the WU when downloaded is still way out of sync with real run time, but the DCF seems to be working and adjusting the times of the remaining WUs in my queue. The low initial runtime estimate (28 minutes, if I recall?) still causes queues to be overfilled.

Profile [B^S] Dr. Bill Skiba
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 6
ID: 76
Credit: 111,934
RAC: 0
Message 225 - Posted 15 Sep 2006 15:21:19 UTC

I detached/reattached my windows box this morning. Downloaded about 15 wu's. All errored out on download. See http://docking.utep.edu/result.php?resultid=9467

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 227 - Posted 15 Sep 2006 15:47:30 UTC - in response to Message ID 210 .

I have found the problem that is causing this error. It seems that our resultCollector (a fancy name for the file_deleter that does a little bit more), is removing a couple of files consistently from the download directory, so that you see the error below in your logs. I have fixed this for any new wu's that will be created (I've put the no_delete flag in the workunit template for these files) but for the current ones I am still looking for a good solution, because I think that boinc doesn't allow what we are currently doing with out files.

Thanks for pointing us to this problem.
Andre

I still have the same download problem on my Linux box. See http://docking.utep.edu/result.php?resultid=9227

<core_client_version>5.4.9</core_client_version>
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>grid_probes.rtf</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>lpdb_amino.rtf</file_name>
<error_code>-163</error_code>
<error_message>file was not found on server</error_message>
</file_xfer_error>

</message>


I have reset the project, but Boinc still seems to d/l the 5.01 app and work.

EDIT: Out of curiosity I attached a Windows machines as well. There it downloads app 5.02, but I get the same type of download errors.

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 228 - Posted 15 Sep 2006 15:51:21 UTC - in response to Message ID 224 .

That's interesting. I checked my machines again and every result that I crunch takes 2.5 hours on a 2 GHz celeron and 1.5 hours on a 3.2 GHz P4. The difference is I am running Linux and we all now that Linux is a bit more performant than Windows... but 2 hours is a quite a difference.... We should try to get more data on this. Can any of the other Linux guys/girls comment on this?

Andre


Just returned 2 successful WUs. No crunching errors, no file errors.

However, I did notice that the reported time (12,550 sec approx.) does not agree with the message log run duration of 4:10 approx. for both WUs There's about 40 minutes missing. The tasks ran without switching, and nothing else running on the PC.

The 4:10 times I'm getting are on an XP2600 running W2K, quite a bit more that the estimate of 2 hours on a Celeron 2GHz.

The estimated run time that is embedded in the WU when downloaded is still way out of sync with real run time, but the DCF seems to be working and adjusting the times of the remaining WUs in my queue. The low initial runtime estimate (28 minutes, if I recall?) still causes queues to be overfilled.

Honza
Volunteer tester

Joined: Sep 13 06
Posts: 25
ID: 72
Credit: 5,064
RAC: 0
Message 229 - Posted 15 Sep 2006 15:51:35 UTC

Another one finished fine
Angus - it would be nice if you can unhide your machines so others can see the results.

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 230 - Posted 15 Sep 2006 15:52:17 UTC - in response to Message ID 221 .

We'll check into this.

Andre

Iam watching many write to disk activities with the app 5.02. Can you check this? I think it will also ignore the preferences. The worst things are gone :)

Profile Guy Pauwels
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 21
ID: 71
Credit: 801
RAC: 0
Message 231 - Posted 15 Sep 2006 15:54:49 UTC

Since I used up my daily quota on both machines, I d/l some new work in a Vista virtual machine. The d/l now works fine, and the WU start off full of enthousiasm :) It looks like the problem is solved. Unfortunately my Vista installation is so sluggish and eats so much of my machine's resources, even when idle, that I won't let the WU run till the end. I hope you don't mind <blush>

Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 232 - Posted 15 Sep 2006 15:58:12 UTC

4:49h took my P4 3,2 HT on Win XP Pro with 2GB RAM, I think the disk writing grabbed some time away, oh linux ^^

Frank Encruncher
Volunteer tester

Joined: Sep 13 06
Posts: 2
ID: 44
Credit: 24,596
RAC: 0
Message 243 - Posted 16 Sep 2006 1:17:59 UTC
Last modified: 16 Sep 2006 1:18:33 UTC

One successful returned finally WOOT!
no transfer problem
your on the right track now
how's the app working for ya,running O.K.?

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 244 - Posted 16 Sep 2006 2:47:39 UTC - in response to Message ID 243 .

Should be better now after my temporary fix. Next problem we'll look at is the excessive disk writing. Richard has started on this already this afternoon.

Thanks for all of your patience :-)
Andre

One successful returned finally WOOT!
no transfer problem
your on the right track now
how's the app working for ya,running O.K.?

Profile Krunchin-Keith [USA]
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 41
ID: 4
Credit: 1,539,093
RAC: 0
Message 299 - Posted 17 Sep 2006 15:35:26 UTC

I quess I'm the first to try this, I checked boincstats and saw no Windows 98 hosts.

I attached my Windows 98 host, went off for breakfast and came back it was still downloading I thought, No it was downloading another workunit after chewing thru 16 others at 5 seconds each.

The error in the result is:

<core_client_version>5.4.11</core_client_version>
<stderr_txt>
Starting charmm run...
CHARMM.OUT OPEN ERROR - Charmm exited with code 2.
Calling BOINC finish.

</stderr_txt>
<message>
<file_xfer_error>
<file_name>1tng_mod0001_171_434235_4_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>1tng_mod0001_171_434235_4_1</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>1tng_mod0001_171_434235_4_2</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

Profile JShadic
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 10
ID: 7
Credit: 2,007
RAC: 0
Message 327 - Posted 18 Sep 2006 9:47:08 UTC

I am womdering has anyone see a result like this one?

<core_client_version>5.4.11</core_client_version>
<stderr_txt>
Starting charmm run...
Starting charmm run...
Starting charmm run...
Starting charmm run...
No heartbeat from core client for 31 sec - exiting
Starting charmm run...
No heartbeat from core client for 31 sec - exiting
Starting charmm run...
No heartbeat from core client for 31 sec - exiting
Starting charmm run...
Starting charmm run...
ERROR - Charmm exited with code 1.
Calling BOINC finish.

</stderr_txt>

Just wanting to know what was going on.

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 328 - Posted 18 Sep 2006 12:12:54 UTC

Is the 5.02 version of Docking@home just for Windows? My Linux machine still only downloads 5.01 workunits and they all take about 3 1/2 minutes. All have the same error code even though they say they are successful.
Error message is

Starting charmm run...
ERROR - Charmm exited with code 1.

That's about 320 total units with about 112 aborted and the rest 5.01 WU's that have all taken 3 1/2 minutes. I can't get 5.02 WU's and no 5.01 WU will run for 5 minutes let alone 2 to 4 hours.

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 329 - Posted 18 Sep 2006 12:42:28 UTC

The part about the Version numbers I have got the answer from another thead from Andre, so don't worry about the 5.02 and 5.01 thing, 5.02 is for Windows and 5.01 is for Linux and Macs.
The rest of my last post should be in the "problems with 5.01" thread, so sorry about that.

Honza
Volunteer tester

Joined: Sep 13 06
Posts: 25
ID: 72
Credit: 5,064
RAC: 0
Message 332 - Posted 18 Sep 2006 13:14:07 UTC

@ JShadic - No heartbeat means that BOINC core is having trouble finiding science application alive. Application should send "alive" message periodically.

It can happen when another task takes too much CPU cycles hence BOINC project application doesn't get any since it runs on low priority.
Or, it can happen on windows machines when the clock on XP is updated to the correct time (done automatically in Windows), and BOINC core gets out of sync with application.

More on wiki
http://boinc-wiki.ath.cx/index.php?title=No_heartbeat_from_core_client_-_exiting

Nicolas
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 66
ID: 17
Credit: 29,510
RAC: 0
Message 347 - Posted 18 Sep 2006 14:36:25 UTC - in response to Message ID 332 .

@ JShadic - No heartbeat means that BOINC core is having trouble finiding science application alive. Application should send "alive" message periodically.

Note that this is done automatically by BOINC library, it's not something the application should so 'by hand'.
Or, it can happen on windows machines when the clock on XP is updated to the correct time (done automatically in Windows), and BOINC core gets out of sync with application.

Wonder what will happen if clock goes *backwards* when it updates! "No heartbeat for -10 seconds"? :D
Nicolas
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 66
ID: 17
Credit: 29,510
RAC: 0
Message 348 - Posted 18 Sep 2006 14:40:26 UTC - in response to Message ID 332 .
Last modified: 18 Sep 2006 14:43:27 UTC

@ JShadic - No heartbeat means that BOINC core is having trouble finiding science application alive. Application should send "alive" message periodically.

Note that this is done automatically by BOINC library, it's not something the application should so 'by hand'.
Or, it can happen on windows machines when the clock on XP is updated to the correct time (done automatically in Windows), and BOINC core gets out of sync with application.

Wonder what will happen if clock goes *backwards* when it updates! "No heartbeat for -10 seconds"? :D
EDIT: any mod around to delete my doublepost? :(
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 349 - Posted 18 Sep 2006 16:04:12 UTC - in response to Message ID 328 .

Yes, 5.2 is only for windows. I am currently working on a 5.2 for linux, because some of you report app crashes on linux (not everybody and I cannot reproduce any of these crashes on the system in my test lab). We also will release a fix for the validation problem soon. So many new versions to come.

Thanks
Andre

Is the 5.02 version of Docking@home just for Windows? My Linux machine still only downloads 5.01 workunits and they all take about 3 1/2 minutes. All have the same error code even though they say they are successful.
Error message is

Starting charmm run...
ERROR - Charmm exited with code 1.

That's about 320 total units with about 112 aborted and the rest 5.01 WU's that have all taken 3 1/2 minutes. I can't get 5.02 WU's and no 5.01 WU will run for 5 minutes let alone 2 to 4 hours.

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 350 - Posted 18 Sep 2006 16:09:55 UTC - in response to Message ID 299 .

We haven't tested on win98 for the simple reason we don't have a system like that and have a hard time finding cd's to set one up. We could definitely use some help in that corner. Seems that our app cannot open its logfile charmm.out on your box. That will be a hard problem to solve since we don't even have a logfile.. Are the permissions set right on the boinc directory? (projects and or slots)

Thanks
Andre

I quess I'm the first to try this, I checked boincstats and saw no Windows 98 hosts.

I attached my Windows 98 host, went off for breakfast and came back it was still downloading I thought, No it was downloading another workunit after chewing thru 16 others at 5 seconds each.

The error in the result is:

<core_client_version>5.4.11</core_client_version>
<stderr_txt>
Starting charmm run...
CHARMM.OUT OPEN ERROR - Charmm exited with code 2.
Calling BOINC finish.

</stderr_txt>
<message>
<file_xfer_error>
<file_name>1tng_mod0001_171_434235_4_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>1tng_mod0001_171_434235_4_1</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>1tng_mod0001_171_434235_4_2</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 351 - Posted 18 Sep 2006 16:11:05 UTC - in response to Message ID 347 .

Correct. This should be taken care of by the boinc client. We don't touch that functionality at all (wouldn't know how to :-)

Andre


Note that this is done automatically by BOINC library, it's not something the application should so 'by hand'.

Nicolas
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 66
ID: 17
Credit: 29,510
RAC: 0
Message 354 - Posted 18 Sep 2006 16:13:32 UTC - in response to Message ID 350 .

We haven't tested on win98 for the simple reason we don't have a system like that and have a hard time finding cd's to set one up. We could definitely use some help in that corner. Seems that our app cannot open its logfile charmm.out on your box. That will be a hard problem to solve since we don't even have a logfile.. Are the permissions set right on the boinc directory? (projects and or slots)

Thanks
Andre

I can give you full access via VNC to a Win98 host (virtual machine). Although it's Spanish version of Windows...
Profile JShadic
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 10
ID: 7
Credit: 2,007
RAC: 0
Message 359 - Posted 18 Sep 2006 17:17:49 UTC

Thank you Andre, Nicolas, and Honza for the explanation. Happy to let Docking use my spare cycles on this old clunker of mine.

Profile Krunchin-Keith [USA]
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 41
ID: 4
Credit: 1,539,093
RAC: 0
Message 392 - Posted 18 Sep 2006 20:49:59 UTC - in response to Message ID 350 .

We haven't tested on win98 for the simple reason we don't have a system like that and have a hard time finding cd's to set one up. We could definitely use some help in that corner. Seems that our app cannot open its logfile charmm.out on your box. That will be a hard problem to solve since we don't even have a logfile.. Are the permissions set right on the boinc directory? (projects and or slots)

Thanks
Andre


I've never had a problem like this on any of my 6 or 7 windows 98 hosts running any other BOINC projects/applications. They all run other BOINC projects without ever having to set any permissions so I don't know about that.
Nicolas
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 66
ID: 17
Credit: 29,510
RAC: 0
Message 394 - Posted 18 Sep 2006 20:53:49 UTC - in response to Message ID 392 .

We haven't tested on win98 for the simple reason we don't have a system like that...(snip) Are the permissions set right on the boinc directory? (projects and or slots)

Thanks
Andre


I've never had a problem like this on any of my 6 or 7 windows 98 hosts running any other BOINC projects/applications. They all run other BOINC projects without ever having to set any permissions so I don't know about that.

Windows 9x doesn't even have a permissions/ownership system. That's only on NT-based Windows versions.
Jim Baize
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 14
ID: 113
Credit: 18,202
RAC: 0
Message 452 - Posted 19 Sep 2006 13:35:56 UTC

My linux box just updated the application from 5.01 to 5.02. 5.01 was running just fine. all WU's that it crunched completed successfully.

5.02, on the other hand, is giving me the following error.

9/19/2006 6:31:22 AM Unrecoverable error for result 1tng_mod0001_1123_110866_3 (process exited with code 1 (0x1))


Any other info needed?

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 455 - Posted 19 Sep 2006 14:18:32 UTC - in response to Message ID 452 .

I suspect that your other WUs weren't successful, but only validated successful because of a bug in 5.1. Do you have any result numbers for us to check?

Thanks
Andre

My linux box just updated the application from 5.01 to 5.02. 5.01 was running just fine. all WU's that it crunched completed successfully.

5.02, on the other hand, is giving me the following error.

9/19/2006 6:31:22 AM Unrecoverable error for result 1tng_mod0001_1123_110866_3 (process exited with code 1 (0x1))


Any other info needed?


____________
D@H the greatest project in the world... a while from now!
Jim Baize
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 14
ID: 113
Credit: 18,202
RAC: 0
Message 458 - Posted 19 Sep 2006 14:55:15 UTC - in response to Message ID 455 .

Here are my computers
http://docking.utep.edu/show_host_detail.php?hostid=207
and
http://docking.utep.edu/show_host_detail.php?hostid=208
and all of my results
http://docking.utep.edu/results.php?userid=113

I had picked out several wu's and listed them individually, but I inadvertently hit the "back" button, which erased my post before it got sent. :( oh well. Here is some of the info.

Anything else needed?






I suspect that your other WUs weren't successful, but only validated successful because of a bug in 5.1. Do you have any result numbers for us to check?

Thanks
Andre

My linux box just updated the application from 5.01 to 5.02. 5.01 was running just fine. all WU's that it crunched completed successfully.

5.02, on the other hand, is giving me the following error.

9/19/2006 6:31:22 AM Unrecoverable error for result 1tng_mod0001_1123_110866_3 (process exited with code 1 (0x1))


Any other info needed?


Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 459 - Posted 19 Sep 2006 15:38:13 UTC - in response to Message ID 458 .

We have finally found the cause of the problem that some users were experiencing on their Linux systems. It has to do with the stacksize setting on your machine which is for some distros (SuSE 9.3 and 10 for example) set to unlimited and for others (FCx, Ubuntu, etc) set to a limited value like 10240. Your setting can be seen by typing 'ulimit -s' in a terminal. To make the Charmm 'exit 1' errors go away, please set the stacksize to unlimited using the command 'ulimit -s unlimited'. This is not saying that Charmm will use all of your memory (it won't), but it gives us a little bit more space to do our simulations correctly and without errors. Please let us know if this does not work for you. If it does work, please add this command to your shell initialization file (.bashrc, .tcshrc, .kshrc, etc) in your home directory. Of course don't forget to resume the D@H project on your boincmgr in case you suspended it before.

Thanks, Andre

Here are my computers
http://docking.utep.edu/show_host_detail.php?hostid=207
and
http://docking.utep.edu/show_host_detail.php?hostid=208
and all of my results
http://docking.utep.edu/results.php?userid=113

I had picked out several wu's and listed them individually, but I inadvertently hit the "back" button, which erased my post before it got sent. :( oh well. Here is some of the info.

Anything else needed?






I suspect that your other WUs weren't successful, but only validated successful because of a bug in 5.1. Do you have any result numbers for us to check?

Thanks
Andre

My linux box just updated the application from 5.01 to 5.02. 5.01 was running just fine. all WU's that it crunched completed successfully.

5.02, on the other hand, is giving me the following error.

9/19/2006 6:31:22 AM Unrecoverable error for result 1tng_mod0001_1123_110866_3 (process exited with code 1 (0x1))


Any other info needed?




____________
D@H the greatest project in the world... a while from now!
Profile Guy Pauwels
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 21
ID: 71
Credit: 801
RAC: 0
Message 464 - Posted 19 Sep 2006 16:03:28 UTC

I am running a test now. The default ulimit was set to 8192 for Ubuntu 5.10. The test is now beyond the point where it used to exit (now close to 10%, where it used to exit at 4%), so it's looking good. Unfortunately I have to leave now, I will only see in the morning if it really ran to the end.

Remains the question why the program needs that much stack space. 8192 KB is a lot! How does the program come to that high usage? Is there very deep recursion in the coding? Or large chunks of memory that are put on the stack instead of allocating them from the heap?
____________

BOINC.BE : For Belgians who love the smell of glowing red cpu's in the morning
Tutta55's Lair

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 465 - Posted 19 Sep 2006 16:31:43 UTC - in response to Message ID 464 .

I suspect that charmm is using stack space to allocate memory instead of the heap. Or maybe it uses both I'm not sure. Also it's a piece of fortran code that is under development for more than 30 years now which doesn't make it easier to analyze ;-) We will get back to the charmm developers (a whole different community) to ask why the stack. For now there's not too much we can do except asking people to increase their stacksize.

Thanks
Andre

Remains the question why the program needs that much stack space. 8192 KB is a lot! How does the program come to that high usage? Is there very deep recursion in the coding? Or large chunks of memory that are put on the stack instead of allocating them from the heap?


____________
D@H the greatest project in the world... a while from now!
zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 467 - Posted 19 Sep 2006 16:34:03 UTC - in response to Message ID 459 .

We have finally found the cause of the problem that some users were experiencing on their Linux systems. It has to do with the stacksize setting on your machine which is for some distros (SuSE 9.3 and 10 for example) set to unlimited and for others (FCx, Ubuntu, etc) set to a limited value like 10240. Your setting can be seen by typing 'ulimit -s' in a terminal. To make the Charmm 'exit 1' errors go away, please set the stacksize to unlimited using the command 'ulimit -s unlimited'. This is not saying that Charmm will use all of your memory (it won't), but it gives us a little bit more space to do our simulations correctly and without errors. Please let us know if this does not work for you. If it does work, please add this command to your shell initialization file (.bashrc, .tcshrc, .kshrc, etc) in your home directory. Of course don't forget to resume the D@H project on your boincmgr in case you suspended it before.


Where is ulimit located? I am getting "command not found", and I cannot locate it anywhere.


____________
Dublin, CA
Team SETI.USA
Jim Baize
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 14
ID: 113
Credit: 18,202
RAC: 0
Message 468 - Posted 19 Sep 2006 16:46:12 UTC

My stack size was also 8192. I have just increased it to unlimited as you asked. I will keep an eye on things from here on out.

Jim Baize
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 14
ID: 113
Credit: 18,202
RAC: 0
Message 469 - Posted 19 Sep 2006 16:50:54 UTC

Just a thought. You said to put that command in the shell initialization file. But, that won't work for me. I have BOINC runninng as a daemon. I am actually rarely logged into either one of my linux boxes.

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 470 - Posted 19 Sep 2006 16:52:53 UTC - in response to Message ID 467 .

What distro are you running and which shell do you use?


Where is ulimit located? I am getting "command not found", and I cannot locate it anywhere.



____________
D@H the greatest project in the world... a while from now!
zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 471 - Posted 19 Sep 2006 16:54:10 UTC - in response to Message ID 470 .

What distro are you running and which shell do you use?


Ubuntu 6.06, tcsh

Thanks.

____________
Dublin, CA
Team SETI.USA
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 472 - Posted 19 Sep 2006 16:57:40 UTC - in response to Message ID 469 .

It will work even as a daemon: in the boinc start script or init script (or whatever means you use to start), put this command before you start the actual boinc process. Make sure to set the stack limit for the user that boinc runs under.

Andre

PS We are looking for a better solution (one where this hack on the user side is not necessary), but for now this is the workaround.

Just a thought. You said to put that command in the shell initialization file. But, that won't work for me. I have BOINC runninng as a daemon. I am actually rarely logged into either one of my linux boxes.


____________
D@H the greatest project in the world... a while from now!
Jim Baize
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 14
ID: 113
Credit: 18,202
RAC: 0
Message 476 - Posted 19 Sep 2006 17:15:49 UTC - in response to Message ID 472 .

Sorry for all the NOOB questions. The reason i am running these linux machines is to get a better handle on LInux, and it is working, slowly but surely.

Ok, now on to my question..

In my init script, how do i set the stack limit for a particular user?

It will work even as a daemon: in the boinc start script or init script (or whatever means you use to start), put this command before you start the actual boinc process. Make sure to set the stack limit for the user that boinc runs under.

Andre

PS We are looking for a better solution (one where this hack on the user side is not necessary), but for now this is the workaround.

Just a thought. You said to put that command in the shell initialization file. But, that won't work for me. I have BOINC runninng as a daemon. I am actually rarely logged into either one of my linux boxes.


zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 477 - Posted 19 Sep 2006 17:18:36 UTC - in response to Message ID 471 .

What distro are you running and which shell do you use?


Ubuntu 6.06, tcsh

Thanks.

Okay, you asking about the shell got me thinking that this is a bash-only command. So I changed my shell to bash, added the line to my .bashrc, and then rebooted for good measure.

I have boincmgr set to run at login via the sessions manager, so it started automatically. After 3 minutes or so, the WUs failed in the usual way.

I fired up a terminal and checked, yep, "unlimited". Everything looks right there.

So I quit boincmgr, went to the GUI filemanager, and double clicked boincmgr to start it again. After 3 minutes or so, the WUs still failed in the usual way.

So I quit boinc manager again, went back to the terminal, and launched boincmgr from the command line. This time, it appears to have worked. It's up to 9 minutes now.

Issues with this solution:

1) I don't like bash

2) This won't work when there is a power failure, as I have my machines set to automatically boot, log in, and run boincmgr. And if I have to launch it manually from the command line, it won't get fixed until whenever I notice and get back to the machine.
____________
Dublin, CA
Team SETI.USA
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 478 - Posted 19 Sep 2006 17:45:12 UTC - in response to Message ID 477 .
Last modified: 19 Sep 2006 17:58:24 UTC

1) On tcsh the command is called 'limit' and you set stacksize to unlimited with 'limit stacksize unlimited'. For ksh it is 'ulimit'.

Edit - I've updated the front page news as well with this info.

2) I never automatically boot, log in as a certain user, and run an app, so I don't know how this works. But somehow it must be possible to set your stack to unlimited. Could you run the ./run_manager script that comes standard with boinc? You could add the 'limit' command to that script and use that to fire up boincmgr either from the commandline or by clicking on it.

Let me know if that works.
Andre

What distro are you running and which shell do you use?


Ubuntu 6.06, tcsh

Thanks.

Okay, you asking about the shell got me thinking that this is a bash-only command. So I changed my shell to bash, added the line to my .bashrc, and then rebooted for good measure.

I have boincmgr set to run at login via the sessions manager, so it started automatically. After 3 minutes or so, the WUs failed in the usual way.

I fired up a terminal and checked, yep, "unlimited". Everything looks right there.

So I quit boincmgr, went to the GUI filemanager, and double clicked boincmgr to start it again. After 3 minutes or so, the WUs still failed in the usual way.

So I quit boinc manager again, went back to the terminal, and launched boincmgr from the command line. This time, it appears to have worked. It's up to 9 minutes now.

Issues with this solution:

1) I don't like bash

2) This won't work when there is a power failure, as I have my machines set to automatically boot, log in, and run boincmgr. And if I have to launch it manually from the command line, it won't get fixed until whenever I notice and get back to the machine.


____________
D@H the greatest project in the world... a while from now!
zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 479 - Posted 19 Sep 2006 17:48:51 UTC - in response to Message ID 478 .
Last modified: 19 Sep 2006 17:49:09 UTC

1) On tcsh the command is called 'limit' and you set stacksize to unlimited with 'limit stacksize unlimited'. For ksh it is 'ulimit'.


Thanks!

2) I never automatically boot, log in as a certain user, and run an app, so I don't know how this works. But somehow it must be possible to set your stack to unlimited. Could you run the ./run_manager script that comes standard with boinc? You could add the 'limit' command to that script and use that to fire up boincmgr either from the commandline or by clicking on it.


yikes. I'm afraid that is beyond my skills. But a thought occured to me. perhaps I can start boincmgr from one of the .cshrc/.bashrc files. Let me try that.
____________
Dublin, CA
Team SETI.USA
Profile [AF>ALPES] Jump400
Volunteer tester

Joined: Sep 13 06
Posts: 18
ID: 64
Credit: 1,395,685
RAC: 0
Message 480 - Posted 19 Sep 2006 18:25:58 UTC
Last modified: 19 Sep 2006 18:29:22 UTC

Sorry, it's beyond my linux skills too :(
I opened a linux "session" (terminal) under root.
Typed : ulimit - s, result is 8192
I changed ulimit to unlimited than restarted boinc.
I Typed again to make sure : ulimit - s, result is "unlimited"
I closed (exit) the session

Same problem...

So, I opened a session
I typed : ulimit -s
Bloody hell ! the value is 8192 again

After many years, Windows made me stupid, I'm affraid :(
My knowledge of linux is very poor, and I'm not sure I want to learn those crazy commands. It's chinese for me...
____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 481 - Posted 19 Sep 2006 19:30:14 UTC - in response to Message ID 480 .

In the terminal where you type 'ulimit -s unlimited' also start the boincmgr process. Every terminal that you open will have the setting 8192 again unless you put that command in a file called .bashrc in your home directory. That file can be edited with any GUI editor (doesn't have to be vi ;-)

The other method is editing the file run_manager in your BOINC directory and add the line there:

ulimit -s unlimited
cd "/data/BOINC" && exec ./boincmgr $@

Than use the command run_manager to start boinc.

Hope that makes it a little clearer...

Andre

Sorry, it's beyond my linux skills too :(
I opened a linux "session" (terminal) under root.
Typed : ulimit - s, result is 8192
I changed ulimit to unlimited than restarted boinc.
I Typed again to make sure : ulimit - s, result is "unlimited"
I closed (exit) the session

Same problem...

So, I opened a session
I typed : ulimit -s
Bloody hell ! the value is 8192 again

After many years, Windows made me stupid, I'm affraid :(
My knowledge of linux is very poor, and I'm not sure I want to learn those crazy commands. It's chinese for me...


____________
D@H the greatest project in the world... a while from now!
Profile [AF>ALPES] Jump400
Volunteer tester

Joined: Sep 13 06
Posts: 18
ID: 64
Credit: 1,395,685
RAC: 0
Message 485 - Posted 19 Sep 2006 20:19:19 UTC - in response to Message ID 481 .

Many, thanks for your time Andre !
For that, I'll give Linux a chance ...
Let's try




____________

zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 494 - Posted 19 Sep 2006 21:42:28 UTC - in response to Message ID 479 .

2) I never automatically boot, log in as a certain user, and run an app, so I don't know how this works. But somehow it must be possible to set your stack to unlimited. Could you run the ./run_manager script that comes standard with boinc? You could add the 'limit' command to that script and use that to fire up boincmgr either from the commandline or by clicking on it.


yikes. I'm afraid that is beyond my skills. But a thought occured to me. perhaps I can start boincmgr from one of the .cshrc/.bashrc files. Let me try that.


I figured how to do it afterall. I added the "limit stacksize unlimited" as the first line in run_manager (my shell is tcsh). Then I went into the sessions -> startup items.

Deleted boincmgr
added run_manager

Works like a charm!

____________
Dublin, CA
Team SETI.USA
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 498 - Posted 19 Sep 2006 23:27:35 UTC - in response to Message ID 494 .

Great! Happy to hear that :-)
Linux is not so hard after all, but watch out you might get addicted to it ;-)

Andre

2) I never automatically boot, log in as a certain user, and run an app, so I don't know how this works. But somehow it must be possible to set your stack to unlimited. Could you run the ./run_manager script that comes standard with boinc? You could add the 'limit' command to that script and use that to fire up boincmgr either from the commandline or by clicking on it.


yikes. I'm afraid that is beyond my skills. But a thought occured to me. perhaps I can start boincmgr from one of the .cshrc/.bashrc files. Let me try that.


I figured how to do it afterall. I added the "limit stacksize unlimited" as the first line in run_manager (my shell is tcsh). Then I went into the sessions -> startup items.

Deleted boincmgr
added run_manager

Works like a charm!


____________
D@H the greatest project in the world... a while from now!
Profile [AF>ALPES] Jump400
Volunteer tester

Joined: Sep 13 06
Posts: 18
ID: 64
Credit: 1,395,685
RAC: 0
Message 519 - Posted 20 Sep 2006 12:03:51 UTC - in response to Message ID 481 .

In the terminal where you type 'ulimit -s unlimited' also start the boincmgr process. Every terminal that you open will have the setting 8192 again unless you put that command in a file called .bashrc in your home directory. That file can be edited with any GUI editor (doesn't have to be vi ;-)

The other method is editing the file run_manager in your BOINC directory and add the line there:

ulimit -s unlimited
cd "/data/BOINC" && exec ./boincmgr $@

Than use the command run_manager to start boinc.

Hope that makes it a little clearer...

Andre



It works for me ! Thanks again Andre

____________
Nicolas
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 66
ID: 17
Credit: 29,510
RAC: 0
Message 523 - Posted 20 Sep 2006 14:22:59 UTC

I got this message as I started BOINC:

20/09/2006 10:05:41|Docking@Home|Restarting task 1tng_mod0001_3701_212189_2 using charmm version 503
20/09/2006 10:05:49|Docking@Home|Unrecoverable error for result 1tng_mod0001_3701_212189_2 (Función incorrecta. (0x1) - exit code 1 (0x1))
20/09/2006 10:05:49|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
20/09/2006 10:05:49|Docking@Home|Computation for task 1tng_mod0001_3701_212189_2 finished

Last night I had a power failure, probably while that WU was running. I hadn't started the computer again after that.

And btw, Función incorrecta = Incorrect function

Profile [B^S] Paul@home
Volunteer tester

Joined: Sep 13 06
Posts: 8
ID: 34
Credit: 5,877
RAC: 0
Message 524 - Posted 20 Sep 2006 15:15:19 UTC

Doesnt appear to be working for me... see latest results from this host...

Also geting the code 1 exit

Paul


____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 525 - Posted 20 Sep 2006 17:08:16 UTC - in response to Message ID 524 .

Paul,
Could you please detail the exact steps you used to start the boincmgr?
Thanks
Andre

Doesnt appear to be working for me... see latest results from this host...

Also geting the code 1 exit

Paul



____________
D@H the greatest project in the world... a while from now!
Profile [B^S] Paul@home
Volunteer tester

Joined: Sep 13 06
Posts: 8
ID: 34
Credit: 5,877
RAC: 0
Message 528 - Posted 20 Sep 2006 21:23:42 UTC

Hi Andre,

I am running Gentoo Linux and have BOINC starting as a daemon at boot under a user called boinc. I installed BOINC from portage (Gentoo ebuild) and am running version 5.5.6.


To implement this work around, I su'd to boinc and edited /home/boinc/.bashrc

I added the line "ulimit -s unlimited" to the start of this script before it checks whether it is running an interactive shell (so that it takes this setting either way). I confirmed that the setting was holding by closing all terminal windows, opening a new window, su'ing to boinc and running ulimit


paul@GentooPC ~ $ su - boinc
Password:
boinc@GentooPC ~ $ ulimit
unlimited
boinc@GentooPC ~ $


I then stop started boinc

paul@GentooPC ~ $ su - root
Password:
GentooPC ~ # cd /etc/init.d/
GentooPC init.d # ./boinc stop
* Caching service dependencies ... [ ok ]
* Stopping BOINC ... [ ok ]
GentooPC init.d # ./boinc start
* Starting BOINC ... [ ok ]
GentooPC init.d #


The actual command to start the client as a daemon is below. The variables are all standard stuff and populated earlier in the start script.


setsid start-stop-daemon --quiet --start --chdir ${RUNTIMEDIR} \
--exec ${BOINCBIN} --chuid ${USER}:${GROUP} \
--nicelevel ${NICELEVEL} -- ${ARGS} > ${LOGFILE} 2>&1 &



To launch BOINC Manager, i log into KDE and run the command below via a desktop shortcut.

/usr/bin/boinc_gui



I think it should have implemented the work around properly!

cheers,

Paul.


____________
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 529 - Posted 20 Sep 2006 22:18:34 UTC - in response to Message ID 528 .

Maybe you could put the ulimit command in the startup script before the daemon is started?

Any other gentoo boxes out there who still have the problem? If not, how did you solve it?

Thanks
Andre

Hi Andre,

I am running Gentoo Linux and have BOINC starting as a daemon at boot under a user called boinc. I installed BOINC from portage (Gentoo ebuild) and am running version 5.5.6.


To implement this work around, I su'd to boinc and edited /home/boinc/.bashrc

I added the line "ulimit -s unlimited" to the start of this script before it checks whether it is running an interactive shell (so that it takes this setting either way). I confirmed that the setting was holding by closing all terminal windows, opening a new window, su'ing to boinc and running ulimit


paul@GentooPC ~ $ su - boinc
Password:
boinc@GentooPC ~ $ ulimit
unlimited
boinc@GentooPC ~ $


I then stop started boinc

paul@GentooPC ~ $ su - root
Password:
GentooPC ~ # cd /etc/init.d/
GentooPC init.d # ./boinc stop
* Caching service dependencies ... [ ok ]
* Stopping BOINC ... [ ok ]
GentooPC init.d # ./boinc start
* Starting BOINC ... [ ok ]
GentooPC init.d #


The actual command to start the client as a daemon is below. The variables are all standard stuff and populated earlier in the start script.


setsid start-stop-daemon --quiet --start --chdir ${RUNTIMEDIR}
--exec ${BOINCBIN} --chuid ${USER}:${GROUP}
--nicelevel ${NICELEVEL} -- ${ARGS} > ${LOGFILE} 2>&1 &



To launch BOINC Manager, i log into KDE and run the command below via a desktop shortcut.

/usr/bin/boinc_gui



I think it should have implemented the work around properly!

cheers,

Paul.



____________
D@H the greatest project in the world... a while from now!
Profile [B^S] Paul@home
Volunteer tester

Joined: Sep 13 06
Posts: 8
ID: 34
Credit: 5,877
RAC: 0
Message 537 - Posted 21 Sep 2006 15:27:08 UTC

Ok, the start script no reads (only the interesting bit below!):


ulimit -s unlimited
setsid start-stop-daemon --quiet --start --chdir ${RUNTIMEDIR} \
--exec ${BOINCBIN} --chuid ${USER}:${GROUP} \
--nicelevel ${NICELEVEL} -- ${ARGS} > ${LOGFILE} 2>&1 &



I have restarted BOINC and will see what happens.

Paul.
Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 540 - Posted 21 Sep 2006 16:35:12 UTC - in response to Message ID 467 .

[quote]...

Where is ulimit located? I am getting "command not found", and I cannot locate it anywhere.


I'm having the same problem, running Xubuntu (XFCE uses Terminal). Ulimit, limit, none of them work! :-(

Additionally, this is what happened when I've crunched my first WU:

Wed 20 Sep 2006 05:36:36 PM AST|Docking@Home|Starting task 1tng_mod0001_1530_1466_4 using charmm version 502
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Unrecoverable error for result 1tng_mod0001_1530_1466_4 (process exited with code 1 (0x1))
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
Wed 20 Sep 2006 05:40:44 PM AST||Rescheduling CPU: application exited
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Computation for task 1tng_mod0001_1530_1466_4 finished
____________
Brucifer
Volunteer tester

Joined: Sep 18 06
Posts: 10
ID: 111
Credit: 2,367
RAC: 0
Message 547 - Posted 21 Sep 2006 20:42:51 UTC


running linux. getting a work not sent - but was committed to other platforms ???

Brucifer
Volunteer tester

Joined: Sep 18 06
Posts: 10
ID: 111
Credit: 2,367
RAC: 0
Message 548 - Posted 21 Sep 2006 20:46:52 UTC


running linux. getting a work not sent - but was committed to other platforms ???

Profile Richard Zamudio
Volunteer tester

Joined: Sep 13 06
Posts: 9
ID: 3
Credit: 296
RAC: 0
Message 552 - Posted 21 Sep 2006 21:36:32 UTC - in response to Message ID 547 .


running linux. getting a work not sent - but was committed to other platforms ???


I generated more work units. That should solve the problem. Please let use know if it doesn't. Thanks.
Profile Richard Zamudio
Volunteer tester

Joined: Sep 13 06
Posts: 9
ID: 3
Credit: 296
RAC: 0
Message 553 - Posted 21 Sep 2006 21:54:28 UTC - in response to Message ID 540 .

[quote]...

Where is ulimit located? I am getting "command not found", and I cannot locate it anywhere.


I'm having the same problem, running Xubuntu (XFCE uses Terminal). Ulimit, limit, none of them work! :-(

Additionally, this is what happened when I've crunched my first WU:

Wed 20 Sep 2006 05:36:36 PM AST|Docking@Home|Starting task 1tng_mod0001_1530_1466_4 using charmm version 502
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Unrecoverable error for result 1tng_mod0001_1530_1466_4 (process exited with code 1 (0x1))
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
Wed 20 Sep 2006 05:40:44 PM AST||Rescheduling CPU: application exited
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Computation for task 1tng_mod0001_1530_1466_4 finished


ulimit is a shell parameter for bash (and other shells). I am not familiar with Xubuntu/XFCE, but its shell should have a similar parameter. You can try looking the man pages for references to the stack size.

I'm not sure if the error you are getting is related, but I will look into this too. Thanks
Profile [B^S] Paul@home
Volunteer tester

Joined: Sep 13 06
Posts: 8
ID: 34
Credit: 5,877
RAC: 0
Message 554 - Posted 21 Sep 2006 21:58:59 UTC - in response to Message ID 529 .

Maybe you could put the ulimit command in the startup script before the daemon is started?


I dont want to jinx it, but since making this change I am 4 hors and 80% into a result - the longest one yet by some margin. Here's hoping!

Paul.
Profile [B^S] Paul@home
Volunteer tester

Joined: Sep 13 06
Posts: 8
ID: 34
Credit: 5,877
RAC: 0
Message 557 - Posted 22 Sep 2006 6:47:26 UTC

Lost power overnight (damn storms!) and my UPS shut down my computers... hopefully will work to report soon enough.

Paul.

Nicolas
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 66
ID: 17
Credit: 29,510
RAC: 0
Message 559 - Posted 22 Sep 2006 13:38:24 UTC - in response to Message ID 557 .

Lost power overnight (damn storms!) and my UPS shut down my computers... hopefully will work to report soon enough.

Paul.

[off-topic]At least you have an UPS! Yesterday, AND the day before, I had 30-second long power outages at night. Both times VMware was running, so that's two operating systems being shut down uncleanly. And only one of the times I could be bothered to turn computer back on and strat up everything again, so lost CPU time (BOINC) and download time.[/off-topic]
Profile [B^S] Paul@home
Volunteer tester

Joined: Sep 13 06
Posts: 8
ID: 34
Credit: 5,877
RAC: 0
Message 561 - Posted 22 Sep 2006 15:40:08 UTC

[off topic]
dang that sucks!
[/off topic]


back on topic... Looks like adding the ulimit command to the startup script did the job. [url=http://docking.utep.edu/result.php?resultid=20675]This Results[/url looks to have completed successfully!

Thanks for the help!


Paul.

Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 568 - Posted 23 Sep 2006 3:09:48 UTC - in response to Message ID 553 .

[quote]...

Where is ulimit located? I am getting "command not found", and I cannot locate it anywhere.


I'm having the same problem, running Xubuntu (XFCE uses Terminal). Ulimit, limit, none of them work! :-(

Additionally, this is what happened when I've crunched my first WU:

Wed 20 Sep 2006 05:36:36 PM AST|Docking@Home|Starting task 1tng_mod0001_1530_1466_4 using charmm version 502
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Unrecoverable error for result 1tng_mod0001_1530_1466_4 (process exited with code 1 (0x1))
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
Wed 20 Sep 2006 05:40:44 PM AST||Rescheduling CPU: application exited
Wed 20 Sep 2006 05:40:44 PM AST|Docking@Home|Computation for task 1tng_mod0001_1530_1466_4 finished


ulimit is a shell parameter for bash (and other shells). I am not familiar with Xubuntu/XFCE, but its shell should have a similar parameter. You can try looking the man pages for references to the stack size.

I'm not sure if the error you are getting is related, but I will look into this too. Thanks

Sorry to bother again, but I can't for the life of me find anything even remotely related to that command. I've looked *everywhere*... :-(

Oh, and the same thing happened with a second WU now:
Fri 22 Sep 2006 12:21:01 PM AST|Docking@Home|Starting task 1tng_mod0001_1127_24451_6 using charmm version 502
Fri 22 Sep 2006 12:25:26 PM AST|Docking@Home|Unrecoverable error for result 1tng_mod0001_1127_24451_6 (process exited with code 1 (0x1))
Fri 22 Sep 2006 12:25:26 PM AST|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
Fri 22 Sep 2006 12:25:26 PM AST||Rescheduling CPU: application exited
Fri 22 Sep 2006 12:25:26 PM AST|Docking@Home|Computation for task 1tng_mod0001_1127_24451_6 finished
Dagorath

Joined: Sep 18 06
Posts: 38
ID: 116
Credit: 4,866
RAC: 0
Message 572 - Posted 23 Sep 2006 5:13:48 UTC - in response to Message ID 568 .


Monster Truck,

It sounds like you need advice from someone who has more intimate knowledge of the distro you run. You might wait a long time before someone with that knowledge shows up in this small forum and stumbles upon your post. Have you tried explaining your problem in a forum dedicated to the distro you're running? I think you would get fairly quick results that way.

Profile Saenger
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 125
ID: 79
Credit: 411,959
RAC: 0
Message 576 - Posted 23 Sep 2006 10:59:17 UTC
Last modified: 23 Sep 2006 11:01:45 UTC

I've got this messagesw in my BOINC when I looked there:

Sam 23 Sep 2006 11:07:43 CEST|Docking@Home|Resuming task 1tng_mod0001_1254_335479_3 using charmm version 502
Sam 23 Sep 2006 11:47:17 CEST|Docking@Home|Computation for task 1tng_mod0001_1254_335479_3 finished
Sam 23 Sep 2006 11:47:20 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_0
Sam 23 Sep 2006 11:47:20 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_1
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Error on file upload: invalid signature
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Error on file upload: invalid signature

Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_0
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_0: server rejected file
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_1
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_1: server rejected file
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_2
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_3
Sam 23 Sep 2006 11:47:24 CEST|Docking@Home|Error on file upload: invalid signature
Sam 23 Sep 2006 11:47:24 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_3
Sam 23 Sep 2006 11:47:24 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_3: server rejected file
Sam 23 Sep 2006 11:47:28 CEST|Docking@Home|Error on file upload: invalid signature

Sam 23 Sep 2006 11:47:28 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_2
Sam 23 Sep 2006 11:47:28 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_2: server rejected file

I don't know what happened, as I see the corresponding result in my account as "Checked, but no consensus yet", but at least succesful uploaded.

What went wrong where? And did anything go wrong at all besides the worrysome messages popping up?
Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 588 - Posted 24 Sep 2006 2:37:43 UTC - in response to Message ID 572 .


Monster Truck,

It sounds like you need advice from someone who has more intimate knowledge of the distro you run. You might wait a long time before someone with that knowledge shows up in this small forum and stumbles upon your post. Have you tried explaining your problem in a forum dedicated to the distro you're running? I think you would get fairly quick results that way.


Nevermind, I've found the command with the help of someone, and my stacksize is already set to unlimited. So, the errors I get must be coming from something else...
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 589 - Posted 24 Sep 2006 3:57:25 UTC - in response to Message ID 576 .
Last modified: 24 Sep 2006 3:58:17 UTC

Saenger,
I wouldn't worry too much about this one. The validator is confused about this wu, because there is one valid result which actually ended with an error. This one was crunched with the 5.1 app which had an error and was replaced with 5.2 because of that reason. So basically the validator gets 4 valid results of which 1 is different from the other 3 and this makes it set the validate_state of this wu to 4 (no consensus yet). The one that is still pending will determine the final result I suspect (and hope).

Hope that helps explain it...
Thanks, Andre

PS I'm on vacation for 4 days starting tomorrow. Going to check out the Grand Canyon :-)

I've got this messagesw in my BOINC when I looked there:
Sam 23 Sep 2006 11:07:43 CEST|Docking@Home|Resuming task 1tng_mod0001_1254_335479_3 using charmm version 502
Sam 23 Sep 2006 11:47:17 CEST|Docking@Home|Computation for task 1tng_mod0001_1254_335479_3 finished
Sam 23 Sep 2006 11:47:20 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_0
Sam 23 Sep 2006 11:47:20 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_1
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Error on file upload: invalid signature
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Error on file upload: invalid signature

Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_0
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_0: server rejected file
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_1
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_1: server rejected file
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_2
Sam 23 Sep 2006 11:47:22 CEST|Docking@Home|Started upload of file 1tng_mod0001_1254_335479_3_3
Sam 23 Sep 2006 11:47:24 CEST|Docking@Home|Error on file upload: invalid signature
Sam 23 Sep 2006 11:47:24 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_3
Sam 23 Sep 2006 11:47:24 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_3: server rejected file
Sam 23 Sep 2006 11:47:28 CEST|Docking@Home|Error on file upload: invalid signature

Sam 23 Sep 2006 11:47:28 CEST|Docking@Home|Permanently failed upload of 1tng_mod0001_1254_335479_3_2
Sam 23 Sep 2006 11:47:28 CEST|Docking@Home|Giving up on upload of 1tng_mod0001_1254_335479_3_2: server rejected file

I don't know what happened, as I see the corresponding result in my account as "Checked, but no consensus yet", but at least succesful uploaded.

What went wrong where? And did anything go wrong at all besides the worrysome messages popping up?


____________
D@H the greatest project in the world... a while from now!
Profile Saenger
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 125
ID: 79
Credit: 411,959
RAC: 0
Message 597 - Posted 24 Sep 2006 11:09:01 UTC

I don't worry about the credits, I worry about the messages.
My BOINC says it's not uploaded, it went wrong, sorry, but it failed.
My account here says everythings fine, no probs at all.

Both can't be right, my question is why they don't agree.

Profile [B^S] Doug Worrall
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 127
ID: 74
Credit: 11,046
RAC: 0
Message 599 - Posted 24 Sep 2006 11:52:15 UTC

If I go to PCLINUX Forums and talk about Boinc, you get NO-Reply for my Distro.
Maybe next weeks Distro Du Jur will take the ulimit -s unlimited
Untill that time, have tried 14 times with No Luck, will try each day
to get some results that help "Docking".
Thanks Andre

Doug

Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 637 - Posted 27 Sep 2006 14:18:24 UTC - in response to Message ID 588 .


Monster Truck,

It sounds like you need advice from someone who has more intimate knowledge of the distro you run. You might wait a long time before someone with that knowledge shows up in this small forum and stumbles upon your post. Have you tried explaining your problem in a forum dedicated to the distro you're running? I think you would get fairly quick results that way.


Nevermind, I've found the command with the help of someone, and my stacksize is already set to unlimited. So, the errors I get must be coming from something else...

:-(

Wed 27 Sep 2006 10:02:25 AM AST|Docking@Home|Starting task 1tng_mod0001_4039_71682_2 using charmm version 502
Wed 27 Sep 2006 10:07:23 AM AST|Docking@Home|Unrecoverable error for result 1tng_mod0001_4039_71682_2 (process exited with code 1 (0x1))
Wed 27 Sep 2006 10:07:23 AM AST|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
Wed 27 Sep 2006 10:07:23 AM AST||Rescheduling CPU: application exited
Wed 27 Sep 2006 10:07:23 AM AST|Docking@Home|Computation for task 1tng_mod0001_4039_71682_2 finished


Anyone? My stacksize is already set to unlimited.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 669 - Posted 29 Sep 2006 3:03:57 UTC - in response to Message ID 637 .

Can you show your ulimit output using the 'ulimit -a' command on the same terminal you started your boinc client on? (start a terminal, cd into your BOINC directory, enter 'ulimit -s unlimited', enter 'run_manager.sh &', enter 'ulimit -a')

Thanks
Andre


:-(

Wed 27 Sep 2006 10:02:25 AM AST|Docking@Home|Starting task 1tng_mod0001_4039_71682_2 using charmm version 502
Wed 27 Sep 2006 10:07:23 AM AST|Docking@Home|Unrecoverable error for result 1tng_mod0001_4039_71682_2 (process exited with code 1 (0x1))
Wed 27 Sep 2006 10:07:23 AM AST|Docking@Home|Deferring scheduler requests for 1 minutes and 0 seconds
Wed 27 Sep 2006 10:07:23 AM AST||Rescheduling CPU: application exited
Wed 27 Sep 2006 10:07:23 AM AST|Docking@Home|Computation for task 1tng_mod0001_4039_71682_2 finished


Anyone? My stacksize is already set to unlimited.


____________
D@H the greatest project in the world... a while from now!
Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 677 - Posted 29 Sep 2006 13:34:56 UTC

The problem is, I can't find my BOINC directory. That may sound sutpid, but all I can see is the executables in /usr/bin (boinc_client, boinc_cmd, boincmgr), I have no idea where all the data is stored. There's nothing in /~ either except a small text file with basic settings. Anyhow, executing those commands in /usr/bin gave me the following:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) unlimited
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[1]+ Exit 127 run_manager.sh

I don't know what to make of it. :-/
____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 678 - Posted 29 Sep 2006 13:45:42 UTC - in response to Message ID 677 .
Last modified: 29 Sep 2006 13:46:25 UTC

Ah, seems you haven't downloaded a boinc client from boinc.berkeley.edu but installed one from an rpm or deb package. That may mean that you don't have a run_manager.sh script... Can you do the same, but instead of run_manager.sh run boincmgr? Let me know what that does.

Andre

The problem is, I can't find my BOINC directory. That may sound sutpid, but all I can see is the executables in /usr/bin (boinc_client, boinc_cmd, boincmgr), I have no idea where all the data is stored. There's nothing in /~ either except a small text file with basic settings. Anyhow, executing those commands in /usr/bin gave me the following:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) unlimited
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[1]+ Exit 127 run_manager.sh

I don't know what to make of it. :-/


____________
D@H the greatest project in the world... a while from now!
Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 696 - Posted 29 Sep 2006 20:52:25 UTC

Indeed, this is the boinc client that comes packaged with Xubuntu, which I've downloaded through Synaptic. But when I try that command, it gives me a "command not found" error. I've tried:

run_boincmgr
run_boincmgr &
run boincmgr
run boincmgr &

None of them work. As you can see I'm still a n00b with GNU/Linux, so go easy on me. ;-)

Jim Baize
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 14
ID: 113
Credit: 18,202
RAC: 0
Message 741 - Posted 2 Oct 2006 3:40:09 UTC - in response to Message ID 696 .

Indeed, this is the boinc client that comes packaged with Xubuntu, which I've downloaded through Synaptic. But when I try that command, it gives me a "command not found" error. I've tried:

run_boincmgr
run_boincmgr &
run boincmgr
run boincmgr &

None of them work. As you can see I'm still a n00b with GNU/Linux, so go easy on me. ;-)



Ubuntu is based on Debian. I am running Debian and I am using the version of BOINC released through them. My solution to the ulimit problem was to go to /etc/init.d and find the startup script for BOINC. I edited the startup script and put the ulimit command close to the beginning of the script. before any other commands where executed.

Jim
Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 757 - Posted 2 Oct 2006 20:01:16 UTC
Last modified: 2 Oct 2006 20:04:53 UTC

Hi there..

I've just added a Linux host and first done some wu's without the 'ulimit -s unlimited' command.

All ended after reaching 3.475% and exited with status 1 (x-1)

After 'ulimit -s' I found out that stacks were at 8192.. so then I've used 'ulimit -s unlimited' but it did not seem to help.

See host for details:
358

Greetings
Rene

edit: Linux distro Ubuntu 6.06 LTS

Memo
Forum moderator
Project developer
Project tester

Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
Message 761 - Posted 2 Oct 2006 21:49:43 UTC

Couple of questions:

Did you stop boinc after you changed the setting.

Where did you add the command ulimit -s unilimited to a file or just typed on your shell?

Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 766 - Posted 3 Oct 2006 5:07:29 UTC - in response to Message ID 761 .

Couple of questions:

Did you stop boinc after you changed the setting.

Where did you add the command ulimit -s unilimited to a file or just typed on your shell?


Good morning Memo,

After the first 12 were crunched (and reported) I opened up a Terminal-window and entered after the prompt 'ulimit -s'. This was replied by 8192.
At that moment BOINC was still running doing Seti and Rosetta.
I then entered 'ulimit -s unlimited' and left the Terminal-window open.
New downloaded Wu seemed to go to 4.something% this time but then stopped anaway. (so did the remaining next 11)
Then I redid the 'ulimit -s' command to check the stacks, but the reply was still 8192 and did not seem changed.

Did I do something wrong?

Greetings
Rene

[B^S] Morgan the Gold
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 41
ID: 170
Credit: 138,735
RAC: 0
Message 767 - Posted 3 Oct 2006 5:40:27 UTC

no joy here, only hard resets for computation errors to endure, tried the ulimiting after a couple, still hanging. Ah well this this old athlon with its new red hat has failed every wu of nearly every kind for a few weeks, after running successfully all kinds (even sap)for mounths .I'v memtest & all that other sorta stuff, perhaps time to re emerge, or recycle that p.c., lol .
Thanks for letting me play eh.

____________

Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 769 - Posted 3 Oct 2006 11:26:18 UTC - in response to Message ID 741 .

Indeed, this is the boinc client that comes packaged with Xubuntu, which I've downloaded through Synaptic. But when I try that command, it gives me a "command not found" error. I've tried:

run_boincmgr
run_boincmgr &
run boincmgr
run boincmgr &

None of them work. As you can see I'm still a n00b with GNU/Linux, so go easy on me. ;-)



Ubuntu is based on Debian. I am running Debian and I am using the version of BOINC released through them. My solution to the ulimit problem was to go to /etc/init.d and find the startup script for BOINC. I edited the startup script and put the ulimit command close to the beginning of the script. before any other commands where executed.

Jim

Thank you, Jim. I've done that to /etc/init.d/boinc-client. Is there anything else I should do, or will the problem be fixed now?
Memo
Forum moderator
Project developer
Project tester

Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
Message 786 - Posted 3 Oct 2006 18:28:12 UTC - in response to Message ID 766 .

Couple of questions:

Did you stop boinc after you changed the setting.

Where did you add the command ulimit -s unilimited to a file or just typed on your shell?


Good morning Memo,

After the first 12 were crunched (and reported) I opened up a Terminal-window and entered after the prompt 'ulimit -s'. This was replied by 8192.
At that moment BOINC was still running doing Seti and Rosetta.
I then entered 'ulimit -s unlimited' and left the Terminal-window open.
New downloaded Wu seemed to go to 4.something% this time but then stopped anaway. (so did the remaining next 11)
Then I redid the 'ulimit -s' command to check the stacks, but the reply was still 8192 and did not seem changed.

Did I do something wrong?

Greetings
Rene



Rene

The thing is that the command must be runed by a script befor boinc starts.
If you run on text mode adding the script to .bashrc will do. If its running in graphical mode I belive (I run boinc in text mode) it has to be in the run_client script in boinc directory.

Dont forget to restart boinc so this setting is catched by the client.

Let me know if you have more problems.
Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 788 - Posted 3 Oct 2006 19:40:48 UTC - in response to Message ID 786 .

Couple of questions:

Did you stop boinc after you changed the setting.

Where did you add the command ulimit -s unilimited to a file or just typed on your shell?


Good morning Memo,

After the first 12 were crunched (and reported) I opened up a Terminal-window and entered after the prompt 'ulimit -s'. This was replied by 8192.
At that moment BOINC was still running doing Seti and Rosetta.
I then entered 'ulimit -s unlimited' and left the Terminal-window open.
New downloaded Wu seemed to go to 4.something% this time but then stopped anaway. (so did the remaining next 11)
Then I redid the 'ulimit -s' command to check the stacks, but the reply was still 8192 and did not seem changed.

Did I do something wrong?

Greetings
Rene



Rene

The thing is that the command must be runed by a script befor boinc starts.
If you run on text mode adding the script to .bashrc will do. If its running in graphical mode I belive (I run boinc in text mode) it has to be in the run_client script in boinc directory.

Dont forget to restart boinc so this setting is catched by the client.

Let me know if you have more problems.


That's were i did put it the first time (run_client) thinking the run_manager script would trigger the run-client.

Now it seems that i've fixed it.
I've edited the run_manager script (the one that I use to start up the manager) and added the "ulimit -s unlimited" at the beginning.

Wu is still running now for over a hour and has reached approx 70%.

Thanks and will report back if the first ones have been crunched.

;-)

Jim Baize
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 14
ID: 113
Credit: 18,202
RAC: 0
Message 795 - Posted 3 Oct 2006 22:43:39 UTC - in response to Message ID 769 .

Indeed, this is the boinc client that comes packaged with Xubuntu, which I've downloaded through Synaptic. But when I try that command, it gives me a "command not found" error. I've tried:

run_boincmgr
run_boincmgr &
run boincmgr
run boincmgr &

None of them work. As you can see I'm still a n00b with GNU/Linux, so go easy on me. ;-)



Ubuntu is based on Debian. I am running Debian and I am using the version of BOINC released through them. My solution to the ulimit problem was to go to /etc/init.d and find the startup script for BOINC. I edited the startup script and put the ulimit command close to the beginning of the script. before any other commands where executed.

Jim

Thank you, Jim. I've done that to /etc/init.d/boinc-client. Is there anything else I should do, or will the problem be fixed now?



Just stop / start or restart your boinc client. Other than that, it should work.
Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 801 - Posted 4 Oct 2006 5:11:04 UTC

I've got some finished wu's now (pending) and all seems well.
So here's what dit the trick for me:

Close (if needed) the BOINC manager.

Open Gedit ---> Open run_manager in the BOINC directory --->
Add "ulimit -s unlimited" at the beginning of the file --->
Save the file and use this to start up the BOINC manager.

You can also save the file as another (run_manager_docking or something) to use that script as long as it is needed here at Docking.

Greetings
Rene

Memo
Forum moderator
Project developer
Project tester

Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
Message 802 - Posted 4 Oct 2006 5:45:56 UTC

Just as a side note this setting will not affect any other project. It just give charmm a little more space to work thats all.

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 803 - Posted 4 Oct 2006 8:07:17 UTC

Have added a new computer using Linux and added the "ulimit -s unlimited" command to run_manager.
My first lot of work units (40 of them,10 per cpu) have all errored out with either 'error code 2' or an error message about can't get input files?
Computer is this one:- http://docking.utep.edu/show_host_detail.php?hostid=410
Am I missing something?
____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 806 - Posted 4 Oct 2006 14:59:59 UTC - in response to Message ID 803 .

Error 2 means that the app cannot open its own logfile called charmm.out. Could you check permissions on the slots and projects directories, etc?

Andre

Have added a new computer using Linux and added the "ulimit -s unlimited" command to run_manager.
My first lot of work units (40 of them,10 per cpu) have all errored out with either 'error code 2' or an error message about can't get input files?
Computer is this one:- http://docking.utep.edu/show_host_detail.php?hostid=410
Am I missing something?


____________
D@H the greatest project in the world... a while from now!
Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 807 - Posted 4 Oct 2006 16:50:10 UTC

Hello Andre,
No restrictions that I can see. Set up the same as my other Linux machine.
Both are AMD Opteron computers, the working one is an 848 (2 cpus) and the one having trouble is an 275 (2 dual cpus). The 275 machine has no trouble running CP, Einstein, Rosetta, Ralph, QMC and Predictor.
The 848 computer also runs QMC, Rosetta, Einstein, Ralph, LHC.
Where is the 'charmm.out' file kept? I am unable to find it on either machine when I look in the project folder or the Boinc folder, I have but 5 files in the project folder (both machines).

Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 808 - Posted 4 Oct 2006 18:18:27 UTC - in response to Message ID 807 .

Hello Andre,
No restrictions that I can see. Set up the same as my other Linux machine.
Both are AMD Opteron computers, the working one is an 848 (2 cpus) and the one having trouble is an 275 (2 dual cpus). The 275 machine has no trouble running CP, Einstein, Rosetta, Ralph, QMC and Predictor.
The 848 computer also runs QMC, Rosetta, Einstein, Ralph, LHC.
Where is the 'charmm.out' file kept? I am unable to find it on either machine when I look in the project folder or the Boinc folder, I have but 5 files in the project folder (both machines).


You can find them in one of the folders in ../BOINC/SLOTS
The folders are called "0", "1", "2", etc... depending on how much projects are running.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 813 - Posted 4 Oct 2006 21:50:50 UTC - in response to Message ID 808 .

Hello Andre,
No restrictions that I can see. Set up the same as my other Linux machine.
Both are AMD Opteron computers, the working one is an 848 (2 cpus) and the one having trouble is an 275 (2 dual cpus). The 275 machine has no trouble running CP, Einstein, Rosetta, Ralph, QMC and Predictor.
The 848 computer also runs QMC, Rosetta, Einstein, Ralph, LHC.
Where is the 'charmm.out' file kept? I am unable to find it on either machine when I look in the project folder or the Boinc folder, I have but 5 files in the project folder (both machines).


You can find them in one of the folders in ../BOINC/SLOTS
The folders are called "0", "1", "2", etc... depending on how much projects are running.


The files we send back to the server are actually 'symlinks' in the slots directory. These files point to files called 1tng_xxxx_xxxxxx_x_x in the projects directory that actually contain the real content. The file name resolving is being done by the boinc client.

Andre

____________
D@H the greatest project in the world... a while from now!
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 814 - Posted 4 Oct 2006 21:53:15 UTC - in response to Message ID 807 .

One thing to try is suspending the job that is going to crash right after you downloaded it and check which files are present in the project and slots directories. The charmm.out will be called like 1tng_xxxx_xxxxxx_x_3 in the projects directory (the charmm.out file in the slots directory will contain the real file name). Let me know what you find.

Andre

Hello Andre,
No restrictions that I can see. Set up the same as my other Linux machine.
Both are AMD Opteron computers, the working one is an 848 (2 cpus) and the one having trouble is an 275 (2 dual cpus). The 275 machine has no trouble running CP, Einstein, Rosetta, Ralph, QMC and Predictor.
The 848 computer also runs QMC, Rosetta, Einstein, Ralph, LHC.
Where is the 'charmm.out' file kept? I am unable to find it on either machine when I look in the project folder or the Boinc folder, I have but 5 files in the project folder (both machines).


____________
D@H the greatest project in the world... a while from now!
Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 855 - Posted 5 Oct 2006 14:51:35 UTC

I will try Andre, but with that last lot all terminating in seconds due to the errors, even if I had been home (I was at work), I would of had trouble trapping a WU to see what was in the SLOT directory. I notice that the SLOT folders only hold information while a project is being processed.
I am also trying to work out why the 848 machine only has 7 SLOT folders (with 6 projects) but the 275 machine has 17 SLOT folders for 7 projects, seems a bit weird but as the extras hold no data no real problem.
As I type this the 275 machine has Docking work downloading so we shall see what I find.


____________

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 865 - Posted 5 Oct 2006 15:33:36 UTC - in response to Message ID 855 .

I will try Andre, but with that last lot all terminating in seconds due to the errors, even if I had been home (I was at work), I would of had trouble trapping a WU to see what was in the SLOT directory. I notice that the SLOT folders only hold information while a project is being processed.
I am also trying to work out why the 848 machine only has 7 SLOT folders (with 6 projects) but the 275 machine has 17 SLOT folders for 7 projects, seems a bit weird but as the extras hold no data no real problem.
As I type this the 275 machine has Docking work downloading so we shall see what I find.




Well It did not get better. After 3 minutes 20 seconds all the WU's started to error out so I suspended the project.

The "charmm.out" SLOT file held this information :-
<soft_link>../../projects/
docking.utep.edu/1tng_mod0001_5104_384550_0_3</soft_link>

The SLOT folder held these files :-
1tng_0.bin 1tng.bin 1tng.crt 1tng_grid.bin 1tng_min.pdb 1tng.streamfile boinc_lockfile charmm_5.2_i686-pc-linux-gnu charmm.inp charmm.out grid_probes.rtf init_data.xml ligandmingrid.bin ligand.pdb ligand.psf 1pdb_amino.rtf 1pdb.prm 1pdb_probes.prm minenergy.pdb minrmsd.pdb percentdone.str receptor.pdb receptor.psf stderr.txt summary.txt

The Project folder held these files (plus all WU files):-
grid_probes.rtf 1pdb_amino.rtf 1pdb.prm 1pdb_probes.prm charmm_5.2_i686-pc-linux-gnu

I am now getting "Unrecoverable error for result xxxxx (process exited with code 1 (0x1)).
This is the same as original error with Linux machines, but I have added the 'ulimit -s unlimited' command in the 'run_manager' boinc file.
____________
grummel
Volunteer tester

Joined: Oct 2 06
Posts: 6
ID: 127
Credit: 4,957
RAC: 0
Message 872 - Posted 5 Oct 2006 17:53:15 UTC - in response to Message ID 865 .
Last modified: 5 Oct 2006 17:57:35 UTC

Please, type it in the " run_client " also!
On my system it works (Kubuntu 6.0.6).

Tino Ruiz
Volunteer tester

Joined: Sep 19 06
Posts: 9
ID: 117
Credit: 423,548
RAC: 0
Message 878 - Posted 5 Oct 2006 20:36:03 UTC

Yay! Thank you all who helped me. Now Docking@Home is working properly for me. Anyone running Xubuntu who doesn't know what to do, here are the summarized steps:

sudo nano -w /etc/init.d/boinc-client

- Add the text "ulimit -s unlimited" to the beginning of the file and save that file (Ctrl+O). Then:

./boinc-client stop
./boinc-client start

That's it!

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 885 - Posted 6 Oct 2006 2:28:19 UTC

>>> While I did not have to add the 'ulimit' command to 'run_client' on my other working machine (AMD Opteron 848 (2 cpus) same OS Fedora Core 3), but I will try.

Ok stopped Boinc and added the 'ulimit -s unlimited' command as the first line of 'run_client'. Started Boinc but still the same, error code 1 after 3 minutes 21 seconds.

Right, nothing for it, I will reboot the computer.
You little ripper, she's a goer now and has gone past 9 minutes for the first time and still crunching, says WU will take 35 minutes 13 seconds, I will wait.

It would appear that I may not of needed the 'ulimit -s unlimited' command in the 'run_client' if I had rebooted the machine in the first place, I made the assumption that as I did not have to reboot the first computer I did not need to reboot this one, how wrong I was. Crunching now for 19 minutes.
As it looks like it is going to take longer than 35 minutes I will give an update later as i have to go to work.

____________

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 896 - Posted 6 Oct 2006 13:31:43 UTC

All now working ok, have now processed a successful WU after 1 hour 25 minutes. Also have 6 pending.
I should of done the reboot at the start.
Thanks for everyones help, all systems go.
____________

Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 897 - Posted 6 Oct 2006 14:19:16 UTC - in response to Message ID 896 .

All now working ok, have now processed a successful WU after 1 hour 25 minutes. Also have 6 pending.
I should of done the reboot at the start.
Thanks for everyones help, all systems go.


Well done... ;-)

Let's hope that an app update will fix the needed "hack".

Message boards : Number crunching : Charmm 5.02

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#123 (2) {
      ["db_conn"]=>
      resource(228) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(50) "update DBNAME.thread set views=views+1 where id=26"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#123 (2) {
      ["db_conn"]=>
      resource(228) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(2) "26"
        ["forum"]=>
        string(1) "2"
        ["owner"]=>
        string(2) "52"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(11) "Charmm 5.02"
        ["timestamp"]=>
        string(10) "1160144356"
        ["views"]=>
        string(4) "3153"
        ["replies"]=>
        string(3) "117"
        ["activity"]=>
        string(20) "4.5022778327013e-129"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1158221853"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(2) "26"
      ["forum"]=>
      string(1) "2"
      ["owner"]=>
      string(2) "52"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(11) "Charmm 5.02"
      ["timestamp"]=>
      string(10) "1160144356"
      ["views"]=>
      string(4) "3153"
      ["replies"]=>
      string(3) "117"
      ["activity"]=>
      string(20) "4.5022778327013e-129"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1158221853"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=26