CHARMM ERRORS AND QUIRKS
Message boards : Number crunching : CHARMM ERRORS AND QUIRKS
Author | Message | |
---|---|---|
I have been noticing that the new CHARMM work units often get to 100.00% but still have some more processing to do before they finish. They will sit at 100.00% "waiting to run" when the task switches to another project.
|
||
ID: 4210 | Rating: 0 | rate: / | ||
Conan, it seems to me that these work-units are not going to the end. Please abort those that have overcome the 3 hours. Each work-unit should not overcome 1.5/2 hours,
|
||
ID: 4211 | Rating: 0 | rate: / | ||
I recieved a mail from you about a misconfigured BOINC on my machine_
Docking@Home notification: I can't confirm this, as Docking is the only project with problems, I'm running CPDN, Einstein, yoyo, POEM, malaria, Cosmology, UTC-malaria, WCG, Lattice, orbit, Rosetta, QMC, Milkyway, Leiden, Ralph, Magnetism, Ibercivis and soon again Simap, the occasional Pirates and LHC on it, all crunch well. I don't think it's a problem with the configuration. My puter is described here , and of course in my account as well , but there's just rudimentary information. The errors I get are the following :
<core_client_version>6.2.14</core_client_version>
<![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> Calling BOINC init. Starting charmm run (initial or from checkpoint)... ERROR - Charmm exited with code 1. Calling BOINC finish. called boinc_finish </stderr_txt> ]]> |
||
ID: 4213 | Rating: 0 | rate: / | ||
Hi Conan, yes I think that is better if you cancel those very long work units (taking more than 2 or 3 hrs). It is a very strange behavior that we are trying to understand. I will run those workunits outside BOINC to see if I can reproduce that same behavior and find the cause. I don't know if I will succeed because often, different architectures give different results, but I'll try and I'll keep you posted. |
||
ID: 4219 | Rating: 0 | rate: / | ||
Hi Conan, yes I think that is better if you cancel those very long work units (taking more than 2 or 3 hrs). It is a very strange behavior that we are trying to understand. I will run those workunits outside BOINC to see if I can reproduce that same behavior and find the cause. I don't know if I will succeed because often, different architectures give different results, but I'll try and I'll keep you posted. G'Day Trilce, I think I worked it out that the work units I was having trouble with (there were about 5 of them), had problems because they were all resends already sent twice as 6.04 work units and then sent to me as 6.07 work units. I have had no more trouble since. (I sent an e-mail to Michela to say I could not supply the files she requested, but offered her what I believe to be the solution to the problem, and I now have passed this onto you) Thanks again. ____________ |
||
ID: 4221 | Rating: 0 | rate: / | ||
I still get the same errors for every single WU an my puter as described below (or above, depending on your sorting;)
|
||
ID: 4223 | Rating: 0 | rate: / | ||
Hi Saenger, I think it could be a corrupted input file. Can you please reset the project, or detach-attach the project. Maybe this way if there is a corrupted file you will get a good one. I'm still investigating what could be the cause for those errors.
|
||
ID: 4224 | Rating: 0 | rate: / | ||
OK I found one cause in my own client: The input files where empty. So, if you go to the BOINC client directory, then /projects/docking.cis.udel.edu/ and some of the sizes of the input files are 0, your are likely to keep having errors. I just detach=attach the project, but now I'm having problems to download the files.
|
||
ID: 4225 | Rating: 0 | rate: / | ||
Hi All, Im sure about one kind of error:
|
||
ID: 4227 | Rating: 0 | rate: / | ||
Hi Saenger, I think it could be a corrupted input file. Can you please reset the project, or detach-attach the project. Maybe this way if there is a corrupted file you will get a good one. I'm still investigating what could be the cause for those errors. I first tried a reset, it didn't help. Then I detach/reattached and now I've done my first 4. I hope my max-per-core of 1 will go up again soon, but patience is a virtue, especially in beta stages ;) ____________ Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki |
||
ID: 4232 | Rating: 0 | rate: / | ||
I just had 2 with
Maximum CPU time exceeded
as well:
|
||
ID: 4235 | Rating: 0 | rate: / | ||
I just had 2 with Maximum CPU time exceeded as well: This is the issue that keep us busy right now. We cannot reproduce the problem in standalone or on our testing server. On docking we do not get back any file that could tell us what went on. The bar is not broken and the fact that it stays at 0.00% probably means that the charmm simulation does not start. It helps us to get from you and the other volunteers with this problem the list of files in the slot/# associated to the work-unit and the list of files in project/docking.cis.udel.edu. Thanks, Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 4236 | Rating: 0 | rate: / | ||
It helps us to get from you and the other volunteers with this problem the list of files in the slot/# associated to the work-unit and the list of files in project/docking.cis.udel.edu. The slots are gone with the WU obviously, If I notice another one in the future I will save it before aborting the WU. I'll send you the project/docking folder, I've archived it to a .tar.gz-archive, is that fine? If so, I'll send it to the mail address in this page . |
||
ID: 4237 | Rating: 0 | rate: / | ||
It helps us to get from you and the other volunteers with this problem the list of files in the slot/# associated to the work-unit and the list of files in project/docking.cis.udel.edu. Yes, please. I was looking at the database and oder those wus with error-status = -177 (what we get). Well, it seems that all of them have been created before 2008-07-28 23:11:18 and the last wu with the problem (so far) had id=6149. Unfortunately our setting repeated wus with errors up to 3 time - we just changed this now. These were old wus and we made some changes since we created them that could be the cause of the -error 177. Let me keep monitored the situation and see if this is indeed the case. Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 4238 | Rating: 0 | rate: / | ||
Yes, please. Been there, done that. 4MB on their way to you. |
||
ID: 4239 | Rating: 0 | rate: / | ||
Yes, please. If my observation is correct and the error -177 is indeed related to old work-units then canceling those old work-units should fix the problem. I just cancelled old work-units up to the work-unit with id 6149. This should not affect credits already awarded. I am monitoring docking and will keep you all posted. Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 4240 | Rating: 0 | rate: / | ||
Message boards : Number crunching : CHARMM ERRORS AND QUIRKS
Database Error: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) { [0]=> array(7) { ["file"]=> string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc" ["line"]=> int(97) ["function"]=> string(8) "do_query" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#21 (2) { ["db_conn"]=> resource(72) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(51) "update DBNAME.thread set views=views+1 where id=321" } } [1]=> array(7) { ["file"]=> string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc" ["line"]=> int(60) ["function"]=> string(6) "update" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#21 (2) { ["db_conn"]=> resource(72) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(3) { [0]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "321" ["forum"]=> string(1) "2" ["owner"]=> string(3) "100" ["status"]=> string(1) "0" ["title"]=> string(24) "CHARMM ERRORS AND QUIRKS" ["timestamp"]=> string(10) "1218233641" ["views"]=> string(3) "406" ["replies"]=> string(2) "15" ["activity"]=> string(20) "1.2488385068118e-100" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1217724750" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } [1]=> &string(6) "thread" [2]=> &string(13) "views=views+1" } } [2]=> array(7) { ["file"]=> string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php" ["line"]=> int(184) ["function"]=> string(6) "update" ["class"]=> string(11) "BoincThread" ["object"]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "321" ["forum"]=> string(1) "2" ["owner"]=> string(3) "100" ["status"]=> string(1) "0" ["title"]=> string(24) "CHARMM ERRORS AND QUIRKS" ["timestamp"]=> string(10) "1218233641" ["views"]=> string(3) "406" ["replies"]=> string(2) "15" ["activity"]=> string(20) "1.2488385068118e-100" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1217724750" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(13) "views=views+1" } } }query: update docking.thread set views=views+1 where id=321