Incorrect Function 1

Message boards : Number crunching : Incorrect Function 1

Author	Message
Brian Priebe Joined: Oct 3 10 Posts: 3 ID: 33519 Credit: 4,529,494 RAC: 0	Message 6850 - Posted 1 Oct 2012 22:30:21 UTC
	It seems this problem has raised its ugly head again. I am seeing dozens of WU's on a number of PC's all fail with: Calling BOINC init. Starting charmm run (initial or from checkpoint)... ERROR - Charmm exited with code 1. Calling BOINC finish. called boinc_finish
	ID: 6850 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6853 - Posted 5 Oct 2012 3:10:34 UTC - in response to Message ID 6850 . Last modified: 5 Oct 2012 3:22:46 UTC
	same here :-( I aborted all unstarted ones and set to NNW. p.s.: @Project : Exit 0 if you recognize this situation and handle it on server side. As this is a situation, that is recognized by the program, which aborts the result on purpose, it is not a crash or invalid result so don't treat it like one.
	ID: 6853 \| Rating: 0 \| rate: /

Reeferman Joined: Sep 25 12 Posts: 2 ID: 67677 Credit: 1,489 RAC: 0	Message 6854 - Posted 5 Oct 2012 20:06:56 UTC
	I am also getting the same failed messages...........
	ID: 6854 \| Rating: 0 \| rate: /

Lois Petrolito Joined: Apr 1 12 Posts: 2 ID: 53545 Credit: 58,779 RAC: 0	Message 6855 - Posted 5 Oct 2012 22:26:58 UTC
	I don't get "incorrect function". What I've been getting lately is "computation error" ?
	ID: 6855 \| Rating: 0 \| rate: /

Reeferman Joined: Sep 25 12 Posts: 2 ID: 67677 Credit: 1,489 RAC: 0	Message 6856 - Posted 5 Oct 2012 23:50:06 UTC - in response to Message ID 6855 .
	I don't get "incorrect function". What I've been getting lately is "computation error" ? Yeah I get "computation error" on my results also, but when I click on the reported WU results, it lists the "incorrect function 1" error.
	ID: 6856 \| Rating: 0 \| rate: /

Lois Petrolito Joined: Apr 1 12 Posts: 2 ID: 53545 Credit: 58,779 RAC: 0	Message 6857 - Posted 6 Oct 2012 4:06:47 UTC - in response to Message ID 6856 .
	I don't get "incorrect function". What I've been getting lately is "computation error" ? Yeah I get "computation error" on my results also, but when I click on the reported WU results, it lists the "incorrect function 1" error. Now that I check that page, I DO get the same error. What's happening?
	ID: 6857 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6858 - Posted 6 Oct 2012 8:54:26 UTC - in response to Message ID 6857 . Last modified: 6 Oct 2012 9:10:34 UTC
	... Now that I check that page, I DO get the same error. What's happening? If I remember right, somewhere in the middle of a calculation it finds out, that one value is out of bounds and exits with an status value of 1. Afaik. the translation of "1" into "incorrect function" is a BOINC thing and doesn't really reflect the reason for this error. When they start the calculation, they cannot tell if it is a candidate for this type of error but I still think they should treat it as a valid result on BOINC-side, as no technical error led to the error. It could very well be handled when it's transferred (or not transferred in this case) from BOINC into the scientific database instead of filling our task lists with error results. p.s.: Here's an older thread about the same problem, it is a known problem, that's why Brian wrote "again".
	ID: 6858 \| Rating: 0 \| rate: /

der_Day Joined: Jan 16 10 Posts: 10 ID: 24434 Credit: 1,922,000 RAC: 0	Message 6859 - Posted 7 Oct 2012 11:08:14 UTC
	I've also some of these buggy WUs... Since today I got some WUs with an estimitated time of 10 hours (normal 3h)! And it seems, that they have no checkpoints, is this normal? example wu Is it helpful to reset the project?
	ID: 6859 \| Rating: 0 \| rate: /

Andreas38871 Joined: Jan 8 09 Posts: 2 ID: 5693 Credit: 8,459 RAC: 0	Message 6860 - Posted 7 Oct 2012 11:55:43 UTC Last modified: 7 Oct 2012 11:56:15 UTC
	So I have the same problem, no checkpoints. After 3 hours and 30 minutes no more time is displayed. Is relatively poor when the computer is to be issued once. Andreas
	ID: 6860 \| Rating: 0 \| rate: /

der_Day Joined: Jan 16 10 Posts: 10 ID: 24434 Credit: 1,922,000 RAC: 0	Message 6861 - Posted 7 Oct 2012 14:59:45 UTC - in response to Message ID 6860 .
	So I have the same problem, no checkpoints. After 3 hours and 30 minutes no more time is displayed. Is relatively poor when the computer is to be issued once. Andreas I looked in the slot-folder (for example d:\Boinc\Project_Data\slots\) of the broken WUs and saw, that several files are missing. I cancelled these jobs
	ID: 6861 \| Rating: 0 \| rate: /

Boyu Zhang Forum moderator Project administrator Project developer Project tester Joined: May 5 10 Posts: 88 ID: 28821 Credit: 2,013,795 RAC: 0	Message 6875 - Posted 9 Oct 2012 2:24:00 UTC
	During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. Thanks for letting us know and bear with us during the difficulty! Boyu
	ID: 6875 \| Rating: 0 \| rate: /

speechless Joined: Nov 24 11 Posts: 5 ID: 46818 Credit: 1,075,787 RAC: 0	Message 6879 - Posted 9 Oct 2012 13:36:25 UTC - in response to Message ID 6875 .
	During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. Thanks for letting us know and bear with us during the difficulty! Boyu I have the 0 % problem on 3 WU that start with 1m0b1htf, some work just fine, though. What should I do?
	ID: 6879 \| Rating: 0 \| rate: /

Aaron Finney Volunteer tester Joined: Mar 23 07 Posts: 74 ID: 367 Credit: 2,409,831 RAC: 0	Message 6880 - Posted 9 Oct 2012 14:38:09 UTC - in response to Message ID 6879 .
	During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. Thanks for letting us know and bear with us during the difficulty! Boyu I have the 0 % problem on 3 WU that start with 1m0b1htf, some work just fine, though. What should I do? I also have the 0% problem with 3 WU that start with "1m0b1htf", and also 3 that start with "1ohr1htf". They are over 20 hours and counting, noticed it this morning.
	ID: 6880 \| Rating: 0 \| rate: /

Boyu Zhang Forum moderator Project administrator Project developer Project tester Joined: May 5 10 Posts: 88 ID: 28821 Credit: 2,013,795 RAC: 0	Message 6881 - Posted 9 Oct 2012 15:11:42 UTC - in response to Message ID 6879 .
	Please abort the ones with 0% progress. Thanks! During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. Thanks for letting us know and bear with us during the difficulty! Boyu I have the 0 % problem on 3 WU that start with 1m0b1htf, some work just fine, though. What should I do?
	ID: 6881 \| Rating: 0 \| rate: /

TheFiend Joined: Apr 7 09 Posts: 70 ID: 9482 Credit: 20,705,527 RAC: 0	Message 6887 - Posted 9 Oct 2012 19:08:29 UTC
	Just downloaded some 1ebz1hih_mod0014crossdockinghiv1 and they are coming up with the 0% progress problem... Aborting them
	ID: 6887 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6891 - Posted 10 Oct 2012 6:58:30 UTC Last modified: 10 Oct 2012 7:13:46 UTC
	Two separate problems I think ... The tasks that return with exit code 1 do show progress, they seem to run with quite a normal speed (judged by the % display) and then they exit before having reached the 100% Examples : 1hvi1htf_mod0014crossdockinghiv1_26830_396694_0 1hvi1htf_mod0014crossdockinghiv1_26823_446111_0 1hvi1htf_mod0014crossdockinghiv1_26819_164747_0 1hvi1htf_mod0014crossdockinghiv1_26714_324070_0 so the problem started with 1hvi1htf_... (for me) p.s.: I checked the ones of Brian, the thread starter : 1dif1htf_ is what I found there, so it's not restricted to the series that caused the problem for me. p.p.s.: Maybe an x64 windows issue? Brians fleet runs this OS type and my box does too.
	ID: 6891 \| Rating: 0 \| rate: /

Simba123 Joined: Dec 7 11 Posts: 23 ID: 47237 Credit: 2,607,800 RAC: 0	Message 6916 - Posted 14 Oct 2012 5:55:58 UTC
	Hello, I also am seeing a lot of these 'computational error' workunits. The latest failures all seem to be coming from the 1ohr1htf series of workunits.....
	ID: 6916 \| Rating: 0 \| rate: /

TheFiend Joined: Apr 7 09 Posts: 70 ID: 9482 Credit: 20,705,527 RAC: 0	Message 6917 - Posted 14 Oct 2012 10:08:46 UTC Last modified: 14 Oct 2012 10:21:40 UTC
	Out of the current batch of 1ohr1htf units I've had 7 errors out of 121 crunched so far and those were limited to 1 PC - a 1090T hex core. EDIT.... as I have recently been tweaking core voltages on my 1090T I have upped it a notch to see if the errors stop occuring on the 1090T.
	ID: 6917 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6927 - Posted 19 Oct 2012 21:18:09 UTC
	After a little timeout ... currently the results seem to run much better, no "exit 1" error, my last five went through flawless.
	ID: 6927 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6928 - Posted 21 Oct 2012 0:09:52 UTC - in response to Message ID 6927 . Last modified: 21 Oct 2012 0:12:34 UTC
	After a little timeout ... currently the results seem to run much better, no "exit 1" error, my last five went through flawless. That was too early :-( After 25 valid 1t7k1htf, 1dif1htf_mod0014crossdockinghiv1_23875_162815 failed with -1 There always seem to be certain series that have this flaw, other series are completely unaffected.
	ID: 6928 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6929 - Posted 21 Oct 2012 19:29:31 UTC - in response to Message ID 6917 . Last modified: 21 Oct 2012 19:46:12 UTC
	Out of the current batch of 1ohr1htf units I've had 7 errors out of 121 crunched so far and those were limited to 1 PC - a 1090T hex core. ... Your box started working fine when you ran out of 1ohr1htf and received 1t7k1htf instead. I doubt that it is a voltage issue, it would have done that with the lower voltage too I bet. (You had way more than 7 errors btw. and all only in certain series) 1dif1htf (those that caused problems for me) fail on your box too and on all other hosts I checked. Mine is a Xeon L5520, standard voltages and frequencies. @project : Again ... as this "ERROR - Charmm exited with code 1." is a program controlled exit, you should "exit 0" and set a flag that the result ran into a condition where further processing doesn't make sense for scientific reasons. On BOINC-side it should be successfull. Compare it to a prime project - there the numbers that turn out to be no prime do not exit with an error either.
	ID: 6929 \| Rating: 0 \| rate: /

TheFiend Joined: Apr 7 09 Posts: 70 ID: 9482 Credit: 20,705,527 RAC: 0	Message 6931 - Posted 22 Oct 2012 10:54:09 UTC - in response to Message ID 6929 .
	Out of the current batch of 1ohr1htf units I've had 7 errors out of 121 crunched so far and those were limited to 1 PC - a 1090T hex core. ... Your box started working fine when you ran out of 1ohr1htf and received 1t7k1htf instead. I doubt that it is a voltage issue, it would have done that with the lower voltage too I bet. (You had way more than 7 errors btw. and all only in certain series) 1dif1htf (those that caused problems for me) fail on your box too and on all other hosts I checked. Came to the conclusion it was dodgy WU's and dropped the the voltage agin a couple of days ago. My 1055T has just had a few 1dif units error out so aborting all 1dif on both crunchers.
	ID: 6931 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6937 - Posted 25 Oct 2012 17:03:35 UTC
	1hvi1htf is another buggy series, I'll abort all I get from this type.
	ID: 6937 \| Rating: 0 \| rate: /

robertmiles Joined: Apr 16 09 Posts: 96 ID: 9967 Credit: 1,290,747 RAC: 0	Message 6938 - Posted 26 Oct 2012 0:29:00 UTC
	A large fraction, but not all, of my 1hvi1htf workunits are now giving this compute error: Incorrect function. (0x1) - exit code 1 (0x1) Could you investigate why, and what should be done about this?
	ID: 6938 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6939 - Posted 26 Oct 2012 1:54:44 UTC Last modified: 26 Oct 2012 2:35:33 UTC
	Add 1hvj1htf and 1hvk1htf to the badlist
	ID: 6939 \| Rating: 0 \| rate: /

Ananas Joined: Aug 29 09 Posts: 56 ID: 17736 Credit: 2,500,425 RAC: 0	Message 6940 - Posted 26 Oct 2012 19:57:42 UTC
	hmmmm ... the non-1hv*** results are getting rare, maybe it's time for another timeout. The last one had been caused by series having tons of "ERROR - Charmm exited with code 1." but no one has fixed it since then and no one seems to care about tons of crashing results at all.
	ID: 6940 \| Rating: 0 \| rate: /

ZapSSD Joined: Jul 17 12 Posts: 1 ID: 63092 Credit: 206,746 RAC: 0	Message 6941 - Posted 26 Oct 2012 22:14:31 UTC
	Wel the problem last almost a month now and indeed the projectleaders seems to not bother at all. I think I detach from this project permanently.
	ID: 6941 \| Rating: 0 \| rate: /

Message boards : Number crunching : Incorrect Function 1

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement

array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#32 (2) {
      ["db_conn"]=>
      resource(126) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(51) "update DBNAME.thread set views=views+1 where id=691"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#32 (2) {
      ["db_conn"]=>
      resource(126) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(3) "691"
        ["forum"]=>
        string(1) "2"
        ["owner"]=>
        string(5) "33519"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(20) "Incorrect Function 1"
        ["timestamp"]=>
        string(10) "1351289671"
        ["views"]=>
        string(3) "414"
        ["replies"]=>
        string(2) "26"
        ["activity"]=>
        string(22) "1.6378441531674999e-34"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1349130621"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(3) "691"
      ["forum"]=>
      string(1) "2"
      ["owner"]=>
      string(5) "33519"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(20) "Incorrect Function 1"
      ["timestamp"]=>
      string(10) "1351289671"
      ["views"]=>
      string(3) "414"
      ["replies"]=>
      string(2) "26"
      ["activity"]=>
      string(22) "1.6378441531674999e-34"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1349130621"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}

query: update docking.thread set views=views+1 where id=691