what casued this error?


Advanced search

Message boards : Number crunching : what casued this error?

Sort
Author Message
zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3373 - Posted 29 May 2007 8:42:35 UTC

http://docking.utep.edu/result.php?resultid=192582

This machine can't seem to make this project work.
____________
Dublin, CA
Team SETI.USA

Memo
Forum moderator
Project developer
Project tester

Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
Message 3377 - Posted 30 May 2007 16:26:01 UTC

Do you have a lot of projects attached to this client? or anything else that might help us to reproduce this error at the lab?

We had a problem with linux boxes with exit code 1 but it was related to the stack problem which I think we don't have with the macs.

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3378 - Posted 30 May 2007 17:53:52 UTC

@Memo

He has exactly 4 GB of ram. Could a 32 bit calculation somewhere be interpreting that as 0 ram ?

@Zombie67

Are you letting the core client run at normal priority or are you running it niced?

-- David

____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3379 - Posted 30 May 2007 18:32:16 UTC

I had several projects attached at the time, including:

RCN
Rosetta
SETI
Docking
____________
Dublin, CA
Team SETI.USA

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3386 - Posted 4 Jun 2007 17:49:22 UTC

I notice the machine has 8 cores. In your general preferences, what do you have for the percentage of memory to use while running and the percentage of memory to use while idle?

How much swap space does the machine have and what percentage of swap does your general preferences say BOINC can use? In the general preferences, do you have "Leave applications in memory while suspended?" set to YES or NO?

I'm not sure if the ulimit command exists on Macs (it does on linux) but can you go to a command prompt for the same user that you have BOINC running as on your machine and type the command
ulimit -a
and post the output here. It should show your resource limits.

Is there any chance that your machine uses a really slow server for DNS lookups?

Could you look at some results from other projects (Besides Docking) and see if they got the "No heartbeat from core client for 31 sec" error. The other projects might have gotten that error and recovered from it and still validated and gotten credit. It would be helpful to know if they got the error though.

-- David

____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3388 - Posted 4 Jun 2007 19:46:20 UTC - in response to Message ID 3386 .
Last modified: 4 Jun 2007 19:46:41 UTC

I notice the machine has 8 cores. In your general preferences, what do you have for the percentage of memory to use while running and the percentage of memory to use while idle?

99% (of 4gb)

How much swap space does the machine have and what percentage of swap does your general preferences say BOINC can use? In the general preferences, do you have "Leave applications in memory while suspended?" set to YES or NO?


VM = 16gb. Yes, leave in memory.

I'm not sure if the ulimit command exists on Macs (it does on linux) but can you go to a command prompt for the same user that you have BOINC running as on your machine and type the command
ulimit -a
and post the output here. It should show your resource limits.


% ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) 6144
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 266
virtual memory (kbytes, -v) unlimited

Is there any chance that your machine uses a really slow server for DNS lookups?


I use the normal comcast DNS servers.

Could you look at some results from other projects (Besides Docking) and see if they got the "No heartbeat from core client for 31 sec" error. The other projects might have gotten that error and recovered from it and still validated and gotten credit. It would be helpful to know if they got the error though.

I'm not seeing anything with SETI. That is the only other project that has any results to look at.


____________
Dublin, CA
Team SETI.USA
Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3389 - Posted 5 Jun 2007 4:33:45 UTC

stack size (kbytes, -s) 8192


That would cause the error. I'm not sure how the BOINC core client is being started on a MAC, but you need to issue the command

ulimit -s unlimited

either in the script just before it starts the boinc core client or manually before you start the boinc core client. If you're letting the BOINC manager start the BOINC core client, then stop the run manager AND the core client, issue that command, and start the core client from the same command line or start the run manager from the same command line and let it start the core client.

I'm sorry I can't be more specific, but I don't have a Mac.

Happy Crunching,

-- David
____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?
zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3390 - Posted 5 Jun 2007 15:24:15 UTC - in response to Message ID 3389 .

stack size (kbytes, -s) 8192


That would cause the error. I'm not sure how the BOINC core client is being started on a MAC, but you need to issue the command

ulimit -s unlimited

either in the script just before it starts the boinc core client or manually before you start the boinc core client. If you're letting the BOINC manager start the BOINC core client, then stop the run manager AND the core client, issue that command, and start the core client from the same command line or start the run manager from the same command line and let it start the core client.


Something doesn't sound right. None of my other macs have this problem, and they have the same value for the stack size. However, they are all PPCs. Perhaps it is a PPC vs. Intel thing. I just attached another Intel machine to see if it also has the same problem.

Also, I thought the "ulimit" issue had been resolved with the newer versions of BOINC. Am I mis-remembering?


____________
Dublin, CA
Team SETI.USA
zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3391 - Posted 6 Jun 2007 2:48:28 UTC - in response to Message ID 3390 .
Last modified: 6 Jun 2007 2:49:43 UTC

Something doesn't sound right. None of my other macs have this problem, and they have the same value for the stack size. However, they are all PPCs. Perhaps it is a PPC vs. Intel thing. I just attached another Intel machine to see if it also has the same problem.

Duh, no need to wait. That other intel machine has over 400 results returned without ever having the problem. Additionally, it just successfully returned anther. It is a Core Duo, with 2gb RAM, running RCN and Climate Prediction at the same time. It has the following settings:

% ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) 6144
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 266
virtual memory (kbytes, -v) unlimited

I really don't think the stack size is causing the problem....unless it has something to do with the number of cores.
____________
Dublin, CA
Team SETI.USA
Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3392 - Posted 6 Jun 2007 5:34:59 UTC

*scratches head*

Well, I may have to admit to being somewhat baffled by this one.

One thing I can think of to try is to set your general preferences to only use 2 cores and see what happens. To be honest, I don't expect this to make a difference but it would eliminate one possible source of the problem.

ISTR that they solved the problem on Linux by having a script issue the "ulimit -s unlimited" command start the docking client. I don't know if the Mac version of Docking does that or not.

You could also investigate whatever limiting mechanism OS X has and whether it prevents the user you're running the program as from successfully doing the "ulimit -s unlimited" command. On linux, depending on several things in the security setup, some users can issue the "ulimit -s unlimited" command and it won't actually do anything.

Also, you might check that both the docking program and the script have execute permission.

HTH,

-- David
____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3401 - Posted 11 Jun 2007 20:15:16 UTC

I think zombie has another problem, but I also don't know what causes it. The mac app doesn't have the ulimit problem; that one only caused problems on the Linux side and has been fixed. I'm currently looking into the issue that charmm uses so much system time (opposed to user time) on linux and macs. Hope I will figure out soon what's going on there.

Andre
____________
D@H the greatest project in the world... a while from now!

zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3403 - Posted 17 Jun 2007 7:41:45 UTC

Okay, I finally had a chance to try the machine again on D@H. A thought occurred to me. Macs often have a shared memory error running BOINC, which is easily solved by following the directions here:

http://www.spy-hill.net/help/apple/SharedMemory.html

I had implemented this fix when I first started crunching with this machine. I went back to this page an reread it. The author says "These settings increase the amount of shared memory to four (4) times the usual default.". That got me thinking. I have *8* cores on this machine. Maybe I need to double the 4x settings he suggests. So I did that and now have a result "pending", which is a step up. Fingers crossed.

http://docking.utep.edu/result.php?resultid=262841
____________
Dublin, CA
Team SETI.USA

zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3404 - Posted 17 Jun 2007 16:18:45 UTC

w00t!

I am not sure if that was the fix, or if something else changed that fixed it. But in any case, I am now getting results to validate.
____________
Dublin, CA
Team SETI.USA

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3405 - Posted 17 Jun 2007 23:12:54 UTC

That's Great.

Congratulations!

-- David
____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3407 - Posted 19 Jun 2007 3:30:02 UTC - in response to Message ID 3404 .

That's great news!

I will make an entry in our FAQ about this fix. It might be able to help others too.

Thanks,
Andre

w00t!

I am not sure if that was the fix, or if something else changed that fixed it. But in any case, I am now getting results to validate.


____________
D@H the greatest project in the world... a while from now!

Message boards : Number crunching : what casued this error?

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#20 (2) {
      ["db_conn"]=>
      resource(72) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(51) "update DBNAME.thread set views=views+1 where id=260"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#20 (2) {
      ["db_conn"]=>
      resource(72) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(3) "260"
        ["forum"]=>
        string(1) "2"
        ["owner"]=>
        string(3) "114"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(23) "what casued this error?"
        ["timestamp"]=>
        string(10) "1182223802"
        ["views"]=>
        string(4) "1084"
        ["replies"]=>
        string(2) "14"
        ["activity"]=>
        string(20) "7.2190218881162e-119"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1180428155"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(3) "260"
      ["forum"]=>
      string(1) "2"
      ["owner"]=>
      string(3) "114"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(23) "what casued this error?"
      ["timestamp"]=>
      string(10) "1182223802"
      ["views"]=>
      string(4) "1084"
      ["replies"]=>
      string(2) "14"
      ["activity"]=>
      string(20) "7.2190218881162e-119"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1180428155"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=260