compute error


Advanced search

Message boards : Unix/Linux : compute error

Sort
Author Message
Profile KWSN - Sir Grawlfang
Volunteer tester

Joined: Oct 2 06
Posts: 1
ID: 163
Credit: 68,976
RAC: 0
Message 949 - Posted 10 Oct 2006 22:45:59 UTC

Hi everyone,

I've recently upgraded from Slackware 10.2 to Zenwalk 3.0 (so from a 2.4.31 kernel to "2.6.17.11 #1 SMP PREEMPT Tue Sep 5 16:25:38 CEST 2006 i686 athlon-4 i386 GNU/Linux" and now every work unit I crunch seems to fail after a few minutes processing (5 minutes on a job that is estimated to take 2 hours 50 minutes) with the following output :-

<core_client_version>5.4.9</core_client_version>
<message>
process exited with code 1 (0x1)
</message>
<stderr_txt>
Calling BOINC init.
Starting charmm run...
ERROR - Charmm exited with code 1.
Calling BOINC finish.

</stderr_txt>

any more information you need, just let me know.

Ni!
Fang

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 963 - Posted 12 Oct 2006 5:56:03 UTC

I believe this is the same fault that plagues nearly all Linux distributions.
You need to add "ulimit -s unlimited" (without the " marks) at the begining of the run_manager file in Boinc folder and you may also have to place it at the begining of the run_client file as well, also in the Boinc folder.
So Stop the Boinc programme then do the changes, then restart Boinc manager.
This worked for my Fedora Core 3 Linux but I had to reboot one machine as the settings did not take from just stopping Boinc mananger.
Hope this helps.
See threads in general number crunching section for more information.
____________

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 1262 - Posted 6 Nov 2006 4:59:41 UTC

> @ Andre, Memo and the Docking team.

I have found a little something in the problem with "ulimit".
The solution that you found by adding 'ulimit -s unlimited' in front of "run_manager" works fine for single core processors even dual processor/single core but does not work for dual core or dual processor/dual core combinations.
I had a dual processor AMD opteron 848 computer that had no trouble when 'ulimit -2 unlimited' was added to 'run_manager' and Boinc restarted with the machine not needing a reboot.
When the same process was added to my AMD opteron 275 (dual core/dual cpu) it would not work. Had to add the same line to 'run_client' and then had to reboot the computer to make it work.
I have just upgraded the 848 machine to a 285 dual core and found my Docking workunits failing after 5 minutes 32 seconds, just like they used to before the 'fix'.
The cure was the same as the other dual core computer (my 275), I had to add the 'ulimit' line to the 'run_client' and then reboot the machine as starting Boinc without a reboot had no effect.

So not sure if this helps in any way but it is what I have found and as I am supposed to testing things for you I thought I would let you know.
____________

Memo
Forum moderator
Project developer
Project tester

Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
Message 1263 - Posted 6 Nov 2006 5:11:44 UTC - in response to Message ID 1262 .

> @ Andre, Memo and the Docking team.

I have found a little something in the problem with "ulimit".
The solution that you found by adding 'ulimit -s unlimited' in front of "run_manager" works fine for single core processors even dual processor/single core but does not work for dual core or dual processor/dual core combinations.
I had a dual processor AMD opteron 848 computer that had no trouble when 'ulimit -2 unlimited' was added to 'run_manager' and Boinc restarted with the machine not needing a reboot.
When the same process was added to my AMD opteron 275 (dual core/dual cpu) it would not work. Had to add the same line to 'run_client' and then had to reboot the computer to make it work.
I have just upgraded the 848 machine to a 285 dual core and found my Docking workunits failing after 5 minutes 32 seconds, just like they used to before the 'fix'.
The cure was the same as the other dual core computer (my 275), I had to add the 'ulimit' line to the 'run_client' and then reboot the machine as starting Boinc without a reboot had no effect.

So not sure if this helps in any way but it is what I have found and as I am supposed to testing things for you I thought I would let you know.


It sure helps. What distro are you running? I help a friend to do the fix on a quad processor xeon with hyper threading and we just did the usual without reboot and it worked. I wander if your situation is caused by the distro or the dual core opterons? Anyway thanks for the info.
Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 1299 - Posted 7 Nov 2006 1:22:17 UTC

Thanks Memo, I am using Fedora Core 3.
____________

daniele
Volunteer tester

Joined: Oct 23 06
Posts: 86
ID: 190
Credit: 6,702
RAC: 0
Message 1314 - Posted 8 Nov 2006 2:47:29 UTC

Oh, for me it's strange to add something to the manager parameters, since the manager does nothing with the WUs. I added the ulimit command just before starting the client in the init script. Is your distribution requiring also the ulimit stack for the manager?

The problem could really be the distribution, it wouldn't be the first problem with FC. Maybe your distro is not so good on such these monster-processors.
Or maybe it's beacause FC creates new issues daily only to have fun. :P

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 1320 - Posted 8 Nov 2006 15:29:33 UTC - in response to Message ID 1314 .

Oh, for me it's strange to add something to the manager parameters, since the manager does nothing with the WUs. I added the ulimit command just before starting the client in the init script. Is your distribution requiring also the ulimit stack for the manager?

The problem could really be the distribution, it wouldn't be the first problem with FC. Maybe your distro is not so good on such these monster-processors.
Or maybe it's beacause FC creates new issues daily only to have fun. :P


>> Hello daniele,
When I started this project the 'fix' for getting Linux to work on most distros was to add the 'ulimit -s unlimited' parameter to the 'run_manager' file.
This worked on my single core dual processor AMD 848 when I restarted Boinc and processed workunits fine.
When I added the dual core dual processor AMD 275 machine this fix did not work, I can't recall if I rebooted the machine or not. Anyway it was suggested to me by another tester and I think Andre agreed, that I should also add the line to the 'run_client' file. I did this, made no difference so I rebooted the machine and then all started working.
When I then upgraded the 848 machine (I only changed the cpus in the computer, so everything else was the same), I struct the same problem I had with the 275 machine, and by adding 'ulimit' to 'run_client' and a reboot got it working again.
As for Fedora Core being a problem distro, I have had no problems with it. the only thing I run is Boinc on it and nothing else, so my Linux knowledge is not great as I just installed the distro and with a few tweaks everything works for what I need. If I knew how to upgrade the distro I would like to try FC5, but if it isn't broke why change it?
Your knowledge of Linux of any type far surpasses what I know.

____________
daniele
Volunteer tester

Joined: Oct 23 06
Posts: 86
ID: 190
Credit: 6,702
RAC: 0
Message 1327 - Posted 9 Nov 2006 3:27:52 UTC - in response to Message ID 1320 .
Last modified: 9 Nov 2006 3:28:24 UTC

When I started this project the 'fix' for getting Linux to work on most distros was to add the 'ulimit -s unlimited' parameter to the 'run_manager' file.
This worked on my single core dual processor AMD 848 when I restarted Boinc and processed workunits fine.


Hi Conan :)
Don't know why, probably it's a difference between debian package and Fedora's one. It's not rare for debian mantainers adding or removing something from packages, for example I don't have a run_manager script, and I run the client with the boinc user. Anyway, if you are happy with FC and I'm happy with a debian-based distro, we could be friends as well. :)

I'm not saying FC is a bad distro, it's good indeed, but I prefer more dolid and coherent "system administration structures". It's not for nothing that tools like apt are spreading in many different, non debian-based distros.
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1363 - Posted 10 Nov 2006 17:17:20 UTC

I crunched one result on debian via VMWare, and the result experienced 0x1 error.
http://docking.utep.edu/result.php?resultid=45124

Is there any way to avoid this error? I'm not good at commanding on linux, so please tell me around with ease. Also, could someone tell me what "ulimit -s unlimited" means and how it works in crunching d@h?

thanks,
suguruhirahara
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.

daniele
Volunteer tester

Joined: Oct 23 06
Posts: 86
ID: 190
Credit: 6,702
RAC: 0
Message 1366 - Posted 10 Nov 2006 21:00:22 UTC - in response to Message ID 1363 .
Last modified: 10 Nov 2006 21:04:55 UTC

I crunched one result on debian via VMWare, and the result experienced 0x1 error.
http://docking.utep.edu/result.php?resultid=45124

Is there any way to avoid this error? I'm not good at commanding on linux, so please tell me around with ease. Also, could someone tell me what "ulimit -s unlimited" means and how it works in crunching d@h?

thanks,
suguruhirahara


Sure, I have never used VMWare, but if it sets up a virtual debian system in which you can move as if you were in the real one, then you have installed the .deb packages for the client and the manager and now the client starts automatically when you "boot" linux. Is it right? Or do you start the client manually?

If it starts "at boot", then look for the file

boinc-client

in the directory /etc/init.d

More concisely, the file is

/etc/init.d/boinc-client

Tell me if you want I write the commands to use the shell directly, or if you prefer the graphical way.

In this file, which is a script executed at boot time with the addictional parameter "start", this way (or similar)

./etc/init.d/boinc-client start

there's a function named START()

Here it is mine, the line you have to add is the "highlighted" one

start()
{
log_begin_msg "Starting $DESC: $NAME"
if is_running; then
log_progress_msg "already running"
else
###################################
ulimit -s unlimited
###################################
start-stop-daemon --start --quiet --background --pidfile $PIDFILE
--make-pidfile --user $BOINC_USER --chuid $BOINC_USER --chdir $BOINC_DIR
--exec $BOINC_CLIENT -- $BOINC_OPTS
fi
log_end_msg 0
}



This way you'll make the minimum change to the behaviour of the system, since it will apply only to the client process and not to all the process of the user.

If you start the client in other ways, then you have to add the line in the script you use. If you are not using any scripts, tell me what you are doing :)
But since it's debian, there should be no problems.



What is `ulimit -s unlimited`?
This command gives to the shell instance in which you type it the faculty to have an unlimited stack when executing code. This "permission" will apply only to that shell and to the processes started by that shell. Generally speaking, there are "different kinds" of shell, or better there are different ways to start one.

When you execute a script, this is done in a separate shell, which is closed when the script stops. If you give the ulimit -s command in that script, then that shell will have unlimited stack. When the script start a new client, this new process has unlimited stack. But in little time the script will finish, and the shell will be closed, so the client would die with the father.
To avoid this, there are many methods, one is to start the client as a daemon (a service running in background), which is fair 'cause it's exactly what we want to do with the client, with or without the unlimited stack :)
So, as you can see in the start() function, the command stop-start-daemon is used. When the script stops, the client remains alive, with unlimited stack permission. Its sons, i.e. the apps launched for the projects, will have unlimited stack as well, and the client is a session manager (...).


If you want to stop the client to make some experiments, you can type in a console

./etc/init.d/boinc-client stop

or if you prefer

./etc/init.d/boinc-client restart


This is why debian rules

:)
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1368 - Posted 11 Nov 2006 4:44:46 UTC - in response to Message ID 1366 .
Last modified: 11 Nov 2006 5:13:48 UTC

Thanks for informative reply, daniele:) I'd like to answer/ask by lines.

now the client starts automatically when you "boot" linux. Is it right? Or do you start the client manually?

I start client manually by clicking executable file "run_client" on X Window.

Tell me if you want I write the commands to use the shell directly, or if you prefer the graphical way.

I prefer the graphical way. Almost forgot I how to command shell, though I learned it during the previous semester in the university:(

In this file, which is a script executed at boot time with the addictional parameter "start", this way (or similar)

./etc/init.d/boinc-client start

At first I cannot find boinc-client out in the directory:( hmm...

If you start the client in other ways, then you have to add the line in the script you use. If you are not using any scripts, tell me what you are doing :)
But since it's debian, there should be no problems.

As I mentioned above, I started BOINC Manager on X Window by clicking run_manager. So I suppose there doesn't appear any script. (Edit: Is this file itself script to start boinc?)

What is `ulimit -s unlimited`?
This command gives to the shell instance in which you type it the faculty to have an unlimited stack when executing code. This "permission" will apply only to that shell and to the processes started by that shell. Generally speaking, there are "different kinds" of shell, or better there are different ways to start one.

When you execute a script, this is done in a separate shell, which is closed when the script stops. If you give the ulimit -s command in that script, then that shell will have unlimited stack. When the script start a new client, this new process has unlimited stack. But in little time the script will finish, and the shell will be closed, so the client would die with the father.
To avoid this, there are many methods, one is to start the client as a daemon (a service running in background), which is fair 'cause it's exactly what we want to do with the client, with or without the unlimited stack :)
So, as you can see in the start() function, the command stop-start-daemon is used. When the script stops, the client remains alive, with unlimited stack permission. Its sons, i.e. the apps launched for the projects, will have unlimited stack as well, and the client is a session manager (...).

Then, doesn't ulimit -s command appear if no script is used in starting boinc (i.e doesn't graphical execute of boinc require concerning of ulimit -s command)?

Or should I place the phrase into the run_client(script?) file?

[Edit2: I'm now learning how to install... I'll be back after that]
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1370 - Posted 11 Nov 2006 5:48:10 UTC
Last modified: 11 Nov 2006 6:02:53 UTC

Finally I finished setup and started crunching of one result.

I went to the debian third party website which was introduced at Boinc official website, and installed boinc-client.sh and boinc-manager.sh. Through root account I added ulimit into the start function of boinc-client.sh. Then I started Boinc Manager on X Window and one result is being computed. So far so good.

I haven't seen incorrect function error yet. By adding ulimit is this error avoided?

Thanks,
suguruhirahara

[edit: After several minutes the result encountered 0x1 error again!]

____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.

daniele
Volunteer tester

Joined: Oct 23 06
Posts: 86
ID: 190
Credit: 6,702
RAC: 0
Message 1373 - Posted 11 Nov 2006 13:11:46 UTC - in response to Message ID 1368 .
Last modified: 11 Nov 2006 13:16:43 UTC

I start client manually by clicking executable file "run_client" on X Window.


So you are not using the debian package, but the standard one from boinc team.
I suggest to use the first one if you are using debian.
Anyway, if you prefer to use the one you are running now, the line goes in the run_client script, you are right :)

As I said, you can't find those files because they have been created by debian mantainers to give the boinc client a coherent structure with other system services.
daniele
Volunteer tester

Joined: Oct 23 06
Posts: 86
ID: 190
Credit: 6,702
RAC: 0
Message 1374 - Posted 11 Nov 2006 13:15:40 UTC - in response to Message ID 1370 .
Last modified: 11 Nov 2006 13:18:03 UTC

Finally I finished setup and started crunching of one result.

I went to the debian third party website which was introduced at Boinc official website, and installed boinc-client.sh and boinc-manager.sh. Through root account I added ulimit into the start function of boinc-client.sh. Then I started Boinc Manager on X Window and one result is being computed. So far so good.

I haven't seen incorrect function error yet. By adding ulimit is this error avoided?


I'm sorry I can't examinate those scripts now, but if the ulimit command doesn't work it could be related to how the virtual machine runs. To be clearer, I don't know what the VM does when a process claims for unlimited stack.

Would you paste the boinc-client.sh script here?
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1375 - Posted 11 Nov 2006 16:45:17 UTC

I just added the highlighted line "ulimit -s unlimited" to boinc-client:( perhaps I should have config more carefully!

Well, since there seem to be several testers of debian, including Daniele, I don't have to test one anymore. Instead I tested another distribution, "Vine Linux", which derives from redhat. On that one result was crunched without 0x1 error.
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.

Message boards : Unix/Linux : compute error

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#20 (2) {
      ["db_conn"]=>
      resource(78) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(50) "update DBNAME.thread set views=views+1 where id=78"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#20 (2) {
      ["db_conn"]=>
      resource(78) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(2) "78"
        ["forum"]=>
        string(1) "6"
        ["owner"]=>
        string(3) "163"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(13) "compute error"
        ["timestamp"]=>
        string(10) "1163263517"
        ["views"]=>
        string(4) "1637"
        ["replies"]=>
        string(2) "14"
        ["activity"]=>
        string(20) "5.3995660984026e-128"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1160520359"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(2) "78"
      ["forum"]=>
      string(1) "6"
      ["owner"]=>
      string(3) "163"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(13) "compute error"
      ["timestamp"]=>
      string(10) "1163263517"
      ["views"]=>
      string(4) "1637"
      ["replies"]=>
      string(2) "14"
      ["activity"]=>
      string(20) "5.3995660984026e-128"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1160520359"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=78