Shorter work units or BOINC gone silly?


Advanced search

Message boards : Number crunching : Shorter work units or BOINC gone silly?

Sort
Author Message
Profile Webmaster Yoda
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 43
ID: 271
Credit: 6,498
RAC: 0
Message 1538 - Posted 21 Nov 2006 11:29:16 UTC
Last modified: 21 Nov 2006 11:29:46 UTC

My 2.4GHz Pentium 4 has suddenly decided to download over 30 work units. This is odd because:

1. My "connect to network every" is set to 0.1 days
2. I am using an unmodified, standard BOINC client (5.4.11) - no inflated benchmarks (if anything, they're very low for this PC)
3. The average turnaround on this PC is 2.27 days
4. Deadline for these WU is 26 November..

It seems to have happened right after BOINC ran its benchmarks, so perhaps it's a problem with BOINC. I've never seen it go this silly though:


--- 21/11/2006 18:47:07 Running CPU benchmarks
--- 21/11/2006 18:48:06 Benchmark results:
--- 21/11/2006 18:48:06 Number of CPUs: 1
--- 21/11/2006 18:48:06 1078 floating point MIPS (Whetstone) per CPU
--- 21/11/2006 18:48:06 2177 integer MIPS (Dhrystone) per CPU
--- 21/11/2006 18:48:06 Finished CPU benchmarks
--- 21/11/2006 18:48:07 Resuming computation
--- 21/11/2006 18:48:07 Rescheduling CPU: Resuming computation
--- 21/11/2006 18:48:07 Resuming network activity
Docking@Home 21/11/2006 18:48:07 Resuming task 1tng_mod0001_11944_308664_2 using charmm version 503
Docking@Home 21/11/2006 19:03:35 Sending scheduler request to http://docking.utep.edu/docking_cgi/cgi
Docking@Home 21/11/2006 19:03:35 Reason: To fetch work
Docking@Home 21/11/2006 19:03:35 Requesting 635712704 seconds of new work


I set it to "no new work" as soon as I saw it and will abort all but two of these WU so they can be re-issued, then re-install BOINC. Something's obviously amiss.

Glad I happened to be looking at that PC as it happened. Imagine 20 years worth of work with a 5 day deadline - I don't think this CPU will last that long ;D

Has anyone seen this strange behaviour before?
____________


Join the #1 Aussie Alliance on Docking@Home
STE\/E [BlackOpsTeam]
Volunteer tester

Joined: Nov 14 06
Posts: 47
ID: 292
Credit: 10,082,802
RAC: 0
Message 1539 - Posted 21 Nov 2006 12:06:47 UTC

Yes, the BOINC Client can do weird things @ times ... IMO

Heck, I've had Projects set to 0.0 Resource Share, No new work selected in the Client for that Project & still have had the project send me work ... 0_o

Profile Webmaster Yoda
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 43
ID: 271
Credit: 6,498
RAC: 0
Message 1540 - Posted 21 Nov 2006 12:12:45 UTC - in response to Message ID 1539 .
Last modified: 21 Nov 2006 12:13:03 UTC

Yes, the BOINC Client can do weird things @ times ... IMO

Heck, I've had Projects set to 0.0 Resource Share, No new work selected in the Client for that Project & still have had the project send me work ... 0_o


Yeah, has to be BOINC. A clean install needed when these WU are finished. Your supercharged X6800 might have some hope to complete 33 WU in 5 days, but my poor old P4 hasn't got a hope ;D

____________


Join the #1 Aussie Alliance on Docking@Home
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1541 - Posted 21 Nov 2006 12:25:50 UTC

It seems that many hosts have downloaded results more than needed / computed in time. So did my hosts. It's supposed that deadline is too short. On another thread named "No windows work?" I asked Andre to work on this issue...

thanks for reading,
suguruhirahara
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.

STE\/E [BlackOpsTeam]
Volunteer tester

Joined: Nov 14 06
Posts: 47
ID: 292
Credit: 10,082,802
RAC: 0
Message 1542 - Posted 21 Nov 2006 12:35:44 UTC - in response to Message ID 1540 .
Last modified: 21 Nov 2006 12:39:49 UTC

Your supercharged X6800 might have some hope to complete 33 WU in 5 days, but my poor old P4 hasn't got a hope ;D


By my calculations for my X6800EE it should be able to do 75 in a 5 day time span, the E6600's I have attached aren't to far behind the X6800 though, they should be able to get over 65 done. They all downloaded about 50 WU's yesterday so I should be able to get them done with room to spare ...

I'll probably throw 2 or 3 more E6600's at the Project in the next week or so after I see how the WU's I have already work out ... :)
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1543 - Posted 21 Nov 2006 12:44:41 UTC - in response to Message ID 1542 .

Your supercharged X6800 might have some hope to complete 33 WU in 5 days, but my poor old P4 hasn't got a hope ;D


By my calculations for my X6800EE it should be able to do 75 in a 5 day time span, the E6600's I have attached aren't to far behind the X6800 though, they should be able to get over 65 done. They all downloaded about 50 WU's yesterday so I should be able to get them done with room to spare ...

I'll probably throw 2 or 3 more E6600's at the Project in the next week or so after I see how the WU's I have already work out ... :)

I'm bit afraid that these super good machines would get all workunits and prevent PIII machines from reproducing errors in order to find out a solution of the error...lol
STE\/E [BlackOpsTeam]
Volunteer tester

Joined: Nov 14 06
Posts: 47
ID: 292
Credit: 10,082,802
RAC: 0
Message 1544 - Posted 21 Nov 2006 12:52:30 UTC - in response to Message ID 1543 .
Last modified: 21 Nov 2006 12:53:03 UTC

I'm bit afraid that these super good machines would get all workunits and prevent PIII machines from reproducing errors in order to find out a solution of the error...lol


Naaaaa, right now I don't even have a call in for more work & I've lowered my connection time too. I really don't need that many WU's @ 1 time, can't process for any other projects when I get that many all @ once ...
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 1564 - Posted 21 Nov 2006 16:32:13 UTC - in response to Message ID 1541 .

Maybe this has to do with the fact that for some time you all got the 'commited to other platforms' message and the boinc client is trying to catch up whenever it gets a chance. It still doesn't make sense to download more work than ever can be completed before the deadline though... Can an experienced boincer explain a bit more about the work fetch policy maybe?

Andre

PS Deadline is 5 days currently.

It seems that many hosts have downloaded results more than needed / computed in time. So did my hosts. It's supposed that deadline is too short. On another thread named "No windows work?" I asked Andre to work on this issue...

thanks for reading,
suguruhirahara


____________
D@H the greatest project in the world... a while from now!
STE\/E [BlackOpsTeam]
Volunteer tester

Joined: Nov 14 06
Posts: 47
ID: 292
Credit: 10,082,802
RAC: 0
Message 1575 - Posted 21 Nov 2006 17:17:26 UTC - in response to Message ID 1564 .

Can an experienced boincer explain a bit more about the work fetch policy maybe?

Andre

PS Deadline is 5 days currently.



Well I'm not a real authority on it but from after running the BOINC Projects from the on-set of the Beta Seti Project I've had a lot of experience learning how to adjust my Preferences so as to get more or less work from a Project.

A lot of variables come into play, your Resources Share, your Connection Time, your Debt to the other Projects you are Attached to and your Bench Marks all come into play. Then if your over-inflating your BenchMarks you can throw that all out because your going to get more work than the PC is capable of processing if the other settings are to high.

John McCloud II can explain it in more detail, he's the Authority on most of this BOINC stuff, don't know if he is Attached to the Project yet ...
John McLeod VII
Volunteer tester
Avatar

Joined: Oct 3 06
Posts: 9
ID: 179
Credit: 240,291
RAC: 0
Message 1577 - Posted 22 Nov 2006 3:09:41 UTC

There is a bug in 5.4.11 and earlier where a task that is downloading does not count as present on the system, and therefore more work will be requested until some actually completes download.

This is fixed in the Alpha builds. When a task is comitted to the system, it is counted towards the work on the system, blocking downloads to the estimate of its processing time.

STE\/E [BlackOpsTeam]
Volunteer tester

Joined: Nov 14 06
Posts: 47
ID: 292
Credit: 10,082,802
RAC: 0
Message 1579 - Posted 22 Nov 2006 4:12:07 UTC

Sorry about the name butcher there John, at least I got the First name right anyway ... ;)

[B^S] Morgan the Gold
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 41
ID: 170
Credit: 138,735
RAC: 0
Message 1582 - Posted 22 Nov 2006 10:16:07 UTC
Last modified: 22 Nov 2006 10:21:21 UTC

hundreds of wu ? been there done that (ufluids and sztaki not here yet)

called a 'work bomb' usualy happens because either

    * the 5 or more last wu took under a minute
    * the only wu's were short



nah JM VII's is probibly right

____________

Profile Atomic Booty
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 59
ID: 23
Credit: 30,370
RAC: 0
Message 1591 - Posted 23 Nov 2006 0:22:27 UTC
Last modified: 23 Nov 2006 0:48:18 UTC

This was a problem for many of us when Docking first opened alpha stage. My computer downloaded 300 WU's in the first few minutes after attatching to the project. BOINC (or my system, or some code in the app, whatever is responsible for making this judgement) initially interpreted the WUs to be 5 minutes long to crunch, as opposed to their more accurate time of 3 hours. Even though my "connect every _" was set to 0.1 days, the WUs just poured in to the point that my system was lagging (which is when I noticed it and shut it down). Most of us had no choice but to abort a lot of work.

The quota was set to 50/day after that, to prevent such a massive scale of downloads, but I am unaware of any other steps that devs may have taken to address this problem. Again, I'm not altogether sure where it stems from (server vs. client side, BOINC app, project app, etc.). Since this has only been an issue for first time users (after crunching a WU, the DCF normalizes and makes a more reliable estimation of the time it will take to complete the next one), and this project has been more or less closed to new users since the beginning, this problem has not crept up in awhile.

You can read more about it in the Too Many WUs! thread.

Atomic
____________
KWSN - Asylum for the Cynically Insane

John McLeod VII
Volunteer tester
Avatar

Joined: Oct 3 06
Posts: 9
ID: 179
Credit: 240,291
RAC: 0
Message 1595 - Posted 23 Nov 2006 0:55:59 UTC - in response to Message ID 1591 .

This was a problem for many of us when Docking first opened alpha stage. My computer downloaded 300 WU's in the first few minutes after attatching to the project. BOINC (or my system, or some code in the app, whatever is responsible for making this judgement) initially interpreted the WUs to be 5 minutes long to crunch, as opposed to their more accurate time of 3 hours. Even though my "connect every _" was set to 0.1 days, the WUs just poured in to the point that my system was lagging (which is when I noticed it and shut it down). Most of us had no choice but to abort a lot of work.

The quota was set to 50/day after that, to prevent such a massive scale of downloads, but I am unaware of any other steps that devs may have taken to address this problem. Again, I'm not altogether sure where it stems from (server vs. client side, BOINC app, project app, etc.). Since this has only been an issue for first time users (after crunching a WU, the DCF normalizes and makes a more reliable estimation of the time it will take to complete the next one), and this project has been more or less closed to new users since the beginning, this problem has not crept up in awhile.

You can read more about it in the Too Many WUs! thread.

Atomic

The developers set an estimate of the floating point operations required to complete a typical result. Therefore it is the server side where the cause of the problem lies.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 1600 - Posted 23 Nov 2006 4:45:19 UTC - in response to Message ID 1595 .

We calculate the estimate based on the runtime of a result on a 3 GHz linux box. I assume that when the estimated runtime on my boinc manager is equal to the actual runtime, the FP estimate must be correct and will correctly scale to all other platforms. If that is not the case, could there be a problem in boinc? If this is not the correct way of getting a good FP estimate, please somebody step forward and tell us how to do this :-)

Thanks,
Andre

The developers set an estimate of the floating point operations required to complete a typical result. Therefore it is the server side where the cause of the problem lies.


____________
D@H the greatest project in the world... a while from now!
John McLeod VII
Volunteer tester
Avatar

Joined: Oct 3 06
Posts: 9
ID: 179
Credit: 240,291
RAC: 0
Message 1602 - Posted 23 Nov 2006 16:28:44 UTC - in response to Message ID 1600 .

We calculate the estimate based on the runtime of a result on a 3 GHz linux box. I assume that when the estimated runtime on my boinc manager is equal to the actual runtime, the FP estimate must be correct and will correctly scale to all other platforms. If that is not the case, could there be a problem in boinc? If this is not the correct way of getting a good FP estimate, please somebody step forward and tell us how to do this :-)

Thanks,
Andre

The developers set an estimate of the floating point operations required to complete a typical result. Therefore it is the server side where the cause of the problem lies.


It seems that you picked a system that is a bit more efficient than average at these computations. Of the machines that I have attached, most are in the range of 5 to 6 which means that the calculations are taking 5 to 6 times as long as expected.

Not all computers are exactly as efficient as expected on a given calculation. Therefore, the Duration Correction Factor was developed to deal with these discrepencies. Some of the things that matter that are not measured in the benchmarks are: L1 and L2 cache sizes and speeds, memory bandwidth, memory size (too small can cause thrashing and very slow results - see BURP and Render@Home for worst case examples). If all of the DCFs for the project average about 1, then this is the best that can be expected.

The real problems arise when a project is off by more than an order of magnitude in its estimates. If the actual averate is around 5, it is not too bad, but it could be improved.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 1603 - Posted 23 Nov 2006 21:56:40 UTC - in response to Message ID 1602 .

Hi John,

Do you mean that if we would increase the FP estimate for our workunit, the estimate made by the boinc client would be better? (that would be easy to do)

Thanks

Andre

It seems that you picked a system that is a bit more efficient than average at these computations. Of the machines that I have attached, most are in the range of 5 to 6 which means that the calculations are taking 5 to 6 times as long as expected.

Not all computers are exactly as efficient as expected on a given calculation. Therefore, the Duration Correction Factor was developed to deal with these discrepencies. Some of the things that matter that are not measured in the benchmarks are: L1 and L2 cache sizes and speeds, memory bandwidth, memory size (too small can cause thrashing and very slow results - see BURP and Render@Home for worst case examples). If all of the DCFs for the project average about 1, then this is the best that can be expected.

The real problems arise when a project is off by more than an order of magnitude in its estimates. If the actual averate is around 5, it is not too bad, but it could be improved.


____________
D@H the greatest project in the world... a while from now!
John McLeod VII
Volunteer tester
Avatar

Joined: Oct 3 06
Posts: 9
ID: 179
Credit: 240,291
RAC: 0
Message 1604 - Posted 24 Nov 2006 0:38:38 UTC - in response to Message ID 1603 .

Hi John,

Do you mean that if we would increase the FP estimate for our workunit, the estimate made by the boinc client would be better? (that would be easy to do)

Thanks

Andre

It seems that you picked a system that is a bit more efficient than average at these computations. Of the machines that I have attached, most are in the range of 5 to 6 which means that the calculations are taking 5 to 6 times as long as expected.

Not all computers are exactly as efficient as expected on a given calculation. Therefore, the Duration Correction Factor was developed to deal with these discrepencies. Some of the things that matter that are not measured in the benchmarks are: L1 and L2 cache sizes and speeds, memory bandwidth, memory size (too small can cause thrashing and very slow results - see BURP and Render@Home for worst case examples). If all of the DCFs for the project average about 1, then this is the best that can be expected.

The real problems arise when a project is off by more than an order of magnitude in its estimates. If the actual averate is around 5, it is not too bad, but it could be improved.


Yes, I believe that is probable. The factor to change the fpops estimate by should be about the average of the current duration_correction_factors that you have in your database / client. After you do this, it will take some time for the DCF values to reach an equilibrium again as they will be headed down.

BTW, the worst estimate I have seen from any of the projects were on the low side about 1000, and on the high side was E+70. So yours is not too bad for an initial estimate.
Profile [B^S] Acmefrog
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 45
ID: 252
Credit: 1,604,407
RAC: 0
Message 1611 - Posted 25 Nov 2006 15:36:44 UTC

Will completed WUs that finish past the deadline still count or should I just abort those? I also had received too many for me to complete even with my machine running 24/7. I don't want to waste time if it doesn't matter. Thanks.
____________

Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1612 - Posted 25 Nov 2006 15:47:17 UTC - in response to Message ID 1611 .
Last modified: 25 Nov 2006 15:49:51 UTC

Will completed WUs that finish past the deadline still count or should I just abort those? I also had received too many for me to complete even with my machine running 24/7. I don't want to waste time if it doesn't matter. Thanks.

I fear no unfortunately. If the deadline past, the server produce and distribute a new result, regarding the late result as "no reply".
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.
Profile [B^S] Acmefrog
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 45
ID: 252
Credit: 1,604,407
RAC: 0
Message 1613 - Posted 25 Nov 2006 16:47:59 UTC

10-4 I just didn't want to waste the time if the result didn't matter.

Thanks.
____________

John McLeod VII
Volunteer tester
Avatar

Joined: Oct 3 06
Posts: 9
ID: 179
Credit: 240,291
RAC: 0
Message 1616 - Posted 25 Nov 2006 18:56:41 UTC - in response to Message ID 1612 .

Will completed WUs that finish past the deadline still count or should I just abort those? I also had received too many for me to complete even with my machine running 24/7. I don't want to waste time if it doesn't matter. Thanks.

I fear no unfortunately. If the deadline past, the server produce and distribute a new result, regarding the late result as "no reply".

Actually, it depends on how late the return is, how fast the server is at generating new tasks when a result is late, how fast they are handed out, and how fast they are returned.

If the late work is returned before the replacement is actually sent, the replacement will not be sent. If the late work is returned before the replacement is verified, then the late work will at least contribute towards the verification.

If the late work is returned after a quorum is met and verified, it counts for nothing.
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 1618 - Posted 25 Nov 2006 19:32:38 UTC - in response to Message ID 1616 .

Will completed WUs that finish past the deadline still count or should I just abort those? I also had received too many for me to complete even with my machine running 24/7. I don't want to waste time if it doesn't matter. Thanks.

I fear no unfortunately. If the deadline past, the server produce and distribute a new result, regarding the late result as "no reply".

Actually, it depends on how late the return is, how fast the server is at generating new tasks when a result is late, how fast they are handed out, and how fast they are returned.

If the late work is returned before the replacement is actually sent, the replacement will not be sent. If the late work is returned before the replacement is verified, then the late work will at least contribute towards the verification.

If the late work is returned after a quorum is met and verified, it counts for nothing.

thanks for information.

BTW everyone can use the function of rating...I tested just now:)
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.
Profile Webmaster Yoda
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 43
ID: 271
Credit: 6,498
RAC: 0
Message 1717 - Posted 6 Dec 2006 3:27:12 UTC - in response to Message ID 1600 .

We calculate the estimate based on the runtime of a result on a 3 GHz linux box. I assume that when the estimated runtime on my boinc manager is equal to the actual runtime, the FP estimate must be correct and will correctly scale to all other platforms. If that is not the case, could there be a problem in boinc?


This is still a problem. I just re-attached my 3.4GHz Pentium 4 with HT, running Windows.

It typically takes 10 hours per work unit and BOINC asked for 1 day worth. Your server sent 19 work units, which will take this computer 4 full days if I suspend everything else on it and don't switch it off at night.

Docking has only a 20% share of resources and the computer is only running 18 hours a day so I only needed 1 work unit. BOINC is certainly partly to blame - it asked for two days worth (1 day per logical CPU) when it should have asked for about 7 hours of work (20% of 18 hours a day available on each logical CPU).

But the WU time estimates are still way off the mark. Had BOINC requested only 7 hours worth, your scheduler (which seems to think they will take 1:45 each when in reality they take 10 hours) might still have sent 4 work units (about 40 hours of crunching, rather than the 7 needed).

The fastest I've completed a WU that was valid on my Athlon XP 3000+ was 4 hours (running Linux) and 6.5 hours (running Windows).

Even if we take into account that these WU take longer under Windows than Linux, it's still too big a discrepancy. My fastest PC (Athlon 64 at 2.6GHz with 500MHz dual channel DDR) still takes over 5 hours per work unit under Windows...

Just for the record, I'm running with the stock BOINC client and low benchmarks, so the discrepancy in estimated time to complete is not a matter of inflated benchmarks.

Might be worth comparing the benchmarks on your 3GHz Linux system and my 3.4Ghz Windows system?
____________


Join the #1 Aussie Alliance on Docking@Home
John McLeod VII
Volunteer tester
Avatar

Joined: Oct 3 06
Posts: 9
ID: 179
Credit: 240,291
RAC: 0
Message 1718 - Posted 6 Dec 2006 4:13:53 UTC - in response to Message ID 1717 .

BOINC is certainly partly to blame - it asked for two days worth (1 day per logical CPU) when it should have asked for about 7 hours of work (20% of 18 hours a day available on each logical CPU).

This will be fixed in 5.8.x (the next released version). The code is already in the client, but other problems are holding up the release.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 1719 - Posted 6 Dec 2006 4:39:47 UTC - in response to Message ID 1717 .

Thanks for the info Yoda.

I've increased the flops estimate to almost double to what it was before. Please let us know if that makes the estimate on your client better. This will always be a problematic point though since the performance on linux, windows and macs for Charmm is not the same (and won't be any time soon), but boinc only allows one estimate to be provided for a workunit (I think). We have to see how we can tackle that problem.

Thanks
Andre

PS It might take a little while until the workunits with the new estimated flops are distributed.


This is still a problem. I just re-attached my 3.4GHz Pentium 4 with HT, running Windows.

It typically takes 10 hours per work unit and BOINC asked for 1 day worth. Your server sent 19 work units, which will take this computer 4 full days if I suspend everything else on it and don't switch it off at night.

Docking has only a 20% share of resources and the computer is only running 18 hours a day so I only needed 1 work unit. BOINC is certainly partly to blame - it asked for two days worth (1 day per logical CPU) when it should have asked for about 7 hours of work (20% of 18 hours a day available on each logical CPU).

But the WU time estimates are still way off the mark. Had BOINC requested only 7 hours worth, your scheduler (which seems to think they will take 1:45 each when in reality they take 10 hours) might still have sent 4 work units (about 40 hours of crunching, rather than the 7 needed).

The fastest I've completed a WU that was valid on my Athlon XP 3000+ was 4 hours (running Linux) and 6.5 hours (running Windows).

Even if we take into account that these WU take longer under Windows than Linux, it's still too big a discrepancy. My fastest PC (Athlon 64 at 2.6GHz with 500MHz dual channel DDR) still takes over 5 hours per work unit under Windows...

Just for the record, I'm running with the stock BOINC client and low benchmarks, so the discrepancy in estimated time to complete is not a matter of inflated benchmarks.

Might be worth comparing the benchmarks on your 3GHz Linux system and my 3.4Ghz Windows system?


____________
D@H the greatest project in the world... a while from now!
John McLeod VII
Volunteer tester
Avatar

Joined: Oct 3 06
Posts: 9
ID: 179
Credit: 240,291
RAC: 0
Message 1720 - Posted 6 Dec 2006 4:47:09 UTC

The worst problem only exhibits itself until the client has completed its first result. After that point the Duration Correction Factor will be large enough to prevent download of vast quantities of work.

Profile Webmaster Yoda
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 43
ID: 271
Credit: 6,498
RAC: 0
Message 1721 - Posted 6 Dec 2006 5:19:20 UTC
Last modified: 6 Dec 2006 5:28:24 UTC

I think over-estimating the time taken to complete a work unit is less likely to cause problems than under-estimating. Doubling the estimate is certainly a step in the right direction.

When it's under-estimated, many (windows) hosts will download more work than they can handle. When it is over-estimated, linux hosts will not downoad enough, but there is less chance of work units going over their deadline.

The DCF will adjust, sure. But it's too late if the computer has too much to handle in the first place due to the under-estimated time.

On the other hand, with Linux hosts computing them quicker, their DCF will also adjust quicker, so the impact of the inaccurate time estimate is not as great.

I hope I'm making sense?

With regard to the BIONC scheduler being fixed in new versions (as well as Linux benchmarks), that's great, especially for people who run more than one project I look forward to it :D

FWIW, I have just loaded BOINC on the P4/3.4 (with Ubuntu 6.10 Live CD) and am running a Docking WU to compare times. Hope it has enough RAM as I haven't got a partition to put Linux on with this box (using a RAMdisk). Fingers crossed.

EDIT: Well, that didn't work either. Even though ulimit is unlimited on this distro, it still crashed both WU with an 0x1 error. Back to Windows for me - it may be slow but it works.
____________


Join the #1 Aussie Alliance on Docking@Home

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 1722 - Posted 6 Dec 2006 7:09:19 UTC

Hello Webmaster Yoda, I found this information by Cold Shot in another thread, that may help your Linux problem with Ubuntu 6.10,

Ubuntu 6.10
Linux 2.6.17-10
Boinc client and manager 5.4.11-1

This distro has no run_manager or run_client
Added ulimit -s unlimited to boinc-client a script file located in /etc/init.d

This seems to have fixed the problem. I'm running Linux on a slow box (Celeron 1200 MHz, 384 mem). It took me a little over 8.5 hours to complete the first work unit.

Hope this helps.
____________

Profile Webmaster Yoda
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 43
ID: 271
Credit: 6,498
RAC: 0
Message 1723 - Posted 6 Dec 2006 7:35:06 UTC - in response to Message ID 1722 .

Hello Webmaster Yoda, I found this information by Cold Shot in another thread, that may help your Linux problem with Ubuntu 6.10,

Ubuntu 6.10
Linux 2.6.17-10
Boinc client and manager 5.4.11-1

This distro has no run_manager or run_client
Added ulimit -s unlimited to boinc-client a script file located in /etc/init.d

This seems to have fixed the problem. I'm running Linux on a slow box (Celeron 1200 MHz, 384 mem). It took me a little over 8.5 hours to complete the first work unit.

Hope this helps.


I used the official BOINC 5.4.9 (I didn't see a later one other than development versions). ulimit was already set to unlimited (checked that) but I guess I could specify it just to be sure.

Am trying to resurrect an old hard-drive to see if the lack of swapfile (running just in RAMdisk) has anything to do with getting this same error.
____________


Join the #1 Aussie Alliance on Docking@Home
Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 1725 - Posted 6 Dec 2006 9:42:22 UTC

Hello Webmaster Yoda.

I use Redhat distributions, but I've heard it mentioned in the forums that the Debian maintainers have a package that installs the BOINC client as a service. I believe it's also available for Ubuntu.

I think someone got Linux/BOINC to work on the free VMware player with windows as the host.

I suspect that part of the problem you're experiencing is the big debug file that the Docking@Home writes. If the work unit succeeds, I think this file is replaced by a file that's only 8 or 9 bytes before it's uploaded. Until then, it grows rapidly and is probably using up your ram disk.

I'm on dial-up or I'd have tried the VMware approach myself. I suspect that the delayed writes that Linux uses are part of the reason it's faster. If that's the case, BOINC on Linux on VMware on Windows might even be faster than BOINC on Windows. There might also be different compilers used for the Linux version. I'm not sure on that. If you have a fast connection and some space free on your windows filesystem, this might be worth trying. I think it just creates files in the windows filesystem to hold the virtual Linux disk partitions, so you don't have to have unpartitioned space on your disk drive to create real Linux partitions.

I hope this helps,

-- David

Profile Webmaster Yoda
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 43
ID: 271
Credit: 6,498
RAC: 0
Message 1729 - Posted 7 Dec 2006 4:05:22 UTC - in response to Message ID 1725 .

Thanks David, I tried it under VMWare as well, but it (and Linux) dosn't like my wireless network, so I gave up on that idea.

It's summer here and with he hard-disk constantly on the go (particularly when running two docking work units side by side), it gets too hot and the computer crashes (and the work unit errors out in the process). I'll probably have to abandon Docking for the time being as the constant disk writes are wreaking havoc with my systems.

I don't get these lock-ups when running other BOINC projects.


____________


Join the #1 Aussie Alliance on Docking@Home

Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 1732 - Posted 7 Dec 2006 6:18:25 UTC - in response to Message ID 1729 .
Last modified: 7 Dec 2006 6:18:58 UTC

Thanks David, I tried it under VMWare as well, but it (and Linux) dosn't like my wireless network, so I gave up on that idea.


Did you have an active firewall on the VmWare Network Adapter(s)..?

I've turned it off on both my VmWare adapters, because WinXP doesn't seem to work well, while they were turned on.
Norton just did not came up, VmWare was complaining about the network not being active.

Turning the firewall off on those cleared my problem.

;-)

____________
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 1736 - Posted 7 Dec 2006 17:07:34 UTC - in response to Message ID 1729 .


It's summer here and with he hard-disk constantly on the go (particularly when running two docking work units side by side), it gets too hot and the computer crashes (and the work unit errors out in the process). I'll probably have to abandon Docking for the time being as the constant disk writes are wreaking havoc with my systems.


The checkpointing method and period will be the first thing we work on when Charmm c33b1 is released by the Charmm developers. This hopefully won't take too long anymore and will bring the disk activity down to an acceptable level (although I'm kind of getting desperate slowly slowly...).

Also, we currently write a lot of debug information to the charmm logfile, which is also a cause of lots of disk activity; since we are in alpha we will have to do this to find out problems more easily. as soon as we have most of our pressing problems solved, we can cut back on this too.

Thanks
Andre
____________
D@H the greatest project in the world... a while from now!
Profile Webmaster Yoda
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 43
ID: 271
Credit: 6,498
RAC: 0
Message 1741 - Posted 8 Dec 2006 0:10:51 UTC - in response to Message ID 1736 .


The checkpointing method and period will be the first thing we work on when Charmm c33b1 is released by the Charmm developers. This hopefully won't take too long anymore and will bring the disk activity down to an acceptable level (although I'm kind of getting desperate slowly slowly...).

Also, we currently write a lot of debug information to the charmm logfile, which is also a cause of lots of disk activity; since we are in alpha we will have to do this to find out problems more easily. as soon as we have most of our pressing problems solved, we can cut back on this too.


Thanks Andre. I'll keep an eye on the news and message boards.
____________


Join the #1 Aussie Alliance on Docking@Home
Memo
Forum moderator
Project developer
Project tester

Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
Message 1758 - Posted 12 Dec 2006 16:41:59 UTC - in response to Message ID 1725 .

Hello Webmaster Yoda.

I use Redhat distributions, but I've heard it mentioned in the forums that the Debian maintainers have a package that installs the BOINC client as a service. I believe it's also available for Ubuntu.

I think someone got Linux/BOINC to work on the free VMware player with windows as the host.

I suspect that part of the problem you're experiencing is the big debug file that the Docking@Home writes. If the work unit succeeds, I think this file is replaced by a file that's only 8 or 9 bytes before it's uploaded. Until then, it grows rapidly and is probably using up your ram disk.

I'm on dial-up or I'd have tried the VMware approach myself. I suspect that the delayed writes that Linux uses are part of the reason it's faster. If that's the case, BOINC on Linux on VMware on Windows might even be faster than BOINC on Windows. There might also be different compilers used for the Linux version. I'm not sure on that. If you have a fast connection and some space free on your windows filesystem, this might be worth trying. I think it just creates files in the windows filesystem to hold the virtual Linux disk partitions, so you don't have to have unpartitioned space on your disk drive to create real Linux partitions.

I hope this helps,

-- David


I got linux running on vmware under windows and indeed it runs faster(D@H).

Message boards : Number crunching : Shorter work units or BOINC gone silly?

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#41 (2) {
      ["db_conn"]=>
      resource(120) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(51) "update DBNAME.thread set views=views+1 where id=109"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#41 (2) {
      ["db_conn"]=>
      resource(120) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(3) "109"
        ["forum"]=>
        string(1) "2"
        ["owner"]=>
        string(3) "271"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(39) "Shorter work units or BOINC gone silly?"
        ["timestamp"]=>
        string(10) "1165941720"
        ["views"]=>
        string(4) "1743"
        ["replies"]=>
        string(2) "35"
        ["activity"]=>
        string(23) "1.0052660629727998e-126"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1164108556"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(3) "109"
      ["forum"]=>
      string(1) "2"
      ["owner"]=>
      string(3) "271"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(39) "Shorter work units or BOINC gone silly?"
      ["timestamp"]=>
      string(10) "1165941720"
      ["views"]=>
      string(4) "1743"
      ["replies"]=>
      string(2) "35"
      ["activity"]=>
      string(23) "1.0052660629727998e-126"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1164108556"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=109