There is work but it is committed to other platforms
Message boards : Number crunching : There is work but it is committed to other platforms
Author | Message | |
---|---|---|
Docking Team, going on the above message I am getting on my Linux machine, you have run out of work.
|
||
ID: 2368 | Rating: 0 | rate: / | ||
I have the same problem here with my AMD K6-III currently. I guess the workunits available are committed to other platforms due to Homogeneous Redundancy. However, it is a bit odd, that your Opteron/Linux is not getting any work, because there are quite a number of compatible AMD K7, AMD K8, Intel PII and PIII. At least they were available... I hope they are not all switched off / detached now... So the only thing we can do, I suppose, is keep them polling and crunch for other projects in the meantime. My K6-III is doing a little bit of ABC and SIMAP currently, as slow as it is. Regards Alex ____________ |
||
ID: 2370 | Rating: 0 | rate: / | ||
Docking Team, going on the above message I am getting on my Linux machine, you have run out of work. I have a total of six WUs on eight Linux CPUs.............kinda dry. |
||
ID: 2371 | Rating: 0 | rate: / | ||
Not much we can do about this at the moment. I suspect that Alex is right and a lot of machines might not be available anymore. I guess this is an disadvantage of using HR: we need enough machines per class. We will try to come up with some way of showing what is in the shared memory and to which type of HR class workunits are assigned so that it will be easier to see for you guys what is going on.
|
||
ID: 2372 | Rating: 0 | rate: / | ||
|
||
ID: 2380 | Rating: 0 | rate: / | ||
This is now hopefully fixed. I just dawned on me that I forgot to apply the patch we got from the world community grid people that solves this 'work committed to other platforms' problem. Silly me....
____________ D@H the greatest project in the world... a while from now! |
||
ID: 2381 | Rating: 0 | rate: / | ||
See the front page news item on this. AK ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2382 | Rating: 0 | rate: / | ||
This is now hopefully fixed. I just dawned on me that I forgot to apply the patch we got from the world community grid people that solves this 'work committed to other platforms' problem. Thanks for the fix. My K6-III now got another workunit. @all: Btw, there are at least four results unsent for K6/Windows ( here and here ) and the last one gave the two other members of the quorum and me 124 credits. This was the first Docking credit at all for my K6. So if you have a K6 online, you might want to grab one of these results... perhaps. Regards Alex |
||
ID: 2383 | Rating: 0 | rate: / | ||
Thanks Andre, unfortunately I don't think the patch is working as I am still getting the error message that there is no work available for my platform (linux). my Windows machines are not having this problem.
|
||
ID: 2384 | Rating: 0 | rate: / | ||
We'll keep on checking what can be the cause of this. I've also requested one of the students to built a tool that will show us which HR classes are currently in the shared memory. This will give us a better feel on which classes are running out of work. This might take a while though...
Thanks Andre, unfortunately I don't think the patch is working as I am still getting the error message that there is no work available for my platform (linux). my Windows machines are not having this problem. ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2385 | Rating: 0 | rate: / | ||
Hi,
|
||
ID: 2389 | Rating: 0 | rate: / | ||
Great! That's good news.
Hi, ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2391 | Rating: 0 | rate: / | ||
Great! That's good news. Well, it would be good news, but it's back today. It's not as bad though. Will having more client machines running D@H make this more of a problem or less of a problem? ____________ The views expressed are my own. Facts are subject to memory error :-) Have you read a good science fiction novel lately? |
||
ID: 2393 | Rating: 0 | rate: / | ||
More client machines will be better as the throughput in the share memory will increase, thus more new workunits that are unassigned to a HR class will enter the game.
Great! That's good news. ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2394 | Rating: 0 | rate: / | ||
More client machines will be better as the throughput in the share memory will increase, thus more new workunits that are unassigned to a HR class will enter the game. I guess I had it just ass-backwards..........I started starving some machines thinking that would leave the "few" available WUs for others. |
||
ID: 2395 | Rating: 0 | rate: / | ||
Hello j2satx, is your K6 still online? Maybe you want to grab one or two of these unsent results here : Workunit 24800 Workunit 25108 Of course we would still need a third one, then. But we have already shared the following quorum, which gave some nice credits: Workunit 22173 Think about it... This is the first project were I start to sell results... should I be worried? Could be the next, more serious stage of BOINC addiction. Regards Alex My results during the HR tests ____________ |
||
ID: 2396 | Rating: 0 | rate: / | ||
Alex,just for you I'll put it on line for a couple of WUs........can't get tooo excited about 30+ hour WUs. |
||
ID: 2397 | Rating: 0 | rate: / | ||
@Alex, OK, converted to 5.8.8 and caught a WU. We'll see how long it takes..........BoincManager shows about 38 hours. |
||
ID: 2398 | Rating: 0 | rate: / | ||
Guys, this is REAL team work :-)
|
||
ID: 2399 | Rating: 0 | rate: / | ||
Guys, this is REAL team work :-) @Andre, If you could give us a matrix of the HR classes and number of machines in each class, maybe some other machines could be resurrected to fill out the classes. |
||
ID: 2400 | Rating: 0 | rate: / | ||
@Andre, If you could give us a matrix of the HR classes and number of machines in each class, maybe some other machines could be resurrected to fill out the classes. We'll see what we can come up with today/tomorrow. AK ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2401 | Rating: 0 | rate: / | ||
|
||
ID: 2404 | Rating: 0 | rate: / | ||
I don't see where I caught any on my Linux boxes.........I'd already set my Windows boxes to no new work. |
||
ID: 2406 | Rating: 0 | rate: / | ||
Something has changed. As soon as I posted that message, one of my machines (the one I posted from) was able to finally get 2 work units. It's a Northwood P4 based Celeron. My 2 AMD machines (1 Linux, 1 WinXP) are still out of luck though. -- David |
||
ID: 2407 | Rating: 0 | rate: / | ||
Two other users also caught each WU that you did. |
||
ID: 2408 | Rating: 0 | rate: / | ||
Thanks. Hope all goes well. All my PCs are self-build (except my notebook) and so I usually keep them for a long time, even when they start to become obsolete. This includes my K6-III which has served me well since 1999. And so it is fun to crunch a few workunits with these oldtimers from to time to time. I kept my 286 till 2002 (running Minix) and my last 486 (running WinNT) was disposed last year. :-) Btw, I have found that ABC, RieselSieve with the sieve app. (not llr) and SIMAP with the simap app. (not hmmer) (the last two only with manually installed non-standard applications) are going well with the K6. All those applications seem to use only or mostly integer calcalution. Regards Alex |
||
ID: 2409 | Rating: 0 | rate: / | ||
Memo is going to build a tool that will show is what is the contents of the shared memory, HR classes and all. This will give is a better chance to explain what is happening with the workunits.
@Andre, If you could give us a matrix of the HR classes and number of machines in each class, maybe some other machines could be resurrected to fill out the classes. ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2411 | Rating: 0 | rate: / | ||
I think you will find most of B@A is over at ABC taking part in AA5 but many should return around the 15th when AA5 is over. Also like many a Linux user i wont bring part of my farm back until the credits are fixed. I don't like my C2D earning less credits an hour than a Windows P2 or P3. I throw it at projects where the crunching power is appreciated.
|
||
ID: 2412 | Rating: 0 | rate: / | ||
I've been out for a day or two. Working on Einstein@Home til Docking comes back up. I guess I've still got over 3,000 pending, so no big deal! ____________ John MacPro 2 x 2.66GHz Dual-Core Xeon | 2GB RAM | ATI x1900 | BOINC 5.9.5 |
||
ID: 2414 | Rating: 0 | rate: / | ||
> Only getting the occassional WU through on both Windows and Linux, one machine has less than a dozen and the other 3 less than 6 (only 4 left on my Opteron 275, one WU per core).
|
||
ID: 2415 | Rating: 0 | rate: / | ||
WooHoo! Got a couple more workunits today.
|
||
ID: 2418 | Rating: 0 | rate: / | ||
Does running out of WU mean that Docking has all the help it needs with Alpha testing? Don't think so... just keep that Mac attached... ;-) ____________ |
||
ID: 2453 | Rating: 0 | rate: / | ||
WooHoo! Got a couple more workunits today. There's multiple snags/problems :-) BOINC 5.8.8 (now 5.8.11) did cause a problem because it changed the system information strings returned by the BOINC client and I believe that messed up Homogeneous redundancy (referred to as HR). Basically different processors return slightly different results due to changes in the floating point processing and rounding. HR attempts to group the processors that have the same characteristics so that their results can be compared to verify that they got the right result. There's also a problem with the shared memory segment on the server side and they're writing a tool to gather more info to try to find out what is going on. IIRC, Andre or Memo said that more users would actually help alleviate the problem. If I understand correctly (definitely not guaranteed), The shared memory segment is a pool of available work units. A work unit is assigned to 3 client computers to run (3 is the current quorum). Here's what happens in the shared memory segment. 1) Work units go into the shared memory segment on the server. 2) A client requests work. Their HR is matched against the HR of work units in the shared memory. 2A) If a HR match is found AND the machines which have already been assigned to work on the work unit don't belong to the user which owns this client machine, then the work unit is assigned to this client. 2B) If the conditions in 2A aren't met, it looks for a work unit which hasn't been assigned to ANY HR group and if it finds one, it marks the work unit to only be assigned to this clients HR group and assigns the work unit to this client. 2C) If 2A and 2B aren't met, then you get the work was committed to other platforms message and your client will try again in a few minutes. It will try a few times with one minute between tries and then will go to waiting a gradually increasing random time between tries. At some point, more work is added to the pool and the client succeeds in getting work. 3) I'm even more vague on what happens after the work unit has been assigned to enough clients to meet the minimum quorum. I suspect it leaves the shared memory work pool unless one of the machines it is assigned to gets an error or doesn't return the result within the time limit. That's just a guess, though. I think that some HR groups are filling up the shared memory area because they don't have another machine in the same HR group to issue the work unit to. This is probably caused in part by some HR groups having a couple of very fast machines and some really slow machines (like mine). To feed the fast machines, a bunch of work has to be assigned to their HR group, but it has to wait for one of the slower machines to get around to asking for more work for the quorum to be met and a space to be freed up in the shared memory pool. There might also be limits on how much of the pool one HR group can reserve. Please remember that much of this is guessing from what I've heard in the various forums and developer lists. I fully expect that Andre will come by in the morning and tell you how much of it I got wrong :-) Anyway, the problems are being worked on and the project needs as wide a variety of clients on it as possible, so please keep crunching. Hopefully, this is at least close to how the process works and will help explain things :-) I'd have to actually get the source code from the CVS repository and spend hours or days looking at it to be sure. ____________ The views expressed are my own. Facts are subject to memory error :-) Have you read a good science fiction novel lately? |
||
ID: 2459 | Rating: 1 | rate: / | ||
[quote]WooHoo! Got a couple more workunits today. Why should it matter if work goes to a client machine owned by the same user which has the WU? I can see why you wouldn't want it to go to the same computer, but why not a computer in the same HR class, no matter who owns it. |
||
ID: 2460 | Rating: 0 | rate: / | ||
Because in principle (and this is coming from the seti days a long time ago) a user can cheat and inject the same result files on all his/her computers and then trigger boinc to send these back. I don't think this is that easy to do with the newest boinc client software, but the mechanism still exists. Andre ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2461 | Rating: 0 | rate: / | ||
Hi David,
Correct. The new boinc client is (hopefully) going to make this HR filtering easier, but since we have a mix of old and new clients, for the moment it is harder. If you can, everybody please upgrade to the latest version of the boinc client. In a while we probably set this as the required minimum version of the client needed.
There's not really a problem with the shared memory segment, but at the moment BOINC doesn't supply the tools to see what HR classes are in there, so we've decided to write a tool ourselves.
The shared memory is a pool of available results (replicas). A workunit generates 3 replicas that are stored in the shared memory by the feeder program.
Replicas go into the shared memory segment.
Correct. (except that the workunit is a replica)
Correct. (except that the workunit is a replica)
Correct. (except that the workunit is a replica)
Correct.
Every replica that has been assigned to a host is removed from the SM. If replicas time out or return an error, the system generates a new replica for tht workunit and sticks it into the SM for distribution to a new host.
Correct. The fast/slow machine issue is a problem (we will probably give them their own HR class later) and the fact that we don't have many machines attached for certain classes (e.g. K6's or Macs) is a problem.
No, we don't have limits setup. The Leiden Classical guys have actually implemented limits and it seems to work well. We might go that way later on.
That is 100% correct :-) Very good 'guesswork' David!
Thanks Andre ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2463 | Rating: 0 | rate: / | ||
Message boards : Number crunching : There is work but it is committed to other platforms
Database Error: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) { [0]=> array(7) { ["file"]=> string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc" ["line"]=> int(97) ["function"]=> string(8) "do_query" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#41 (2) { ["db_conn"]=> resource(96) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(51) "update DBNAME.thread set views=views+1 where id=167" } } [1]=> array(7) { ["file"]=> string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc" ["line"]=> int(60) ["function"]=> string(6) "update" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#41 (2) { ["db_conn"]=> resource(96) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(3) { [0]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "167" ["forum"]=> string(1) "2" ["owner"]=> string(3) "100" ["status"]=> string(1) "0" ["title"]=> string(52) "There is work but it is committed to other platforms" ["timestamp"]=> string(10) "1171302835" ["views"]=> string(4) "1588" ["replies"]=> string(2) "35" ["activity"]=> string(22) "7.960986777641899e-124" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1170317583" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } [1]=> &string(6) "thread" [2]=> &string(13) "views=views+1" } } [2]=> array(7) { ["file"]=> string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php" ["line"]=> int(184) ["function"]=> string(6) "update" ["class"]=> string(11) "BoincThread" ["object"]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "167" ["forum"]=> string(1) "2" ["owner"]=> string(3) "100" ["status"]=> string(1) "0" ["title"]=> string(52) "There is work but it is committed to other platforms" ["timestamp"]=> string(10) "1171302835" ["views"]=> string(4) "1588" ["replies"]=> string(2) "35" ["activity"]=> string(22) "7.960986777641899e-124" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1170317583" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(13) "views=views+1" } } }query: update docking.thread set views=views+1 where id=167