Wow, I attached and began crunching with no problems. I noticed I was getting some lag while surfing the net a few minutes later, and checked the message window to see if one of my currently problematic projects was causing it. It turned out I was still downloading WU's from Docking@home, 300 in all before I managed to get it to stop. I don't know if it's because the expected completion was ~5min (which it's looking like they will probably take more like a few hours), and the DCF hasn't had a chance to stabilize yet since I am just working on my first one, but 300 still seems excessive. I hope I can make the deadlines for them all! Anyone else have this happen?
Thanks,
Atomic
ID:
15 | Rating: 0
| rate:
/
Memo
Forum moderator
Project developer
Project tester
Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
I just got 20 on one of my machines and 4 on another. 300 it is a lot you better leave your machine running 24x7
Wow, I attached and began crunching with no problems. I noticed I was getting some lag while surfing the net a few minutes later, and checked the message window to see if one of my currently problematic projects was causing it. It turned out I was still downloading WU's from Docking@home, 300 in all before I managed to get it to stop. I don't know if it's because the expected completion was ~5min (which it's looking like they will probably take more like a few hours), and the DCF hasn't had a chance to stabilize yet since I am just working on my first one, but 300 still seems excessive. I hope I can make the deadlines for them all! Anyone else have this happen?
Thanks,
Atomic
Yes, I just got 50 WUs on one PC and over 300 WUs on another one. Seems to be much to much to crunch in time. Maybe you could change the "maximum daily WU quota per CPU" from 500 to 50 or 100 for the moment?
Good idea, setting the quota lower would be very helpful. On my system (windows, P4 3.2 GHz) it's looking like they will take 4 hours or more apiece. I shut off the tap at 20 wus but even so, I can't afford to dedicate this system 100% to this project so several of mine will be past deadline. Good luck with 300!
That is actually a good idea. I've set it to 50 for now. Let's see what that does. No problem if you need to abort WUs or don't make the deadline: this is an alpha test to iron out problems with the system, we're not too interested in great results yet (although it is okay if we get some great results :-)
Thanks
Andre
Good idea, setting the quota lower would be very helpful. On my system (windows, P4 3.2 GHz) it's looking like they will take 4 hours or more apiece. I shut off the tap at 20 wus but even so, I can't afford to dedicate this system 100% to this project so several of mine will be past deadline. Good luck with 300!
Like others here, I too attached and "behind my back" found afterwards that BOINC had downloaded 144 WU's all supposedly of 9 mins 31 seconds "to completion".
Shame is that after 2 hours and 20 mins of continuous crunching (on a 3GHz P4 w/HT), I've only completed 30% of the 1st 2 WU's.....!
So it'll take me the next 432 hours (on 24/7 basis) to complete all of these WU's if I let them go to 100% and assuming they all take about 6 hours to complete.
So your Sept 12th "message" on your home page that says:
"Also our workunits currently take around 1.5 hours on an Intel P4 3GHz PC"
should be amended, if my example is typical.....!
In the meantime, good luck with your project - you'll certainly have our Teams support...!
Thankfully looked at this board and went back and set my computer to 'no new work'. Seem to have got away with a mere 50 extra after the initial download of twelve. Estimated time 14min20sec, current execution 20.2% after 2h03min.
So that's ~10h per wu, 62 wu's, ~120 hours to deadline.....
Well, I got "only" 11 results on first Conroe.
I gues it's because I have only 0.2 days work cache. Or it is because machine is already attached to ~ 30 projects?
Well, I got "only" 11 results on first Conroe.
I gues it's because I have only 0.2 days work cache. Or it is because machine is already attached to ~ 30 projects?
How long have you been attached? If you noticed, I 'only' got 12 wu initially...
Well, I got "only" 11 results on first Conroe.
I gues it's because I have only 0.2 days work cache. Or it is because machine is already attached to ~ 30 projects?
perhaps changed, I got also too many and Iam attached to 35 now ;)
Well, I got "only" 11 results on first Conroe.
I gues it's because I have only 0.2 days work cache. Or it is because machine is already attached to ~ 30 projects?
perhaps changed, I got also too many and Iam attached to 35 now ;)
I can top that: 39 for my good old AthlonXP2200+ for 0.1 days
Yup, same problem, got about 95 before I noticed and shut it down. So BOINC is estimating 15 minutes per work unit, which obviously isn't accurate, I'm 3 1/2 hours into 1 at about %30.... Only a P4 1.6
Can anyone tell us more about how BOINC actually gets workunits? Meaning, the client requests xx seconds of work from the server... And sends up a benchmark number? And uptime / resource sharing info? Then the server makes the determination of how long each unit should take based on the benchmark, and then sends the right number to fill the requested time, adjusted for uptime and sharing? Or is it completely different? Does each project fine tune the projected time to completion based on results reported from a host (learns as it goes along, doesn't trust the client if it says "I get 300 Megaflops")?
I noticed particularly with chess960, you'll request say 8000 seconds, and get 3 workunits back... Then make another request for less time, get 3 back, usually until the server says "ok no more". I take it that means there's also a limit per request as well as a limit per day... But that probably wouldn't have helped here, because we'd just make another request anyway...
And even though my notebook should only be queuing up 8640 seconds of work (yes, short, because it's off alot when traveling and don't want too much work and miss short deadlines), it seems to request 8640 FROM EACH PROJECT, regardless of how much work it already has queued from a different one... Any ideas about that?
-D
on my sempron 3100+ my first WU took me 5h to crunch, and still have 19WU to go... everything would be alright, if tere was longer deadline.. 5 days is not enough to crunch it all (i suspect, i'll send results of 5-6 of them...)
95 as soon as I attatched. Took just over 5.5 hours for the first one to finish which of course errored out. The correction factor doesn't seem to have factored in at all and is still showing just over 6 minutes each.
It would appear that I joined after all of you suggested a drop to 50 WU's a day. I received 100 (50 per cpu). I have not checked if all working as I had to go to work.
Good question. We have discussed this in the team and the best thing probably is to abort these units. I have already cancelled a lot of wu's and stopped the wu generator until we have fixed the problem with the -131 error (we coded a fix yesterday and will test and deploy today if everything goes OK). So it might be people run out of work for a short while.
Thanks!
Andre
Has there been any word from the project about what to do with the excessive units?
Ignoring all other issues, I have ~50 wu's that will not finish this side of the deadline. Do I crunch them anyway or bin them?
this is why when I ALPHA test I set my preference to 0.08 days which is 1.92 hours or 115.2 minutes. Its not uncommon for a new project to have the fpops value not exaclty correct.
this is why when I ALPHA test I set my preference to 0.08 days which is 1.92 hours or 115.2 minutes. Its not uncommon for a new project to have the fpops value not exaclty correct.
Even at 5 minutes I could get 23 wu
so welcome to ALPHA testing.
Yeah, I normally do that too. Had reset it to 2 days though to get some extra for last weekend, I was going out of town away from a connection, and forgot to set it back :(
-D
There is something seriously wrong with the work fetch for this project.
I run with a 3 day queue, and I'm attached to 10 projects.
I had Docking suspended so it wouldn't fill up my queue. The queue was filled (probably overfilled) with work from 8 other projects. So- I felt it as safe to "resume" the Docking project, since there wasn't room in the queue for it to download a bundle.
The instant I clicked "resume" in BOINC Manager, it requested 259,200 seconds of work - 72 hours - 3 full days worth - IGNORING that there was already other work in the queue.
09/17/06 20:29:08||Rescheduling CPU: project resumed by user
09/17/06 20:29:11|Docking@Home|Sending scheduler request to http://docking.utep.edu/docking_cgi/cgi
09/17/06 20:29:11|Docking@Home|Reason: To fetch work
09/17/06 20:29:11|Docking@Home|Requesting 259200 seconds of new work
09/17/06 20:29:22|Docking@Home|Scheduler request succeeded
09/17/06 20:29:24|Docking@Home|Started download of file charmm_5.2_windows_intelx86
09/17/06 20:29:24|Docking@Home|Started download of file 1tng_mod0001_1088_66396.inp
09/17/06 20:29:56|Docking@Home|Finished download of file 1tng_mod0001_1088_66396.inp
Now I have 3 days of work, with short deadlines putting BOINC into emergency EDF mode.
I'm going to reset the project and "suspend" again until this is fixed.
I'm going to post this to the boinc mailing lists, since boinc itself is supposed be handling the workfetch policy correctly. I'm not aware that we as a project can do something about it.
Thanks for letting us know. Anybody else who is seeing this behavior?
Andre
There is something seriously wrong with the work fetch for this project.
I run with a 3 day queue, and I'm attached to 10 projects.
I had Docking suspended so it wouldn't fill up my queue. The queue was filled (probably overfilled) with work from 8 other projects. So- I felt it as safe to "resume" the Docking project, since there wasn't room in the queue for it to download a bundle.
The instant I clicked "resume" in BOINC Manager, it requested 259,200 seconds of work - 72 hours - 3 full days worth - IGNORING that there was already other work in the queue.
09/17/06 20:29:08||Rescheduling CPU: project resumed by user
09/17/06 20:29:11|Docking@Home|Sending scheduler request to http://docking.utep.edu/docking_cgi/cgi
09/17/06 20:29:11|Docking@Home|Reason: To fetch work
09/17/06 20:29:11|Docking@Home|Requesting 259200 seconds of new work
09/17/06 20:29:22|Docking@Home|Scheduler request succeeded
09/17/06 20:29:24|Docking@Home|Started download of file charmm_5.2_windows_intelx86
09/17/06 20:29:24|Docking@Home|Started download of file 1tng_mod0001_1088_66396.inp
09/17/06 20:29:56|Docking@Home|Finished download of file 1tng_mod0001_1088_66396.inp
Now I have 3 days of work, with short deadlines putting BOINC into emergency EDF mode.
I'm going to reset the project and "suspend" again until this is fixed.
I'm going to post this to the boinc mailing lists, since boinc itself is supposed be handling the workfetch policy correctly. I'm not aware that we as a project can do something about it.
Thanks for letting us know. Anybody else who is seeing this behavior?
Andre
Yep, I see it all the time too. I can understand it happening when there's no other work queued for those projects, but it seems to do it all the time. Almost as if whatever time period you've requested is being applied PER PROJECT instead of to the client as a whole... As in, I've seen it request 86400 seconds from BURP, and then immediately thereafter request only 24000 seconds from Chess960 - and it had units queued to chess, but not burp.
Now, I've also noticed messages come back from time to time (from the project during a work request) saying you requested x amount of work, boinc on for yy percent of the time, this project gets zz percent of that, you won't finish in time. And it sends no work - so it seems the server *might* be able to make that determination somehow, but I don't know. It might also be a client generated message that just appears to come from the server - I haven't ethereal'd it so don't know conclusively.
5.4.11, btw, and only on Windows platforms (various xp/2000/2k3)
-D
Yep, I see it all the time too. I can understand it happening when there's no other work queued for those projects, but it seems to do it all the time. Almost as if whatever time period you've requested is being applied PER PROJECT instead of to the client as a whole...
I've never seen work requests behave any other way, at least in the last several revs of BOINC. It's always annoyed me because it tends to put BOINC in EDF when, imo, it shouldn't. Specifically, if BOINC requests work from a project (i.e. no EDF, uptime or LTD obstacles) -- and if that project has no work already queued on the host, I've never seen BOINC ask for less than a full cache of work for that project. No matter what else is queued for other projects. Assumed it was a "feature" as it's been true for so long.
Maybe Docking should send a max of 10 WUs. Last time I contacted the server I got 29. This is work for 3 days and I don`t WANT to stop all other Projekts because of Docking.
Maybe Docking should send a max of 10 WUs. Last time I contacted the server I got 29. This is work for 3 days and I don`t WANT to stop all other Projekts because of Docking.
Sounds like a good idea.
I'm quite sure SETI uses this method to limit fetching large portion of results at once (to prevent overloading servers).
(it's a pain if you want to fetch full cache on a BOINC installation intended for non-internet machines...takes hours due to deffered communitation)
Maybe the estimated FLOPS are too low, and make BOINC think there is actually little work in the queue.
By the way, in config.xml on server you can set maximum workunits to send at a time, and how much time to delay the next request. Setting to 1 wu and 60 seconds will make the server send a single workunit and wait one minute before sending anything else (ever seen "last RPC too recent" message on client?)
EDIT: it's <max_wus_to_send> and <min_sendwork_interval>. Tweaking daily quota may help as well.
This seems to be a problem with the boinc client that is going to be fixed in the next release according to John Keck on the boinc_projects mailing list:
quote - There is one problem mentioned that is a BOINC client problem and will be fixed in the next release, getting a full queue for a low resource share project. - unquote
Estimates have already been changed slightly from our side to match the calculation better. We like to keep our deadlines short for now, because of the quick validation as a result.
Thanks
Andre
There is something seriously wrong with the work fetch for this project.
I run with a 3 day queue, and I'm attached to 10 projects.
I had Docking suspended so it wouldn't fill up my queue. The queue was filled (probably overfilled) with work from 8 other projects. So- I felt it as safe to "resume" the Docking project, since there wasn't room in the queue for it to download a bundle.
The instant I clicked "resume" in BOINC Manager, it requested 259,200 seconds of work - 72 hours - 3 full days worth - IGNORING that there was already other work in the queue.
09/17/06 20:29:08||Rescheduling CPU: project resumed by user
09/17/06 20:29:11|Docking@Home|Sending scheduler request to http://docking.utep.edu/docking_cgi/cgi
09/17/06 20:29:11|Docking@Home|Reason: To fetch work
09/17/06 20:29:11|Docking@Home|Requesting 259200 seconds of new work
09/17/06 20:29:22|Docking@Home|Scheduler request succeeded
09/17/06 20:29:24|Docking@Home|Started download of file charmm_5.2_windows_intelx86
09/17/06 20:29:24|Docking@Home|Started download of file 1tng_mod0001_1088_66396.inp
09/17/06 20:29:56|Docking@Home|Finished download of file 1tng_mod0001_1088_66396.inp
Now I have 3 days of work, with short deadlines putting BOINC into emergency EDF mode.
I'm going to reset the project and "suspend" again until this is fixed.
____________
D@H the greatest project in the world... a while from now!
I had noticed D@H wasn't recognizing my other cached work, so I have been leaving it set on "No new work" and lower my connection settings to 0.1 days before I allow more again. Even so, it still gives me 3 or 4 WU's (at 3 hours apiece).... but better than 300! :-O
Make sure you mention this on the boinc_dev mailing list too, because that sounds like a bug in boinc itself. D@H has no clue about any of the other projects you are running; the boinc client handles that for us...
Thanks,
Andre
I had noticed D@H wasn't recognizing my other cached work, so I have been leaving it set on "No new work" and lower my connection settings to 0.1 days before I allow more again. Even so, it still gives me 3 or 4 WU's (at 3 hours apiece).... but better than 300! :-O
____________
D@H the greatest project in the world... a while from now!
Maybe the estimated FLOPS are too low, and make BOINC think there is actually little work in the queue.
By the way, in config.xml on server you can set maximum workunits to send at a time, and how much time to delay the next request. Setting to 1 wu and 60 seconds will make the server send a single workunit and wait one minute before sending anything else (ever seen "last RPC too recent" message on client?)
EDIT: it's <max_wus_to_send> and <min_sendwork_interval>. Tweaking daily quota may help as well.
____________
D@H the greatest project in the world... a while from now!
Client asks for workunits, server sends 50. If client asks again 10 seconds later, server would send another 50 (in this case it won't because of daily quota). If it asks before 10 seconds pass from last request, it will answer 'last RPC too recent'.
Maybe you would want to lower max_wus_to_send so that it sends less workunits at a time.
Client asks for workunits, server sends 50. If client asks again 10 seconds later, server would send another 50 (in this case it won't because of daily quota). If it asks before 10 seconds pass from last request, it will answer 'last RPC too recent'.
Maybe you would want to lower max_wus_to_send so that it sends less workunits at a time.
But isn't that dependent on the amount of work the client requests from the server as well? If it wants 8400 seconds of work, it shouldn't send 50, it should send 4 (example) if that would fit the request. I think this works okay, because that's what I see on my machine: I ask for 24 hours of work and I get 9 wus or so (every one takes about 2.5 hours).
Andre
____________
D@H the greatest project in the world... a while from now!
But isn't that dependent on the amount of work the client requests from the server as well? If it wants 8400 seconds of work, it shouldn't send 50, it should send 4 (example) if that would fit the request. I think this works okay, because that's what I see on my machine: I ask for 24 hours of work and I get 9 wus or so (every one takes about 2.5 hours).
Andre
Yeah, sure. That setting is the
maximum
number of workunits to send at a time.
Off-topic a bit, but I got what proves that this project is being tested with energy.
05/10/2006 21:02:05|Docking@Home|Sending scheduler request to http://docking.utep.edu/docking_cgi/cgi
05/10/2006 21:02:05|Docking@Home|Reason: To fetch work
05/10/2006 21:02:05|Docking@Home|Requesting 259200 seconds of new work
05/10/2006 21:02:11|Docking@Home|Scheduler request succeeded
05/10/2006 21:02:11|Docking@Home|Message from server: No work sent
05/10/2006 21:02:11|Docking@Home|Message from server: (there was work but it was committed to other platforms)
05/10/2006 21:02:11|Docking@Home|No work from project
I've never seen such a message... It shows that everyone is crunching so hard, but please ready WUs for the platform (Windows x86), so that we can test the project more :)
Thanks for reading,
suguruhirahara
____________
I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.
Off-topic a bit, but I got what proves that this project is being tested with energy.
And that is great!
05/10/2006 21:02:05|Docking@Home|Sending scheduler request to http://docking.utep.edu/docking_cgi/cgi
05/10/2006 21:02:05|Docking@Home|Reason: To fetch work
05/10/2006 21:02:05|Docking@Home|Requesting 259200 seconds of new work
05/10/2006 21:02:11|Docking@Home|Scheduler request succeeded
05/10/2006 21:02:11|Docking@Home|Message from server: No work sent
05/10/2006 21:02:11|Docking@Home|Message from server: (there was work but it was committed to other platforms)
05/10/2006 21:02:11|Docking@Home|No work from project
I've never seen such a message... It shows that everyone is crunching so hard, but please ready WUs for the platform (Windows x86), so that we can test the project more :)
Thanks for reading,
suguruhirahara
Our workgenerator doesn't monitor the database by itself yet, so we have to generate new workunits manually to make sure we don't run out of work and sometimes the level gets a bit low (although never to 0!). We are working on a wuGenerator that is a bit more intelligent and can do things by itself.
Thanks
Andre
____________
D@H the greatest project in the world... a while from now!
Client asks for workunits, server sends 50. If client asks again 10 seconds later, server would send another 50 (in this case it won't because of daily quota). If it asks before 10 seconds pass from last request, it will answer 'last RPC too recent'.
Maybe you would want to lower max_wus_to_send so that it sends less workunits at a time.
There is currently no setting for maximum number of results for a project that have not yet been processed on a particular project, however, that has been mentioned, and will probably be implemented.
We are working on a wuGenerator that is a bit more intelligent and can do things by itself.
Could you tell me when the generator will be introduced, in this alpha-phase or in the next beta-phase? I'd love my computers to crunch workunits much more, but they cannot as workunits for their platform are no longer ready...
____________
I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.
ID:
985 | Rating: 0
| rate:
/
Memo
Forum moderator
Project developer
Project tester
Joined: Sep 13 06
Posts: 88
ID: 14
Credit: 1,666,392
RAC: 0
I believe that this is something that needs to be fixed before we proceed to the next phase. We will post it on the news section.