HELP - Consistant 0% Progress - Client Problem?
Message boards : Number crunching : HELP - Consistant 0% Progress - Client Problem?
Author | Message | |
---|---|---|
I have a laptop which is refusing to budge past zero percent unit progress despite eating cpu.
|
||
ID: 5333 | Rating: 0 | rate: / | ||
Hi Gandelf,
|
||
ID: 5351 | Rating: 0 | rate: / | ||
Does that machine really only have 512MB ram? Maybe it's swapping to disk.
|
||
ID: 5375 | Rating: 0 | rate: / | ||
I'm seeing the same issue. Three of four tasks show at least 71 hours elapsed, with about 45 minutes to completion, but progress is still 0%. The fourth task hasn't started yet.
|
||
ID: 5389 | Rating: 0 | rate: / | ||
Hi Suzan
|
||
ID: 5390 | Rating: 0 | rate: / | ||
I aborted the workunits. I'll let you know if I see the same problem again. |
||
ID: 5399 | Rating: 0 | rate: / | ||
I have a laptop which is refusing to budge past zero percent unit progress despite eating cpu. I have the same issue on one Intel CPU Q9550. My other Intel CPUs (Q6600, Q6700, Q9300, Q9450) with same mobo run Docking fine. All W7RC 64-bit. I did not find a solution to the issue, I just stopped running Docking on the Q9550. |
||
ID: 5401 | Rating: 0 | rate: / | ||
Same problem here.
|
||
ID: 5444 | Rating: 0 | rate: / | ||
also no progress here
|
||
ID: 5478 | Rating: 0 | rate: / | ||
Yes, sadly is still a mystery. It could be a particular initial random configuration of the ligand or something about BOINC, but we have not been able to have this behavior in our machines so far
|
||
ID: 5487 | Rating: 0 | rate: / | ||
Just thought I'd add that I've been having the same problem. I currently have a work unit beginning 1c5q that has been running for 12 hours without any progress being made on it. This has happened with several work units but I think I remember them all beginning with 1c5q. Interestingly, when I suspend the project BOINC assigns two cores to other projects and charmm keeps eating up an entire core by itself even though I have BOINC set to use a maximum of two cores right now. It also uses a constant 704K of memory, which seems low but I haven't been paying much attention to what it usually uses.
|
||
ID: 5499 | Rating: 0 | rate: / | ||
Well it isn't a particular type of work unit as I thought it might be. The same thing happened on a new one I tried and that began 1hvi. I tried aborting the task and the cpu remained in use again. I guess docking@home will have to be suspended for a little while. |
||
ID: 5500 | Rating: 0 | rate: / | ||
This is happening to me to. No matter what I try, it won't budge past zero. I detach and reattach, I suspend and resume, I abort and retry. Nothing, and it still eats my processor. Docking@home is suspended for now, let me know when this problem is fixed. Oh, my setup, if that matters:
|
||
ID: 5509 | Rating: 0 | rate: / | ||
Same thing here.
|
||
ID: 5512 | Rating: 0 | rate: / | ||
So far everyone who has posted their OS has reported using windows 7 64 bit. Perhaps that is where the problem lies. |
||
ID: 5526 | Rating: 0 | rate: / | ||
Just thought I'd add I also have this problem. Yes, I am using Win 7 64 bit, but it's only a recent problem. Started about 3 to 4 day ago. Stopped the tasks at 64hrs running with 0% complete.
|
||
ID: 5529 | Rating: 0 | rate: / | ||
I have a desktop PC with a P5Q motherboard, an Intel Core2Duo E8400, 4 GB DDR2 RAM, a Sapphire Radeon HD 4670. When I was using Windows Vista Ultimate Service Pack 1 32 bit, Docking@Home was working perfectly, but when I switched to Windows 7 Ultimate 64 bit the Work Units stopped working correctly: the cpu time (in the "Properties" section of the WU) was always "---" and the graphics window showed "No Model Formed Yet.", even after many hours of continued processing. The work Units used 100% of the CPU core they were assigned to and used only some MBs of RAM (more than 2 GB of RAM were free). I tried to reset the project many times, with no success. The other BOINC projects were running fine. Then I tried running Docking@Home on Ubuntu 9.10 64 bit: this time the Work Units were correctly processed (the complexes of the work Units processed with Linux are so far 1pph, 1qb6 and 1ce5) while on Windows 7 they still aren't working (even the complexes 1qb6 and 1ce5).
|
||
ID: 5532 | Rating: 0 | rate: / | ||
I just joined docking@home. I have 2 active WUs. Both are blocked with respectively 43,07% and 1% progress whilst elaspsed time counters go on running for a few hours without any special activity on my PC. Is this normal ? Is there any special HW requirement ? May I abort ? |
||
ID: 5545 | Rating: 0 | rate: / | ||
@[AF>France>Aquitaine>Cote-Adour-et-Gaves] Bernard du 40
|
||
ID: 5550 | Rating: 0 | rate: / | ||
Confirming the same problem with a Q9550, 4 gigs of RAM on Win 7 64 bit. 0% after a number of hours.
|
||
ID: 5563 | Rating: 0 | rate: / | ||
Another confirmation of wu's with 0% progress.
|
||
ID: 5564 | Rating: 0 | rate: / | ||
31 hours and still at 0 percent. Estimated run-time is 3 hours WTF.
|
||
ID: 5569 | Rating: 0 | rate: / | ||
just to let you know, I'm having the same problem:
|
||
ID: 5573 | Rating: 0 | rate: / | ||
I see this is an ongoing problem. I just aborted all Docking WUs
|
||
ID: 5587 | Rating: 0 | rate: / | ||
I'm having the same problem.
|
||
ID: 5604 | Rating: 0 | rate: / | ||
Down loaded and started new work. Three units ran for 13+ hours while showing no work. After aborting the Docking work units charrm processes continued to run for several hours untill I killed them in task manager. They had all four cores pegged at 100%. Since there is no interest in fixing this bug from Docking at home I'll be detaching all computers. I'm sure my cpu cycles can be put to use on another project.
|
||
ID: 5622 | Rating: 0 | rate: / | ||
From what I've seen, the problem may be specific to the combination of Windows 7 and a sufficiently recent Intel CPU. Anyone ready to agree or disagree?
|
||
ID: 5645 | Rating: 0 | rate: / | ||
From what I've seen, the problem may be specific to the combination of Windows 7 and a sufficiently recent Intel CPU. Anyone ready to agree or disagree? I've searched the log, and all I could find was: 07:05:05 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623 07:08:11 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623 07:11:16 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623 07:14:21 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623 07:17:26 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623 07:20:31 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623 07:23:36 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623 07:26:41 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 and it just goes on and on and on... |
||
ID: 5647 | Rating: 0 | rate: / | ||
Looks like the cause of the problem is different than what I've seen before. |
||
ID: 5648 | Rating: 0 | rate: / | ||
I've suggested that the users of
this thread
post here to keep it all together. More people with the same problem raises the profile somewhat, but doesn't seem to offer much help yet.
|
||
ID: 5654 | Rating: 0 | rate: / | ||
Same here, no Docking WUs show any progress.
|
||
ID: 5655 | Rating: 0 | rate: / | ||
HM, added 6 machines for the charity-race; 5 running okay, 1 is having the described problem with no progress
|
||
ID: 5656 | Rating: 0 | rate: / | ||
Those that have the continual "non-run", was it running okay then start the problem? What I'm wondering is that something happens/ is set/ written to a file/ etc. by a wu and thereafter, the later wu's are seeing some flag/ setting/ something or other and "failing to run" as a result of that?
|
||
ID: 5659 | Rating: 0 | rate: / | ||
Those that have the continual "non-run", was it running okay then start the problem? What I'm wondering is that something happens/ is set/ written to a file/ etc. by a wu and thereafter, the later wu's are seeing some flag/ setting/ something or other and "failing to run" as a result of that? No, for me, the problem started direct with the first Docking-WU on the machine |
||
ID: 5660 | Rating: 0 | rate: / | ||
Same here. Never run Docking before on that machine. |
||
ID: 5661 | Rating: 0 | rate: / | ||
Any chances to get solved the problem with not-running/permanently restarting WUs at some machines? |
||
ID: 5665 | Rating: 0 | rate: / | ||
Anyone want to mention if they've seen this problem recently on any machine NOT running Windows 7 on an Intel CPU?
|
||
ID: 5669 | Rating: 0 | rate: / | ||
That is, what was said in Planet 3DNow! forum yesterday:
The condition for the problem is Win7 + Intel-Yorkfield/Wolfdale (whether dual or quad, cache size does not matter). The problem may or may not occur. |
||
ID: 5675 | Rating: 0 | rate: / | ||
"may or may not occur"
|
||
ID: 5690 | Rating: 0 | rate: / | ||
Have the same problem,
|
||
ID: 5719 | Rating: 0 | rate: / | ||
Hi, must be gettin bored, have, anyway looks like it, the
same
problems.
|
||
ID: 5720 | Rating: 0 | rate: / | ||
Looked like a similar issue at Rosetta
here
but it has appreciable differences. I don't think there is commonality.
|
||
ID: 5721 | Rating: 0 | rate: / | ||
Same problem for me: many hours of computation, progress still 0.000% and in the screensaver a message says something like "no protein created yet"
|
||
ID: 5728 | Rating: 0 | rate: / | ||
Same problem. My PC:
|
||
ID: 5736 | Rating: 0 | rate: / | ||
as the people have said before. Same Here.
|
||
ID: 5737 | Rating: 0 | rate: / | ||
Yep same thing here...
|
||
ID: 5740 | Rating: 0 | rate: / | ||
I currently have two Charmm34a2 6.23 workunits showing around 2 hours elapsed time, 0.000% progress, 00:27:37 to completion, no checkpoints written yet, and a CPU core in use.
|
||
ID: 5742 | Rating: 0 | rate: / | ||
I have some of those too now, it seems that this is caused by empty input files (file size = 0 bytes), so if you have any of those, abort them . |
||
ID: 5744 | Rating: 0 | rate: / | ||
I have several that are of the Charmm 34a2 6.23 type. They were still at 0% after nearly 4 hours. They show a total completion run of 1hr 9 minutes.
|
||
ID: 5745 | Rating: 0 | rate: / | ||
Closer to 6 hours elapsed time before I saw the last two messages in this thread. Will abort them now. |
||
ID: 5747 | Rating: 0 | rate: / | ||
I have some of those too now, it seems that this is caused by empty input files (file size = 0 bytes), so if you have any of those, abort them . Yep I found a few on a couple of my systems that were hanging. All had the IMP files at 0 bytes and seem to have downloaded in the past 18 hours. May try once again to run this project on my laptop. Maybe I was just getting a few bad WU's to start with and had other that would have ran. |
||
ID: 5748 | Rating: 0 | rate: / | ||
You can "abort" the damaged WUs before they start to crunch by deleting all 0-bytes .inp files in advance - but give them time to download ;-)
|
||
ID: 5749 | Rating: 0 | rate: / | ||
UNBELIEVABLE!!!
|
||
ID: 5752 | Rating: 0 | rate: / | ||
Upon waking this morning I discovered I had 3 of these too. Mine were issued around 10UTC yesterday. As Ananas stated all had 0byte .inp files. Teach me to think a thread didn't apply to me and not read it again! Could had saved myself 6 hours. |
||
ID: 5753 | Rating: 0 | rate: / | ||
Dang, there are still those damaged things under ways :-( |
||
ID: 5754 | Rating: 0 | rate: / | ||
I have some of those too now, it seems that this is caused by empty input files (file size = 0 bytes), so if you have any of those, abort them . Thanks. I had this problem start about 24 hours ago. I aborted tasks until I came to some where the progress bar came off zero after a minute or so. Thanks for the heads up. I am a relatively new BOINC user - how do you see the specific files that are downloaded? I am running the BOINC client, 6.10.18, and not using a client manager, but attaching to projects on each PC through the BOINC client. Thanks |
||
ID: 5755 | Rating: 0 | rate: / | ||
In your BOINC data directory, there should be "projects/docking.cis.udel.edu"
625 SEG1 41 ARG HN 1 0.250000 1.00800 0 0.00000 -0.301140E-02 or 1866 SEG2 18 GLN HE21 1 0.300000 1.00800 0 0.00000 -0.301140E-02 or even set params lpdb Not sure what to do with those, I doubt that they produce valid results :-/ |
||
ID: 5756 | Rating: 0 | rate: / | ||
I contacted the project leader by mail now.
|
||
ID: 5757 | Rating: 0 | rate: / | ||
We are looking at the problem. We may need to stop the distribution of work temporarily.
|
||
ID: 5758 | Rating: 0 | rate: / | ||
We temporarily suspended the generation of new jobs while investigating some issues with the charmm script. Stay tuned ..
|
||
ID: 5759 | Rating: 0 | rate: / | ||
Workunit 1iiq_43_mod0013b_1581_18995
|
||
ID: 5760 | Rating: 0 | rate: / | ||
Hi, except problems with BOINC version 6.10.18*.
not only
on my VISTA host, noticed the
same problems as stated above
, that is 0% progress after 6 hours and a
re
start of BOINC, just started the same WU's again, with
zero-time and progress
. When
suspending
a task, it just starts another one, with the
same result
.
|
||
ID: 5761 | Rating: 0 | rate: / | ||
A sanity check that scans the file for a logical EOF mark would sure help - something like // in Stockholm or just a comment line with !EOF, which would not disturb the current syntax. |
||
ID: 5763 | Rating: 0 | rate: / | ||
Workunit 1ohr_47_mod0013b_238_18734
|
||
ID: 5765 | Rating: 0 | rate: / | ||
Hi,
appears
the
Never Ending WU's
, are not over yet!
|
||
ID: 5766 | Rating: 0 | rate: / | ||
Verster,
|
||
ID: 5767 | Rating: 0 | rate: / | ||
Not sure if we're meant to report or not, but just in case...
|
||
ID: 5768 | Rating: 0 | rate: / | ||
--[snip]-- Thanks Trigggl, yesterday I just noticed Docking WU's being retrieved/deleted/??? But this morning, I had a new load, with the same problems . Hope they get it fixed, whithout too much hassle :) ____________ Knight who says N! Ni Ni |
||
ID: 5769 | Rating: 0 | rate: / | ||
--[snip]-- I'm doing some RNA work until these problems are cleaned up. |
||
ID: 5770 | Rating: 0 | rate: / | ||
I picked up a few of these never ending WUs. I have aborted them. My cache is drying up so I will see what these remaining few are doing. Has anyone seen anykind of pattern to which ones hang? All the ones I aborted seem to be different types.
|
||
ID: 5771 | Rating: 0 | rate: / | ||
You can "abort" the damaged WUs before they start to crunch by deleting all 0-bytes .inp files in advance - but give them time to download ;-) Everyone keeps asking how to figure out which ones hang. Check the .inp files. Most of the ones that are going to hang are empty files. The ones that will succeed should be roughly 1.2M and end with something like this: END In Linux you can check it with
tail <crossdocking-file>.inp
|
||
ID: 5772 | Rating: 0 | rate: / | ||
I've noticed that one of my machines has not been affected by the bug. It is running Win7 64bit with BOINC 6.10.29. My other machines that have been bogged down with errors are all running XP 32bit and BOINC 6.10.18. Is there a relationship?
|
||
ID: 5773 | Rating: 0 | rate: / | ||
I've noticed that one of my machines has not been affected by the bug. It is running Win7 64bit with BOINC 6.10.29. My other machines that have been bogged down with errors are all running XP 32bit and BOINC 6.10.18. Is there a relationship? Empty is empty, an x64 binary cannot change that fact :-) Possible that it handles the empty input different and aborts them immediately, possible that the box has just been lucky. |
||
ID: 5778 | Rating: 0 | rate: / | ||
I'm finally getting work again and the input files are all complete. It *may* be safe to download work again. |
||
ID: 5779 | Rating: 0 | rate: / | ||
I'm finally getting work again and the input files are all complete. ... Same here :-) The last failures reported have probably been cached files from before the bugfix. |
||
ID: 5780 | Rating: 0 | rate: / | ||
Hi
still
got some tasks. Which never end, presumably, 0% progress.
|
||
ID: 5784 | Rating: 0 | rate: / | ||
I must have been lucky........ have only come across 2 of these units on my 4 cruchers, which have been aborted before reaching the top of the cache. |
||
ID: 5787 | Rating: 0 | rate: / | ||
Most of the units that start with the name: "1iiq_43_" have failed to start on my Vista and Windows 7 machines. Some ran as long as 85 hours before I noticed them not completing.
|
||
ID: 5800 | Rating: 0 | rate: / | ||
I have several systems (19) with a variety of OS. that were stuck at 0%.
|
||
ID: 5802 | Rating: 0 | rate: / | ||
as the people have said before. Same Here. heres what ive pulled form my message logs 100315 044535 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 146 times every 3 minutes give or take 20 seconds with things in between like 100315 052003 Project communication failed: attempting access to reference site 100315 052005 Internet access OK - project servers may be temporarily down. 100315 134934 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 135546 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 135851 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 140157 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 140809 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 141114 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 141420 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 141726 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 142031 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 142337 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 142643 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 143254 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 143600 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 143629 Suspending computation - user is active 100315 143931 Resuming computation 100315 144234 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 144539 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 144845 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623 100315 144912 Suspending computation - user is active has spent about 7 hours at this point calculating (task switch every 10 minutes so its really been running more like 3 days)and has 0% to show for it, |
||
ID: 5805 | Rating: 0 | rate: / | ||
Hi, for the 3th time I'm seeing tasks with >100
K
seconds runtime and
no progress?!?
|
||
ID: 5806 | Rating: 0 | rate: / | ||
I had run in to a few of the 0% completion sessions a few days ago and just encountered 4 just now. They were all just past their (same) deadline. One had been running for 12 hours and the other 3 for 5 hours. I usually can knock out one in a half hour. I aborted the 4 sessions and 2 other Docking sessions started and ran to completion normally. All were running high priority.
|
||
ID: 5818 | Rating: 0 | rate: / | ||
They do still deliver a lot of those damaged results but those are still old ones whith status "no reply" on the previous host.
|
||
ID: 5819 | Rating: 0 | rate: / | ||
I just had to delete a boatload of WUs, some running for over 16 hours and at 0%.
|
||
ID: 5820 | Rating: 0 | rate: / | ||
Darn, after pausing a while due to the problem and getting back into after the supposed fix, I just found I lost some ~500hours of CPU time - again.
|
||
ID: 5821 | Rating: 0 | rate: / | ||
Hi, well the
empty; no-progrss WU's
, are gone, atleast looks that way.
|
||
ID: 5822 | Rating: 0 | rate: / | ||
I'm back to processing WU here, but am still getting some of those nasty 0% complete jobs. I'll just abort them when needed.
|
||
ID: 5823 | Rating: 0 | rate: / | ||
Hi, well the empty; no-progrss WU's , are gone, atleast looks that way. ... not the redelivered ones. I have killed about 10 just now, several of which already had eaten quite some CPU time. I run only a tiny cache, so those have been quite fresh. Example sent 21 Mar 2010 14:27:43 UTC |
||
ID: 5825 | Rating: 0 | rate: / | ||
Ahh, I have 32 Docking WU's
pauzed
looks like a task which was
switched after 60min. pauzed and another is started, but it looks they are finishing normally. Atleast, I hope so.
|
||
ID: 5826 | Rating: 0 | rate: / | ||
I'm seeing similar. Get the WU, shows some reasonable estimate - I had some that said 48 minutes last week. I just killed one that showed 66 hours elapsed, 0% progress, nothing under time-to-complete.
|
||
ID: 5827 | Rating: 0 | rate: / | ||
I'm seeing similar. Get the WU, shows some reasonable estimate - I had some that said 48 minutes last week. I just killed one that showed 66 hours elapsed, 0% progress, nothing under time-to-complete. Last night I had one ready to start with an estimated 4:15 run time. This morning it has run for an hour, still shows 0% complete. ____________ |
||
ID: 5828 | Rating: 0 | rate: / | ||
Yes I just aborted 3 of these tasks, one at 16 hours 0.00% another at 14 hours 0.00% and one at nearly 4 hours 0.00%.
|
||
ID: 5829 | Rating: 0 | rate: / | ||
Please can we have some sort of response to this problem, even if it is just an acknowledgement and "we are looking into it"? Its ongoing, we are dealing with it as best we can but the silence from the project team is deafening. |
||
ID: 5830 | Rating: 0 | rate: / | ||
Hear, hear. I haven't had a good WU for at least a month and probably longer. I'm sick of aborting WUs with dozens and dozens of wasted hours invested in them. I am understanding of problems but this is getting ridiculous and I'm on the cusp of disabling this project. If you don't sort it you're going to find the level of interest tumbling like a house of cards out here.
|
||
ID: 5831 | Rating: 0 | rate: / | ||
Why is this problem not being addressed? I have three computers all showing 100% CPU usage but 0% PROGRESS. This has been happening on my computers for some time now so I will have to attach to another project until someone on your end finds a remedy.
|
||
ID: 5833 | Rating: 0 | rate: / | ||
Why is this problem not being addressed? I have three computers all showing 100% CPU usage but 0% PROGRESS. This has been happening on my computers for some time now so I will have to attach to another project until someone on your end finds a remedy. I just aborted this wu because it was at 0% after 9 hours of computation. I put Docking on NNW |
||
ID: 5841 | Rating: 0 | rate: / | ||
|
||
ID: 5842 | Rating: 0 | rate: / | ||
This issue of zero progress is bad enough, yet the lack of respect to acknowledge us is truly far greater to me at this point. Almost seems you don't understand the value of the participants in this project. Simple put, without us... there is no Docking at home. I second that. There needs to be at least a response if not a solution to this severe problem! I'm really thinking about setting Docking to "now new work" if nothing happens. I have better things to do than to constantly check all systems for those broken WUs. And I'm sure I'm not the only one considering the simplest "solution". It's just a click in my account manager to get rid of this annoyance. |
||
ID: 5843 | Rating: 0 | rate: / | ||
I totally agree with ScientificFrontline on this one, I'm having to put in masses of effort to keep my machines computing work units. Effort that I should not have to, imo.
|
||
ID: 5844 | Rating: 0 | rate: / | ||
There's another thread calling for a boycott, which I'm not yet inclined to do. But where is "the man behind the curtain"? There are other, worthwhile projects out there, and we get to choose which ones to run, so some acknowledgment or comment would seem to be appropriate here.
|
||
ID: 5851 | Rating: 0 | rate: / | ||
22k RAC = ninth overall = bye bye Docking, at least until they have learned some basic communication skills. I'm not burning electricity and putting my time and effort in for a group who don't even think they need to talk to their volunteers. Plenty of units coming soon, I'm about to ditch about 1300 |
||
ID: 5853 | Rating: 0 | rate: / | ||
During March 7-11 we had a big problem with the server (
http://docking.cis.udel.edu/about/project/news.php
), which was running out of space and therefore several workunits were sent empty. On March 11 we stopped the production, increase the server partition and resume distribution. I want to think that the problems of workunits with 0% progress are still some of those created during March 7-11, what worries me the most is that now it's been 15+ days and you are still having the problems.
|
||
ID: 5854 | Rating: 0 | rate: / | ||
unfortunately, I think a large proportion of users have aborted all or detached now Trilce, and you'll have a hard time getting those logs.
|
||
ID: 5856 | Rating: 0 | rate: / | ||
During March 7-11 we had a big problem with the server ( http://docking.cis.udel.edu/about/project/news.php ), which was running out of space and therefore several workunits were sent empty. On March 11 we stopped the production, increase the server partition and resume distribution. I want to think that the problems of workunits with 0% progress are still some of those created during March 7-11, what worries me the most is that now it's been 15+ days and you are still having the problems. Trilce, This is the only project I have ever truly been passionate about. I understand issues do arise, yet a response of any kind is always imperative. One of science as yourself knows the importance of communication, without such... there is failure. I'm willing also to help with communicating with members, your team has to establish the connection though. ____________ Recognized by the Carnegie Institute of Science . Washington D.C. |
||
ID: 5858 | Rating: 0 | rate: / | ||
http://docking.cis.udel.edu/community/workunit.php?wuid=11126773
|
||
ID: 5859 | Rating: 0 | rate: / | ||
I think you are right DoubleTop, we are/will face a different problem now, there is a bug in the transitioner that keeps generating workunits even if we said we want just one. We had changed the workuint generator and other daemons around and haven't been able to keep it straight, I need to modify the validator to accept these kind of workunits, let's see if I can fix it soon
|
||
ID: 5864 | Rating: 0 | rate: / | ||
I think you are right DoubleTop, we are/will face a different problem now, there is a bug in the transitioner that keeps generating workunits even if we said we want just one. We had changed the workuint generator and other daemons around and haven't been able to keep it straight, I need to modify the validator to accept these kind of workunits, let's see if I can fix it soon As I am also, now lets all move forward with better understanding of both sides. Heidi-Ann Kennedy ____________ Recognized by the Carnegie Institute of Science . Washington D.C. |
||
ID: 5869 | Rating: 0 | rate: / | ||
When the problem started, I stopped requesting new WUs. When I started getting WUs again, I started monitoring for 0% and killed only them. I figured there were enough good WUs to make it valid to keep pressing on. I havn't received any bad WUs for awhile but if I do, I will document them before killing them.
|
||
ID: 5870 | Rating: 0 | rate: / | ||
Good luck. BTW, facebook would be a good forum for status problems/updates. With the screensaver, a quick update "If you see this and your project is at 0%, abort it!" would have sufficed! :) |
||
ID: 5874 | Rating: 0 | rate: / | ||
Personally I would just like to see the notices on the main page where they belong in my opinion.
|
||
ID: 5879 | Rating: 0 | rate: / | ||
Good luck. BTW, facebook would be a good forum for status problems/updates. Except the problem is a 0 data input file. If they can code it to recognize that and switch the screensaver, it would be easier to code it to recognize and abort, or better yet, recognize and re-download. It would be easy to scan an input file for the word "END". |
||
ID: 5880 | Rating: 0 | rate: / | ||
The abortion was supposed to be codded on the charmm warper, but it is obvious that is not working. We will have to revisit the code to add a reliable way to detect empty or truncated input files |
||
ID: 5883 | Rating: 0 | rate: / | ||
The abortion was supposed to be codded on the charmm warper, but it is obvious that is not working. We will have to revisit the code to add a reliable way to detect empty or truncated input files I'm not a programmer, but I play one on web forums. :-D |
||
ID: 5886 | Rating: 0 | rate: / | ||
When the problem started, I stopped requesting new WUs. When I started getting WUs again, I started monitoring for 0% and killed only them. I figured there were enough good WUs to make it valid to keep pressing on. I havn't received any bad WUs for awhile but if I do, I will document them before killing them. Yeees, unfortunately, there aren't any "good" WUs, or at least I haven't seen one for months, so the cusp is now behind me and I've stopped receiving work until such time as their is a clear indication that this has stopped. Shame. |
||
ID: 5888 | Rating: 0 | rate: / | ||
Another one bites the dust..
|
||
ID: 5889 | Rating: 0 | rate: / | ||
Another one bites the dust.. Yes, that was one of them. The good news is that this crisis helped us to take care of several pending issues, one was the generation and validation of retrials, like this one ending in _1 |
||
ID: 5892 | Rating: 0 | rate: / | ||
Another one bites the dust.. Yes, that was one of them. The good news is that this crisis helped us to take care of several pending issues, one was the generation and validation of retrials, like this one ending in _1 |
||
ID: 5893 | Rating: 0 | rate: / | ||
Unfortunately we have found a new bug in the message board software where under certain circumstances replies are posted twice.......... ;) |
||
ID: 5894 | Rating: 0 | rate: / | ||
True =) |
||
ID: 5897 | Rating: 0 | rate: / | ||
<sigh>
|
||
ID: 5900 | Rating: 0 | rate: / | ||
Still no solution for me, either. 0% on all workunits I get after aborting. |
||
ID: 5916 | Rating: 0 | rate: / | ||
Is there any solution by now?
|
||
ID: 5987 | Rating: 0 | rate: / | ||
*bump* |
||
ID: 6037 | Rating: 0 | rate: / | ||
I just noticed the issue too, but unfortunately I aborted the tasks prior to looking for a thread on it.
|
||
ID: 6040 | Rating: 0 | rate: / | ||
Same problem here.
|
||
ID: 6148 | Rating: 0 | rate: / | ||
Same problem here. On other BOINC projects, that often happens if you don't let it run long enough to reach the first checkpoint, for any workunits that only update their progress at checkpoints. Therefore, it would be useful to know how much CPU time and how much elapsed time it used while still showing 0.00% progress, to see if it should have reached a checkpoint by then. Also, some BOINC projects will wait about 24 hours after you report a failed or aborted workunit before sending you any more workunits at all. |
||
ID: 6151 | Rating: 0 | rate: / | ||
No such problem for monts (actually I had not even one with no progress until now) but now 5 in a row, e.g. :
|
||
ID: 6158 | Rating: 0 | rate: / | ||
No such problem for monts (actually I had not even one with no progress until now) but now 5 in a row, e.g. : I've been having the same problem for the last few days and have aborted a number of workunits at various times (up to an hour) in their progress. The progress always remained at 0%. Finally decided to let one run for the entire estimated completion time of about 3 hour. The time to completion went down to zero, the progress stayed at 0%, and the elapsed time continued to count up. Are these workunits defective? I'm also running Seti and not having any problems. I would like to continue with this project, but don't want to waste my CPU time if the workunits are defective. Anybody got any answers? |
||
ID: 6161 | Rating: 0 | rate: / | ||
Hi.
|
||
ID: 6162 | Rating: 0 | rate: / | ||
1hvi1hbv_mod0014crossdockinghiv1_13789_408524
showing 09:13:21 CPU time and 48:27:37 elapsed. Fraction done 0.000%. Aborted.
|
||
ID: 6163 | Rating: 0 | rate: / | ||
Got one as well,
1hvj1hbv_mod0014crossdockinghiv1_18736_423697
.
|
||
ID: 6164 | Rating: 0 | rate: / | ||
1hvj1hbv_mod0014crossdocking_15775_294374
01:34:53 CPU time, 06:18:04 elapsed time, 0.000% done. Aborted.
|
||
ID: 6165 | Rating: 0 | rate: / | ||
2011/01/30: I've gotten hit with a bunch of the "run forever - no progress" WUs on my Windows 7 Pro-64 bit, 8 cores machine. Being firmly committed to the "set it and forget it" philosophy for volunteering my cycles, I've simply aborted them after the remaining time went to zero. I'll leave the debugging to those with greater time and expertise. Having read the thread about the problem, I'll now delete any future apparent problems after one hour with no progress rather than the 20 hours I've been giving WUs with an estimated 15 hour completion. Good luck to those attempting to track down and resolve this issue. As others have said, it's annoying. |
||
ID: 6166 | Rating: 0 | rate: / | ||
Another one, aborted after 12 hours, 0% performed and 0% remaining
|
||
ID: 6167 | Rating: 0 | rate: / | ||
50 more hours wasted, setting to no new work :-(
|
||
ID: 6168 | Rating: 0 | rate: / | ||
Having the same problem, only on my Win7 machine. However, NOT Intel CPU.
|
||
ID: 6169 | Rating: 0 | rate: / | ||
Hi.
|
||
ID: 6170 | Rating: 0 | rate: / | ||
OK... only my linux boxes were showing this, not any of my windows machines, so I reported it on the linux board. I'll just keep an eye on it and abort them when it happens, since it appears to have been going on for over 17 months without a fix. |
||
ID: 6172 | Rating: 0 | rate: / | ||
I just aborted seven of these with 0% progress after various run lengths. The four processing now are showing progress after only a few minutes of run time. Seems to be an issue with some WUs, but not all. |
||
ID: 6174 | Rating: 0 | rate: / | ||
new one
|
||
ID: 6175 | Rating: 0 | rate: / | ||
Also having this (recuring) problem on two machines.
|
||
ID: 6176 | Rating: 0 | rate: / | ||
Yet another,
1hvk1hbv_mod0014crossdockinghiv1_24501_456879
.
|
||
ID: 6178 | Rating: 0 | rate: / | ||
Hi.
|
||
ID: 6179 | Rating: 0 | rate: / | ||
1hvk1hbv_mod0014crossdockinghiv1_33167_465761
and...
|
||
ID: 6180 | Rating: 0 | rate: / | ||
Same here, i aborded a bunch of WUs this day. all stucked at 0.0% |
||
ID: 6181 | Rating: 0 | rate: / | ||
These dodgy WU's that are coming through have .INP files with a size of 0KB.
|
||
ID: 6183 | Rating: 0 | rate: / | ||
I also found a number of these wu on different systems. All with the imp file of zero bytes. Due the fact that I don't have the time to babysit my systems this can cause that I have to decide to stop temperary with docking |
||
ID: 6186 | Rating: 0 | rate: / | ||
Is this project still maintained or someone locked the server room in November and forgot about it? And it's miraculously running by itself?
|
||
ID: 6187 | Rating: 0 | rate: / | ||
... All with the imp file of zero bytes. Good catch... I just aborted a half dozen with 0-byte .inp files here, before they started running. I've wasted over 200 hours of crunching because of those over the last few days. Now how do we abort work units with a script? Then we can just have it check /var/lib/boinc/projects/docking.cis.udel.edu/ say, every 15 minutes, and abort work units with the 0-byte *.inp file names. Or would it be enough to just delete the 0-byte .inp files so BOINC doesn't even try to run those WU's and instead Aborts them itself? |
||
ID: 6188 | Rating: 0 | rate: / | ||
Well, for me >all< the good ones after 2 minutes are already 1%, so if after 4-5 minutes it's still 0 i abort them. |
||
ID: 6189 | Rating: 0 | rate: / | ||
During March 7-11 we had a big problem with the server ( http://docking.cis.udel.edu/about/project/news.php ), which was running out of space and therefore several workunits were sent empty. On March 11 we stopped the production, increase the server partition and resume distribution. I want to think that the problems of workunits with 0% progress are still some of those created during March 7-11, what worries me the most is that now it's been 15+ days and you are still having the problems. Looks like it could be the same problem as happened March 2010! |
||
ID: 6190 | Rating: 0 | rate: / | ||
Both of my machines are experiencing this problem. One is Linux 64-bit with 8GB of RAM and the other is Windows Vista 32-bit with 2GB of RAM. Work units ran for over 12 hours with 0% progress before I aborted them. |
||
ID: 6191 | Rating: 0 | rate: / | ||
Makes sense. But I'm a bit worried if there's anyone out there to fix the problem this time ^.= |
||
ID: 6192 | Rating: 0 | rate: / | ||
Just fired a PM off to Trilce Estrada (see above) to see if she is aware of the current problems. Hopefully she will have a look into it. |
||
ID: 6193 | Rating: 0 | rate: / | ||
Hi,
|
||
ID: 6194 | Rating: 0 | rate: / | ||
I did just a check and all the wu with this problem are created recently. Also this morning a found some new ones. The only reason why I have wu aborted by client is this reason, so you could see the workunts in my tasks list |
||
ID: 6195 | Rating: 0 | rate: / | ||
Arghh it's getting worse. Got almost only bad tasks today. Could delete them straight away because they indeed have 0-size INP file.
|
||
ID: 6196 | Rating: 0 | rate: / | ||
Simplest solution / work-around.
|
||
ID: 6198 | Rating: 0 | rate: / | ||
No response yet from the PM sent to Trilce Estrada :( |
||
ID: 6199 | Rating: 0 | rate: / | ||
No response yet from the PM sent to Trilce Estrada :( Does not surprise me any. Been one of the biggest flaws of this project is the lack of communication. ____________ Recognized by the Carnegie Institute of Science . Washington D.C. |
||
ID: 6200 | Rating: 0 | rate: / | ||
Have a WU that has been running for 17+ hours showing 0% progress and time to completion as --- with a report deadline of 2/11/11.
|
||
ID: 6201 | Rating: 0 | rate: / | ||
The problem is definitely caused by empty(0-size INP file) or corrupt (less than 1,16MB INP file - it even crashed the app twice lol) work units. Unfortunately it's a server-side problem and it would be nice to see that anyone in there cares about it...
|
||
ID: 6205 | Rating: 0 | rate: / | ||
I have just started and got two more of the zero-byte *.inp file bad work units. I aborted them following the advice on this thread. They are work units 18530883 and 18531863 . |
||
ID: 6208 | Rating: 0 | rate: / | ||
I have another bad one that I aborted because of the empty input file problem. It is work unit 18537922 . |
||
ID: 6209 | Rating: 0 | rate: / | ||
Drop me a line when y'all get it fixed. I'll be crunching elsewhere, maybe POEM. Got no time for handholding. |
||
ID: 6212 | Rating: 0 | rate: / | ||
I have Docking@home set to no new tasks until this is fixed. |
||
ID: 6214 | Rating: 0 | rate: / | ||
I've e-mailed Michela concerning the issue here. Maybe she is around.
|
||
ID: 6215 | Rating: 0 | rate: / | ||
Hi All,
|
||
ID: 6216 | Rating: 0 | rate: / | ||
Hi All, You have more then a disk issue, the project admin has a total lack of respect for its members. ____________ Recognized by the Carnegie Institute of Science . Washington D.C. |
||
ID: 6220 | Rating: 0 | rate: / | ||
Hi All, Nah, at least we know they're alive. I was getting worried. |
||
ID: 6222 | Rating: 0 | rate: / | ||
Hi All, Been there and done that with them too many times. Never was worried, just annoyed. ____________ Recognized by the Carnegie Institute of Science . Washington D.C. |
||
ID: 6223 | Rating: 0 | rate: / | ||
I discovered that I have the "Long Run times with no progress" tasks after two of them had run for over 96 hours.
|
||
ID: 6224 | Rating: 0 | rate: / | ||
Since this problem has been occurring intermittently for over a year and a half, Ronald Tilby's suggestions all sound reasonable to me.
2011-02-04 23:58:25|Docking|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 3 completed tasks 2011-02-04 23:58:30|Docking|Scheduler request succeeded: got 0 new tasks 2011-02-04 23:58:30|Docking|Message from server: Server error: can't attach shared memory Fri 04 Feb 2011 11:56:57 PM EST Docking Reporting 2 completed tasks, not requesting new tasks Fri 04 Feb 2011 11:57:08 PM EST Docking Scheduler request completed Fri 04 Feb 2011 11:57:08 PM EST Docking Message from server: Server error: can't attach shared memory Both of those clips were grabbed on the same machine. The first group is from BOINCTasks (from the BOINC manager on another machine on my LAN), and the second group is from BOINC itself... I just synced its clock, which was about 1:06 slow... so if you're synced to a source like otc2.psu.edu:123, too, your log clips for those 2 should show up between 23:58:03 on the 4th and a little after midnight on the 5th. I found a couple links to http://www.spy-hill.net/~myers/help/boinc/Create_Project.html#feeder on berkeley.edu about that message. None of them have to do with the client/manager, though. |
||
ID: 6225 | Rating: 0 | rate: / | ||
Dear All, a new update from D@H:
|
||
ID: 6228 | Rating: 0 | rate: / | ||
Dear All, a new update from D@H: I'll accept that as a reasonable answer. Academics first by all means, ____________ Recognized by the Carnegie Institute of Science . Washington D.C. |
||
ID: 6233 | Rating: 0 | rate: / | ||
Dear All, a new update from D@H: I doubt that your systems are capable of collecting results. Result uploads work fine, but they do nothing but waste space on your disk until they have been reported. That is when your server becomes aware of the results and prepares them for the postprocessing they need (checking to see if they can be validated, getting them validated, assimilated into the science database, and then deleted along with their associated work unit). However, when I try to do an update to report them or BOINC tries to do so automatically, I get these messages: 2/5/2011 2:18:10 PM Docking update requested by user 2/5/2011 2:18:14 PM Docking Sending scheduler request: Requested by user. 2/5/2011 2:18:14 PM Docking Reporting 4 completed tasks, requesting new tasks for CPU and GPU 2/5/2011 2:18:15 PM Docking Scheduler request completed: got 0 new tasks 2/5/2011 2:18:15 PM Docking Message from server: Server error: can't attach shared memory The results then stay in my lists of unfinished tasks. Something is keeping your server from being able to accept the reporting of these tasks. When I searched for the matter, one possible scenario involves the feeder not running. Could you please fix the issue preventing us from reporting our results so that they can finally be processesd and accepted? I think that the space freed up by the deletion of the work units could help you with your disk issues if the problem turns out to be a full disk, which has caused other projects to generate empty work unit files that must be aborted or caused other errors. |
||
ID: 6234 | Rating: 0 | rate: / | ||
Dear All, a new update from D@H: Hi, all the daemons are up and running. I am monitoring a couple of clients to see if I can reproduce you error message. ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6235 | Rating: 0 | rate: / | ||
Having the same problem as metioned above several times.
|
||
ID: 6236 | Rating: 0 | rate: / | ||
Having the same problem as metioned above several times. We removed all the jobs with potential 0% progress that were in our database. Unfortunately some jobs were distributed by the time we worked on the database. Can you abort the jobs with 0% progress and get new jobs? Thanks, Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6237 | Rating: 0 | rate: / | ||
All jobs aborted
|
||
ID: 6243 | Rating: 0 | rate: / | ||
All jobs aborted Thanks! Let us know if the new jobs have any similar issues. Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6244 | Rating: 0 | rate: / | ||
I'd ramped up the quota for Docking on a couple of machines to check for problems, and none were found. Looks okay.
|
||
ID: 6245 | Rating: 0 | rate: / | ||
I'd ramped up the quota for Docking on a couple of machines to check for problems, and none were found. Looks okay. The solution we were considering would require us to recompile charmm with BOINC and this task can be very challenging considering the complexity of charmm. At this point we have a nagios system in place alerting us on the quote of the disks and the status of the daemons. This should help a lot. Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6246 | Rating: 0 | rate: / | ||
If you've just installed nagios, then perhaps it will help. The problems we see here have been going on, (on and off), for a LONG time though. I can see that there is only one other member of our team that is still with the project now.
|
||
ID: 6247 | Rating: 0 | rate: / | ||
Continue to get problems with 0% progress for some tasks. What file do we need to check for zero length so we can abort the dud tasks early? |
||
ID: 6248 | Rating: 0 | rate: / | ||
Continue to get problems with 0% progress for some tasks. What file do we need to check for zero length so we can abort the dud tasks early? Can you please tell us the name of the jobs with 0% progress? We deleted the old jobs still on the server and the space on disk is now plenty. We were not able to delete the jobs already distributed. I want to check if your jobs with 0% progress are old jobs. Thanks, Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6249 | Rating: 0 | rate: / | ||
I have aborted all of them. |
||
ID: 6250 | Rating: 0 | rate: / | ||
I have aborted all of them. OK, this is a good decision. Please send me an e-mail or submit an entry to this forum if there are another jobs with 0% progress together with the name of the jobs. Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6251 | Rating: 0 | rate: / | ||
Thank you so much Docking ppl 4 posting a message to me via the screen saver graphics. Not sure how many months I have been getting the 0% progress error WU, but I aborted it as ordered. (^:= Here are the messages I got today regarding it.
|
||
ID: 6252 | Rating: 0 | rate: / | ||
This is strange, your client is continuously asking for GPU jobs and we do not support GPUs yet. I would expect that eventually the client starts asking for CPU jobs. What version of the client do you have? ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6253 | Rating: 0 | rate: / | ||
This is another task with 0% progress
|
||
ID: 6256 | Rating: 0 | rate: / | ||
The daily project throughput went down below 1/3rd of what it used to be, I doubt that one specific client version or only a few specific boxes have a problem.
|
||
ID: 6257 | Rating: 0 | rate: / | ||
Most of the workunits that were affected during the disk problem were 1hvl1hbv and 1hvk1hbv. We are not longer distributing empty jobs since last week.
|
||
ID: 6260 | Rating: 0 | rate: / | ||
The current versions of the BOINC client software will, if your computer has a BOINC-usable GPU, send ALL the connected projects requests for both CPU workunits and GPU workunits. However, the current versions of the BOINC server software allow you to reduce, but not totally eliminate, this - it allows the server to send a response telling the client not to ask for any more workunits of the type requested for up to about a week. This should allow you, for as long as you're not even planning any GPU workunits, to tell any client that sends a request for GPU workunits only that it should not send another such request for about a week. The 6.10.* series of BOINC client programs has this feature. It will eventually start asking for CPU workunits, but usually only after it gets at least one GPU workunit from SOME project. If you're looking for a project that sends only GPU workunits, I've found two: GPUGRID sends protein-folding workunits, but only if you have a sufficiently high-end Nvidia GPU (a GT 220 is currently about the lowest it will use). Collatz Conjecture sends workunits related to some math problem, but to almost any GPU that BOINC can use. http://www.gpugrid.net/ http://boinc.thesonntags.com/collatz/ An idea on how to handle the input file size checking: Add a wrapper program that checks the size of the input file, then passes control to the main application program ONLY if the input file passes this test. |
||
ID: 6261 | Rating: 0 | rate: / | ||
16-2-2011 11:59:41 Docking Restarting task 1hvk1hbv_mod0014crossdockinghiv1_19536_75277_0 using charmm34 version 623
|
||
ID: 6263 | Rating: 0 | rate: / | ||
Good number of "Client Error" wu's today. Most seem to fail after some multiple of ~300 seconds, (300, 600, 900, 1200 you get the picture).
|
||
ID: 6265 | Rating: 0 | rate: / | ||
Good number of "Client Error" wu's today. Most seem to fail after some multiple of ~300 seconds, (300, 600, 900, 1200 you get the picture). We are working on this problem right now! We will keep you posted. Thanks! ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6266 | Rating: 0 | rate: / | ||
One of the ligands, ligand 1hih, really did not want to dock into the other protein conformations than the one in which it was observed experimentally. So in the cross-docking simulation, no matter what protein conformation we were using, the simulation was very short and inconclusive, besides crating D@H problems. We removed the whole batch of simulations with this ligand and will work with our scientists to understand the scientific reason for this problem. We are distributing a new batch of jobs with another ligand and this time it seem to work OK.
|
||
ID: 6267 | Rating: 0 | rate: / | ||
All jobs aborted Above your message from the 8th Feb. Recieved a new job that day, after aborting old jobs. The jobs started today (charmm 34a2 6.23) and quit after one second wokring time with "error while computing" and progress of 100%. No problem with jobs of any other project today or the last few days. ____________ meine Kiste |
||
ID: 6268 | Rating: 0 | rate: / | ||
For some reason all my workunits on my new computer i7 2600k are stalling at 0.000% progress while those on my old computer Intel core duo are still running fine. Any idea on how to fix this? My new computer is burning through workunits on other projects, and I want them to do the same here. |
||
ID: 6271 | Rating: 0 | rate: / | ||
"does not want to dock" should not be treated as an error status, it is a result just like "docks easily" or "might dock if there is no better interface available".
|
||
ID: 6277 | Rating: 0 | rate: / | ||
"does not want to dock" should not be treated as an error status, it is a result just like "docks easily" or "might dock if there is no better interface available". The docking simulation can evolve toward a state in which the energy of the complex does not make any sense and the traditional charmm executable aborts. We wrapped charmm to catch these errors, terminate gently, and send us proper information. We also changed the application to give partial credits for partial simulations, once the initial phase of the simulation (when the ligand is located into the docking pocket) is successful. Right now we run a set of short simulations on a testing server for each complex to make sure that the simulations can complete. Unfortunately, this does not necessity mean that we are always able to capture all the possible problems of a complex simulation. Error and energy violations are hard to predict a priori, especially with the type of simulations we are doing right now in which we cross-dock proteins and ligands that were not observed experimentally. Our next step toward preventing this problem is as follows: we will extend the testing phase. Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6279 | Rating: 0 | rate: / | ||
I have seen the same problem on 2 differen system (linux and windows). While monitoring using BOINCTASKS, I notice the % complete is way past 100%. Stopping and starting boinc causes the task to start over at 0.0 % even though it might have run for 8-12 hours at over %200 or more. Viewing results using BOINCMANAGER one never sees past 0 percent as the percent complete is not calculated the same way in BM as in BT.
|
||
ID: 6350 | Rating: 0 | rate: / | ||
I have seen the same problem on 2 differen system (linux and windows). While monitoring using BOINCTASKS, I notice the % complete is way past 100%. Stopping and starting boinc causes the task to start over at 0.0 % even though it might have run for 8-12 hours at over %200 or more. Viewing results using BOINCMANAGER one never sees past 0 percent as the percent complete is not calculated the same way in BM as in BT. You may want to try observing the timing values for the last checkpoint before shutting down BOINC, since that's the critical factor in when workunits can be restarted. In my version of BOINC Manager, advanced view, Tasks, just click on the workunit, then Properties. Do not expect it to be able to resume any more recently than the last checkpoint after a system restart or a BOINC restart - BOINC simply does not have that capability. However, if your operating system supports sleep mode, and you suspend all workunits within BOINC but do not shut BOINC down entirely, the operating system should be able to go into sleep mode while still preserving the memory contents needed to resume the workunits where they were suspended, IF you have enabled the option to keep workunits in memory while they are suspended. Also, you may want to check if the operating system agrees that the workunit is still using any CPU time. If not, do not expect any time limits built into the application program to work - the code checking for exceeding that time limit cannot run with no CPU time at all. |
||
ID: 6351 | Rating: 0 | rate: / | ||
For some reason all my workunits on my new computer i7 2600k are stalling at 0.000% progress while those on my old computer Intel core duo are still running fine. Any idea on how to fix this? My new computer is burning through workunits on other projects, and I want them to do the same here. I'm having the same issue. I have 8 docking work units running at once and they all stay at zero percent. More over the elapsed time seems to count up to about a minute forty five or so then resets to zero. ------------------ System Information ------------------ Time of this report: 6/11/2011, 16:32:00 Machine name: VESPID Operating System: Windows 7 Professional 64-bit (6.1, Build 7600) (7600.win7_rtm.090713-1255) Language: English (Regional Setting: English) System Manufacturer: MSI System Model: MS-7681 BIOS: BIOS Date: 03/02/11 10:58:35 Ver: 04.06.04 Processor: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (8 CPUs), ~3.4GHz Memory: 16384MB RAM Available OS Memory: 16364MB RAM Page File: 5003MB used, 27723MB available |
||
ID: 6358 | Rating: 0 | rate: / | ||
For some reason all my workunits on my new computer i7 2600k are stalling at 0.000% progress while those on my old computer Intel core duo are still running fine. Any idea on how to fix this? My new computer is burning through workunits on other projects, and I want them to do the same here. We are looking at this. Thanks for the note! Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6360 | Rating: 0 | rate: / | ||
This is a report of the jobs associated with 1hvi1hpv.
|
||
ID: 6363 | Rating: 0 | rate: / | ||
I just attached to D@H yesterday and I'm getting this problem also, WU's stay at 0% with no progress, the screen saver say's to abort any WU's displaying 0%.
|
||
ID: 6365 | Rating: 0 | rate: / | ||
I have a 1hvi1hpv workunit on one of my computers, but since it's already at 18% progress I plan to let it run for now.
|
||
ID: 6366 | Rating: 0 | rate: / | ||
Does Docking still have the annoying 0.000% progress bug? |
||
ID: 6369 | Rating: 0 | rate: / | ||
Does Docking still have the annoying 0.000% progress bug? G'Day Vaughan, I have not noticed it in the past two weeks of processing work units. I am running both Windows and Linux, on 5 AMD Phenom processors and so far there has been no problems at all. Conan ____________ |
||
ID: 6370 | Rating: 0 | rate: / | ||
Does Docking still have the annoying 0.000% progress bug? Thanks Conan. Yes it seems to be behaving now. |
||
ID: 6371 | Rating: 0 | rate: / | ||
I just joined and I seem to have this 0% issue. |
||
ID: 6426 | Rating: 0 | rate: / | ||
I just joined and I seem to have this 0% issue. Hi, can I please have the name of the jobs with 0 % progress? I just checked the server and we have space on the disk (in the past it was one of the reasons for the problem). The testing machines in the lab seems to crunch well. We will look at this in detail tomorrow morning. Michela ____________ If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'! |
||
ID: 6433 | Rating: 0 | rate: / | ||
I just rejoined the project and found it was the reason why I previously terminated this project. Zero progress bar, WU's not terminating the process, etc.
|
||
ID: 6543 | Rating: 0 | rate: / | ||
Today I joined to this project and ran into the same 0 % bug. I disrupt the given two WU because they deliver 0 % progress and eating CPU-time.
|
||
ID: 6560 | Rating: 0 | rate: / | ||
The problem appears to be with the program's ability to output its results. It appears to be with CPU only, but the graphics version is not being used. I have an Intel CPU as well. Perhaps the output code is having difficulty working properly? |
||
ID: 6587 | Rating: 0 | rate: / | ||
This thread has been running since August 2009, and the problem has still not been solved, despite promises that the admins are working on it.
|
||
ID: 6614 | Rating: 0 | rate: / | ||
ok, I have a number of intel computer the ones that will run docking 64win7 have vt-x, the ones that will not run docking do not vt-x. go figure. |
||
ID: 6620 | Rating: 0 | rate: / | ||
same problem here. no changes using the newest version for gpu usage. docking only 0.00 had to abort all dockings. |
||
ID: 6837 | Rating: 0 | rate: / | ||
This messageboard has died as is now a terraforming/action art project involving large amounts of organic waste (therefore the smell)run by the Knights who say Ni!
|
||
ID: 6838 | Rating: 0 | rate: / | ||
This messageboard has died as is now a terraforming/action art project involving large amounts of organic waste (therefore the smell)run by the Knights who say Ni! free courses online |
||
ID: 6843 | Rating: 0 | rate: / | ||
same problem here. no changes using the newest version for gpu usage. docking only 0.00 had to abort all dockings. myfreecoursesonline |
||
ID: 6844 | Rating: 0 | rate: / | ||
Damn! Instead of terraforming we've created a spam magent!
|
||
ID: 6845 | Rating: 0 | rate: / | ||
Damn! Instead of terraforming we've created a spam magent! That should be magnet of course ... (slaps forhead) ____________ The Knights who say Ni! |
||
ID: 6846 | Rating: 0 | rate: / | ||
After over an hour of crunching, Progress remains at 0.000%. Can anyone help or explain to me what is happening? Thanks. It is happening on one of three of my computers. First began getting computational errors and then it changes to zero progress. There must be a problem on the far end not the user side as my other 2 machines appear to be working okay for now. |
||
ID: 6863 | Rating: 0 | rate: / | ||
After over an hour of crunching, Progress remains at 0.000%. Can anyone help or explain to me what is happening? Thanks. It is happening on one of three of my computers. First began getting computational errors and then it changes to zero progress. There must be a problem on the far end not the user side as my other 2 machines appear to be working okay for now. My guess is that the progress is only updated when a checkpoint is made, and many of the current workunits have some problem that makes them run at least 3 times the initially estimated time and perhaps 10 times the initially estimated time before writing any checkpoints at all (if they ever get around to doing anything useful). I've had to abort at least my last 5 workunits on this computer for this reason. I haven't seen any such workunits in the last few days on my other two computers, perhaps because all three computer participate in many BOINC projects and the others just haven't reached a good time for their next batch of Docking@Home workunits. |
||
ID: 6865 | Rating: 0 | rate: / | ||
same problem with this one in another thread |
||
ID: 6867 | Rating: 0 | rate: / | ||
Same problem!
|
||
ID: 6868 | Rating: 0 | rate: / | ||
Those mentioning this problem might mention whether they see it only on computers running Windows 7, and whether that happens to be the 64-bit version of Windows 7.
|
||
ID: 6869 | Rating: 0 | rate: / | ||
Those mentioning this problem might mention whether they see it only on computers running Windows 7, and whether that happens to be the 64-bit version of Windows 7. Win Vista 64/ Win 7 64. Been downloading files with zero content. Spread over 4 machines. Been downloading files with incorrect crc checksums.All 4 machines. The workunits that do start, on my Vista machine: Run up to 100% complete and after 16 hours still the same. Aborted. On same machine, same batch, zero% after two hours.On the other 3 machines I cannot even start to crunch. I'll wait for better days. Toppie. |
||
ID: 6870 | Rating: 0 | rate: / | ||
I had this issue on one 2008x64 server. 3 tasks running on a quad core opteron. No progress on any task after 18.5h, 14.4h and 14h.3h. CPU usage at 75% (the tasks), and memory being used as expected. I aborted the said tasks. The next tasks started running but didn't progress either so I restarted the system.
|
||
ID: 6871 | Rating: 0 | rate: / | ||
I've also a Win7 x64 machine
after the reboot one task had reached 1% progress by the time I had logged on (running as a daemon). The time was about 3min. to reach this 1% and the checkpoint was at 23sec. About 8min into the run and the same task went to 3.475%. Neither of the other tasks had progressed (0%), so I suspended them. I don't wait so long. As you said, the first progress is visible after almost 45sec. I checked the slot-folders (for example d:\Boinc\Project_Data\slots) of the broken WUs and saw, that several files are missing. |
||
ID: 6872 | Rating: 0 | rate: / | ||
A little more to report: One more workunit on my 64-bit Windows 7 computer failed the same way last night.
|
||
ID: 6873 | Rating: 0 | rate: / | ||
During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again.
|
||
ID: 6876 | Rating: 0 | rate: / | ||
During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. Thanks!!! :) I think that i have the same problem with a "1ohr1htf". Should I abort this one too? |
||
ID: 6877 | Rating: 0 | rate: / | ||
During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. Looks like the workunits need some test at the beginning that will quickly shut down any incomplete workunits. My current group of troublesome workunits all have names beginning with 1m0b1htf; should I abort all of them too? |
||
ID: 6878 | Rating: 0 | rate: / | ||
Yes, please abort them too. Thanks!
During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. |
||
ID: 6882 | Rating: 0 | rate: / | ||
Yes, please abort them too, thanks!
During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again. |
||
ID: 6883 | Rating: 0 | rate: / | ||
Why isn't this problem posted as a news item in the server status section yet? |
||
ID: 6885 | Rating: 0 | rate: / | ||
I have set Docking@Home to no new tasks and have aborted all current tasks. I had two more tasks run for hours this morning with 0 progress. Please let us know when you have fixed this problem. |
||
ID: 6886 | Rating: 0 | rate: / | ||
I have set Docking@Home to no new tasks and have aborted all current tasks. I had two more tasks run for hours this morning with 0 progress. Please let us know when you have fixed this problem. Yes, please let us know when the problem has been fixed. Presently, I think the server isn't sending new tasks, which is good saying as they don't work. Can you send server aborts, to expedite the resolution? GL ____________ |
||
ID: 6888 | Rating: 0 | rate: / | ||
Yes, please abort them too. Thanks! Aborted. Could you let us know when you have a new batch of workunits that have been adequately tested under 64-bit Windows 7, and the other versions of Windows mentioned recently in this thread? |
||
ID: 6890 | Rating: 0 | rate: / | ||
Yes, please abort them too. Thanks! I have new workunits today with 1hbv1hih string at the beginning. All 8 of them 2 hours in and 0% complete. |
||
ID: 6894 | Rating: 0 | rate: / | ||
The current 0% problem is not just restricted to Win 7 x64, all my Docking is done on XP x86 crunchers. |
||
ID: 6895 | Rating: 0 | rate: / | ||
The current 0% problem is not just restricted to Win 7 x64, all my Docking is done on XP x86 crunchers. Not restricted for me either. One of my 64-bit Windows Vista computers has now had two such failures, and is now on No New Tasks for Docking@Home. All three of my computers run BOINC 7.0.28. |
||
ID: 6897 | Rating: 0 | rate: / | ||
This problem is in Linux too (Arch Linux x86_64). |
||
ID: 6898 | Rating: 0 | rate: / | ||
I'm in it for the science (and subsequent speedup of medical research, ie, my life expectancy) and will keep testing WUs with new tasks even if my RAC takes a dive. Had loads of 1m0b1htf ones that are now aborted. |
||
ID: 6899 | Rating: 0 | rate: / | ||
Hi all,
|
||
ID: 6904 | Rating: 0 | rate: / | ||
Hi all,
|
||
ID: 6905 | Rating: 0 | rate: / | ||
Hi all, Can't you abort them remotely from the project's side (other projects like RNA do this regularly)? Would be much more convenient for people who don't have the time to check all machines for WUs blocking the computation. |
||
ID: 6907 | Rating: 0 | rate: / | ||
I aborted the ones that are "unsent" from server side, but for the workunits that are already sent to the volunteers, we do not have control from the server side.
Hi all, |
||
ID: 6908 | Rating: 0 | rate: / | ||
I aborted the ones that are "unsent" from server side, but for the workunits that are already sent to the volunteers, we do not have control from the server side. As I am sure your realize, this situation makes for a rather large pain in the tush. I will have to check several machines to make certain that the defective Docking WUs are not blocking Docking and the other projects I run as well. I pay attention to BOINC, so while it's an issue, for me, it's not insurmountable. I expect that many other crunchers do not pay attention and for them the defective Docking WUs might be a major slowdown, not only for Docking but for other projects. As it happens, other projects (Malariacontrol for instance) have the ability to delete WUs after they are downloaded. I urge in the strongest possible terms for Docking to spend some resources developing this capability. Mark |
||
ID: 6909 | Rating: 0 | rate: / | ||
I aborted the ones that are "unsent" from server side, but for the workunits that are already sent to the volunteers, we do not have control from the server side. I second that. And I can only reiterate, that other projects can do it. Mark mentioned MalariaControl and I mentioned RNA World before. Both projects have the ability to cancel tasks remotely (for instance when the results are not needed anymore). The BOINC platforms offers this somehwhere for sure. |
||
ID: 6912 | Rating: 0 | rate: / | ||
SOMETHING has allowed my Windows 7 computer to resume Docking@Home workunits. I can't tell if it was an improvement in the workunits, or the fact that I drained that computer of Docking@Home workunits and then told BOINC Manager to reset that project. |
||
ID: 6918 | Rating: 0 | rate: / | ||
-- Deleted -- |
||
ID: 6925 | Rating: 0 | rate: / | ||
Hi, this morning I noticed several tasks running
High Priority
, but
|
||
ID: 6934 | Rating: 0 | rate: / | ||
Dear Fred,
Hi, this morning I noticed several tasks running High Priority , but |
||
ID: 6936 | Rating: 0 | rate: / | ||
Still getting these workunits. Had 6 today with 42 hours elapsed time..
|
||
ID: 6989 | Rating: 0 | rate: / | ||
Message boards : Number crunching : HELP - Consistant 0% Progress - Client Problem?
Database Error: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) { [0]=> array(7) { ["file"]=> string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc" ["line"]=> int(97) ["function"]=> string(8) "do_query" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#264 (2) { ["db_conn"]=> resource(684) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(51) "update DBNAME.thread set views=views+1 where id=460" } } [1]=> array(7) { ["file"]=> string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc" ["line"]=> int(60) ["function"]=> string(6) "update" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#264 (2) { ["db_conn"]=> resource(684) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(3) { [0]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "460" ["forum"]=> string(1) "2" ["owner"]=> string(4) "9674" ["status"]=> string(1) "0" ["title"]=> string(47) "HELP - Consistant 0% Progress - Client Problem?" ["timestamp"]=> string(10) "1353082915" ["views"]=> string(4) "5327" ["replies"]=> string(3) "258" ["activity"]=> string(22) "2.6183915503533997e-34" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1250668904" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } [1]=> &string(6) "thread" [2]=> &string(13) "views=views+1" } } [2]=> array(7) { ["file"]=> string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php" ["line"]=> int(184) ["function"]=> string(6) "update" ["class"]=> string(11) "BoincThread" ["object"]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "460" ["forum"]=> string(1) "2" ["owner"]=> string(4) "9674" ["status"]=> string(1) "0" ["title"]=> string(47) "HELP - Consistant 0% Progress - Client Problem?" ["timestamp"]=> string(10) "1353082915" ["views"]=> string(4) "5327" ["replies"]=> string(3) "258" ["activity"]=> string(22) "2.6183915503533997e-34" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1250668904" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(13) "views=views+1" } } }query: update docking.thread set views=views+1 where id=460