HELP - Consistant 0% Progress - Client Problem?


Advanced search

Message boards : Number crunching : HELP - Consistant 0% Progress - Client Problem?

Sort
Author Message
Gandelf

Joined: Apr 11 09
Posts: 1
ID: 9674
Credit: 3,684,116
RAC: 0
Message 5333 - Posted 19 Aug 2009 8:01:44 UTC

I have a laptop which is refusing to budge past zero percent unit progress despite eating cpu.

Symptoms are:-
2 processes running at approx 50% cpu, each using 653kb ram, exe's are charmm34_6.23_windows_x86_64, processes run continuously >6 hours with zero progress, time remaining does not change from approx 4 hours.

Setup:-
CPU is T9800 dual core 2.93Ghz, Laptop is genuine licensed 64bit Windows 7 ultimate (from MSDN).

Tried:-
I have reset the project, detached and reattached, uninstalled and reinstalled,turned off DEP and still no luck. If I close the client and stop the processes the units restart from 0 time again.

Please help...

Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5351 - Posted 25 Aug 2009 16:37:20 UTC
Last modified: 25 Aug 2009 16:37:29 UTC

Hi Gandelf,

Is this the machine with the problem ?? I don't know exactly what can be causing the problem, we kow for sure that there are eventually workunits that take much longer than the average, but it would be really improbable that more than one were assigned to the same machine in such a short interval of time. We will look at your tasks to see if we can find a possible cause. If you find something, please let us know

Thank you

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 5375 - Posted 6 Sep 2009 19:12:35 UTC

Does that machine really only have 512MB ram? Maybe it's swapping to disk.
____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

SazanEyes

Joined: Feb 17 09
Posts: 2
ID: 7266
Credit: 830,596
RAC: 0
Message 5389 - Posted 17 Sep 2009 4:24:41 UTC

I'm seeing the same issue. Three of four tasks show at least 71 hours elapsed, with about 45 minutes to completion, but progress is still 0%. The fourth task hasn't started yet.

The tasks are here:
http://docking.cis.udel.edu/community/results.php?userid=7266
One specific task that has been running:
http://docking.cis.udel.edu/community/result.php?resultid=7802724
My PC:
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=44672

My situation is not a RAM issue. The Docking processes use very little RAM. This box is also running yoyo@home and The Lattice Project with no issues.

Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5390 - Posted 17 Sep 2009 16:41:45 UTC - in response to Message ID 5389 .

Hi Suzan

We have not been able to reproduce such an odd behavior, and we don't have an answer about what could be wrong. Please abort those workunits or detach/attach the project. Let us know if by doing this the situation changes or not.

Thanks a lot

SazanEyes

Joined: Feb 17 09
Posts: 2
ID: 7266
Credit: 830,596
RAC: 0
Message 5399 - Posted 17 Sep 2009 23:43:18 UTC

I aborted the workunits. I'll let you know if I see the same problem again.

j2satx
Volunteer tester

Joined: Dec 22 06
Posts: 183
ID: 339
Credit: 16,191,581
RAC: 0
Message 5401 - Posted 18 Sep 2009 14:23:49 UTC - in response to Message ID 5333 .

I have a laptop which is refusing to budge past zero percent unit progress despite eating cpu.

Symptoms are:-
2 processes running at approx 50% cpu, each using 653kb ram, exe's are charmm34_6.23_windows_x86_64, processes run continuously >6 hours with zero progress, time remaining does not change from approx 4 hours.

Setup:-
CPU is T9800 dual core 2.93Ghz, Laptop is genuine licensed 64bit Windows 7 ultimate (from MSDN).

Tried:-
I have reset the project, detached and reattached, uninstalled and reinstalled,turned off DEP and still no luck. If I close the client and stop the processes the units restart from 0 time again.

Please help...


I have the same issue on one Intel CPU Q9550. My other Intel CPUs (Q6600, Q6700, Q9300, Q9450) with same mobo run Docking fine. All W7RC 64-bit.

I did not find a solution to the issue, I just stopped running Docking on the Q9550.

Twodee

Joined: Jul 2 09
Posts: 2
ID: 14800
Credit: 4,938,070
RAC: 0
Message 5444 - Posted 13 Oct 2009 20:41:34 UTC

Same problem here.

Q9650 - 4 gigs of ram - all at stock
running on windows7-64bit.

several tests and always the same result, => 0% after hours

but, similar systems [q6600/q9300 etc..], with the same operation system and software base works fine.

Profile Erkan Yilmaz

Joined: Mar 29 09
Posts: 3
ID: 9000
Credit: 13,217
RAC: 0
Message 5478 - Posted 25 Oct 2009 4:20:44 UTC
Last modified: 25 Oct 2009 5:04:35 UTC

also no progress here

about the 2 processes:
1. one belongs to the current calculation
2. one did not stop from a previous BOINC session the day before

so, crashed 2., but did not help
when pausing 1. in BOINC, I see still cpu is consumed by the Docking app

So, I will also pause Docking until the problems solved
no other info gathered about this anomaly so far
on further request I could provide more info (e.g. remote connection)



system:
Q9400 with win7 64 RC, activated, admin user
BOINC 6.10.11

happened with these WUs:
http://docking.cis.udel.edu/community/result.php?resultid=8635180
http://docking.cis.udel.edu/community/result.php?resultid=8621734
http://docking.cis.udel.edu/community/result.php?resultid=8246167


Erkan YILMAZ
iaskquestions.com

Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5487 - Posted 28 Oct 2009 0:07:41 UTC - in response to Message ID 5478 .

Yes, sadly is still a mystery. It could be a particular initial random configuration of the ligand or something about BOINC, but we have not been able to have this behavior in our machines so far

yuuwaku

Joined: Sep 19 09
Posts: 3
ID: 18776
Credit: 4,810
RAC: 0
Message 5499 - Posted 1 Nov 2009 19:12:39 UTC

Just thought I'd add that I've been having the same problem. I currently have a work unit beginning 1c5q that has been running for 12 hours without any progress being made on it. This has happened with several work units but I think I remember them all beginning with 1c5q. Interestingly, when I suspend the project BOINC assigns two cores to other projects and charmm keeps eating up an entire core by itself even though I have BOINC set to use a maximum of two cores right now. It also uses a constant 704K of memory, which seems low but I haven't been paying much attention to what it usually uses.

System specs:

Windows 7 Ultimate 64 bit
Intel Core 2 Quad Q9550
8 Gigs of ram

yuuwaku

Joined: Sep 19 09
Posts: 3
ID: 18776
Credit: 4,810
RAC: 0
Message 5500 - Posted 2 Nov 2009 1:02:02 UTC

Well it isn't a particular type of work unit as I thought it might be. The same thing happened on a new one I tried and that began 1hvi. I tried aborting the task and the cpu remained in use again. I guess docking@home will have to be suspended for a little while.

Jakester

Joined: Nov 7 09
Posts: 1
ID: 21258
Credit: 0
RAC: 0
Message 5509 - Posted 8 Nov 2009 14:38:53 UTC

This is happening to me to. No matter what I try, it won't budge past zero. I detach and reattach, I suspend and resume, I abort and retry. Nothing, and it still eats my processor. Docking@home is suspended for now, let me know when this problem is fixed. Oh, my setup, if that matters:

Pentium Dual-Core E6300 @ 2.83 GHz
4 GB Dual-Channel 800 MHz RAM (brand is OCZ)
Radeon HD 4650 512MB (brand is HIS)

Never heard of these brands? That's because they're generic: I built this machine myself. Runs great, and it should; it's a borderline entry-level gamer machine. Anyway, try to sort this out so I can help. Thanks!

ampakal

Joined: Nov 2 09
Posts: 1
ID: 21040
Credit: 92,531
RAC: 0
Message 5512 - Posted 10 Nov 2009 2:42:02 UTC
Last modified: 10 Nov 2009 2:43:27 UTC

Same thing here.

Here are the details on my laptop

Intel core 2 duo 2.4 ghz
6 gb ram
nvidia 9600M gt
Windows 7 64

Machine id is 49006

Every single wu has had the same problem. It just runs and uses process time but never progresses beyond 0.00%. They even appear to start over when the laptop is restarted.

Would love to contribute. I hope this can be fixed soon.

yuuwaku

Joined: Sep 19 09
Posts: 3
ID: 18776
Credit: 4,810
RAC: 0
Message 5526 - Posted 21 Nov 2009 9:02:22 UTC

So far everyone who has posted their OS has reported using windows 7 64 bit. Perhaps that is where the problem lies.

The Dirts

Joined: Apr 13 09
Posts: 1
ID: 9813
Credit: 692,497
RAC: 0
Message 5529 - Posted 21 Nov 2009 20:20:34 UTC

Just thought I'd add I also have this problem. Yes, I am using Win 7 64 bit, but it's only a recent problem. Started about 3 to 4 day ago. Stopped the tasks at 64hrs running with 0% complete.

Also getting this message

21/11/2009 19:29:11 Docking@Home Sending scheduler request: Requested by user.
21/11/2009 19:29:11 Docking@Home Requesting new tasks for CPU
21/11/2009 19:29:16 Docking@Home Scheduler request completed: got 0 new tasks
21/11/2009 19:29:16 Docking@Home Message from server: No work sent
21/11/2009 19:29:16 Docking@Home Message from server: (reached daily quota of 2 results)

Cheers
TheDirts - TPR

Win 7 - 64bit
8Gb Ram
E8400

Marco Vuano

Joined: Mar 4 09
Posts: 2
ID: 7913
Credit: 16,969
RAC: 0
Message 5532 - Posted 21 Nov 2009 23:57:30 UTC

I have a desktop PC with a P5Q motherboard, an Intel Core2Duo E8400, 4 GB DDR2 RAM, a Sapphire Radeon HD 4670. When I was using Windows Vista Ultimate Service Pack 1 32 bit, Docking@Home was working perfectly, but when I switched to Windows 7 Ultimate 64 bit the Work Units stopped working correctly: the cpu time (in the "Properties" section of the WU) was always "---" and the graphics window showed "No Model Formed Yet.", even after many hours of continued processing. The work Units used 100% of the CPU core they were assigned to and used only some MBs of RAM (more than 2 GB of RAM were free). I tried to reset the project many times, with no success. The other BOINC projects were running fine. Then I tried running Docking@Home on Ubuntu 9.10 64 bit: this time the Work Units were correctly processed (the complexes of the work Units processed with Linux are so far 1pph, 1qb6 and 1ce5) while on Windows 7 they still aren't working (even the complexes 1qb6 and 1ce5).
I think this problem is related to Windows 7 64 bit.

[AF>France>Aquitaine>Cote-Adour-et-Gaves]Bernard 64250

Joined: Nov 22 09
Posts: 3
ID: 21818
Credit: 500,076
RAC: 0
Message 5545 - Posted 23 Nov 2009 15:06:28 UTC
Last modified: 23 Nov 2009 15:31:19 UTC

I just joined docking@home. I have 2 active WUs. Both are blocked with respectively 43,07% and 1% progress whilst elaspsed time counters go on running for a few hours without any special activity on my PC. Is this normal ? Is there any special HW requirement ? May I abort ?

Marco Vuano

Joined: Mar 4 09
Posts: 2
ID: 7913
Credit: 16,969
RAC: 0
Message 5550 - Posted 23 Nov 2009 22:16:19 UTC

@[AF>France>Aquitaine>Cote-Adour-et-Gaves] Bernard du 40
I don't think it would be wise to abort. Your progress is far different from 0%. Try to see if the CPU time counter (in the properties section of the "Work Units" tab in the Advanced View of Boinc Manager) is blocked and try to see if there are other processes with higher priority that are "stealing" CPU time to Docking@Home

arcturus

Joined: Sep 22 08
Posts: 4
ID: 1145
Credit: 767,313
RAC: 0
Message 5563 - Posted 1 Dec 2009 2:00:06 UTC

Confirming the same problem with a Q9550, 4 gigs of RAM on Win 7 64 bit. 0% after a number of hours.

However - no problem on a Phenom II 940 on Win 7 64 bit.

Looks to be a lot of processing power going to waste without a solution.

Profile Jaxis
Avatar

Joined: Oct 20 09
Posts: 1
ID: 20049
Credit: 559,626
RAC: 0
Message 5564 - Posted 1 Dec 2009 20:23:17 UTC

Another confirmation of wu's with 0% progress.

Windows7 64-bit
Intel Core 2 Quad CPU Q8400
ATI Radeon HD 4650
8 GB RAM
Boinc Ver. 6.10.18


"Properties" of task 1k1m_52_mod0014trypsin_6028_100105_0

Application: Charmm 34a2 6.23
Workunit Name: 1k1m_52_mod0014trypsin_6028_100105
State: Running
Received: 12/1/2009 12:58:37PM
Report deadline: 12/15/2009 11:38:56AM
CPU time at last checkpoint ---
CPU time ---
Elapsed time: 00:12:40
Estimated time remaining: 3:59:11
Fraction done: 0.000%
Virtual memory size: 153.32 MB
Working set size: 2.63 MB
Directory: slots/10


When prompted to "Show Graphics" it states "No Model Formed Yet."


Continuously restarts approx. every 3 minutes:

12/1/2009 1:00:24 PM Docking@Home Starting 1k1m_52_mod0014trypsin_6028_100105_0
12/1/2009 1:00:25 PM Docking@Home Starting task 1k1m_52_mod0014trypsin_6028_100105_0 using charmm34 version 623
12/1/2009 1:06:33 PM Docking@Home Restarting task 1k1m_52_mod0014trypsin_6028_100105_0 using charmm34 version 623
12/1/2009 1:09:37 PM Docking@Home Restarting task 1k1m_52_mod0014trypsin_6028_100105_0 using charmm34 version 623
12/1/2009 1:12:41 PM Docking@Home Restarting task 1k1m_52_mod0014trypsin_6028_100105_0 using charmm34 version 623
12/1/2009 1:15:45 PM Docking@Home Restarting task 1k1m_52_mod0014trypsin_6028_100105_0 using charmm34 version 623





Profile vaughan
Volunteer tester

Joined: Oct 3 06
Posts: 9
ID: 177
Credit: 3,108,281
RAC: 0
Message 5569 - Posted 6 Dec 2009 11:13:18 UTC
Last modified: 6 Dec 2009 11:14:47 UTC

31 hours and still at 0 percent. Estimated run-time is 3 hours WTF.

I stopped BOINC and killed BOINCtray from Windows task manager (why doesn't this process close when you shutdown BOINC?) Win7 64 Ultimate, Intel C2D Wolfdale E8600 @ 4GHz

Restart BOINC, tasks resume at 0 percent done and time is 0 again. Did I just waste 31 hours of crunching?

Developers please address this issue ASAP.
____________

Yoran

Joined: Dec 4 09
Posts: 3
ID: 22405
Credit: 1,117
RAC: 0
Message 5573 - Posted 10 Dec 2009 10:33:19 UTC

just to let you know, I'm having the same problem:

specs:
Intel E8600
Nvidia gtx295
4gb of ram
Windows 7 ultimate x86

info:charmm is using 50%cpu time and only 500kb ram...


I will check this thread once in a while, untill then i'll disable docking@home :(
please solve this problem quickly :(

steve

Joined: Jun 22 09
Posts: 4
ID: 13910
Credit: 62,355
RAC: 0
Message 5587 - Posted 19 Dec 2009 19:53:12 UTC

I see this is an ongoing problem. I just aborted all Docking WUs
because of no progress shown on 13+ hours of cpu time with no
progress shown on a 3hr WU.

I'll check back after the first of the year to see if any solutions
are offered.

Specs: Win 7 64bit
Intel Q9550 cpu
8 Gig of ram
Ati Radeon HD 5850 grahics card
BOINC 6.10.19

Bill

Luciano

Joined: Dec 29 09
Posts: 2
ID: 23483
Credit: 224,791
RAC: 0
Message 5604 - Posted 31 Dec 2009 10:31:17 UTC - in response to Message ID 5573 .

I'm having the same problem.
Spec:
Processor: Intel Core2 Duo CPU T9550
Cache: 6.00mb
OS: Windows 7 Ultimate x64 Edition
Memory: 4gb
Kaspersky 2010 (no scan on folder boinc/*)
process docking restart continuously every about 3 min.
Charmm is using only 500kb ram.

I disable docking@home

steve

Joined: Jun 22 09
Posts: 4
ID: 13910
Credit: 62,355
RAC: 0
Message 5622 - Posted 8 Jan 2010 3:37:31 UTC

Down loaded and started new work. Three units ran for 13+ hours while showing no work. After aborting the Docking work units charrm processes continued to run for several hours untill I killed them in task manager. They had all four cores pegged at 100%. Since there is no interest in fixing this bug from Docking at home I'll be detaching all computers. I'm sure my cpu cycles can be put to use on another project.



Bill

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 5645 - Posted 15 Jan 2010 15:18:49 UTC

From what I've seen, the problem may be specific to the combination of Windows 7 and a sufficiently recent Intel CPU. Anyone ready to agree or disagree?

Also, anyone with this problem may want to search their log files for anything mentioning the boinc_lockfile; this probably indicates a problem I've seen on some other BOINC projects, where the problem can cascade from any workunit that originates the problem to any other workunit using the same slot before the next boinc.exe restart.

Yoran

Joined: Dec 4 09
Posts: 3
ID: 22405
Credit: 1,117
RAC: 0
Message 5647 - Posted 15 Jan 2010 18:56:28 UTC - in response to Message ID 5645 .

From what I've seen, the problem may be specific to the combination of Windows 7 and a sufficiently recent Intel CPU. Anyone ready to agree or disagree?

Also, anyone with this problem may want to search their log files for anything mentioning the boinc_lockfile; this probably indicates a problem I've seen on some other BOINC projects, where the problem can cascade from any workunit that originates the problem to any other workunit using the same slot before the next boinc.exe restart.



I've searched the log, and all I could find was:

07:05:05 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623
07:08:11 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623
07:11:16 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623
07:14:21 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623
07:17:26 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623
07:20:31 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623
07:23:36 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34 version 623
07:26:41 [Docking@Home] Restarting task 1k1l_93_mod0014trypsin_18710_128757_0 using charmm34

and it just goes on and on and on...
Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 5648 - Posted 16 Jan 2010 12:10:05 UTC
Last modified: 16 Jan 2010 12:10:55 UTC

Looks like the cause of the problem is different than what I've seen before.

Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 5654 - Posted 17 Jan 2010 10:31:49 UTC

I've suggested that the users of this thread post here to keep it all together. More people with the same problem raises the profile somewhat, but doesn't seem to offer much help yet.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Inya

Joined: Jan 16 10
Posts: 4
ID: 24388
Credit: 62,023
RAC: 0
Message 5655 - Posted 17 Jan 2010 13:48:33 UTC
Last modified: 17 Jan 2010 13:50:25 UTC

Same here, no Docking WUs show any progress.
Tested 10 to 15 different ones with different number/letters combination at start of their name.

My info:
Processor: Intel Core2 Quad CPU Q8300 @ 2.50GHz
OS: Windows 7 Home Premium 64Bit

process docking restarts continuously every 3 to 4 min.
Charmm is using only 500KB to 600KB RAM

BOINC Version 6.10.18

Other projects are running smoothly without problems (like ABC, Rosetta, RCN, NFS, SETI, PG, WCG).

Docking runs smoothly at my older PCs/laptops with Intel processors (no quads) and all with WIN XP.

Yeti

Joined: Sep 3 08
Posts: 2
ID: 606
Credit: 243,169
RAC: 0
Message 5656 - Posted 17 Jan 2010 15:49:28 UTC

HM, added 6 machines for the charity-race; 5 running okay, 1 is having the described problem with no progress

1x Win7 32Bit on Intel QuadCore having problems: http://docking.cis.udel.edu/community/show_host_detail.php?hostid=56940

2x Win7 32Bit on Intel QuadCore without problems: http://docking.cis.udel.edu/community/show_host_detail.php?hostid=45880 and http://docking.cis.udel.edu/community/show_host_detail.php?hostid=56941

3x Server2K3 x64 without problems:
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=56942
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=56943
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=56944

____________

Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 5659 - Posted 18 Jan 2010 10:47:36 UTC
Last modified: 18 Jan 2010 10:48:42 UTC

Those that have the continual "non-run", was it running okay then start the problem? What I'm wondering is that something happens/ is set/ written to a file/ etc. by a wu and thereafter, the later wu's are seeing some flag/ setting/ something or other and "failing to run" as a result of that?
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Yeti

Joined: Sep 3 08
Posts: 2
ID: 606
Credit: 243,169
RAC: 0
Message 5660 - Posted 18 Jan 2010 11:08:35 UTC - in response to Message ID 5659 .

Those that have the continual "non-run", was it running okay then start the problem? What I'm wondering is that something happens/ is set/ written to a file/ etc. by a wu and thereafter, the later wu's are seeing some flag/ setting/ something or other and "failing to run" as a result of that?

No, for me, the problem started direct with the first Docking-WU on the machine
Inya

Joined: Jan 16 10
Posts: 4
ID: 24388
Credit: 62,023
RAC: 0
Message 5661 - Posted 18 Jan 2010 14:18:09 UTC

Same here. Never run Docking before on that machine.

Inya

Joined: Jan 16 10
Posts: 4
ID: 24388
Credit: 62,023
RAC: 0
Message 5665 - Posted 19 Jan 2010 22:12:29 UTC

Any chances to get solved the problem with not-running/permanently restarting WUs at some machines?

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 5669 - Posted 20 Jan 2010 16:35:52 UTC

Anyone want to mention if they've seen this problem recently on any machine NOT running Windows 7 on an Intel CPU?

Anyone else want to mention what Windows 7 version on which Intel CPU type, if you haven't already, in order to help pin down what machines to use in pinning down this problem?

Inya

Joined: Jan 16 10
Posts: 4
ID: 24388
Credit: 62,023
RAC: 0
Message 5675 - Posted 22 Jan 2010 20:21:17 UTC

That is, what was said in Planet 3DNow! forum yesterday:

http://www.planet3dnow.de/vbulletin/showpost.php?p=4126023&postcount=215

In English:

The condition for the problem is Win7 + Intel-Yorkfield/Wolfdale (whether dual or quad, cache size does not matter). The problem may or may not occur.
Yoran

Joined: Dec 4 09
Posts: 3
ID: 22405
Credit: 1,117
RAC: 0
Message 5690 - Posted 25 Jan 2010 16:28:36 UTC

"may or may not occur"

sounds to me as if they don't have a clue...

Matthias Lehmkuhl

Joined: Sep 9 08
Posts: 9
ID: 801
Credit: 151,820
RAC: 0
Message 5719 - Posted 9 Feb 2010 13:07:44 UTC

Have the same problem,
Docking works fine with XP SP3, since change to Win 7 the program runs with no progress bar and no check pointing or other changes in the slot dir.
Also no resultname..._0 to resultname..._3 Files where created in the project dir.
resultid=10723677
resultid=10724590

reset of the project and deleting all remaining files brings no help
first result was
resultid=10599879
witch was started under XP SP3 and should finished under Win 7
Both OS where 32bit, same hardware.
The other project results had finished without problems.
Set Docking on this machine to NNW, and wait with aborting of the result till 22.02.
So if you need some information feel free to contact me.

____________
Matthias

Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5720 - Posted 9 Feb 2010 17:29:06 UTC
Last modified: 9 Feb 2010 17:58:35 UTC

Hi, must be gettin bored, have, anyway looks like it, the same problems.
From the messages tab :
9-2-2010 5:59:47 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32006514441217.789000
9-2-2010 6:00:49 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32112238476814.980000
9-2-2010 6:01:50 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32215566952693.199000
9-2-2010 6:02:50 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32319001897892.266000
9-2-2010 6:03:51 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32422436843091.328000
9-2-2010 6:04:52 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32525845170960.180000
9-2-2010 6:05:52 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32629280116159.242000
9-2-2010 6:06:53 Docking [error] 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0: negative FLOPs left -32732661826697.883000
9-2-2010 6:08:07 Docking Computation for task 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0 finished
9-2-2010 6:08:07 Docking Starting 1dif1ajv_mod0014crossdockinghiv1_1318_257996_0
9-2-2010 6:08:07 Docking Starting task 1dif1ajv_mod0014crossdockinghiv1_1318_257996_0 using charmm34 version 623
9-2-2010 6:08:07 Docking Starting 1dif1ajv_mod0014crossdockinghiv1_1317_300510_0
9-2-2010 6:08:07 Docking Starting task 1dif1ajv_mod0014crossdockinghiv1_1317_300510_0 using charmm34 version 623
9-2-2010 6:08:09 Docking Started upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_0
9-2-2010 6:08:09 Docking Started upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_1
9-2-2010 6:08:16 Docking Finished upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_0
9-2-2010 6:08:16 Docking Finished upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_1
9-2-2010 6:08:16 Docking Started upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_2
9-2-2010 6:08:16 Docking Started upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_3
9-2-2010 6:08:20 Docking Finished upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_2
9-2-2010 6:08:20 Docking Finished upload of 1ebw1ajv_mod0014crossdockinghiv1_166_409078_0_3

Can't find one WU, that was not validated, though.
Anyone has a clue, as to why this is happening, some WU's just stop on my Laptop, almost all, at about 90% (HP Pavillion,T2400CPU; WIN XP x86)?

The other host's (Q6600's) don't have this problem!
And use a minimum of 2 GiG (DDR2) except for the XP64 host, which has 4 GiG DDR2.

We have just reached and passed the 10 Million task's and WU ID .
____________

Knight who says N! Ni Ni

Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 5721 - Posted 10 Feb 2010 15:31:10 UTC
Last modified: 10 Feb 2010 15:32:09 UTC

Looked like a similar issue at Rosetta here but it has appreciable differences. I don't think there is commonality.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

mickey

Joined: Jan 11 10
Posts: 1
ID: 24216
Credit: 0
RAC: 0
Message 5728 - Posted 26 Feb 2010 18:23:54 UTC

Same problem for me: many hours of computation, progress still 0.000% and in the screensaver a message says something like "no protein created yet"

if can help this is my pc:
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=56027
____________

Hacker

Joined: Mar 20 09
Posts: 2
ID: 8510
Credit: 297,087
RAC: 0
Message 5736 - Posted 2 Mar 2010 14:12:05 UTC

Same problem. My PC:
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=60297

AU518987077

Joined: May 21 09
Posts: 2
ID: 11728
Credit: 594,363
RAC: 0
Message 5737 - Posted 3 Mar 2010 15:59:41 UTC

as the people have said before. Same Here.
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=60472
i will watch this thread as it develops but for now I'm suspending that task as it has already done 50% of the work time (6 hours 23 minutes 13 seconds) according to the completion time of 11 hours 55 minutes and 59 seconds and shows 0% completion

yet all my other tasks happily chug along.

MechWarrior

Joined: Jan 25 10
Posts: 2
ID: 24925
Credit: 134,391
RAC: 0
Message 5740 - Posted 5 Mar 2010 18:32:54 UTC

Yep same thing here...

Intel Mobile Core 2 Duo P8700 Penryn
6 Gb RAM

http://docking.cis.udel.edu/community/results.php?hostid=59103

Running fine on an Intel and AMD system with Win 7 RC2 64bit. Tried several reinstalls and no luck. have not had an issue withe any other OS or computer ( also running on intel p4 3Ghz and AMD 3500+ both with XP. All other progects run without issues including collatz, Seti, GPU grid, Rosetta, and Yoyo. ( first 3 running the GPU app)

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 5742 - Posted 6 Mar 2010 12:53:56 UTC

I currently have two Charmm34a2 6.23 workunits showing around 2 hours elapsed time, 0.000% progress, 00:27:37 to completion, no checkpoints written yet, and a CPU core in use.

Is this combination normal for this version?

64-bit Vista SP2
BOINC 6.10.18
1hvi_37_mod0013b type workunits

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5744 - Posted 6 Mar 2010 13:13:56 UTC
Last modified: 6 Mar 2010 13:20:36 UTC

I have some of those too now, it seems that this is caused by empty input files (file size = 0 bytes), so if you have any of those, abort them .

Jim

Joined: Jan 3 10
Posts: 2
ID: 23788
Credit: 342,233
RAC: 0
Message 5745 - Posted 6 Mar 2010 15:53:57 UTC
Last modified: 6 Mar 2010 15:59:35 UTC

I have several that are of the Charmm 34a2 6.23 type. They were still at 0% after nearly 4 hours. They show a total completion run of 1hr 9 minutes.
Aborted the things.
Running these on a 8 core Intel machine.
Also the graphics display shows that there is no model. On the Intel 4 core machine some had run for nearly 8 hours.
Bad batch.
____________

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 5747 - Posted 6 Mar 2010 16:41:25 UTC

Closer to 6 hours elapsed time before I saw the last two messages in this thread. Will abort them now.

MechWarrior

Joined: Jan 25 10
Posts: 2
ID: 24925
Credit: 134,391
RAC: 0
Message 5748 - Posted 6 Mar 2010 17:38:04 UTC - in response to Message ID 5744 .

I have some of those too now, it seems that this is caused by empty input files (file size = 0 bytes), so if you have any of those, abort them .



Yep I found a few on a couple of my systems that were hanging. All had the IMP files at 0 bytes and seem to have downloaded in the past 18 hours.

May try once again to run this project on my laptop. Maybe I was just getting a few bad WU's to start with and had other that would have ran.
Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5749 - Posted 6 Mar 2010 17:57:34 UTC
Last modified: 6 Mar 2010 17:59:07 UTC

You can "abort" the damaged WUs before they start to crunch by deleting all 0-bytes .inp files in advance - but give them time to download ;-)

p.s.: from what I can see, the bad batch is through now, the input files I received lately all had contents.

outlnder

Joined: Sep 18 08
Posts: 1
ID: 1026
Credit: 4,215,011
RAC: 0
Message 5752 - Posted 7 Mar 2010 0:32:07 UTC

UNBELIEVABLE!!!

I lost more than 864 hours of computing time because of this. 18 boxen, 4 cores per box, 12 hours per core. A half day of electricity is $14.

I will complete my 5 mil cobbles as I promised my teammates, then permanently "Detach" from this project.

Profile Neil Polson
Avatar

Joined: Jan 18 10
Posts: 2
ID: 24578
Credit: 347,368
RAC: 0
Message 5753 - Posted 7 Mar 2010 7:15:47 UTC
Last modified: 7 Mar 2010 7:19:16 UTC

Upon waking this morning I discovered I had 3 of these too. Mine were issued around 10UTC yesterday. As Ananas stated all had 0byte .inp files. Teach me to think a thread didn't apply to me and not read it again! Could had saved myself 6 hours.

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5754 - Posted 7 Mar 2010 12:59:52 UTC

Dang, there are still those damaged things under ways :-(

Minardi

Joined: Oct 21 09
Posts: 4
ID: 20057
Credit: 3,888,211
RAC: 0
Message 5755 - Posted 7 Mar 2010 13:16:37 UTC - in response to Message ID 5744 .

I have some of those too now, it seems that this is caused by empty input files (file size = 0 bytes), so if you have any of those, abort them .


Thanks. I had this problem start about 24 hours ago. I aborted tasks until I came to some where the progress bar came off zero after a minute or so. Thanks for the heads up.

I am a relatively new BOINC user - how do you see the specific files that are downloaded? I am running the BOINC client, 6.10.18, and not using a client manager, but attaching to projects on each PC through the BOINC client.

Thanks
Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5756 - Posted 7 Mar 2010 13:31:16 UTC
Last modified: 7 Mar 2010 13:41:55 UTC

In your BOINC data directory, there should be "projects/docking.cis.udel.edu"

Look for files there that have the file extension ".inp" (Windoze default is to hide the extensions, you might have to enable that in your windows explorer) and have a file size of 0 bytes.

Be careful, there are probably result files with 0 bytes file size, make sure to delete only those with .inp at the end.

______________________________________________

It seems that there are more incomplete WUs around, I found some that end in the middle of the code like :

     625 SEG1 41   ARG  HN      1   0.250000       1.00800           0   0.00000     -0.301140E-02
626 SEG1 41 ARG CA 10 0.500000E-01 12.0110 0 0.00000 -0.301

or
    1866 SEG2 18   GLN  HE21    1   0.300000       1.00800           0   0.00000     -0.301140E-02
1867 SEG2 18 GLN HE22 1 0.300000 1.00800 0 0.00000 -0.301140E-02
1868 SEG2

or even
  set params lpdb
set paramfile @params
set rtffile @params_amino.rtf
set prmfile @p

Not sure what to do with those, I doubt that they produce valid results :-/
Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5757 - Posted 7 Mar 2010 14:00:52 UTC
Last modified: 7 Mar 2010 14:24:39 UTC

I contacted the project leader by mail now.

If it had been only the 0 bytes files, it wouldn't have been that bad - aborted WUs don't go into the science database.

The incomplete ones might produce invalid results without beeing caught by the validator, so those might mess up the scientific contents of the project.


(I hope that not everyone had the same idea, she might be mad then)

edit :

Got a response already ... Quote : " We will look at this immediately. " ... so it will sure be fixed soon :-)

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 5758 - Posted 7 Mar 2010 14:26:57 UTC - in response to Message ID 5757 .

We are looking at the problem. We may need to stop the distribution of work temporarily.

We will keep you posted.

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 5759 - Posted 7 Mar 2010 15:21:46 UTC - in response to Message ID 5758 .

We temporarily suspended the generation of new jobs while investigating some issues with the charmm script. Stay tuned ..

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Aegis Maelstrom

Joined: Feb 19 09
Posts: 2
ID: 7346
Credit: 90,121
RAC: 0
Message 5760 - Posted 7 Mar 2010 19:27:12 UTC

Workunit 1iiq_43_mod0013b_1581_18995

31+ hrs of continuous work and 0.00% progress. No problem with the machine. My fault I haven't realized this error before.

Halting Docking@Home until the problem is resolved.

Additionally I would suggest adding some process termination (abort WU) to the watchdog: abort the WU when it is being crunched without any progress for some given period of time (2 hrs?).

Similar solutions have been tried in Rosetta@Home as urgent workarounds and I think it would help to see the problem far earlier. So far we had lost a lot of computing power.

Best regards,
a.m., BOINC@Poland

Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5761 - Posted 7 Mar 2010 20:24:49 UTC
Last modified: 7 Mar 2010 20:37:11 UTC

Hi, except problems with BOINC version 6.10.18*. not only on my VISTA host, noticed the same problems as stated above , that is 0% progress after 6 hours and a re start of BOINC, just started the same WU's again, with zero-time and progress . When suspending a task, it just starts another one, with the same result .

* BOINC, also runs 5 task's?!? After a re start:
7-3-2010 21:10:12 Starting BOINC client version 6.10.18 for windows_intelx86
7-3-2010 21:10:12 log flags: file_xfer, sched_ops, task
7-3-2010 21:10:12 Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3
7-3-2010 21:10:12 Data directory: C:ProgramDataBOINC
7-3-2010 21:10:12 Running under account Fred
7-3-2010 21:10:13 Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [x86 Family 6 Model 15 Stepping 7]
7-3-2010 21:10:13 Processor: 4.00 MB cache
7-3-2010 21:10:13 Processor features: fpu tsc pae nx sse sse2 pni mmx
7-3-2010 21:10:13 OS: Microsoft Windows Vista: Home Premium x86 Edition, Service Pack 2, (06.00.6002.00)
7-3-2010 21:10:13 Memory: 2.00 GB physical, 4.28 GB virtual
7-3-2010 21:10:13 Disk: 290.58 GB total, 225.62 GB free
7-3-2010 21:10:13 Local time is UTC +1 hours
7-3-2010 21:10:13 NVIDIA GPU 0: GeForce 8500 GT (driver version 19038, CUDA version 2030, compute capability 1.1, 512MB, 30 GFLOPS peak)
7-3-2010 21:10:13 Not using a proxy
7-3-2010 21:10:13 Docking URL http://docking.cis.udel.edu/; Computer ID 51168; resource share 300
7-3-2010 21:10:13 Docking Restarting task 1t7k_50_mod0013b_9228_321667_0 using charmm34 version 623
7-3-2010 21:10:13 Docking Restarting task 1t7k_50_mod0013b_9227_285203_0 using charmm34 version 623
7-3-2010 21:10:38 Docking task 1t7k_50_mod0013b_9228_321667_0 suspended by user
7-3-2010 21:10:39 Docking Starting 1t7k_50_mod0013b_9226_208436_0
7-3-2010 21:10:39 Docking Starting task 1t7k_50_mod0013b_9226_208436_0 using charmm34 version 623
7-3-2010 21:10:42 Docking task 1t7k_50_mod0013b_9227_285203_0 suspended by user
7-3-2010 21:10:43 Docking Starting 1t7k_50_mod0013b_9144_184886_0
7-3-2010 21:10:43 Docking Starting task 1t7k_50_mod0013b_9144_184886_0 using charmm34 version 623


On my VISTA QUAD, it started just after 15:15, previous (same)WU's, did fine ?!
No changes, during this time, wasn't even at home . . .
____________

Knight who says N! Ni Ni

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5763 - Posted 7 Mar 2010 21:26:23 UTC

A sanity check that scans the file for a logical EOF mark would sure help - something like // in Stockholm or just a comment line with !EOF, which would not disturb the current syntax.

Profile MacDitch
Volunteer tester

Joined: Sep 13 06
Posts: 27
ID: 24
Credit: 377,838
RAC: 0
Message 5765 - Posted 7 Mar 2010 22:16:40 UTC

Workunit 1ohr_47_mod0013b_238_18734

Currently at 03:19:57 and 0.000%. Now suspended pending further instructions form the project.

Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5766 - Posted 8 Mar 2010 12:39:59 UTC
Last modified: 8 Mar 2010 12:46:10 UTC

Hi, appears the Never Ending WU's , are not over yet!

[size=9]7-3-2010 21:46:07 Docking URL http://docking.cis.udel.edu/; Computer ID 51168; resource share 300
7-3-2010 21:46:07 Reading preferences override file
7-3-2010 21:46:07 Preferences limit memory usage when active to 1023.29MB
7-3-2010 21:46:07 Preferences limit memory usage when idle to 1534.93MB
7-3-2010 21:46:10 Preferences limit disk usage to 10.00GB
7-3-2010 21:46:12 Docking Restarting task 1t7k_50_mod0013b_9226_208436_0 using charmm34 version 623
7-3-2010 21:46:13 Docking Restarting task 1t7k_50_mod0013b_9144_184886_0 using charmm34 version 623
7-3-2010 21:48:02 Docking suspended by user
7-3-2010 21:48:40 Docking resumed by user
7-3-2010 21:48:51 Docking task 1t7k_50_mod0013b_9228_321667_0 resumed by user
7-3-2010 21:48:54 Docking task 1t7k_50_mod0013b_9227_285203_0 resumed by user
7-3-2010 22:48:47 Docking Resuming task 1t7k_50_mod0013b_9226_208436_0 using charmm34 version 623

7-3-2010 23:04:08 Docking Sending scheduler request: To fetch work.
7-3-2010 23:04:08 Docking Reporting 6 completed tasks, requesting new tasks for GPU
7-3-2010 23:04:13 Docking Scheduler request completed: got 0 new tasks
7-3-2010 23:16:11 Docking Resuming task 1t7k_50_mod0013b_9144_184886_0 using charmm34 version 623
7-3-2010 23:18:04 Docking Restarting task 1t7k_50_mod0013b_9228_321667_0 using charmm34 version 623
7-3-2010 23:18:04 Docking Restarting task 1t7k_50_mod0013b_9227_285203_0 using charmm34 version 623

Four task's are running for 16 hours and 0% progress , nothing seems more logical, skipping these , at least, but since I don't know where or what is going wrong, I'll have to Baby-Sit the WU's.
Which is a bit absurd and has nothing to do, with donating spare-CPU & GPU, cycles . . . .
Has this event already reached the Project Dev.'s and Staff?

[ADDED] Did stop the 4 task's running 16 hours with no progress, no new tasks are started?!
____________

Knight who says N! Ni Ni

Profile 7ri9991 [MM]
Avatar

Joined: Apr 20 09
Posts: 14
ID: 10169
Credit: 304,285
RAC: 0
Message 5767 - Posted 8 Mar 2010 13:54:10 UTC

Verster,

Everything you're bringing up has already been addressed in this thread. Abort the tasks that are running forever with no progress because they have incomplete input files that will not and cannot finish.

Also stated above, the project admins are looking into the problem.

Profile MacDitch
Volunteer tester

Joined: Sep 13 06
Posts: 27
ID: 24
Credit: 377,838
RAC: 0
Message 5768 - Posted 8 Mar 2010 17:11:01 UTC

Not sure if we're meant to report or not, but just in case...

I've now aborted the following due to no progress:

Workunit 1ohr_47_mod0013b_238_18734 at 03:19:57
Workunit 1ohr_47_mod0013b_3921_337894 at 07:28:59
Workunit 1ohr_48_mod0013b_5768_90544 at 07:28:41

Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5769 - Posted 8 Mar 2010 20:19:32 UTC

--[snip]--
Also stated above, the project admins are looking into the problem.


Thanks Trigggl, yesterday I just noticed Docking WU's being retrieved/deleted/???
But this morning, I had a new load, with the same problems .
Hope they get it fixed, whithout too much hassle :)



____________

Knight who says N! Ni Ni
Profile 7ri9991 [MM]
Avatar

Joined: Apr 20 09
Posts: 14
ID: 10169
Credit: 304,285
RAC: 0
Message 5770 - Posted 8 Mar 2010 21:45:20 UTC - in response to Message ID 5769 .

--[snip]--
Also stated above, the project admins are looking into the problem.


Thanks Trigggl, yesterday I just noticed Docking WU's being retrieved/deleted/???
But this morning, I had a new load, with the same problems .
Hope they get it fixed, whithout too much hassle :)

I'm doing some RNA work until these problems are cleaned up.
Profile [B^S] Acmefrog
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 45
ID: 252
Credit: 1,604,407
RAC: 0
Message 5771 - Posted 9 Mar 2010 3:10:26 UTC
Last modified: 9 Mar 2010 3:10:58 UTC

I picked up a few of these never ending WUs. I have aborted them. My cache is drying up so I will see what these remaining few are doing. Has anyone seen anykind of pattern to which ones hang? All the ones I aborted seem to be different types.
____________

Profile 7ri9991 [MM]
Avatar

Joined: Apr 20 09
Posts: 14
ID: 10169
Credit: 304,285
RAC: 0
Message 5772 - Posted 9 Mar 2010 4:14:50 UTC - in response to Message ID 5749 .

You can "abort" the damaged WUs before they start to crunch by deleting all 0-bytes .inp files in advance - but give them time to download ;-)

p.s.: from what I can see, the bad batch is through now, the input files I received lately all had contents.

Everyone keeps asking how to figure out which ones hang. Check the .inp files. Most of the ones that are going to hang are empty files. The ones that will succeed should be roughly 1.2M and end with something like this:
END
goto donereadpdbfile5

In Linux you can check it with
tail <crossdocking-file>.inp
Calphor

Joined: Sep 19 08
Posts: 1
ID: 1042
Credit: 2,653,930
RAC: 0
Message 5773 - Posted 9 Mar 2010 4:50:59 UTC

I've noticed that one of my machines has not been affected by the bug. It is running Win7 64bit with BOINC 6.10.29. My other machines that have been bogged down with errors are all running XP 32bit and BOINC 6.10.18. Is there a relationship?
____________

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5778 - Posted 9 Mar 2010 19:42:26 UTC - in response to Message ID 5773 .

I've noticed that one of my machines has not been affected by the bug. It is running Win7 64bit with BOINC 6.10.29. My other machines that have been bogged down with errors are all running XP 32bit and BOINC 6.10.18. Is there a relationship?


Empty is empty, an x64 binary cannot change that fact :-) Possible that it handles the empty input different and aborts them immediately, possible that the box has just been lucky.
Profile 7ri9991 [MM]
Avatar

Joined: Apr 20 09
Posts: 14
ID: 10169
Credit: 304,285
RAC: 0
Message 5779 - Posted 9 Mar 2010 20:31:18 UTC

I'm finally getting work again and the input files are all complete. It *may* be safe to download work again.

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5780 - Posted 9 Mar 2010 21:50:25 UTC - in response to Message ID 5779 .

I'm finally getting work again and the input files are all complete. ...


Same here :-)

The last failures reported have probably been cached files from before the bugfix.
Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5784 - Posted 10 Mar 2010 18:21:56 UTC

Hi still got some tasks. Which never end, presumably, 0% progress.
Best to abort them? I think.



____________

Knight who says N! Ni Ni

Profile TheFiend

Joined: Apr 7 09
Posts: 70
ID: 9482
Credit: 20,705,527
RAC: 0
Message 5787 - Posted 11 Mar 2010 7:38:37 UTC

I must have been lucky........ have only come across 2 of these units on my 4 cruchers, which have been aborted before reaching the top of the cache.

Jim

Joined: Jan 3 10
Posts: 2
ID: 23788
Credit: 342,233
RAC: 0
Message 5800 - Posted 14 Mar 2010 17:04:38 UTC

Most of the units that start with the name: "1iiq_43_" have failed to start on my Vista and Windows 7 machines. Some ran as long as 85 hours before I noticed them not completing.
I aborted all the 1iiq_43_ units.

NICE PROJECT

Profile Mark Brown

Joined: Dec 31 09
Posts: 6
ID: 23636
Credit: 4,678,904
RAC: 0
Message 5802 - Posted 15 Mar 2010 2:27:18 UTC

I have several systems (19) with a variety of OS. that were stuck at 0%.
I had a 2.5 day cache on all of them.
I finally decided to abort all wu's that eithor havn't started yet or were stuck at 0%.

I wont get new tasks for a few days.

My average credit has dropped like a rock this week.

Hope everything gets fixed by next week. I think the project is well worth my cpu(s) time.


____________

AU518987077

Joined: May 21 09
Posts: 2
ID: 11728
Credit: 594,363
RAC: 0
Message 5805 - Posted 15 Mar 2010 21:49:07 UTC - in response to Message ID 5737 .

as the people have said before. Same Here.
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=60472
i will watch this thread as it develops but for now I'm suspending that task as it has already done 50% of the work time (6 hours 23 minutes 13 seconds) according to the completion time of 11 hours 55 minutes and 59 seconds and shows 0% completion

yet all my other tasks happily chug along.


heres what ive pulled form my message logs

100315 044535 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
146 times every 3 minutes give or take 20 seconds with things in between like
100315 052003 Project communication failed: attempting access to reference site
100315 052005 Internet access OK - project servers may be temporarily down.

100315 134934 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623


100315 135546 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 135851 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 140157 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 140809 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 141114 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 141420 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 141726 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 142031 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 142337 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 142643 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 143254 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 143600 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 143629 Suspending computation - user is active
100315 143931 Resuming computation
100315 144234 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 144539 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 144845 Docking Restarting task 1yqj_117_mod0013bp38alpha_2963_55009_0 using charmm34 version 623
100315 144912 Suspending computation - user is active

has spent about 7 hours at this point calculating (task switch every 10 minutes so its really been running more like 3 days)and has 0% to show for it,

Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5806 - Posted 16 Mar 2010 11:38:54 UTC
Last modified: 16 Mar 2010 12:01:14 UTC

Hi, for the 3th time I'm seeing tasks with >100 K seconds runtime and no progress?!?
It is not only counter-productive but very annoying too.
Have a look ???
This WU !?
A waste of resources, IMO.
What can be done, except setting some debug flags, if it's a client error .
I still have some on 1 host and it seems best to me to abort all of them.
It's only a waste of time letting them run.
*BOINC 6.10.18 started the trouble on one of my (3) host's (QUAD's)


Has anyone looked into this, or has some sort of explanation?
Btw, I've changed BOINC versions 3 times, cause BOINC 6.10.xx* doesn't handle large amounts of WU's, (ofcoarse from different projects), well, if you have to many WU's in 'cache', boinc.exe can rise to >20% if there are > 4000 task's in cache.
Or it looses contact with local host.
I only have a 3 day's cache, which appears to work better.

This problem , exists about half a year! I've read 'somewhere', that it should be set and forget , maybe set and forgive, but baby-sitting BOINC can be pretty time consuming and I have something, you can call a life, too :)
____________

Knight who says N! Ni Ni

Jim Strait

Joined: Jul 27 09
Posts: 1
ID: 16229
Credit: 816,722
RAC: 0
Message 5818 - Posted 20 Mar 2010 17:17:47 UTC

I had run in to a few of the 0% completion sessions a few days ago and just encountered 4 just now. They were all just past their (same) deadline. One had been running for 12 hours and the other 3 for 5 hours. I usually can knock out one in a half hour. I aborted the 4 sessions and 2 other Docking sessions started and ran to completion normally. All were running high priority.

I am running Windows XP service pack 3 with an Intel quad core i7 CPU 920 @ 2.67GHz with 2.66, 3.25 GB RAM.

-Jim

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5819 - Posted 20 Mar 2010 18:21:47 UTC

They do still deliver a lot of those damaged results but those are still old ones whith status "no reply" on the previous host.

From what I can see, no new damaged results have been created lately.


It would have been smart do delete all inp files with 0 bytes from the download directory, so a redelivery would have given uns a download error instead of getting stuck.

Profile Beyond

Joined: Feb 9 09
Posts: 8
ID: 6984
Credit: 3,132,056
RAC: 0
Message 5820 - Posted 21 Mar 2010 4:57:47 UTC

I just had to delete a boatload of WUs, some running for over 16 hours and at 0%.
This has gotten ridiculous.

Profile FalconFly
Avatar

Joined: Jan 17 10
Posts: 1
ID: 24493
Credit: 946,295
RAC: 0
Message 5821 - Posted 21 Mar 2010 12:22:08 UTC - in response to Message ID 5820 .
Last modified: 21 Mar 2010 12:22:29 UTC

Darn, after pausing a while due to the problem and getting back into after the supposed fix, I just found I lost some ~500hours of CPU time - again.

That's really annoying but at least some Workunits seem to run normal.
____________
Scientific Network : 44800 MHz - 77824 MB - 1970 GB

Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5822 - Posted 21 Mar 2010 12:49:47 UTC
Last modified: 21 Mar 2010 12:56:31 UTC

Hi, well the empty; no-progrss WU's , are gone, atleast looks that way.

Now, 1 host, running VISTA x86, has almost only Docking WU's, but they all
have been started (~100), some are in High Priority , not consistent with their deadline, though. When switching projects , again BOINC starts a new WU, instead of finishing the one it was working on, IMO :
Maybe it has something to do with BOINC 6.10.37.?

Has anyone seen this odd behavior, before or now?

Atleast, work is done, but 'this' seems to be blocking BOINC, getting other task's!
____________

Knight who says N! Ni Ni

Profile Mark Brown

Joined: Dec 31 09
Posts: 6
ID: 23636
Credit: 4,678,904
RAC: 0
Message 5823 - Posted 21 Mar 2010 14:05:18 UTC

I'm back to processing WU here, but am still getting some of those nasty 0% complete jobs. I'll just abort them when needed.

____________

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 5825 - Posted 21 Mar 2010 18:54:23 UTC - in response to Message ID 5822 .
Last modified: 21 Mar 2010 18:56:47 UTC

Hi, well the empty; no-progrss WU's , are gone, atleast looks that way. ...


not the redelivered ones. I have killed about 10 just now, several of which already had eaten quite some CPU time. I run only a tiny cache, so those have been quite fresh.

Example sent 21 Mar 2010 14:27:43 UTC
Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 5826 - Posted 22 Mar 2010 1:31:18 UTC
Last modified: 22 Mar 2010 1:37:43 UTC

Ahh, I have 32 Docking WU's pauzed looks like a task which was switched after 60min. pauzed and another is started, but it looks they are finishing normally. Atleast, I hope so.
Now 3 are running [i]High Priority
,(1 0.04CPU+GPU SETI)

How could this happen? I did uncheck keep in memory, when pauzed!
I think, this shouldn't happen, in the first place.
Network trouble @ SETI, empty WU's, WU's randomly starting . . .
Knock on wood :)
____________

Knight who says N! Ni Ni

crystalsys

Joined: May 28 09
Posts: 4
ID: 12210
Credit: 738,141
RAC: 0
Message 5827 - Posted 22 Mar 2010 16:31:49 UTC
Last modified: 22 Mar 2010 16:33:10 UTC

I'm seeing similar. Get the WU, shows some reasonable estimate - I had some that said 48 minutes last week. I just killed one that showed 66 hours elapsed, 0% progress, nothing under time-to-complete.

No other projects are behaving this way. The only other bad actor is PrimeGrid which keeps shoving WUs that immediately run high priority. I've got that one on no-new-tasks.
____________

crystalsys

Joined: May 28 09
Posts: 4
ID: 12210
Credit: 738,141
RAC: 0
Message 5828 - Posted 23 Mar 2010 11:01:30 UTC - in response to Message ID 5827 .

I'm seeing similar. Get the WU, shows some reasonable estimate - I had some that said 48 minutes last week. I just killed one that showed 66 hours elapsed, 0% progress, nothing under time-to-complete.

No other projects are behaving this way. The only other bad actor is PrimeGrid which keeps shoving WUs that immediately run high priority. I've got that one on no-new-tasks.



Last night I had one ready to start with an estimated 4:15 run time. This morning it has run for an hour, still shows 0% complete.
____________
Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 5829 - Posted 23 Mar 2010 13:00:16 UTC

Yes I just aborted 3 of these tasks, one at 16 hours 0.00% another at 14 hours 0.00% and one at nearly 4 hours 0.00%.
So a couple still floating around.
____________

TPR_Mojo

Joined: Mar 26 09
Posts: 6
ID: 8777
Credit: 7,205,188
RAC: 0
Message 5830 - Posted 23 Mar 2010 16:02:36 UTC

Please can we have some sort of response to this problem, even if it is just an acknowledgement and "we are looking into it"? Its ongoing, we are dealing with it as best we can but the silence from the project team is deafening.

Profile rebel9

Joined: Sep 3 08
Posts: 2
ID: 421
Credit: 67,272
RAC: 0
Message 5831 - Posted 23 Mar 2010 17:18:29 UTC

Hear, hear. I haven't had a good WU for at least a month and probably longer. I'm sick of aborting WUs with dozens and dozens of wasted hours invested in them. I am understanding of problems but this is getting ridiculous and I'm on the cusp of disabling this project. If you don't sort it you're going to find the level of interest tumbling like a house of cards out here.

Thanks.

Profile King Leo

Joined: Jun 16 09
Posts: 10
ID: 13433
Credit: 4,464,450
RAC: 0
Message 5833 - Posted 23 Mar 2010 18:30:42 UTC

Why is this problem not being addressed? I have three computers all showing 100% CPU usage but 0% PROGRESS. This has been happening on my computers for some time now so I will have to attach to another project until someone on your end finds a remedy.

Profile cenit

Joined: Sep 25 09
Posts: 1
ID: 18997
Credit: 22,829
RAC: 0
Message 5841 - Posted 24 Mar 2010 18:20:55 UTC - in response to Message ID 5833 .

Why is this problem not being addressed? I have three computers all showing 100% CPU usage but 0% PROGRESS. This has been happening on my computers for some time now so I will have to attach to another project until someone on your end finds a remedy.



I just aborted this wu because it was at 0% after 9 hours of computation.

I put Docking on NNW
Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 5842 - Posted 24 Mar 2010 20:17:40 UTC



This issue of zero progress is bad enough, yet the lack of respect to acknowledge us is truly far greater to me at this point. Almost seems you don't understand the value of the participants in this project. Simple put, without us... there is no Docking at home.
Sincerely,
Heidi-Ann Kennedy
____________

Recognized by the Carnegie Institute of Science . Washington D.C.

Cluster Physik

Joined: Jul 2 09
Posts: 35
ID: 14795
Credit: 16,067,012
RAC: 0
Message 5843 - Posted 24 Mar 2010 23:24:09 UTC - in response to Message ID 5842 .

This issue of zero progress is bad enough, yet the lack of respect to acknowledge us is truly far greater to me at this point. Almost seems you don't understand the value of the participants in this project. Simple put, without us... there is no Docking at home.
Sincerely,
Heidi-Ann Kennedy

I second that.

There needs to be at least a response if not a solution to this severe problem! I'm really thinking about setting Docking to "now new work" if nothing happens. I have better things to do than to constantly check all systems for those broken WUs. And I'm sure I'm not the only one considering the simplest "solution". It's just a click in my account manager to get rid of this annoyance.
DoubleTop

Joined: Mar 29 09
Posts: 11
ID: 9044
Credit: 26,873,788
RAC: 0
Message 5844 - Posted 25 Mar 2010 8:37:14 UTC
Last modified: 25 Mar 2010 8:37:45 UTC

I totally agree with ScientificFrontline on this one, I'm having to put in masses of effort to keep my machines computing work units. Effort that I should not have to, imo.

I've lost 100's of hours worth of computing time on these poor units, and there are some machines that I won't be onsite for another month and have the chance to check.

Some form of technical response, or even an apology of sorts from the project team wouldn't go amiss.

DT.

crystalsys

Joined: May 28 09
Posts: 4
ID: 12210
Credit: 738,141
RAC: 0
Message 5851 - Posted 26 Mar 2010 13:56:20 UTC

There's another thread calling for a boycott, which I'm not yet inclined to do. But where is "the man behind the curtain"? There are other, worthwhile projects out there, and we get to choose which ones to run, so some acknowledgment or comment would seem to be appropriate here.
____________

TPR_Mojo

Joined: Mar 26 09
Posts: 6
ID: 8777
Credit: 7,205,188
RAC: 0
Message 5853 - Posted 26 Mar 2010 16:09:31 UTC

22k RAC = ninth overall = bye bye Docking, at least until they have learned some basic communication skills. I'm not burning electricity and putting my time and effort in for a group who don't even think they need to talk to their volunteers. Plenty of units coming soon, I'm about to ditch about 1300

Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5854 - Posted 26 Mar 2010 18:58:45 UTC
Last modified: 26 Mar 2010 19:06:44 UTC

During March 7-11 we had a big problem with the server ( http://docking.cis.udel.edu/about/project/news.php ), which was running out of space and therefore several workunits were sent empty. On March 11 we stopped the production, increase the server partition and resume distribution. I want to think that the problems of workunits with 0% progress are still some of those created during March 7-11, what worries me the most is that now it's been 15+ days and you are still having the problems.

According to the example posted by Ananas, the problem was indeed an old workinit (created on March 7th), but if any of you could post a link to one of those eternal workunits it would be great, specially if it was created after March 11/12 so that we can take a look into the input files, in the meantime we are trying to identify what can be wrong, and if the problem is contained to those old wu's or if it is spread

DoubleTop

Joined: Mar 29 09
Posts: 11
ID: 9044
Credit: 26,873,788
RAC: 0
Message 5856 - Posted 26 Mar 2010 19:20:02 UTC - in response to Message ID 5854 .

unfortunately, I think a large proportion of users have aborted all or detached now Trilce, and you'll have a hard time getting those logs.

I for one can vouch that I simply went through and found all units with a 0KB size and simply aborted them, so perhaps there is a way to analyse the "User aborted" units in the system?

If you need help, then please do ask. I understand you have PhD deadlines to hit and that is fine, but I'm raising my hand for looking through an export of "Client Aborted", the difficulty will be putting them next to logs of files sent, as I think the results of such work may be now skewed from the number of people who have simply aborted all units.

I wonder if the actual .exe should perform a check on the file sizes?

hth,

DT.

Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 5858 - Posted 26 Mar 2010 19:59:39 UTC - in response to Message ID 5854 .

During March 7-11 we had a big problem with the server ( http://docking.cis.udel.edu/about/project/news.php ), which was running out of space and therefore several workunits were sent empty. On March 11 we stopped the production, increase the server partition and resume distribution. I want to think that the problems of workunits with 0% progress are still some of those created during March 7-11, what worries me the most is that now it's been 15+ days and you are still having the problems.

According to the example posted by Ananas, the problem was indeed an old workinit (created on March 7th), but if any of you could post a link to one of those eternal workunits it would be great, specially if it was created after March 11/12 so that we can take a look into the input files, in the meantime we are trying to identify what can be wrong, and if the problem is contained to those old wu's or if it is spread


Trilce,

This is the only project I have ever truly been passionate about.
I understand issues do arise, yet a response of any kind is always imperative.
One of science as yourself knows the importance of communication, without such... there is failure.

I'm willing also to help with communicating with members, your team has to establish the connection though.
____________

Recognized by the Carnegie Institute of Science . Washington D.C.
DoubleTop

Joined: Mar 29 09
Posts: 11
ID: 9044
Credit: 26,873,788
RAC: 0
Message 5859 - Posted 26 Mar 2010 20:09:08 UTC

http://docking.cis.udel.edu/community/workunit.php?wuid=11126773

I've just been sent this unit by the server, which falls into the problem timeframe of unit generation from the first send. The problem may be the aborted work units being re-sent, or even non-returned being resent. The user Miguel in this instance could be stuck on 0% with that unit and hence why I've received it now.

Pure fluke that I spotted this one come into my queue, knowing that I had only just set the project to allow new work.

hth,

DT.

Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5864 - Posted 26 Mar 2010 20:43:09 UTC - in response to Message ID 5859 .

I think you are right DoubleTop, we are/will face a different problem now, there is a bug in the transitioner that keeps generating workunits even if we said we want just one. We had changed the workuint generator and other daemons around and haven't been able to keep it straight, I need to modify the validator to accept these kind of workunits, let's see if I can fix it soon

@Scientific Frontline, I'm sorry about this, and you are absolutely right, I'm seriously thinking about making a facebook group, or twitter account or something more effective than the forums, because this format is not helping much

Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 5869 - Posted 27 Mar 2010 0:25:56 UTC - in response to Message ID 5864 .

I think you are right DoubleTop, we are/will face a different problem now, there is a bug in the transitioner that keeps generating workunits even if we said we want just one. We had changed the workuint generator and other daemons around and haven't been able to keep it straight, I need to modify the validator to accept these kind of workunits, let's see if I can fix it soon

@Scientific Frontline, I'm sorry about this, and you are absolutely right, I'm seriously thinking about making a facebook group, or twitter account or something more effective than the forums, because this format is not helping much


As I am also, now lets all move forward with better understanding of both sides.
Heidi-Ann Kennedy
____________

Recognized by the Carnegie Institute of Science . Washington D.C.
Profile Mark Brown

Joined: Dec 31 09
Posts: 6
ID: 23636
Credit: 4,678,904
RAC: 0
Message 5870 - Posted 27 Mar 2010 1:09:43 UTC

When the problem started, I stopped requesting new WUs. When I started getting WUs again, I started monitoring for 0% and killed only them. I figured there were enough good WUs to make it valid to keep pressing on. I havn't received any bad WUs for awhile but if I do, I will document them before killing them.

I hope others proceed in this manner instead of moving to other projects.

I do run other projects concurrently with D@H set to a much higher priority.

Good luck. BTW, facebook would be a good forum for status problems/updates.

____________

Profile Bryan Price

Joined: Jun 22 09
Posts: 2
ID: 13990
Credit: 526,416
RAC: 0
Message 5874 - Posted 28 Mar 2010 15:58:34 UTC - in response to Message ID 5870 .

Good luck. BTW, facebook would be a good forum for status problems/updates.


With the screensaver, a quick update "If you see this and your project is at 0%, abort it!" would have sufficed! :)
Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 5879 - Posted 29 Mar 2010 0:40:39 UTC

Personally I would just like to see the notices on the main page where they belong in my opinion.
Don't use screen-savers and just about as negative towards social sites as one can get. Project news/updates belong on the project site, nowhere else.
____________

Recognized by the Carnegie Institute of Science . Washington D.C.

Profile 7ri9991 [MM]
Avatar

Joined: Apr 20 09
Posts: 14
ID: 10169
Credit: 304,285
RAC: 0
Message 5880 - Posted 29 Mar 2010 10:50:46 UTC - in response to Message ID 5874 .

Good luck. BTW, facebook would be a good forum for status problems/updates.


With the screensaver, a quick update "If you see this and your project is at 0%, abort it!" would have sufficed! :)

Except the problem is a 0 data input file. If they can code it to recognize that and switch the screensaver, it would be easier to code it to recognize and abort, or better yet, recognize and re-download. It would be easy to scan an input file for the word "END".
Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5883 - Posted 29 Mar 2010 17:13:55 UTC

The abortion was supposed to be codded on the charmm warper, but it is obvious that is not working. We will have to revisit the code to add a reliable way to detect empty or truncated input files

Profile 7ri9991 [MM]
Avatar

Joined: Apr 20 09
Posts: 14
ID: 10169
Credit: 304,285
RAC: 0
Message 5886 - Posted 29 Mar 2010 18:06:23 UTC - in response to Message ID 5883 .

The abortion was supposed to be codded on the charmm warper, but it is obvious that is not working. We will have to revisit the code to add a reliable way to detect empty or truncated input files

I'm not a programmer, but I play one on web forums. :-D
Profile rebel9

Joined: Sep 3 08
Posts: 2
ID: 421
Credit: 67,272
RAC: 0
Message 5888 - Posted 30 Mar 2010 14:22:47 UTC - in response to Message ID 5870 .

When the problem started, I stopped requesting new WUs. When I started getting WUs again, I started monitoring for 0% and killed only them. I figured there were enough good WUs to make it valid to keep pressing on. I havn't received any bad WUs for awhile but if I do, I will document them before killing them.

I hope others proceed in this manner instead of moving to other projects.


Yeees, unfortunately, there aren't any "good" WUs, or at least I haven't seen one for months, so the cusp is now behind me and I've stopped receiving work until such time as their is a clear indication that this has stopped. Shame.
Profile Mark Brown

Joined: Dec 31 09
Posts: 6
ID: 23636
Credit: 4,678,904
RAC: 0
Message 5889 - Posted 30 Mar 2010 23:48:55 UTC

Another one bites the dust..

1t7k_50_mod0013b_9971_238982_1
Received: 3/21/2010
Deadline: 4/4/2010
CPU Time: 83:16:00
Elapsed Time: 95:00:30
Estimated Time Remaining: 04:08:41
Fraction Done: 0.000%


Task ID 12021095
Name 1t7k_50_mod0013b_9971_238982_1
Workunit 11113581
Created 21 Mar 2010 7:03:10 UTC
Sent 21 Mar 2010 7:03:35 UTC
Received 30 Mar 2010 23:41:53 UTC
Server state Over
Outcome Client error
Client state Aborted by user
Exit status -197 (0xffffffffffffff3b)
Computer ID 61649
Report deadline 4 Apr 2010 5:43:35 UTC
CPU time 299902.6
stderr out <core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
<stderr_txt>
Calling BOINC init.
Starting charmm run (initial or from checkpoint)...
Calling BOINC init.
Starting charmm run (initial or from checkpoint)...
Calling BOINC init.
Starting charmm run (initial or from checkpoint)...
Calling BOINC init.
Starting charmm run (initial or from checkpoint)...
Calling BOINC init.
Starting charmm run (initial or from checkpoint)...


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C81A3E1

Engaging BOINC Windows Runtime Debugger...



********************


____________

Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5892 - Posted 1 Apr 2010 22:08:20 UTC - in response to Message ID 5889 .

Another one bites the dust..

1t7k_50_mod0013b_9971_238982_1



Yes, that was one of them. The good news is that this crisis helped us to take care of several pending issues, one was the generation and validation of retrials, like this one ending in _1
Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5893 - Posted 1 Apr 2010 22:10:21 UTC - in response to Message ID 5889 .

Another one bites the dust..

1t7k_50_mod0013b_9971_238982_1



Yes, that was one of them. The good news is that this crisis helped us to take care of several pending issues, one was the generation and validation of retrials, like this one ending in _1
TPR_Mojo

Joined: Mar 26 09
Posts: 6
ID: 8777
Credit: 7,205,188
RAC: 0
Message 5894 - Posted 2 Apr 2010 19:26:55 UTC - in response to Message ID 5893 .
Last modified: 2 Apr 2010 19:28:59 UTC


Yes, that was one of them. The good news is that this crisis helped us to take care of several pending issues, one was the generation and validation of retrials, like this one ending in _1


Yes, that was one of them. The good news is that this crisis helped us to take care of several pending issues, one was the generation and validation of retrials, like this one ending in _1


Unfortunately we have found a new bug in the message board software where under certain circumstances replies are posted twice.......... ;)
Profile Trilce Estrada
Forum moderator
Project administrator
Project developer
Project tester

Joined: Sep 19 06
Posts: 189
ID: 119
Credit: 1,217,236
RAC: 0
Message 5897 - Posted 5 Apr 2010 16:43:02 UTC - in response to Message ID 5894 .
Last modified: 5 Apr 2010 16:43:54 UTC

True =)

arcturus

Joined: Sep 22 08
Posts: 4
ID: 1145
Credit: 767,313
RAC: 0
Message 5900 - Posted 5 Apr 2010 20:54:43 UTC

<sigh>

4 months later, hoping things have improved? Nope.

Downloaded 4 of the Charmm 34a2 6.23's which have (you guessed it!) the all too familiar 0% issue. Aborted them.

Q9550 Yorkfield on Win 7 64 bit. Who has time to babysit?

Hopeless.

Hacker

Joined: Mar 20 09
Posts: 2
ID: 8510
Credit: 297,087
RAC: 0
Message 5916 - Posted 29 Apr 2010 14:00:05 UTC

Still no solution for me, either. 0% on all workunits I get after aborting.

sam_spade

Joined: May 30 09
Posts: 1
ID: 12330
Credit: 701,781
RAC: 0
Message 5987 - Posted 20 Aug 2010 11:05:46 UTC

Is there any solution by now?

I'ver got 2 computers with the above-mentioned problem:
* GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz [Family 0 Model 0 Stepping 0]
Microsoft Windows 7 Enterprise x64 Edition, (06.01.7600.00)
Boinc 6.11.4

* GenuineIntel Pentium(R) Dual-Core CPU T4300 @ 2.10GHz [Family 6 Model 23 Stepping 10]
Microsoft Windows 7 Enterprise x86 Edition, (06.01.7600.00)
Boinc 6.10.58

The other computers works fine with Windows7:
* AuthenticAMD AMD Phenom(tm) II X6 1055T Processor [Family 0 Model 0 Stepping 0]
* GenuineIntel Intel(R) Atom(TM) CPU N270 @ 1.60GHz [Family 6 Model 28 Stepping 2]

Profile BF
Volunteer tester

Joined: Nov 14 06
Posts: 3
ID: 299
Credit: 147,913
RAC: 0
Message 6037 - Posted 1 Oct 2010 5:39:21 UTC

*bump*

sharky

Joined: Sep 16 10
Posts: 2
ID: 33046
Credit: 0
RAC: 0
Message 6040 - Posted 7 Oct 2010 5:36:31 UTC

I just noticed the issue too, but unfortunately I aborted the tasks prior to looking for a thread on it.
I had 4 tasks hung this morning, aborted them and 4 more started, got home from work and they were still at 0%
Intel Q9550, windows 7 64bit

Felix

Joined: Jan 19 11
Posts: 4
ID: 36973
Credit: 320
RAC: 0
Message 6148 - Posted 21 Jan 2011 18:24:48 UTC - in response to Message ID 6040 .

Same problem here.

Windows Seven 64bit, Intel SU9400 CPU. Docking is always 0,00%, Rosetta@Home goes flawless. I stopped Docking, tried to restart and to ask a new job. Fail.

Any idea?

I'm gonna try with an AMD machine with the same profile... I'll let u know.

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6151 - Posted 23 Jan 2011 3:08:52 UTC - in response to Message ID 6148 .
Last modified: 23 Jan 2011 3:16:56 UTC

Same problem here.

Windows Seven 64bit, Intel SU9400 CPU. Docking is always 0,00%, Rosetta@Home goes flawless. I stopped Docking, tried to restart and to ask a new job. Fail.

Any idea?

I'm gonna try with an AMD machine with the same profile... I'll let u know.


On other BOINC projects, that often happens if you don't let it run long enough to reach the first checkpoint, for any workunits that only update their progress at checkpoints. Therefore, it would be useful to know how much CPU time and how much elapsed time it used while still showing 0.00% progress, to see if it should have reached a checkpoint by then.

Also, some BOINC projects will wait about 24 hours after you report a failed or aborted workunit before sending you any more workunits at all.
Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 6158 - Posted 29 Jan 2011 0:08:43 UTC
Last modified: 29 Jan 2011 0:10:11 UTC

No such problem for monts (actually I had not even one with no progress until now) but now 5 in a row, e.g. :

http://docking.cis.udel.edu/community/result.php?resultid=19014897

2 ran for several hours (this one for 8h) and as the usual behavior is 1% (or more) after 1 or 2 minutes, I aborted 3 more after ~5 minutes at 0%.

The ones that are running now are just normal, ~20% after an hours, progress constantly increasing.

edit : If it was a checkpoint problem, it would not record the elapsed CPU time, it would reset to 0 after each restart - but it did record the time so it is not a checkpoint related error.

Tom

Joined: Oct 31 10
Posts: 1
ID: 34448
Credit: 816,266
RAC: 0
Message 6161 - Posted 29 Jan 2011 16:59:25 UTC - in response to Message ID 6158 .

No such problem for monts (actually I had not even one with no progress until now) but now 5 in a row, e.g. :

http://docking.cis.udel.edu/community/result.php?resultid=19014897

2 ran for several hours (this one for 8h) and as the usual behavior is 1% (or more) after 1 or 2 minutes, I aborted 3 more after ~5 minutes at 0%.

The ones that are running now are just normal, ~20% after an hours, progress constantly increasing.

edit : If it was a checkpoint problem, it would not record the elapsed CPU time, it would reset to 0 after each restart - but it did record the time so it is not a checkpoint related error.


I've been having the same problem for the last few days and have aborted a number of workunits at various times (up to an hour) in their progress. The progress always remained at 0%. Finally decided to let one run for the entire estimated completion time of about 3 hour. The time to completion went down to zero, the progress stayed at 0%, and the elapsed time continued to count up. Are these workunits defective? I'm also running Seti and not having any problems. I would like to continue with this project, but don't want to waste my CPU time if the workunits are defective. Anybody got any answers?
P . P . L .
Avatar

Joined: Oct 20 08
Posts: 69
ID: 2725
Credit: 1,000,979
RAC: 0
Message 6162 - Posted 30 Jan 2011 4:33:30 UTC
Last modified: 30 Jan 2011 5:04:00 UTC

Hi.

Looks like the problem has struck linux now too, noticed one stuck on my quad

had been running for 42min and was showing 0%.

They are usually moving after 5min, so i aborted it i'll have to keep an eye on

them from now on!

1hvk1hbv_mod0014crossdockinghiv1_1648_108194

http://docking.cis.udel.edu/community/workunit.php?wuid=18475870

edit / two more, one from hex core ran 2hrs 23min 0% other 33min 0% not good!

1hvk1hbv_mod0014crossdockinghiv1_1649_151280

http://docking.cis.udel.edu/community/workunit.php?wuid=18475871

1hvj1hbv_mod0014crossdockinghiv1_20102_333689

http://docking.cis.udel.edu/community/workunit.php?wuid=18472967

edit / I think i know at least with my two bad tasks they are missing an input

file, (I have another one running now i'll let it go for 30min if it hasn't

moved by then i'll abort it) back to the story, these two d/l to fast / faster

then they normally do, not the full file size.?
____________


Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6163 - Posted 30 Jan 2011 13:17:45 UTC

1hvi1hbv_mod0014crossdockinghiv1_13789_408524 showing 09:13:21 CPU time and 48:27:37 elapsed. Fraction done 0.000%. Aborted.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile Saenger
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 125
ID: 79
Credit: 411,959
RAC: 0
Message 6164 - Posted 30 Jan 2011 13:22:26 UTC

Got one as well, 1hvj1hbv_mod0014crossdockinghiv1_18736_423697 .
CPU-time was 15:45h, process bar showed 0%, I just aborted it.
It says CPU-time 0 sec, so nothing was recorded. My system was always at 100%, so CPU-time was definitely used, just not for something useful.
____________
Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki

Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6165 - Posted 30 Jan 2011 14:52:42 UTC

1hvj1hbv_mod0014crossdocking_15775_294374 01:34:53 CPU time, 06:18:04 elapsed time, 0.000% done. Aborted.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

MAPSIT

Joined: Sep 2 10
Posts: 1
ID: 32608
Credit: 999,488
RAC: 0
Message 6166 - Posted 30 Jan 2011 16:05:48 UTC

2011/01/30: I've gotten hit with a bunch of the "run forever - no progress" WUs on my Windows 7 Pro-64 bit, 8 cores machine. Being firmly committed to the "set it and forget it" philosophy for volunteering my cycles, I've simply aborted them after the remaining time went to zero. I'll leave the debugging to those with greater time and expertise. Having read the thread about the problem, I'll now delete any future apparent problems after one hour with no progress rather than the 20 hours I've been giving WUs with an estimated 15 hour completion. Good luck to those attempting to track down and resolve this issue. As others have said, it's annoying.

Trotador

Joined: Sep 5 09
Posts: 5
ID: 18182
Credit: 6,445,766
RAC: 0
Message 6167 - Posted 30 Jan 2011 17:45:26 UTC
Last modified: 30 Jan 2011 17:46:06 UTC

Another one, aborted after 12 hours, 0% performed and 0% remaining

1hvk1hbv_mod0014crossdockinghiv1_1992_194045_0


ubuntu 9.1 64 bits

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 6168 - Posted 30 Jan 2011 19:25:02 UTC
Last modified: 30 Jan 2011 19:28:01 UTC

50 more hours wasted, setting to no new work :-(

edit : 4 out of 5 WUs had this problem this time.

Paratima

Joined: May 31 10
Posts: 4
ID: 29644
Credit: 1,021,214
RAC: 0
Message 6169 - Posted 31 Jan 2011 3:14:39 UTC

Having the same problem, only on my Win7 machine. However, NOT Intel CPU.
Details:
AuthenticAMD AMD Phenom(tm) II X4 945 Processor [AMD64 Family 16 Model 4 Stepping 2]
Microsoft Windows 7 Home Premium x64 Edition, (06.01.7600.00)

Am aborting the bad units & hoping for the best.

P . P . L .
Avatar

Joined: Oct 20 08
Posts: 69
ID: 2725
Credit: 1,000,979
RAC: 0
Message 6170 - Posted 31 Jan 2011 4:59:16 UTC

Hi.

I've had a few good ones and this one not so good, ran for 18min 0% aborted it.

Some bad tasks are still around.

http://docking.cis.udel.edu/community/workunit.php?wuid=18487328

1hvk1hbv_mod0014crossdockinghiv1_12743_296212


____________


ZoSo

Joined: Oct 14 10
Posts: 16
ID: 33872
Credit: 3,742,738
RAC: 0
Message 6172 - Posted 31 Jan 2011 18:00:43 UTC
Last modified: 31 Jan 2011 18:01:55 UTC

OK... only my linux boxes were showing this, not any of my windows machines, so I reported it on the linux board. I'll just keep an eye on it and abort them when it happens, since it appears to have been going on for over 17 months without a fix.

Strom

Joined: Dec 29 09
Posts: 2
ID: 23444
Credit: 1,316,847
RAC: 0
Message 6174 - Posted 31 Jan 2011 18:32:46 UTC

I just aborted seven of these with 0% progress after various run lengths. The four processing now are showing progress after only a few minutes of run time. Seems to be an issue with some WUs, but not all.

Trotador

Joined: Sep 5 09
Posts: 5
ID: 18182
Credit: 6,445,766
RAC: 0
Message 6175 - Posted 31 Jan 2011 19:58:31 UTC

new one

1hvk1hbv_mod0014crossdockinghiv1_17825_206714_0

aborted after 14 hours 0% progress, 3 hours remaining

mctonale

Joined: Jan 21 10
Posts: 1
ID: 24730
Credit: 98,138
RAC: 0
Message 6176 - Posted 31 Jan 2011 21:20:20 UTC

Also having this (recuring) problem on two machines.

Core i7, 3x2gb ddr3, win7, boinc 6.12.12 (64).
Phenom 9650, ubuntu 10.10, boinc 6.10.56. Over an hour and still @ 0%. Appears too be docking only collatz, seti, rosetta and aqua are all running fine.

I dont think it is a memory issue as have swapped memory on phenom.. 2x1GB ddr2 1066 (at 800 due to motherboard restriction) and 2x2GB ddr2 800. with the same results.

cpu is still 100%??

Project suspended.. for now.

Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6178 - Posted 1 Feb 2011 2:58:14 UTC

Yet another, 1hvk1hbv_mod0014crossdockinghiv1_24501_456879 .

There is a problem here.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

P . P . L .
Avatar

Joined: Oct 20 08
Posts: 69
ID: 2725
Credit: 1,000,979
RAC: 0
Message 6179 - Posted 1 Feb 2011 4:45:52 UTC

Hi.

I'm now getting nothing but 0 file size every time my rigs try get new tasks.

Suspending DOCKING until someone from the project says it's fixed.

____________


Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6180 - Posted 1 Feb 2011 8:30:12 UTC - in response to Message ID 6179 .
Last modified: 1 Feb 2011 8:33:44 UTC

1hvk1hbv_mod0014crossdockinghiv1_33167_465761 and...
1hvk1hbv_mod0014crossdockinghiv1_32186_207448 and...
1hvk1hbv_mod0014crossdockinghiv1_27165_411078

All 0% and aborted. No response from the project, sadly, as usual. Michela, you are turning people off the project with this lack of attention to this and other issues on the boards here.

No New Tasks set.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

sandro

Joined: Sep 3 08
Posts: 4
ID: 512
Credit: 4,076,636
RAC: 0
Message 6181 - Posted 1 Feb 2011 8:37:53 UTC

Same here, i aborded a bunch of WUs this day. all stucked at 0.0%

Profile TheFiend

Joined: Apr 7 09
Posts: 70
ID: 9482
Credit: 20,705,527
RAC: 0
Message 6183 - Posted 1 Feb 2011 9:57:15 UTC

These dodgy WU's that are coming through have .INP files with a size of 0KB.

Have a look through your cache in your project directory and you will find them.

I have just aborted about 70 of them so far, but it seems as though they are still being sent out.

I run a large cache so I have crunched any of them.. luckily!!!

etrecords

Joined: Nov 18 09
Posts: 2
ID: 21628
Credit: 1,943,170
RAC: 0
Message 6186 - Posted 1 Feb 2011 18:27:11 UTC

I also found a number of these wu on different systems. All with the imp file of zero bytes. Due the fact that I don't have the time to babysit my systems this can cause that I have to decide to stop temperary with docking

MaW

Joined: Jan 26 11
Posts: 17
ID: 37208
Credit: 114,943
RAC: 0
Message 6187 - Posted 1 Feb 2011 19:01:03 UTC
Last modified: 1 Feb 2011 19:05:17 UTC

Is this project still maintained or someone locked the server room in November and forgot about it? And it's miraculously running by itself?

Need to abort about half of work units because of this problem...

Edit: just now got another series of dead WUs..

Joined this project recently as it seems more focused on certain task than Rosetta, but this is dissapointing..

ZoSo

Joined: Oct 14 10
Posts: 16
ID: 33872
Credit: 3,742,738
RAC: 0
Message 6188 - Posted 1 Feb 2011 19:12:52 UTC - in response to Message ID 6186 .

... All with the imp file of zero bytes.


Good catch... I just aborted a half dozen with 0-byte .inp files here, before they started running. I've wasted over 200 hours of crunching because of those over the last few days.

Now how do we abort work units with a script?

Then we can just have it check
/var/lib/boinc/projects/docking.cis.udel.edu/
say, every 15 minutes, and abort work units with the 0-byte *.inp file names.

Or would it be enough to just delete the 0-byte .inp files so BOINC doesn't even try to run those WU's and instead Aborts them itself?
MaW

Joined: Jan 26 11
Posts: 17
ID: 37208
Credit: 114,943
RAC: 0
Message 6189 - Posted 1 Feb 2011 19:16:37 UTC

Well, for me >all< the good ones after 2 minutes are already 1%, so if after 4-5 minutes it's still 0 i abort them.

Profile TheFiend

Joined: Apr 7 09
Posts: 70
ID: 9482
Credit: 20,705,527
RAC: 0
Message 6190 - Posted 1 Feb 2011 19:52:13 UTC - in response to Message ID 5854 .

During March 7-11 we had a big problem with the server ( http://docking.cis.udel.edu/about/project/news.php ), which was running out of space and therefore several workunits were sent empty. On March 11 we stopped the production, increase the server partition and resume distribution. I want to think that the problems of workunits with 0% progress are still some of those created during March 7-11, what worries me the most is that now it's been 15+ days and you are still having the problems.

According to the example posted by Ananas, the problem was indeed an old workinit (created on March 7th), but if any of you could post a link to one of those eternal workunits it would be great, specially if it was created after March 11/12 so that we can take a look into the input files, in the meantime we are trying to identify what can be wrong, and if the problem is contained to those old wu's or if it is spread



Looks like it could be the same problem as happened March 2010!
Chris Granger

Joined: Sep 17 10
Posts: 2
ID: 33087
Credit: 294,493
RAC: 0
Message 6191 - Posted 1 Feb 2011 20:26:55 UTC

Both of my machines are experiencing this problem. One is Linux 64-bit with 8GB of RAM and the other is Windows Vista 32-bit with 2GB of RAM. Work units ran for over 12 hours with 0% progress before I aborted them.

MaW

Joined: Jan 26 11
Posts: 17
ID: 37208
Credit: 114,943
RAC: 0
Message 6192 - Posted 1 Feb 2011 20:56:41 UTC - in response to Message ID 6190 .


Looks like it could be the same problem as happened March 2010!


Makes sense. But I'm a bit worried if there's anyone out there to fix the problem this time ^.=


Profile TheFiend

Joined: Apr 7 09
Posts: 70
ID: 9482
Credit: 20,705,527
RAC: 0
Message 6193 - Posted 1 Feb 2011 21:56:43 UTC

Just fired a PM off to Trilce Estrada (see above) to see if she is aware of the current problems. Hopefully she will have a look into it.

_heinz

Joined: Jun 16 09
Posts: 12
ID: 13437
Credit: 1,471,103
RAC: 0
Message 6194 - Posted 2 Feb 2011 1:36:06 UTC
Last modified: 2 Feb 2011 1:50:23 UTC

Hi,
I have the same issue, since 2 hours no progres, wu canceled now
http://docking.cis.udel.edu/community/workunit.php?wuid=18510268

pitty, lost time

I'm stopping work till the problems of the project are solved

Number 140 in the world statistic of Docking

heinz
____________
V8-Xeon-Docking

etrecords

Joined: Nov 18 09
Posts: 2
ID: 21628
Credit: 1,943,170
RAC: 0
Message 6195 - Posted 2 Feb 2011 8:08:36 UTC

I did just a check and all the wu with this problem are created recently. Also this morning a found some new ones. The only reason why I have wu aborted by client is this reason, so you could see the workunts in my tasks list

MaW

Joined: Jan 26 11
Posts: 17
ID: 37208
Credit: 114,943
RAC: 0
Message 6196 - Posted 2 Feb 2011 10:22:56 UTC

Arghh it's getting worse. Got almost only bad tasks today. Could delete them straight away because they indeed have 0-size INP file.

http://docking.cis.udel.edu/community/results.php?hostid=85823&offset=0

...

Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 6198 - Posted 2 Feb 2011 13:51:11 UTC

Simplest solution / work-around.
Increase cache, get new tasks, then set no new tasks, delete 0 inf files, and keep on running until you need more and repeat the process until they get this fixed.
____________

Recognized by the Carnegie Institute of Science . Washington D.C.

Profile TheFiend

Joined: Apr 7 09
Posts: 70
ID: 9482
Credit: 20,705,527
RAC: 0
Message 6199 - Posted 2 Feb 2011 15:44:54 UTC

No response yet from the PM sent to Trilce Estrada :(

Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 6200 - Posted 2 Feb 2011 18:05:39 UTC - in response to Message ID 6199 .

No response yet from the PM sent to Trilce Estrada :(


Does not surprise me any. Been one of the biggest flaws of this project is the lack of communication.


____________

Recognized by the Carnegie Institute of Science . Washington D.C.
Profile keyboards

Joined: Jan 2 09
Posts: 3
ID: 5426
Credit: 1,061,087
RAC: 0
Message 6201 - Posted 2 Feb 2011 23:31:11 UTC
Last modified: 2 Feb 2011 23:33:04 UTC

Have a WU that has been running for 17+ hours showing 0% progress and time to completion as --- with a report deadline of 2/11/11.

http://docking.cis.udel.edu/community/workunit.php?wuid=18446304

Running on this computer:
http://docking.cis.udel.edu/community/show_host_detail.php?hostid=16437

Seriously considering aborting all WUs and suspending indefinitely!
____________
!!REMEMBER - Stupidity should be PAINFUL!!

MaW

Joined: Jan 26 11
Posts: 17
ID: 37208
Credit: 114,943
RAC: 0
Message 6205 - Posted 3 Feb 2011 9:17:44 UTC
Last modified: 3 Feb 2011 9:22:59 UTC

The problem is definitely caused by empty(0-size INP file) or corrupt (less than 1,16MB INP file - it even crashed the app twice lol) work units. Unfortunately it's a server-side problem and it would be nice to see that anyone in there cares about it...

Now I'm doing 'hunting' for good work units :D

Edit: ahh guess there's nothing to hunt.. seems that the generator is down and it's re-sending bad WUs?

Jesse Viviano

Joined: Jan 14 10
Posts: 7
ID: 24317
Credit: 349,250
RAC: 0
Message 6208 - Posted 3 Feb 2011 14:08:58 UTC
Last modified: 3 Feb 2011 14:09:35 UTC

I have just started and got two more of the zero-byte *.inp file bad work units. I aborted them following the advice on this thread. They are work units 18530883 and 18531863 .

Jesse Viviano

Joined: Jan 14 10
Posts: 7
ID: 24317
Credit: 349,250
RAC: 0
Message 6209 - Posted 3 Feb 2011 17:26:33 UTC

I have another bad one that I aborted because of the empty input file problem. It is work unit 18537922 .

Paratima

Joined: May 31 10
Posts: 4
ID: 29644
Credit: 1,021,214
RAC: 0
Message 6212 - Posted 4 Feb 2011 3:00:03 UTC

Drop me a line when y'all get it fixed. I'll be crunching elsewhere, maybe POEM. Got no time for handholding.

googloo

Joined: Nov 30 09
Posts: 6
ID: 22204
Credit: 1,182,026
RAC: 0
Message 6214 - Posted 4 Feb 2011 15:50:06 UTC

I have Docking@home set to no new tasks until this is fixed.

Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6215 - Posted 4 Feb 2011 16:33:59 UTC
Last modified: 4 Feb 2011 16:34:59 UTC

I've e-mailed Michela concerning the issue here. Maybe she is around.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6216 - Posted 4 Feb 2011 16:34:23 UTC - in response to Message ID 6214 .

Hi All,

we have a disk issue. We stopped the generation of new jobs and are looking at the issue.

Sorry for the problem and thank you for the notes!

Michela

____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 6220 - Posted 4 Feb 2011 16:47:35 UTC - in response to Message ID 6216 .

Hi All,

we have a disk issue. We stopped the generation of new jobs and are looking at the issue.

Sorry for the problem and thank you for the notes!

Michela


You have more then a disk issue, the project admin has a total lack of respect for its members.


____________

Recognized by the Carnegie Institute of Science . Washington D.C.
MaW

Joined: Jan 26 11
Posts: 17
ID: 37208
Credit: 114,943
RAC: 0
Message 6222 - Posted 4 Feb 2011 18:16:17 UTC - in response to Message ID 6220 .

Hi All,

we have a disk issue. We stopped the generation of new jobs and are looking at the issue.

Sorry for the problem and thank you for the notes!

Michela


You have more then a disk issue, the project admin has a total lack of respect for its members.



Nah, at least we know they're alive. I was getting worried.
Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 6223 - Posted 4 Feb 2011 21:23:30 UTC - in response to Message ID 6222 .

Hi All,

we have a disk issue. We stopped the generation of new jobs and are looking at the issue.

Sorry for the problem and thank you for the notes!

Michela


You have more then a disk issue, the project admin has a total lack of respect for its members.



Nah, at least we know they're alive. I was getting worried.


Been there and done that with them too many times. Never was worried, just annoyed.
____________

Recognized by the Carnegie Institute of Science . Washington D.C.
Ronald Tilby

Joined: Oct 22 10
Posts: 1
ID: 34143
Credit: 158,428
RAC: 0
Message 6224 - Posted 5 Feb 2011 0:14:32 UTC - in response to Message ID 6223 .

I discovered that I have the "Long Run times with no progress" tasks after two of them had run for over 96 hours.

I have three suggestions:
1. Fix the process that creates the task files to not create zero sized task files.

2. Fix the process that serves the task files to the client so that it won't send zero sized task files.

3. Fix the client docking program to detect empty/invalid task files and appropriately set their status.

ZoSo

Joined: Oct 14 10
Posts: 16
ID: 33872
Credit: 3,742,738
RAC: 0
Message 6225 - Posted 5 Feb 2011 5:51:35 UTC

Since this problem has been occurring intermittently for over a year and a half, Ronald Tilby's suggestions all sound reasonable to me.

Another issue with the work units that have been aborted is:


2011-02-04 23:58:25|Docking|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 3 completed tasks
2011-02-04 23:58:30|Docking|Scheduler request succeeded: got 0 new tasks
2011-02-04 23:58:30|Docking|Message from server: Server error: can't attach shared memory


Fri 04 Feb 2011 11:56:57 PM EST Docking Reporting 2 completed tasks, not requesting new tasks
Fri 04 Feb 2011 11:57:08 PM EST Docking Scheduler request completed
Fri 04 Feb 2011 11:57:08 PM EST Docking Message from server: Server error: can't attach shared memory


Both of those clips were grabbed on the same machine. The first group is from BOINCTasks (from the BOINC manager on another machine on my LAN), and the second group is from BOINC itself... I just synced its clock, which was about 1:06 slow... so if you're synced to a source like otc2.psu.edu:123, too, your log clips for those 2 should show up between 23:58:03 on the 4th and a little after midnight on the 5th.

I found a couple links to http://www.spy-hill.net/~myers/help/boinc/Create_Project.html#feeder
on berkeley.edu about that message. None of them have to do with the client/manager, though.
Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6228 - Posted 5 Feb 2011 14:01:26 UTC - in response to Message ID 6225 .

Dear All, a new update from D@H:

1) We are in a recovery mode. In other words, we are collecting and validating results but we are not generating and distributing new jobs for the moment, while we are investigating what caused the problem yesterday.

2) Please bear with us. We do not have a full time system administrator taking care of D@H but the work is done by students. They are doing their very best but they have also classes and homework. We are dedicating the weekend on understanding the problem and fixing it.

Thanks for your several notes and support.

Michela

____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Profile Scientific Frontline
Avatar

Joined: Mar 25 09
Posts: 42
ID: 8725
Credit: 788,015
RAC: 0
Message 6233 - Posted 5 Feb 2011 15:09:27 UTC - in response to Message ID 6228 .

Dear All, a new update from D@H:

1) We are in a recovery mode. In other words, we are collecting and validating results but we are not generating and distributing new jobs for the moment, while we are investigating what caused the problem yesterday.

2) Please bear with us. We do not have a full time system administrator taking care of D@H but the work is done by students. They are doing their very best but they have also classes and homework. We are dedicating the weekend on understanding the problem and fixing it.

Thanks for your several notes and support.

Michela

I'll accept that as a reasonable answer.
Academics first by all means,
____________

Recognized by the Carnegie Institute of Science . Washington D.C.
Jesse Viviano

Joined: Jan 14 10
Posts: 7
ID: 24317
Credit: 349,250
RAC: 0
Message 6234 - Posted 5 Feb 2011 19:32:24 UTC - in response to Message ID 6228 .

Dear All, a new update from D@H:

1) We are in a recovery mode. In other words, we are collecting and validating results but we are not generating and distributing new jobs for the moment, while we are investigating what caused the problem yesterday.

2) Please bear with us. We do not have a full time system administrator taking care of D@H but the work is done by students. They are doing their very best but they have also classes and homework. We are dedicating the weekend on understanding the problem and fixing it.

Thanks for your several notes and support.

Michela

I doubt that your systems are capable of collecting results. Result uploads work fine, but they do nothing but waste space on your disk until they have been reported. That is when your server becomes aware of the results and prepares them for the postprocessing they need (checking to see if they can be validated, getting them validated, assimilated into the science database, and then deleted along with their associated work unit). However, when I try to do an update to report them or BOINC tries to do so automatically, I get these messages:

2/5/2011 2:18:10 PM Docking update requested by user
2/5/2011 2:18:14 PM Docking Sending scheduler request: Requested by user.
2/5/2011 2:18:14 PM Docking Reporting 4 completed tasks, requesting new tasks for CPU and GPU
2/5/2011 2:18:15 PM Docking Scheduler request completed: got 0 new tasks
2/5/2011 2:18:15 PM Docking Message from server: Server error: can't attach shared memory

The results then stay in my lists of unfinished tasks.

Something is keeping your server from being able to accept the reporting of these tasks. When I searched for the matter, one possible scenario involves the feeder not running. Could you please fix the issue preventing us from reporting our results so that they can finally be processesd and accepted? I think that the space freed up by the deletion of the work units could help you with your disk issues if the problem turns out to be a full disk, which has caused other projects to generate empty work unit files that must be aborted or caused other errors.
Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6235 - Posted 6 Feb 2011 15:54:01 UTC - in response to Message ID 6234 .

Dear All, a new update from D@H:

1) We are in a recovery mode. In other words, we are collecting and validating results but we are not generating and distributing new jobs for the moment, while we are investigating what caused the problem yesterday.

2) Please bear with us. We do not have a full time system administrator taking care of D@H but the work is done by students. They are doing their very best but they have also classes and homework. We are dedicating the weekend on understanding the problem and fixing it.

Thanks for your several notes and support.

Michela

I doubt that your systems are capable of collecting results. Result uploads work fine, but they do nothing but waste space on your disk until they have been reported. That is when your server becomes aware of the results and prepares them for the postprocessing they need (checking to see if they can be validated, getting them validated, assimilated into the science database, and then deleted along with their associated work unit). However, when I try to do an update to report them or BOINC tries to do so automatically, I get these messages:

2/5/2011 2:18:10 PM Docking update requested by user
2/5/2011 2:18:14 PM Docking Sending scheduler request: Requested by user.
2/5/2011 2:18:14 PM Docking Reporting 4 completed tasks, requesting new tasks for CPU and GPU
2/5/2011 2:18:15 PM Docking Scheduler request completed: got 0 new tasks
2/5/2011 2:18:15 PM Docking Message from server: Server error: can't attach shared memory

The results then stay in my lists of unfinished tasks.

Something is keeping your server from being able to accept the reporting of these tasks. When I searched for the matter, one possible scenario involves the feeder not running. Could you please fix the issue preventing us from reporting our results so that they can finally be processesd and accepted? I think that the space freed up by the deletion of the work units could help you with your disk issues if the problem turns out to be a full disk, which has caused other projects to generate empty work unit files that must be aborted or caused other errors.


Hi, all the daemons are up and running. I am monitoring a couple of clients to see if I can reproduce you error message.


____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Hephaiston

Joined: Feb 3 11
Posts: 3
ID: 37733
Credit: 159,759
RAC: 0
Message 6236 - Posted 7 Feb 2011 15:55:46 UTC

Having the same problem as metioned above several times.
0% progress after 76 hours. Still 5 more jobs to do.
____________
meine Kiste

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6237 - Posted 7 Feb 2011 17:33:50 UTC - in response to Message ID 6236 .

Having the same problem as metioned above several times.
0% progress after 76 hours. Still 5 more jobs to do.


We removed all the jobs with potential 0% progress that were in our database. Unfortunately some jobs were distributed by the time we worked on the database. Can you abort the jobs with 0% progress and get new jobs?

Thanks,

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Hephaiston

Joined: Feb 3 11
Posts: 3
ID: 37733
Credit: 159,759
RAC: 0
Message 6243 - Posted 7 Feb 2011 21:18:55 UTC

All jobs aborted
____________
meine Kiste

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6244 - Posted 8 Feb 2011 3:40:51 UTC - in response to Message ID 6243 .

All jobs aborted


Thanks! Let us know if the new jobs have any similar issues.

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6245 - Posted 8 Feb 2011 16:35:02 UTC
Last modified: 8 Feb 2011 16:36:13 UTC

I'd ramped up the quota for Docking on a couple of machines to check for problems, and none were found. Looks okay.

"This problem", assuming it is the same that has been reported in the thread for a while, could be easily caught in the jobs. When the client starts, look for the file, (or files if necessary), if found, look at it's size, if its "big enough", maybe open it and read it through for some end of input marker to see if it is complete/okay. Might not catch all things but is a trivial change to make.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6246 - Posted 9 Feb 2011 17:01:57 UTC - in response to Message ID 6245 .

I'd ramped up the quota for Docking on a couple of machines to check for problems, and none were found. Looks okay.

"This problem", assuming it is the same that has been reported in the thread for a while, could be easily caught in the jobs. When the client starts, look for the file, (or files if necessary), if found, look at it's size, if its "big enough", maybe open it and read it through for some end of input marker to see if it is complete/okay. Might not catch all things but is a trivial change to make.


The solution we were considering would require us to recompile charmm with BOINC and this task can be very challenging considering the complexity of charmm. At this point we have a nagios system in place alerting us on the quote of the disks and the status of the daemons.

This should help a lot.

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6247 - Posted 10 Feb 2011 15:46:21 UTC
Last modified: 10 Feb 2011 15:47:14 UTC

If you've just installed nagios, then perhaps it will help. The problems we see here have been going on, (on and off), for a LONG time though. I can see that there is only one other member of our team that is still with the project now.

Good luck.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile vaughan
Volunteer tester

Joined: Oct 3 06
Posts: 9
ID: 177
Credit: 3,108,281
RAC: 0
Message 6248 - Posted 11 Feb 2011 10:16:32 UTC

Continue to get problems with 0% progress for some tasks. What file do we need to check for zero length so we can abort the dud tasks early?

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6249 - Posted 11 Feb 2011 13:51:44 UTC - in response to Message ID 6248 .

Continue to get problems with 0% progress for some tasks. What file do we need to check for zero length so we can abort the dud tasks early?


Can you please tell us the name of the jobs with 0% progress? We deleted the old jobs still on the server and the space on disk is now plenty. We were not able to delete the jobs already distributed. I want to check if your jobs with 0% progress are old jobs.

Thanks,

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Profile vaughan
Volunteer tester

Joined: Oct 3 06
Posts: 9
ID: 177
Credit: 3,108,281
RAC: 0
Message 6250 - Posted 11 Feb 2011 14:11:30 UTC

I have aborted all of them.

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6251 - Posted 11 Feb 2011 15:39:25 UTC - in response to Message ID 6250 .

I have aborted all of them.


OK, this is a good decision. Please send me an e-mail or submit an entry to this forum if there are another jobs with 0% progress together with the name of the jobs.

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Profile JonJen

Joined: Aug 27 10
Posts: 2
ID: 32294
Credit: 209,242
RAC: 0
Message 6252 - Posted 11 Feb 2011 18:28:43 UTC

Thank you so much Docking ppl 4 posting a message to me via the screen saver graphics. Not sure how many months I have been getting the 0% progress error WU, but I aborted it as ordered. (^:= Here are the messages I got today regarding it.

2/11/2011 10:14:27 AM Docking task 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0 aborted by user
2/11/2011 10:14:37 AM Docking update requested by user
2/11/2011 10:14:39 AM Docking Sending scheduler request: Requested by user.
2/11/2011 10:14:39 AM Docking Reporting 1 completed tasks, requesting new tasks for GPU
2/11/2011 10:14:40 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:14:40 AM Docking [error] garbage_collect(); still have active task for acked result 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0; state 5
2/11/2011 10:14:41 AM Docking Computation for task 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0 finished
2/11/2011 10:14:41 AM Docking Output file 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0_0 for task 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0 absent
2/11/2011 10:14:41 AM Docking Output file 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0_1 for task 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0 absent
2/11/2011 10:14:41 AM Docking Output file 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0_2 for task 1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0 absent
2/11/2011 10:14:48 AM Docking [error] Couldn't delete file projects/docking.cis.udel.edu/1hvk1hbv_mod0014crossdockinghiv1_28102_18972.inp
2/11/2011 10:14:54 AM Docking [error] Couldn't delete file projects/docking.cis.udel.edu/1hvk1hbv_mod0014crossdockinghiv1_28102_18972_0_3
2/11/2011 10:14:54 AM Resuming computation
2/11/2011 10:15:54 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:15:54 AM Docking Requesting new tasks for GPU
2/11/2011 10:15:55 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:17:00 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:17:00 AM Docking Requesting new tasks for GPU
2/11/2011 10:17:02 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:18:07 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:18:07 AM Docking Requesting new tasks for GPU
2/11/2011 10:18:08 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:23:13 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:23:13 AM Docking Requesting new tasks for GPU
2/11/2011 10:23:14 AM Docking Scheduler request completed: got 0 new tasks

____________

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6253 - Posted 11 Feb 2011 21:20:46 UTC - in response to Message ID 6252 .


2/11/2011 10:14:54 AM Resuming computation
2/11/2011 10:15:54 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:15:54 AM Docking Requesting new tasks for GPU
2/11/2011 10:15:55 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:17:00 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:17:00 AM Docking Requesting new tasks for GPU
2/11/2011 10:17:02 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:18:07 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:18:07 AM Docking Requesting new tasks for GPU
2/11/2011 10:18:08 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:23:13 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:23:13 AM Docking Requesting new tasks for GPU
2/11/2011 10:23:14 AM Docking Scheduler request completed: got 0 new tasks


This is strange, your client is continuously asking for GPU jobs and we do not support GPUs yet. I would expect that eventually the client starts asking for CPU jobs. What version of the client do you have?


____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
zioriga

Joined: Sep 3 08
Posts: 1
ID: 409
Credit: 225,585
RAC: 0
Message 6256 - Posted 14 Feb 2011 6:46:17 UTC

This is another task with 0% progress
2/14/2011 7:32:56 AM | Docking | Restarting task 1hvk1hbv_mod0014crossdockinghiv1_4627_410868_0 using charmm34 version 623

Few days ago I had some other WU with 0% progress in a neverending crunching time. I aborted them, as I did with the above.

I use Boinc Manager 6.12.14 (x64), with XP 64b

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 6257 - Posted 14 Feb 2011 22:15:44 UTC
Last modified: 14 Feb 2011 22:17:04 UTC

The daily project throughput went down below 1/3rd of what it used to be, I doubt that one specific client version or only a few specific boxes have a problem.

For the tests, try one of those :

1hvi1hbv_mod0014crossdockinghiv1_11505_235903_0
1hvi1hbv_mod0014crossdockinghiv1_11107_146009_0

1hvi1hbv_mod0014crossdockinghiv1_11617_141160_0 (this is one with errorcode 1, no infinite runtime)

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6260 - Posted 15 Feb 2011 1:20:06 UTC - in response to Message ID 6257 .

Most of the workunits that were affected during the disk problem were 1hvl1hbv and 1hvk1hbv. We are not longer distributing empty jobs since last week.

We are still in a recovery mode and thus we reduced by half the generation of jobs while we are making sure everything is fine.

Michela



____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6261 - Posted 15 Feb 2011 4:44:37 UTC - in response to Message ID 6253 .
Last modified: 15 Feb 2011 4:52:11 UTC


2/11/2011 10:14:54 AM Resuming computation
2/11/2011 10:15:54 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:15:54 AM Docking Requesting new tasks for GPU
2/11/2011 10:15:55 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:17:00 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:17:00 AM Docking Requesting new tasks for GPU
2/11/2011 10:17:02 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:18:07 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:18:07 AM Docking Requesting new tasks for GPU
2/11/2011 10:18:08 AM Docking Scheduler request completed: got 0 new tasks
2/11/2011 10:23:13 AM Docking Sending scheduler request: To fetch work.
2/11/2011 10:23:13 AM Docking Requesting new tasks for GPU
2/11/2011 10:23:14 AM Docking Scheduler request completed: got 0 new tasks


This is strange, your client is continuously asking for GPU jobs and we do not support GPUs yet. I would expect that eventually the client starts asking for CPU jobs. What version of the client do you have?



The current versions of the BOINC client software will, if your computer has a BOINC-usable GPU, send ALL the connected projects requests for both CPU workunits and GPU workunits. However, the current versions of the BOINC server software allow you to reduce, but not totally eliminate, this - it allows the server to send a response telling the client not to ask for any more workunits of the type requested for up to about a week. This should allow you, for as long as you're not even planning any GPU workunits, to tell any client that sends a request for GPU workunits only that it should not send another such request for about a week.

The 6.10.* series of BOINC client programs has this feature.

It will eventually start asking for CPU workunits, but usually only after it gets at least one GPU workunit from SOME project.

If you're looking for a project that sends only GPU workunits, I've found two:

GPUGRID sends protein-folding workunits, but only if you have a sufficiently high-end Nvidia GPU (a GT 220 is currently about the lowest it will use).

Collatz Conjecture sends workunits related to some math problem, but to almost any GPU that BOINC can use.

http://www.gpugrid.net/

http://boinc.thesonntags.com/collatz/


An idea on how to handle the input file size checking:

Add a wrapper program that checks the size of the input file, then passes control to the main application program ONLY if the input file passes this test.
Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 6263 - Posted 16 Feb 2011 13:10:45 UTC
Last modified: 16 Feb 2011 13:49:22 UTC

16-2-2011 11:59:41 Docking Restarting task 1hvk1hbv_mod0014crossdockinghiv1_19536_75277_0 using charmm34 version 623
16-2-2011 11:59:42 Docking Restarting task 1hvk1hbv_mod0014crossdockinghiv1_19530_195401_0 using charmm34 version 623
16-2-2011 11:59:42 Docking Restarting task 1hvk1hbv_mod0014crossdockinghiv1_19843_57465_0 using charmm34 version 623
16-2-2011 11:59:42 Docking Restarting task 1hvk1hbv_mod0014crossdockinghiv1_23011_479024_0 using charmm34 version 623

Hi, long time since I posted here, but now I again noticed some tasks, see above,
showing no progress at all.

Are they empty, CPU, Q6600 is showing 100% usage,(4 x 25%), so that isn't likely!
Everyone else experiencing this abnormal behavior?

Some other WU's are pauzed , for whatever reason and they are all 4 exactly at 2 hours and 40 minutes and xx seconds.

Should I delete these WU's???

I answer this myself: YES, delete them all! OK ......Done .........
They were due 14-15 and 16 feb.2011, so a little late and probably also empty?!
____________

Knight who says N! Ni Ni

Profile adrianxw
Volunteer tester
Avatar

Joined: Dec 30 06
Posts: 164
ID: 343
Credit: 1,669,741
RAC: 0
Message 6265 - Posted 17 Feb 2011 14:43:50 UTC
Last modified: 17 Feb 2011 14:48:40 UTC

Good number of "Client Error" wu's today. Most seem to fail after some multiple of ~300 seconds, (300, 600, 900, 1200 you get the picture).

Example
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6266 - Posted 17 Feb 2011 16:14:54 UTC - in response to Message ID 6265 .

Good number of "Client Error" wu's today. Most seem to fail after some multiple of ~300 seconds, (300, 600, 900, 1200 you get the picture).

Example


We are working on this problem right now! We will keep you posted.

Thanks!
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6267 - Posted 17 Feb 2011 18:09:43 UTC - in response to Message ID 6266 .

One of the ligands, ligand 1hih, really did not want to dock into the other protein conformations than the one in which it was observed experimentally. So in the cross-docking simulation, no matter what protein conformation we were using, the simulation was very short and inconclusive, besides crating D@H problems. We removed the whole batch of simulations with this ligand and will work with our scientists to understand the scientific reason for this problem. We are distributing a new batch of jobs with another ligand and this time it seem to work OK.

Protein-ligand docking is definitely not a deterministic thing!

Thanks for the alert!

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Hephaiston

Joined: Feb 3 11
Posts: 3
ID: 37733
Credit: 159,759
RAC: 0
Message 6268 - Posted 18 Feb 2011 1:35:21 UTC - in response to Message ID 6244 .

All jobs aborted


Thanks! Let us know if the new jobs have any similar issues.

Michela


Above your message from the 8th Feb.

Recieved a new job that day, after aborting old jobs.
The jobs started today (charmm 34a2 6.23) and quit after one second wokring time with "error while computing" and progress of 100%.

No problem with jobs of any other project today or the last few days.
____________
meine Kiste
johnsone79

Joined: Apr 1 09
Posts: 1
ID: 9235
Credit: 995,312
RAC: 0
Message 6271 - Posted 23 Feb 2011 1:36:52 UTC

For some reason all my workunits on my new computer i7 2600k are stalling at 0.000% progress while those on my old computer Intel core duo are still running fine. Any idea on how to fix this? My new computer is burning through workunits on other projects, and I want them to do the same here.

Ananas

Joined: Aug 29 09
Posts: 56
ID: 17736
Credit: 2,500,425
RAC: 0
Message 6277 - Posted 27 Feb 2011 21:30:27 UTC

"does not want to dock" should not be treated as an error status, it is a result just like "docks easily" or "might dock if there is no better interface available".

If the program can handle this situation, there would probably be less results that error out or run forever.

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6279 - Posted 28 Feb 2011 14:18:41 UTC - in response to Message ID 6277 .

"does not want to dock" should not be treated as an error status, it is a result just like "docks easily" or "might dock if there is no better interface available".

If the program can handle this situation, there would probably be less results that error out or run forever.


The docking simulation can evolve toward a state in which the energy of the complex does not make any sense and the traditional charmm executable aborts. We wrapped charmm to catch these errors, terminate gently, and send us proper information. We also changed the application to give partial credits for partial simulations, once the initial phase of the simulation (when the ligand is located into the docking pocket) is successful. Right now we run a set of short simulations on a testing server for each complex to make sure that the simulations can complete. Unfortunately, this does not necessity mean that we are always able to capture all the possible problems of a complex simulation. Error and energy violations are hard to predict a priori, especially with the type of simulations we are doing right now in which we cross-dock proteins and ligands that were not observed experimentally.

Our next step toward preventing this problem is as follows: we will extend the testing phase.

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
BeemerBiker
Avatar

Joined: Aug 8 09
Posts: 3
ID: 16805
Credit: 692,240
RAC: 0
Message 6350 - Posted 28 May 2011 19:16:42 UTC

I have seen the same problem on 2 differen system (linux and windows). While monitoring using BOINCTASKS, I notice the % complete is way past 100%. Stopping and starting boinc causes the task to start over at 0.0 % even though it might have run for 8-12 hours at over %200 or more. Viewing results using BOINCMANAGER one never sees past 0 percent as the percent complete is not calculated the same way in BM as in BT.

Is not the project suppose to terminate the app if it goes beyond some magic number such as 10x the expected computation time?

Neither should it start over at 0 % when the system reboots or boinc restarts.

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6351 - Posted 29 May 2011 5:04:48 UTC - in response to Message ID 6350 .

I have seen the same problem on 2 differen system (linux and windows). While monitoring using BOINCTASKS, I notice the % complete is way past 100%. Stopping and starting boinc causes the task to start over at 0.0 % even though it might have run for 8-12 hours at over %200 or more. Viewing results using BOINCMANAGER one never sees past 0 percent as the percent complete is not calculated the same way in BM as in BT.

Is not the project suppose to terminate the app if it goes beyond some magic number such as 10x the expected computation time?

Neither should it start over at 0 % when the system reboots or boinc restarts.


You may want to try observing the timing values for the last checkpoint before shutting down BOINC, since that's the critical factor in when workunits can be restarted. In my version of BOINC Manager, advanced view, Tasks, just click on the workunit, then Properties. Do not expect it to be able to resume any more recently than the last checkpoint after a system restart or a BOINC restart - BOINC simply does not have that capability. However, if your operating system supports sleep mode, and you suspend all workunits within BOINC but do not shut BOINC down entirely, the operating system should be able to go into sleep mode while still preserving the memory contents needed to resume the workunits where they were suspended, IF you have enabled the option to keep workunits in memory while they are suspended.

Also, you may want to check if the operating system agrees that the workunit is still using any CPU time. If not, do not expect any time limits built into the application program to work - the code checking for exceeding that time limit cannot run with no CPU time at all.
Palamedes

Joined: Dec 7 09
Posts: 2
ID: 22549
Credit: 16,855
RAC: 0
Message 6358 - Posted 11 Jun 2011 21:35:17 UTC - in response to Message ID 6271 .

For some reason all my workunits on my new computer i7 2600k are stalling at 0.000% progress while those on my old computer Intel core duo are still running fine. Any idea on how to fix this? My new computer is burning through workunits on other projects, and I want them to do the same here.


I'm having the same issue. I have 8 docking work units running at once and they all stay at zero percent. More over the elapsed time seems to count up to about a minute forty five or so then resets to zero.


------------------
System Information
------------------
Time of this report: 6/11/2011, 16:32:00
Machine name: VESPID
Operating System: Windows 7 Professional 64-bit (6.1, Build 7600) (7600.win7_rtm.090713-1255)
Language: English (Regional Setting: English)
System Manufacturer: MSI
System Model: MS-7681
BIOS: BIOS Date: 03/02/11 10:58:35 Ver: 04.06.04
Processor: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (8 CPUs), ~3.4GHz
Memory: 16384MB RAM
Available OS Memory: 16364MB RAM
Page File: 5003MB used, 27723MB available
Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6360 - Posted 12 Jun 2011 22:19:01 UTC - in response to Message ID 6358 .

For some reason all my workunits on my new computer i7 2600k are stalling at 0.000% progress while those on my old computer Intel core duo are still running fine. Any idea on how to fix this? My new computer is burning through workunits on other projects, and I want them to do the same here.


I'm having the same issue. I have 8 docking work units running at once and they all stay at zero percent. More over the elapsed time seems to count up to about a minute forty five or so then resets to zero.


------------------
System Information
------------------
Time of this report: 6/11/2011, 16:32:00
Machine name: VESPID
Operating System: Windows 7 Professional 64-bit (6.1, Build 7600) (7600.win7_rtm.090713-1255)
Language: English (Regional Setting: English)
System Manufacturer: MSI
System Model: MS-7681
BIOS: BIOS Date: 03/02/11 10:58:35 Ver: 04.06.04
Processor: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (8 CPUs), ~3.4GHz
Memory: 16384MB RAM
Available OS Memory: 16364MB RAM
Page File: 5003MB used, 27723MB available


We are looking at this.

Thanks for the note!

Michela


____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6363 - Posted 12 Jun 2011 23:48:39 UTC - in response to Message ID 6360 .

This is a report of the jobs associated with 1hvi1hpv.

The testing process that is performed before to distribute a new complex to volunteers did not capture any major problem.

There are 11,684 jobs with server state = over, 692 of them with outcome = client error. It is around 6% of them (including aborted and failing) and the failings come from the same hosts (around 30-40 of them).

The job distributed but not returned are 17,958; the jobs generated but not distributed
are 252; the job to be generated are 0 since we moved on to the next complex.

Please abort jobs with 0% progress. Docking is not a deterministic simulation and some docking attempts can fail. Our testing can successfully capture most of the cases but not all.

Thanks!

MT

____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Vepide

Joined: Jun 13 11
Posts: 4
ID: 41381
Credit: 0
RAC: 0
Message 6365 - Posted 13 Jun 2011 19:43:35 UTC

I just attached to D@H yesterday and I'm getting this problem also, WU's stay at 0% with no progress, the screen saver say's to abort any WU's displaying 0%.
I just suspended D@H until this problem is resolved.

Running Window 7 Ultimate 64bit on a Q9650 OC'd to 3.6Ghz, ASUS P5E3 Premium 8GB RAM and a 4870 X2.
____________

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6366 - Posted 14 Jun 2011 2:43:54 UTC
Last modified: 14 Jun 2011 2:45:58 UTC

I have a 1hvi1hpv workunit on one of my computers, but since it's already at 18% progress I plan to let it run for now.

Some ideas for Docking@Home to consider:

Add another thread to their application program, and move most of the checking of whether the rest of the application program is still doing anything useful there. Depending on what the cause of the 0% progress is, this may allow checking for it to continue.

Add a section to their application program which, if the workunit asks for it, will gather more information on the details of what kind of computer it is running on, and write this information to a separate output file BEFORE going on to the rest of the application program. If you later no longer need it, it should be easy to turn off by changing the workunits instead of the application program. If there is any need to write more to this file later, it should be reopened first so that it will be preserved past most workunit failures.

Profile vaughan
Volunteer tester

Joined: Oct 3 06
Posts: 9
ID: 177
Credit: 3,108,281
RAC: 0
Message 6369 - Posted 25 Jun 2011 13:52:26 UTC

Does Docking still have the annoying 0.000% progress bug?

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 6370 - Posted 27 Jun 2011 11:46:47 UTC - in response to Message ID 6369 .

Does Docking still have the annoying 0.000% progress bug?


G'Day Vaughan,

I have not noticed it in the past two weeks of processing work units.

I am running both Windows and Linux, on 5 AMD Phenom processors and so far there has been no problems at all.

Conan
____________
Profile vaughan
Volunteer tester

Joined: Oct 3 06
Posts: 9
ID: 177
Credit: 3,108,281
RAC: 0
Message 6371 - Posted 2 Jul 2011 7:05:36 UTC - in response to Message ID 6370 .
Last modified: 2 Jul 2011 7:06:21 UTC

Does Docking still have the annoying 0.000% progress bug?


G'Day Vaughan,

I have not noticed it in the past two weeks of processing work units.

I am running both Windows and Linux, on 5 AMD Phenom processors and so far there has been no problems at all.

Conan

Thanks Conan.

Yes it seems to be behaving now.
Ed

Joined: Jul 30 11
Posts: 11
ID: 42642
Credit: 0
RAC: 0
Message 6426 - Posted 31 Jul 2011 15:51:10 UTC

I just joined and I seem to have this 0% issue.

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6433 - Posted 3 Aug 2011 2:52:20 UTC - in response to Message ID 6426 .

I just joined and I seem to have this 0% issue.


Hi, can I please have the name of the jobs with 0 % progress? I just checked the server and we have space on the disk (in the past it was one of the reasons for the problem). The testing machines in the lab seems to crunch well. We will look at this in detail tomorrow morning.

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!
Vepide

Joined: Jun 13 11
Posts: 4
ID: 41381
Credit: 0
RAC: 0
Message 6543 - Posted 2 Jan 2012 17:30:36 UTC

I just rejoined the project and found it was the reason why I previously terminated this project. Zero progress bar, WU's not terminating the process, etc.

When I get around to it I will try to install Windows 8 onto a new hard drive and see if the problem extends into Windows 8 64 bit. I have another post here going back to June of 2011 and the problem clearly has not been solved. So suspending the project for now.

Unknown wether it is a processor specific problem, or Win764bit problem, in which case its just otherwise bad project coding on the 64 bit Win7 platform.

The only few INP files I found contained this html code.

<soft_link>../../projects/docking.cis.udel.edu/1t7k1htf_mod0014crossdockinghiv1_66702_386423.inp</soft_link>

GlowClam

Joined: Feb 3 12
Posts: 1
ID: 49581
Credit: 0
RAC: 0
Message 6560 - Posted 3 Feb 2012 20:18:02 UTC
Last modified: 3 Feb 2012 20:39:33 UTC

Today I joined to this project and ran into the same 0 % bug. I disrupt the given two WU because they deliver 0 % progress and eating CPU-time.

This problem seems to be in existence since a while...

I think that computing for docking@home is a good thing to do. But now I am not happy, because I have to command my PC not to get WU from this project anymore until this error is fixed from the team. Or is anything with my setup?

The BOINC-Manager log is:

03.02.2012 15:38:43 | | Starting BOINC client version 6.12.34 for windows_x86_64
03.02.2012 15:38:43 | | log flags: file_xfer, sched_ops, task
03.02.2012 15:38:43 | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5
03.02.2012 15:38:43 | | Data directory: C:ProgramDataBOINC
03.02.2012 15:38:43 | | Running under account ...
03.02.2012 15:38:43 | | Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz [Family 6 Model 23 Stepping 10]
03.02.2012 15:38:43 | | Processor: 6.00 MB cache
03.02.2012 15:38:43 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe
03.02.2012 15:38:43 | | OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
03.02.2012 15:38:43 | | Memory: 4.00 GB physical, 8.00 GB virtual
03.02.2012 15:38:43 | | Disk: 97.56 GB total, 60.01 GB free
03.02.2012 15:38:43 | | Local time is UTC +1 hours
03.02.2012 15:38:43 | | NVIDIA GPU 0: GeForce 9500 GT (driver version 28562, CUDA version 4010, compute capability 1.1, 1024MB, 72 GFLOPS peak)
...
03.02.2012 20:07:15 | | Attaching to http://docking.cis.udel.edu/
03.02.2012 20:07:19 | http://docking.cis.udel.edu/ | Master file download succeeded
03.02.2012 20:07:24 | http://docking.cis.udel.edu/ | Sending scheduler request: Project initialization.
03.02.2012 20:07:24 | http://docking.cis.udel.edu/ | Requesting new tasks for CPU and NVIDIA GPU
03.02.2012 20:07:27 | Docking | Scheduler request completed: got 1 new tasks
03.02.2012 20:07:27 | | Couldn't parse preferences file - using BOINC defaults
03.02.2012 20:07:27 | | Reading preferences override file
03.02.2012 20:07:27 | | Preferences:
03.02.2012 20:07:27 | | max memory usage when active: 2047.56MB
03.02.2012 20:07:27 | | max memory usage when idle: 3685.61MB
03.02.2012 20:07:27 | | max disk usage: 10.00GB
03.02.2012 20:07:27 | | don't use GPU while active
03.02.2012 20:07:27 | | suspend work if non-BOINC CPU load exceeds 25 %
03.02.2012 20:07:27 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
03.02.2012 20:07:29 | Docking | Started download of charmm34_6.23_windows_x86_64
03.02.2012 20:07:29 | Docking | Started download of charmm34_6.23_graphics_windows_x86_64
03.02.2012 20:07:32 | Docking | Sending scheduler request: To fetch work.
03.02.2012 20:07:32 | Docking | Requesting new tasks for NVIDIA GPU
03.02.2012 20:07:33 | Docking | Finished download of charmm34_6.23_graphics_windows_x86_64
03.02.2012 20:07:33 | Docking | Started download of 1iiq1hpv_mod0014crossdockinghiv1_9720_16010.inp
03.02.2012 20:07:34 | Docking | Scheduler request completed: got 0 new tasks
03.02.2012 20:07:34 | | Couldn't parse preferences file - using BOINC defaults
03.02.2012 20:07:34 | | Reading preferences override file
03.02.2012 20:07:34 | | Preferences:
03.02.2012 20:07:34 | | max memory usage when active: 2047.56MB
03.02.2012 20:07:34 | | max memory usage when idle: 3685.61MB
03.02.2012 20:07:34 | | max disk usage: 10.00GB
03.02.2012 20:07:34 | | don't use GPU while active
03.02.2012 20:07:34 | | suspend work if non-BOINC CPU load exceeds 25 %
03.02.2012 20:07:34 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
03.02.2012 20:07:34 | Docking | Giving up on download of 1iiq1hpv_mod0014crossdockinghiv1_9720_16010.inp: file not found
03.02.2012 20:07:34 | Docking | Started download of grid_probes.rtf
03.02.2012 20:07:35 | Docking | Finished download of grid_probes.rtf
03.02.2012 20:07:35 | Docking | Started download of lpdb_amino.rtf
03.02.2012 20:07:38 | Docking | Finished download of lpdb_amino.rtf
03.02.2012 20:07:38 | Docking | Started download of lpdb.prm
03.02.2012 20:07:40 | Docking | Finished download of charmm34_6.23_windows_x86_64
03.02.2012 20:07:40 | Docking | Finished download of lpdb.prm
03.02.2012 20:07:40 | Docking | Started download of lpdb_probes.prm
03.02.2012 20:07:40 | Docking | Started download of logo.jpg
03.02.2012 20:07:41 | Docking | Finished download of lpdb_probes.prm
03.02.2012 20:07:41 | Docking | Started download of minus.jpg
03.02.2012 20:07:42 | Docking | Finished download of logo.jpg
03.02.2012 20:07:42 | Docking | Finished download of minus.jpg
03.02.2012 20:07:42 | Docking | Started download of plus.jpg
03.02.2012 20:07:42 | Docking | Started download of rotate_left.jpg
03.02.2012 20:07:44 | Docking | Finished download of plus.jpg
03.02.2012 20:07:44 | Docking | Finished download of rotate_left.jpg
03.02.2012 20:07:44 | Docking | Started download of rotate_right.jpg
03.02.2012 20:07:44 | Docking | Started download of helvetica.txf
03.02.2012 20:07:45 | Docking | Finished download of rotate_right.jpg
03.02.2012 20:07:46 | Docking | Finished download of helvetica.txf
03.02.2012 20:08:55 | Docking | Sending scheduler request: To fetch work.
03.02.2012 20:08:55 | Docking | Reporting 1 completed tasks, requesting new tasks for CPU
03.02.2012 20:08:59 | Docking | Scheduler request completed: got 2 new tasks
03.02.2012 20:08:59 | | Couldn't parse preferences file - using BOINC defaults
03.02.2012 20:08:59 | | Reading preferences override file
03.02.2012 20:08:59 | | Preferences:
03.02.2012 20:08:59 | | max memory usage when active: 2047.56MB
03.02.2012 20:08:59 | | max memory usage when idle: 3685.61MB
03.02.2012 20:08:59 | | max disk usage: 10.00GB
03.02.2012 20:08:59 | | don't use GPU while active
03.02.2012 20:08:59 | | suspend work if non-BOINC CPU load exceeds 25 %
03.02.2012 20:08:59 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
03.02.2012 20:09:01 | Docking | Started download of 1t7k1m0b_mod0014crossdockinghiv1_541890_198100.inp
03.02.2012 20:09:01 | Docking | Started download of 1t7k1m0b_mod0014crossdockinghiv1_541891_65051.inp
03.02.2012 20:09:06 | Docking | Finished download of 1t7k1m0b_mod0014crossdockinghiv1_541890_198100.inp
03.02.2012 20:09:06 | Docking | Starting task 1t7k1m0b_mod0014crossdockinghiv1_541890_198100_0 using charmm34 version 623
03.02.2012 20:09:11 | Docking | Finished download of 1t7k1m0b_mod0014crossdockinghiv1_541891_65051.inp
03.02.2012 20:09:11 | Docking | Starting task 1t7k1m0b_mod0014crossdockinghiv1_541891_65051_0 using charmm34 version 623
03.02.2012 20:09:21 | | Suspending computation - CPU is busy
03.02.2012 20:09:31 | | Resuming computation
03.02.2012 20:12:31 | Docking | Restarting task 1t7k1m0b_mod0014crossdockinghiv1_541890_198100_0 using charmm34 version 623
03.02.2012 20:12:31 | Docking | Restarting task 1t7k1m0b_mod0014crossdockinghiv1_541891_65051_0 using charmm34 version 623
03.02.2012 20:16:06 | Docking | Sending scheduler request: To fetch work.
03.02.2012 20:16:06 | Docking | Requesting new tasks for NVIDIA GPU
03.02.2012 20:16:10 | Docking | Scheduler request completed: got 0 new tasks
03.02.2012 20:16:10 | | Couldn't parse preferences file - using BOINC defaults
03.02.2012 20:16:10 | | Reading preferences override file
03.02.2012 20:16:10 | | Preferences:
03.02.2012 20:16:10 | | max memory usage when active: 2047.56MB
03.02.2012 20:16:10 | | max memory usage when idle: 3685.61MB
03.02.2012 20:16:10 | | max disk usage: 10.00GB
03.02.2012 20:16:10 | | don't use GPU while active
03.02.2012 20:16:10 | | suspend work if non-BOINC CPU load exceeds 25 %
03.02.2012 20:16:10 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
03.02.2012 20:25:31 | | Suspending computation - CPU is busy
03.02.2012 20:25:41 | | Resuming computation
03.02.2012 20:28:41 | Docking | Restarting task 1t7k1m0b_mod0014crossdockinghiv1_541890_198100_0 using charmm34 version 623
03.02.2012 20:28:41 | Docking | Restarting task 1t7k1m0b_mod0014crossdockinghiv1_541891_65051_0 using charmm34 version 623
03.02.2012 20:32:15 | Docking | Sending scheduler request: To fetch work.
03.02.2012 20:32:15 | Docking | Requesting new tasks for NVIDIA GPU
03.02.2012 20:32:19 | Docking | Scheduler request completed: got 0 new tasks
03.02.2012 20:32:19 | | Couldn't parse preferences file - using BOINC defaults
03.02.2012 20:32:19 | | Reading preferences override file
03.02.2012 20:32:19 | | Preferences:
03.02.2012 20:32:19 | | max memory usage when active: 2047.56MB
03.02.2012 20:32:19 | | max memory usage when idle: 3685.61MB
03.02.2012 20:32:19 | | max disk usage: 10.00GB
03.02.2012 20:32:19 | | don't use GPU while active
03.02.2012 20:32:19 | | suspend work if non-BOINC CPU load exceeds 25 %
03.02.2012 20:32:19 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
03.02.2012 20:35:37 | | Fetching configuration file from http://bam.boincstats.com/get_project_config.php
03.02.2012 20:35:41 | | Contacting account manager at http://bam.boincstats.com/
...
03.02.2012 20:35:43 | | Account manager contact succeeded
03.02.2012 20:36:50 | Docking | task 1t7k1m0b_mod0014crossdockinghiv1_541890_198100_0 aborted by user
03.02.2012 20:36:58 | Docking | task 1t7k1m0b_mod0014crossdockinghiv1_541891_65051_0 aborted by user
03.02.2012 20:37:50 | Docking | Computation for task 1t7k1m0b_mod0014crossdockinghiv1_541890_198100_0 finished
03.02.2012 20:37:58 | Docking | Computation for task 1t7k1m0b_mod0014crossdockinghiv1_541891_65051_0 finished
03.02.2012 20:40:50 | Docking | Sending scheduler request: To report completed tasks.
03.02.2012 20:40:50 | Docking | Reporting 2 completed tasks, not requesting new tasks
03.02.2012 20:40:53 | Docking | Scheduler request completed

D337z

Joined: Mar 8 12
Posts: 1
ID: 51519
Credit: 0
RAC: 0
Message 6587 - Posted 9 Mar 2012 19:09:33 UTC - in response to Message ID 6560 .

The problem appears to be with the program's ability to output its results. It appears to be with CPU only, but the graphics version is not being used. I have an Intel CPU as well. Perhaps the output code is having difficulty working properly?

Profile UBT - Rick Horn

Joined: Jan 10 11
Posts: 4
ID: 36735
Credit: 230,465
RAC: 0
Message 6614 - Posted 23 Mar 2012 9:49:13 UTC

This thread has been running since August 2009, and the problem has still not been solved, despite promises that the admins are working on it.
I can only say that they are working very slowly.

My Win7 64 bit quad is available for Docking, and would more that double my output if only it could be used.

Come on guys, pull your fingers out!


____________

NATE1

Joined: May 17 11
Posts: 4
ID: 40573
Credit: 109,598
RAC: 0
Message 6620 - Posted 25 Mar 2012 16:58:43 UTC

ok, I have a number of intel computer the ones that will run docking 64win7 have vt-x, the ones that will not run docking do not vt-x. go figure.

Michael Tillman

Joined: Jan 30 10
Posts: 3
ID: 25162
Credit: 6,818
RAC: 0
Message 6837 - Posted 24 Sep 2012 16:50:55 UTC

same problem here. no changes using the newest version for gpu usage. docking only 0.00 had to abort all dockings.

Profile spuddly buddly
Avatar

Joined: Aug 16 12
Posts: 24
ID: 66176
Credit: 14,124
RAC: 0
Message 6838 - Posted 25 Sep 2012 9:24:07 UTC

This messageboard has died as is now a terraforming/action art project involving large amounts of organic waste (therefore the smell)run by the Knights who say Ni!
If you want help running docking@home, it's pretty much go figure it out for yourselves, as no one at the project bothers to answer e-mails or messages posted here.
Sorry! Now enjoy the terraforming/action art ... :)
____________
The Knights who say Ni!

mfarley

Joined: Sep 25 12
Posts: 2
ID: 67712
Credit: 0
RAC: 0
Message 6843 - Posted 25 Sep 2012 22:01:20 UTC - in response to Message ID 6838 .

This messageboard has died as is now a terraforming/action art project involving large amounts of organic waste (therefore the smell)run by the Knights who say Ni!
If you want help running docking@home, it's pretty much go figure it out for yourselves, as no one at the project bothers to answer e-mails or messages posted here.
Sorry! Now enjoy the terraforming/action art ... :)


free courses online
mfarley

Joined: Sep 25 12
Posts: 2
ID: 67712
Credit: 0
RAC: 0
Message 6844 - Posted 25 Sep 2012 22:03:12 UTC - in response to Message ID 6837 .

same problem here. no changes using the newest version for gpu usage. docking only 0.00 had to abort all dockings.


myfreecoursesonline
Profile spuddly buddly
Avatar

Joined: Aug 16 12
Posts: 24
ID: 66176
Credit: 14,124
RAC: 0
Message 6845 - Posted 26 Sep 2012 5:31:03 UTC

Damn! Instead of terraforming we've created a spam magent!
____________
The Knights who say Ni!

Profile spuddly buddly
Avatar

Joined: Aug 16 12
Posts: 24
ID: 66176
Credit: 14,124
RAC: 0
Message 6846 - Posted 26 Sep 2012 19:12:46 UTC - in response to Message ID 6845 .

Damn! Instead of terraforming we've created a spam magent!

That should be magnet of course ... (slaps forhead)
____________
The Knights who say Ni!
King Leo

Joined: Apr 26 12
Posts: 3
ID: 54980
Credit: 2,337,218
RAC: 0
Message 6863 - Posted 7 Oct 2012 16:43:53 UTC

After over an hour of crunching, Progress remains at 0.000%. Can anyone help or explain to me what is happening? Thanks. It is happening on one of three of my computers. First began getting computational errors and then it changes to zero progress. There must be a problem on the far end not the user side as my other 2 machines appear to be working okay for now.

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6865 - Posted 8 Oct 2012 5:26:33 UTC - in response to Message ID 6863 .

After over an hour of crunching, Progress remains at 0.000%. Can anyone help or explain to me what is happening? Thanks. It is happening on one of three of my computers. First began getting computational errors and then it changes to zero progress. There must be a problem on the far end not the user side as my other 2 machines appear to be working okay for now.


My guess is that the progress is only updated when a checkpoint is made, and many of the current workunits have some problem that makes them run at least 3 times the initially estimated time and perhaps 10 times the initially estimated time before writing any checkpoints at all (if they ever get around to doing anything useful). I've had to abort at least my last 5 workunits on this computer for this reason. I haven't seen any such workunits in the last few days on my other two computers, perhaps because all three computer participate in many BOINC projects and the others just haven't reached a good time for their next batch of Docking@Home workunits.

der_Day

Joined: Jan 16 10
Posts: 10
ID: 24434
Credit: 1,922,000
RAC: 0
Message 6867 - Posted 8 Oct 2012 11:53:23 UTC

same problem with this one in another thread

Andreas38871

Joined: Jan 8 09
Posts: 2
ID: 5693
Credit: 8,459
RAC: 0
Message 6868 - Posted 8 Oct 2012 14:10:28 UTC

Same problem!
Andreas

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6869 - Posted 8 Oct 2012 16:16:56 UTC

Those mentioning this problem might mention whether they see it only on computers running Windows 7, and whether that happens to be the 64-bit version of Windows 7.

For me, only one of my computers shows the problem, and that one is running 64-bit Windows 7. The other two, running 64-bit Windows Vista, do not show the problem but have had only one Docking@Home workunit each lately.

I now have my Windows 7 computer on No New Tasks for Docking@Home while I check if its last batch of Docking@Home workunits takes much more than the initial estimated time in addition to having no checkpoints and no visible progress.

Toppie*

Joined: Mar 21 12
Posts: 1
ID: 52537
Credit: 187,392
RAC: 0
Message 6870 - Posted 8 Oct 2012 16:56:34 UTC - in response to Message ID 6869 .

Those mentioning this problem might mention whether they see it only on computers running Windows 7, and whether that happens to be the 64-bit version of Windows 7.

For me, only one of my computers shows the problem, and that one is running 64-bit Windows 7. The other two, running 64-bit Windows Vista, do not show the problem but have had only one Docking@Home workunit each lately.

I now have my Windows 7 computer on No New Tasks for Docking@Home while I check if its last batch of Docking@Home workunits takes much more than the initial estimated time in addition to having no checkpoints and no visible progress.


Win Vista 64/ Win 7 64.

Been downloading files with zero content. Spread over 4 machines.
Been downloading files with incorrect crc checksums.All 4 machines.
The workunits that do start, on my Vista machine: Run up to 100% complete
and after 16 hours still the same.
Aborted.
On same machine, same batch, zero% after two hours.On the other 3 machines
I cannot even start to crunch.
I'll wait for better days.
Toppie.
skgiven

Joined: Oct 10 08
Posts: 10
ID: 2331
Credit: 3,721,673
RAC: 0
Message 6871 - Posted 8 Oct 2012 19:00:24 UTC - in response to Message ID 6870 .

I had this issue on one 2008x64 server. 3 tasks running on a quad core opteron. No progress on any task after 18.5h, 14.4h and 14h.3h. CPU usage at 75% (the tasks), and memory being used as expected. I aborted the said tasks. The next tasks started running but didn't progress either so I restarted the system.
After the reboot one task had reached 1% progress by the time I had logged on (running as a daemon). The time was about 3min. to reach this 1% and the checkpoint was at 23sec. About 8min into the run and the same task went to 3.475%. Neither of the other tasks had progressed (0%), so I suspended them.
When I suspended the tasks, two new tasks immediately failed, but another 2 started, reached 1% and then 3.475%. A while later Boinc decided to run new docking tasks, these started but didn't progress after 10min, so I aborted them.

On a W7x64 system (i7-2600K) the tasks are running normally so far.

I prefer the tasks that fail immediately than the tasks that don't progress for hours on end.

Anyway, try a restart and if tasks don't progress after say 10 or 15 min. just abort them - others should run, but babysitting seems to be the order of the day.
Of note is that the tasks that don't progress don't checkpoint, so we might be able to abort them earlier?

My uninformed guess is that these perpetual tasks were built incorrectly; from a dataset that contains a non-standard a-a or Charmm can't handle an atom type/range/angle... Perhaps their names would be useful in tracking the issue down?
____________

der_Day

Joined: Jan 16 10
Posts: 10
ID: 24434
Credit: 1,922,000
RAC: 0
Message 6872 - Posted 8 Oct 2012 19:10:48 UTC - in response to Message ID 6871 .
Last modified: 8 Oct 2012 19:11:52 UTC

I've also a Win7 x64 machine

after the reboot one task had reached 1% progress by the time I had logged on (running as a daemon). The time was about 3min. to reach this 1% and the checkpoint was at 23sec. About 8min into the run and the same task went to 3.475%. Neither of the other tasks had progressed (0%), so I suspended them.
When I suspended the tasks, two new tasks immediately failed, but another 2 started, reached 1% and then 3.475%. A while later Boinc decided to run new docking tasks, these started but didn't progress after 10min, so I aborted them.

On a W7x64 system (i7-2600K) the tasks are running normally so far.

I prefer the tasks that fail immediately than the tasks that don't progress for hours on end.

Anyway, try a restart and if tasks don't progress after say 10 or 15 min. just abort them - others should run, but babysitting seems to be the order of the day.
Of note is that the tasks that don't progress don't checkpoint, so we might be able to abort them earlier?

I don't wait so long. As you said, the first progress is visible after almost 45sec. I checked the slot-folders (for example d:\Boinc\Project_Data\slots) of the broken WUs and saw, that several files are missing.
Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6873 - Posted 8 Oct 2012 19:54:37 UTC

A little more to report: One more workunit on my 64-bit Windows 7 computer failed the same way last night.

One workunit finished on each of my 64-bit Windows Vista computers last night. One failed, but in a different way. The other was validated.

Boyu Zhang
Forum moderator
Project administrator
Project developer
Project tester

Joined: May 5 10
Posts: 88
ID: 28821
Credit: 2,013,795
RAC: 0
Message 6876 - Posted 9 Oct 2012 2:24:32 UTC - in response to Message ID 6873 .

During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again.

Thanks for letting us know and bear with us during the difficulty!

Boyu

Profile Andrea [E.R.]

Joined: Jul 4 11
Posts: 1
ID: 41944
Credit: 148,083
RAC: 0
Message 6877 - Posted 9 Oct 2012 10:06:15 UTC - in response to Message ID 6876 .

During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again.

Thanks for letting us know and bear with us during the difficulty!

Boyu


Thanks!!! :)

I think that i have the same problem with a "1ohr1htf". Should I abort this one too?
Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6878 - Posted 9 Oct 2012 13:12:07 UTC - in response to Message ID 6876 .

During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again.

Thanks for letting us know and bear with us during the difficulty!

Boyu


Looks like the workunits need some test at the beginning that will quickly shut down any incomplete workunits.

My current group of troublesome workunits all have names beginning with 1m0b1htf; should I abort all of them too?
Boyu Zhang
Forum moderator
Project administrator
Project developer
Project tester

Joined: May 5 10
Posts: 88
ID: 28821
Credit: 2,013,795
RAC: 0
Message 6882 - Posted 9 Oct 2012 15:13:01 UTC - in response to Message ID 6878 .

Yes, please abort them too. Thanks!

During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again.

Thanks for letting us know and bear with us during the difficulty!

Boyu


Looks like the workunits need some test at the beginning that will quickly shut down any incomplete workunits.

My current group of troublesome workunits all have names beginning with 1m0b1htf; should I abort all of them too?

Boyu Zhang
Forum moderator
Project administrator
Project developer
Project tester

Joined: May 5 10
Posts: 88
ID: 28821
Credit: 2,013,795
RAC: 0
Message 6883 - Posted 9 Oct 2012 15:13:27 UTC - in response to Message ID 6877 .

Yes, please abort them too, thanks!

During the past weekend, the space on D@H server is getting filled up and as a result, the server sent out some incomplete workunits, please abort workunits with name "1iiq1hih" or "1ohr1hih". Currently, the server is back to normal again.

Thanks for letting us know and bear with us during the difficulty!

Boyu


Thanks!!! :)

I think that i have the same problem with a "1ohr1htf". Should I abort this one too?

lohphat

Joined: Jan 1 10
Posts: 3
ID: 23732
Credit: 3,321,943
RAC: 0
Message 6885 - Posted 9 Oct 2012 18:19:09 UTC

Why isn't this problem posted as a news item in the server status section yet?

googloo

Joined: Nov 30 09
Posts: 6
ID: 22204
Credit: 1,182,026
RAC: 0
Message 6886 - Posted 9 Oct 2012 19:06:52 UTC

I have set Docking@Home to no new tasks and have aborted all current tasks. I had two more tasks run for hours this morning with 0 progress. Please let us know when you have fixed this problem.

skgiven

Joined: Oct 10 08
Posts: 10
ID: 2331
Credit: 3,721,673
RAC: 0
Message 6888 - Posted 9 Oct 2012 19:20:48 UTC - in response to Message ID 6886 .

I have set Docking@Home to no new tasks and have aborted all current tasks. I had two more tasks run for hours this morning with 0 progress. Please let us know when you have fixed this problem.

Yes, please let us know when the problem has been fixed.
Presently, I think the server isn't sending new tasks, which is good saying as they don't work.
Can you send server aborts, to expedite the resolution?

GL
____________
Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6890 - Posted 10 Oct 2012 0:40:13 UTC - in response to Message ID 6882 .
Last modified: 10 Oct 2012 0:40:58 UTC

Yes, please abort them too. Thanks!

My current group of troublesome workunits all have names beginning with 1m0b1htf; should I abort all of them too?



Aborted.

Could you let us know when you have a new batch of workunits that have been adequately tested under 64-bit Windows 7, and the other versions of Windows mentioned recently in this thread?
Aaron Finney
Volunteer tester

Joined: Mar 23 07
Posts: 74
ID: 367
Credit: 2,409,831
RAC: 0
Message 6894 - Posted 10 Oct 2012 14:13:59 UTC - in response to Message ID 6890 .

Yes, please abort them too. Thanks!

My current group of troublesome workunits all have names beginning with 1m0b1htf; should I abort all of them too?



Aborted.

Could you let us know when you have a new batch of workunits that have been adequately tested under 64-bit Windows 7, and the other versions of Windows mentioned recently in this thread?


I have new workunits today with 1hbv1hih string at the beginning. All 8 of them 2 hours in and 0% complete.
Profile TheFiend

Joined: Apr 7 09
Posts: 70
ID: 9482
Credit: 20,705,527
RAC: 0
Message 6895 - Posted 10 Oct 2012 14:37:21 UTC

The current 0% problem is not just restricted to Win 7 x64, all my Docking is done on XP x86 crunchers.

Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6897 - Posted 10 Oct 2012 21:01:06 UTC - in response to Message ID 6895 .

The current 0% problem is not just restricted to Win 7 x64, all my Docking is done on XP x86 crunchers.


Not restricted for me either. One of my 64-bit Windows Vista computers has now had two such failures, and is now on No New Tasks for Docking@Home.

All three of my computers run BOINC 7.0.28.
rixx

Joined: Mar 29 10
Posts: 4
ID: 27550
Credit: 1,112,714
RAC: 0
Message 6898 - Posted 10 Oct 2012 21:07:01 UTC

This problem is in Linux too (Arch Linux x86_64).

hugos

Joined: Jul 23 12
Posts: 1
ID: 64191
Credit: 1,716,384
RAC: 0
Message 6899 - Posted 10 Oct 2012 23:25:27 UTC

I'm in it for the science (and subsequent speedup of medical research, ie, my life expectancy) and will keep testing WUs with new tasks even if my RAC takes a dive. Had loads of 1m0b1htf ones that are now aborted.

Profile UBT - Timbo
Volunteer tester

Joined: Sep 13 06
Posts: 9
ID: 46
Credit: 159,440
RAC: 0
Message 6904 - Posted 11 Oct 2012 11:37:09 UTC

Hi all,

I just posted in another thread on this forum (url="http://docking.cis.udel.edu/community/forum/thread.php?id=499") that I've got the same issue:

Docking WU's are just spinning their wheels and Progress stays at 0.000%.

I've aborted the WU's and hope that someone, somewhere on this project can fix this issue, as it seems to have been problematic for about 3 years now (earliest post was in 2009 !!).

I can't see that it's a client issue, as there's seems to be no "constant" throughout the reports made on here.....Win and Linux are affected, various versions of BOINC Manager are noted, and different types of PC's, with different CPU's.

It seems to me to be a WU related issue ?

regards
Tim
Founder, UK BOINC Team

Boyu Zhang
Forum moderator
Project administrator
Project developer
Project tester

Joined: May 5 10
Posts: 88
ID: 28821
Credit: 2,013,795
RAC: 0
Message 6905 - Posted 11 Oct 2012 17:39:24 UTC

Hi all,

Please abort all the 0% progress workunits, I posted an entry regarding this on the News: http://docking.cis.udel.edu/

Sorry for the inconvenience and thanks for baring with us!!

Boyu

Cluster Physik

Joined: Jul 2 09
Posts: 35
ID: 14795
Credit: 16,067,012
RAC: 0
Message 6907 - Posted 11 Oct 2012 18:30:26 UTC - in response to Message ID 6905 .

Hi all,

Please abort all the 0% progress workunits, I posted an entry regarding this on the News: http://docking.cis.udel.edu/

Sorry for the inconvenience and thanks for baring with us!!

Boyu

Can't you abort them remotely from the project's side (other projects like RNA do this regularly)? Would be much more convenient for people who don't have the time to check all machines for WUs blocking the computation.
Boyu Zhang
Forum moderator
Project administrator
Project developer
Project tester

Joined: May 5 10
Posts: 88
ID: 28821
Credit: 2,013,795
RAC: 0
Message 6908 - Posted 11 Oct 2012 19:53:23 UTC - in response to Message ID 6907 .

I aborted the ones that are "unsent" from server side, but for the workunits that are already sent to the volunteers, we do not have control from the server side.

Sorry for the inconvenience!

Boyu

Hi all,

Please abort all the 0% progress workunits, I posted an entry regarding this on the News: http://docking.cis.udel.edu/

Sorry for the inconvenience and thanks for baring with us!!

Boyu

Can't you abort them remotely from the project's side (other projects like RNA do this regularly)? Would be much more convenient for people who don't have the time to check all machines for WUs blocking the computation.

Mark Rush

Joined: Feb 15 09
Posts: 4
ID: 7162
Credit: 5,779,850
RAC: 0
Message 6909 - Posted 12 Oct 2012 1:44:11 UTC - in response to Message ID 6908 .

I aborted the ones that are "unsent" from server side, but for the workunits that are already sent to the volunteers, we do not have control from the server side.

Sorry for the inconvenience!

Boyu


As I am sure your realize, this situation makes for a rather large pain in the tush. I will have to check several machines to make certain that the defective Docking WUs are not blocking Docking and the other projects I run as well. I pay attention to BOINC, so while it's an issue, for me, it's not insurmountable. I expect that many other crunchers do not pay attention and for them the defective Docking WUs might be a major slowdown, not only for Docking but for other projects. As it happens, other projects (Malariacontrol for instance) have the ability to delete WUs after they are downloaded. I urge in the strongest possible terms for Docking to spend some resources developing this capability.

Mark
Cluster Physik

Joined: Jul 2 09
Posts: 35
ID: 14795
Credit: 16,067,012
RAC: 0
Message 6912 - Posted 12 Oct 2012 18:15:53 UTC - in response to Message ID 6909 .

I aborted the ones that are "unsent" from server side, but for the workunits that are already sent to the volunteers, we do not have control from the server side.

Sorry for the inconvenience!

Boyu

[..]
As it happens, other projects (Malariacontrol for instance) have the ability to delete WUs after they are downloaded. I urge in the strongest possible terms for Docking to spend some resources developing this capability.

I second that. And I can only reiterate, that other projects can do it. Mark mentioned MalariaControl and I mentioned RNA World before. Both projects have the ability to cancel tasks remotely (for instance when the results are not needed anymore). The BOINC platforms offers this somehwhere for sure.
Profile robertmiles

Joined: Apr 16 09
Posts: 96
ID: 9967
Credit: 1,290,747
RAC: 0
Message 6918 - Posted 15 Oct 2012 2:14:12 UTC
Last modified: 15 Oct 2012 2:15:10 UTC

SOMETHING has allowed my Windows 7 computer to resume Docking@Home workunits. I can't tell if it was an improvement in the workunits, or the fact that I drained that computer of Docking@Home workunits and then told BOINC Manager to reset that project.

dgnuff

Joined: Jan 7 11
Posts: 2
ID: 36644
Credit: 8,253,291
RAC: 0
Message 6925 - Posted 17 Oct 2012 9:27:58 UTC - in response to Message ID 6918 .
Last modified: 17 Oct 2012 9:45:07 UTC

-- Deleted --

Fred Verster
Avatar

Joined: May 8 09
Posts: 26
ID: 11034
Credit: 2,647,353
RAC: 0
Message 6934 - Posted 24 Oct 2012 11:21:30 UTC

Hi, this morning I noticed several tasks running High Priority , but
don't make any progress after 105 hours! Still at 0%.

Seems useless to let it run, so deleting these is the only(?) option?
Atleast for the 4 tasks that are running now, all with 0% progress after >50 hours.


____________

Knight who says N! Ni Ni

Boyu Zhang
Forum moderator
Project administrator
Project developer
Project tester

Joined: May 5 10
Posts: 88
ID: 28821
Credit: 2,013,795
RAC: 0
Message 6936 - Posted 25 Oct 2012 13:46:40 UTC - in response to Message ID 6934 .

Dear Fred,

Please abort all the 0% progress workunits, they are part of the incomplete workunits from the previous batch.

Sorry for the inconvenience!

Thanks!
Boyu

Hi, this morning I noticed several tasks running High Priority , but
don't make any progress after 105 hours! Still at 0%.

Seems useless to let it run, so deleting these is the only(?) option?
Atleast for the 4 tasks that are running now, all with 0% progress after >50 hours.


Aaron Finney
Volunteer tester

Joined: Mar 23 07
Posts: 74
ID: 367
Credit: 2,409,831
RAC: 0
Message 6989 - Posted 16 Nov 2012 16:21:55 UTC - in response to Message ID 6936 .
Last modified: 16 Nov 2012 16:28:57 UTC

Still getting these workunits. Had 6 today with 42 hours elapsed time..

They shouldn't be sent out if they are going to do this.

1d4h1hih_ <--- Workunits start with this prefix.

Message boards : Number crunching : HELP - Consistant 0% Progress - Client Problem?

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#264 (2) {
      ["db_conn"]=>
      resource(684) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(51) "update DBNAME.thread set views=views+1 where id=460"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#264 (2) {
      ["db_conn"]=>
      resource(684) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(3) "460"
        ["forum"]=>
        string(1) "2"
        ["owner"]=>
        string(4) "9674"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(47) "HELP - Consistant 0% Progress - Client Problem?"
        ["timestamp"]=>
        string(10) "1353082915"
        ["views"]=>
        string(4) "5327"
        ["replies"]=>
        string(3) "258"
        ["activity"]=>
        string(22) "2.6183915503533997e-34"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1250668904"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(3) "460"
      ["forum"]=>
      string(1) "2"
      ["owner"]=>
      string(4) "9674"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(47) "HELP - Consistant 0% Progress - Client Problem?"
      ["timestamp"]=>
      string(10) "1353082915"
      ["views"]=>
      string(4) "5327"
      ["replies"]=>
      string(3) "258"
      ["activity"]=>
      string(22) "2.6183915503533997e-34"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1250668904"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=460