Excessive disk activity


Advanced search

Message boards : Number crunching : Excessive disk activity

Sort
Author Message
Profile Evil-Dragon
Volunteer tester

Joined: Oct 26 06
Posts: 2
ID: 192
Credit: 11,462
RAC: 0
Message 2503 - Posted 15 Feb 2007 23:04:43 UTC
Last modified: 15 Feb 2007 23:05:08 UTC

I have been noticing that the current application seems to be doing a heck of a lot of disk reading and writing all the time while a WU is running.

According to task manager charmm_5.4_windows_intelx86.exe is doing 200 I/O Reads (1,000,000+ Read Bytes) per second and 4-5 (1,000-2,000 Write Bytes) I/O Writes per second.

Is there any way to update the application so it does not use so much disk activity?

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 2504 - Posted 16 Feb 2007 3:47:00 UTC - in response to Message ID 2503 .

I have been noticing that the current application seems to be doing a heck of a lot of disk reading and writing all the time while a WU is running.


That's correct. There's a couple of threads on this subject already available.

According to task manager charmm_5.4_windows_intelx86.exe is doing 200 I/O Reads (1,000,000+ Read Bytes) per second and 4-5 (1,000-2,000 Write Bytes) I/O Writes per second.


Reading: the docking grid is read from disk every time we do a docking trial (which is currently about 80x80=6400 times per run). This is the way charmm does it currently, but we are going to try to change this as it shouldn't be necessary to read the same data from disk over an over again.

Writing: the charmm log file (with currently lots of info to help us debugging) and the app checkpointing (also after every docking trial). The checkpointing is going to be more effective later on (or at least that is the plan) and we are going to check the volunteers' 'write to disk' preference in the future (we don't do this yet).

Is there any way to update the application so it does not use so much disk activity?


It's on our to-do list :-)

Thanks
Andre
____________
D@H the greatest project in the world... a while from now!
Profile clownius
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 61
ID: 280
Credit: 2,677
RAC: 0
Message 2507 - Posted 17 Feb 2007 9:34:36 UTC

Within a few days of running docking again i blew up another hard disk. So it may be stressing them a little more than i would like. Hope it gets reduced soon.

Im adding a disclaimer here though. The disk that was killed was an old PATA33 3.2Gb hard disk that's rather ancient to put it mildly. It doesn't seem to worry the more modern disks i have so i wouldn't expect this will cause a porblem for more modern computer just older (see ancient) machines that crunch 24/7 like mine do. Obviously older machines running that hard are going to suffer some failures.
____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 2512 - Posted 17 Feb 2007 21:06:54 UTC - in response to Message ID 2507 .

The 'ancientness' of the disk would explain the failure. Modern sata disk have a very low failure rate of 1 in 100'000 hours or so. An example is the sata disks in the raid system of our 128 cpu cluster her at UTEP. None of the disks have ever failed in more than a year now and these are hammered on continuously (much more than a desktop disk will ever experience).

Thanks!
Andre

Within a few days of running docking again i blew up another hard disk. So it may be stressing them a little more than i would like. Hope it gets reduced soon.

Im adding a disclaimer here though. The disk that was killed was an old PATA33 3.2Gb hard disk that's rather ancient to put it mildly. It doesn't seem to worry the more modern disks i have so i wouldn't expect this will cause a porblem for more modern computer just older (see ancient) machines that crunch 24/7 like mine do. Obviously older machines running that hard are going to suffer some failures.


____________
D@H the greatest project in the world... a while from now!
Profile clownius
Volunteer tester
Avatar

Joined: Nov 14 06
Posts: 61
ID: 280
Credit: 2,677
RAC: 0
Message 2513 - Posted 17 Feb 2007 23:40:02 UTC

Yeah i have some hardware i should be ashamed of lol. That same machine has a NIC with both 10Mb Ethernet and a BNC connector but it works so i use it lol. This is the third small HDD ive lost in this machine in the last few months and my spares are slowly getting newer as the old stuff finally bites the dust. I wouldn't mind a reduction in disk writes anyway as i will protect some of my slightly more modern 4-40Gb PATA disks. The SATA disk i have hasn't missed a beat yet and nor do i expect it would any time soon.
I rambled enough anyway but its a sad day as another P3 class machine (Coppermine Celeron) will be removed from crunching as soon as a replacement arrives (next week) as i spend more time repairing the thing than using it now so its time to go to the great farm in the sky. Hope the others with similar machines on Linux still have enough partners to make quorum.
____________

Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 2621 - Posted 28 Feb 2007 21:17:19 UTC

Any news about a fix for windows?

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 2622 - Posted 28 Feb 2007 22:10:31 UTC - in response to Message ID 2621 .

Windows, Mac, Linux: all the same problem: our checkpointing (which we are working on) and the amount of log entries we write to charmm.out (which we need because we are in alpha).

Thanks for asking!
Andre

Any news about a fix for windows?


____________
D@H the greatest project in the world... a while from now!
Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 2992 - Posted 11 Apr 2007 18:58:57 UTC

Nothing changed with v5.05, my harddisk is a spaceshuttle :o

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 2994 - Posted 12 Apr 2007 4:02:20 UTC - in response to Message ID 2992 .

That doesn't make sense since we reduced log writing by a factor 50 at least. I will be out of town the rest of the week, but will ask Memo to check it tomorrow.

Thanks for the report.
Andre

Nothing changed with v5.05, my harddisk is a spaceshuttle :o


____________
D@H the greatest project in the world... a while from now!
Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 3003 - Posted 12 Apr 2007 7:46:11 UTC
Last modified: 12 Apr 2007 7:46:41 UTC

Here a short debugging of windows app v5.05 for better understanding on Boinc 5.8.15:

12/04/2007 09:43:31|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:34|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:36|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:39|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:41|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:43|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:45|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:47|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3007 - Posted 12 Apr 2007 10:39:58 UTC - in response to Message ID 3003 .

Thanks. We'll check it out!!

Andre

Here a short debugging of windows app v5.05 for better understanding on Boinc 5.8.15:

12/04/2007 09:43:31|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:34|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:36|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:39|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:41|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:43|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:45|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed
12/04/2007 09:43:47|Docking@Home|[task_debug] result 1tng_mod0001_44877_277298_0 checkpointed


____________
D@H the greatest project in the world... a while from now!
Aaron Finney
Volunteer tester

Joined: Mar 23 07
Posts: 74
ID: 367
Credit: 2,409,831
RAC: 0
Message 3032 - Posted 13 Apr 2007 23:10:30 UTC - in response to Message ID 2992 .

Nothing changed with v5.05, my harddisk is a spaceshuttle :o


I concur.

36 minutes and over 400,000 I/O reads. The I/O WRITES however, are only 9,281

The I/O reads need to be looked at.
Aaron Finney
Volunteer tester

Joined: Mar 23 07
Posts: 74
ID: 367
Credit: 2,409,831
RAC: 0
Message 3033 - Posted 13 Apr 2007 23:11:14 UTC - in response to Message ID 2994 .
Last modified: 13 Apr 2007 23:11:31 UTC

That doesn't make sense since we reduced log writing by a factor 50 at least. I will be out of town the rest of the week, but will ask Memo to check it tomorrow.

Thanks for the report.
Andre


It's not the I/O WRITES, it's the READS :-D (at least on my rig)

Hope that clarifies.
Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3035 - Posted 14 Apr 2007 4:51:23 UTC

Same here. I looked at task manager on XP and a WU about 75% complete shows charmm_5.5_windows_intelx86 at nearly 1 million reads totaling over 7 GB from disk and it's only written a bit over 11 MB. Memory size is 12744KB and peak memory is the same. Memory delta is staying at zero and page faults totals 3357 and is staying constant. VM size is constant at 64212. It shows 4 threads running and handles is usually 65 with an occasional 66. Something called I/O Other shows about half a million and I/O Other bytes is over 7 million.

Boinc.exe memory usage is constantly bouncing between 9452K and 9636K. Memory Delta is constantly bouncing between + 184K and - 184K. It has over 12 million page faults and the page fault delta is different every display, ranging from 0 (rarely) to 351 page faults. It shows 165 handles, 3 threads, 4169 I/O Reads totaling almost 16 MB. I/O other is over 4 million and I/O Other bytes is about 67 MB. I/O write bytes is huge at 11 and a half GB. VM size bounces around between 5384K and 5692K.

This is on XP Pro with a non hyperthreaded Northwood P4. MS Access, Firefox, Thunderbird, OpenOffice Word, Notepad, and task manager are open but only task manager and firefox are actively being used (to write this). Summary shows 50 processes, 100% CPU usage, and memory Commit Charge: 593M/3429M to 595M/3429M. The system has 1.5 GB ram and a couple GB of swap.

The about box says boinc is version 5.8.15

HTH,

-- David
____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 3044 - Posted 15 Apr 2007 1:25:16 UTC

On checking my Amd 4800+ Windows machine I found that I was in a similar boat to David and Aaron.
With a Charmm job only 24 minutes done at 18% I found this in Task manager
I/O Reads = 217,800
I/O Reads Byte usage = 1.66 GB (going up at 1 MB per second)
I/O Writes = 5,000
I/O Writes Byte Usage = 4.07 MB

As a comparison I have a QMC job that has been running for over 8 Hours at 40% complete and it has done
I/O Reads = 1,558
I/O Reads Byte usage = 9.6 MB
I/O Writes = 3.3 MB
I/O Writes Byte usage = 111.47 MB

So Charmm is putting a much bigger load on the computer than QMC in a much shorter time frame.
____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3082 - Posted 16 Apr 2007 22:43:45 UTC - in response to Message ID 3044 .

You will see many reads, because Charmm is an interpreter for a script file that implements the docking algorithm that we use. This script (ending .inp in your projects/docking.utep.edu directory) is about a megabyte in size. This does not mean that we read this from your physical disk continuously though: every operating system has something called a buffer cache where data is kept that is read from disk the first time it is needed and then kept there as long as there is space and the data doesn't change. If you use the iostat utility on Linux will see that the number of physical reads/sec is basically 0 after a couple of seconds of running time. The number of reads from the cache won't be zero, but they will be done very fast, because this cache resides in memory. I am sure that even Windows will have some sort of buffer cache although I don't know what it is called (I'm a unix guy :-)

Cheers
Andre

On checking my Amd 4800+ Windows machine I found that I was in a similar boat to David and Aaron.
With a Charmm job only 24 minutes done at 18% I found this in Task manager
I/O Reads = 217,800
I/O Reads Byte usage = 1.66 GB (going up at 1 MB per second)
I/O Writes = 5,000
I/O Writes Byte Usage = 4.07 MB

As a comparison I have a QMC job that has been running for over 8 Hours at 40% complete and it has done
I/O Reads = 1,558
I/O Reads Byte usage = 9.6 MB
I/O Writes = 3.3 MB
I/O Writes Byte usage = 111.47 MB

So Charmm is putting a much bigger load on the computer than QMC in a much shorter time frame.


____________
D@H the greatest project in the world... a while from now!
Aaron Finney
Volunteer tester

Joined: Mar 23 07
Posts: 74
ID: 367
Credit: 2,409,831
RAC: 0
Message 3093 - Posted 19 Apr 2007 17:59:53 UTC - in response to Message ID 3082 .

You will see many reads, because Charmm is an interpreter for a script file that implements the docking algorithm that we use. This script (ending .inp in your projects/docking.utep.edu directory) is about a megabyte in size. This does not mean that we read this from your physical disk continuously though: every operating system has something called a buffer cache where data is kept that is read from disk the first time it is needed and then kept there as long as there is space and the data doesn't change. If you use the iostat utility on Linux will see that the number of physical reads/sec is basically 0 after a couple of seconds of running time. The number of reads from the cache won't be zero, but they will be done very fast, because this cache resides in memory. I am sure that even Windows will have some sort of buffer cache although I don't know what it is called (I'm a unix guy :-)

Cheers
Andre

On checking my Amd 4800+ Windows machine I found that I was in a similar boat to David and Aaron.
With a Charmm job only 24 minutes done at 18% I found this in Task manager
I/O Reads = 217,800
I/O Reads Byte usage = 1.66 GB (going up at 1 MB per second)
I/O Writes = 5,000
I/O Writes Byte Usage = 4.07 MB

As a comparison I have a QMC job that has been running for over 8 Hours at 40% complete and it has done
I/O Reads = 1,558
I/O Reads Byte usage = 9.6 MB
I/O Writes = 3.3 MB
I/O Writes Byte usage = 111.47 MB

So Charmm is putting a much bigger load on the computer than QMC in a much shorter time frame.



May want to sticky this or put it into the FAQ. Could see people wondering why this app is 'seemingly' making 8 gigabytes of disk reading activity per w/u.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3094 - Posted 19 Apr 2007 19:40:32 UTC - in response to Message ID 3093 .

I just stuck it in the FAQ 1 hour ago :-)
I guess great minds think alike :-)

AK


May want to sticky this or put it into the FAQ. Could see people wondering why this app is 'seemingly' making 8 gigabytes of disk reading activity per w/u.


____________
D@H the greatest project in the world... a while from now!
Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 3131 - Posted 27 Apr 2007 9:21:10 UTC

Charmm 5.07 for windows:

27/04/2007 11:11:21|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:23|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:24|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:26|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:29|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:31|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:32|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:34|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:36|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:37|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:40|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed
27/04/2007 11:11:42|Docking@Home|[task_debug] result 1tng_mod0011_117_389293_1 checkpointed

I see no changes. If you can extend the percentage status every 1% (writing) it could be better and have no checkpoints every 2sec.

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3134 - Posted 27 Apr 2007 10:13:33 UTC

Where are you seeing this. I just looked at the _0 through _3 , std???dae.txt and std???gui.txt files for a 90% complete WU (the real files not the windows soft links) and I'm not seeing any excessive output like that in them. This is on Vista.

Is there a chance that there's some kind of debug override flag set and that it's also causing the checkpoints as well as reporting them? I've never worked with any of the override flags. OTOH, I'm a programmer and debug code is well known for changing the behavior of a program, although usually it makes the problem you're trying to find go away until you remove the debug code and then the problem comes back :-)

-- David

____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 3135 - Posted 27 Apr 2007 11:12:19 UTC - in response to Message ID 3134 .
Last modified: 27 Apr 2007 11:12:59 UTC

Where are you seeing this. I just looked at the _0 through _3 , std???dae.txt and std???gui.txt files for a 90% complete WU (the real files not the windows soft links) and I'm not seeing any excessive output like that in them. This is on Vista.

Is there a chance that there's some kind of debug override flag set and that it's also causing the checkpoints as well as reporting them? I've never worked with any of the override flags. OTOH, I'm a programmer and debug code is well known for changing the behavior of a program, although usually it makes the problem you're trying to find go away until you remove the debug code and then the problem comes back :-)

-- David


The debug code is only for testing, without it its the same problem. You can use the cc_config.xml with <task_debug>1</task_debug> to debug on/off while boinc is running.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3175 - Posted 28 Apr 2007 22:07:40 UTC - in response to Message ID 3135 .

Just wrote this reply to Reb in an email as answer on his question and concerns:

Due to the way the application works, we have to checkpoint every rotation or else we wouldn't be able to restart correctly. The fact that the app uses randomness makes this problem even more challenging. The cycles will become longer with more complex molecules (1tng is a very simple one) which means that the checkpointing frequency will automatically go down. For your information: we only write a couple of 100 bytes per checkpoint, which will not do damage to your hard disk at all; on the contrary, that's what these devices are made for. For example in our computer cluster we continuously write megabytes per second to the disks and none of them have failed for up to a year now :-) So don't worry too much about your disks, these are very sturdy devices!

Thanks
Andre

Where are you seeing this. I just looked at the _0 through _3 , std???dae.txt and std???gui.txt files for a 90% complete WU (the real files not the windows soft links) and I'm not seeing any excessive output like that in them. This is on Vista.

Is there a chance that there's some kind of debug override flag set and that it's also causing the checkpoints as well as reporting them? I've never worked with any of the override flags. OTOH, I'm a programmer and debug code is well known for changing the behavior of a program, although usually it makes the problem you're trying to find go away until you remove the debug code and then the problem comes back :-)

-- David


The debug code is only for testing, without it its the same problem. You can use the cc_config.xml with <task_debug>1</task_debug> to debug on/off while boinc is running.


____________
D@H the greatest project in the world... a while from now!
Profile [B^S] BOINC-SG
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 17
ID: 136
Credit: 52,985
RAC: 0
Message 3291 - Posted 13 May 2007 10:32:41 UTC

Hi,

I just crunched my first Docking wu's and I was a bit concerned by the checkpointing every second...


13.05.2007 02:44:38|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:39|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:40|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:41|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:43|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:44|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:45|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:46|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:47|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:48|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:49|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:50|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:52|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:53|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:54|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:55|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:56|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:57|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:58|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:44:59|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:45:00|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:45:02|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed
13.05.2007 02:45:03|Docking@Home|[task_debug] result 1tng_mod0011_10119_481170_1 checkpointed


But according to SpeedFan, the continous checkpointing doesnt do much harm to the harddisk (means, the temperature didnt increase - which it does then I convert a videofile for example...).

But still: Isnt it possible to somehow collect the data in RAM first and then write it to the harddisc every minute or every 10 minutes? - Dont know whats more stressful, though: Writing 100bytes every second or 6 KBytes every minute (60KBytes every 10 minutes)?

Cheers, Shai
____________


My NEW BOINC-Site

Why people joined BOINC Synergy...
STE\/E [BlackOpsTeam]
Volunteer tester

Joined: Nov 14 06
Posts: 47
ID: 292
Credit: 10,082,802
RAC: 0
Message 3294 - Posted 13 May 2007 13:00:59 UTC
Last modified: 13 May 2007 13:15:46 UTC

But all this Check-pointing also has an adverse affect on the BOINC.exe too, it constantly shows anywheres fron 4% up to 17% CPU usage in the TaskManger. In fact I can't even get 1 PC to run right because the BOINC.exe is using 23% to 25% of the CPU (Q6600 Quad Core). The BOINC Manager will just Lock up for seconds at a time if I try to use any of the Tabs.

I've UnInstalled the BOINC Client several times & tried different Versions of the Client, all with the same results. The only way I got the manager to not use up to 25% of the CPU on that PC was to create a new Directory and install BOINC there, it still uses the normal amount of the CPU but no more than the rest of my PC's at least.

Profile [B^S] BOINC-SG
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 17
ID: 136
Credit: 52,985
RAC: 0
Message 3295 - Posted 13 May 2007 13:26:10 UTC - in response to Message ID 3294 .

But all this Check-pointing also has an adverse affect on the BOINC.exe too, it constantly shows anywheres fron 4% up to 17% CPU usage in the TaskManger. In fact I can't even get 1 PC to run right because the BOINC.exe is using 23% to 25% of the CPU (Q6600 Quad Core). The BOINC Manager will just Lock up for seconds at a time if I try to use any of the Tabs.

I've UnInstalled the BOINC Client several times & tried different Versions of the Client, all with the same results. The only way I got the manager to not use up to 25% of the CPU on that PC was to create a new Directory and install BOINC there, it still uses the normal amount of the CPU but no more than the rest of my PC's at least.


Just checked that and its true. While crunching Docking, BOINC.exe uses 3-4% CPU constantly - which it doesnt do while crunching WCG, Rosetta or Einstein, for example.
____________


My NEW BOINC-Site

Why people joined BOINC Synergy...
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3301 - Posted 13 May 2007 19:56:02 UTC - in response to Message ID 3291 .

That is some of the solutions we are looking at at the moment. It's not an easy thing to do in Charmm though, because reads and writes are being done everywhere in the 300,000 lines of code and unfortunately we didn't write it: approx. 100 devs are working on this code and everybody has their own coding style :-(

But good point anyway!

Andre

PS Per checkpoint we don't write a whole lot of data and as soon as our problems are getting bigger the checkpoint frequency will go down to minutes instead of hours. Remember that our protein-ligand docking problem is currently very small and charmm spends more time on the disk then using the cpu because of this; when our problems are getting bigger, you will see the amount of time spent on the cpu take over. Michela is currently working on some new problems that we can distribute soon.


But still: Isnt it possible to somehow collect the data in RAM first and then write it to the harddisc every minute or every 10 minutes? - Dont know whats more stressful, though: Writing 100bytes every second or 6 KBytes every minute (60KBytes every 10 minutes)?

Cheers, Shai


____________
D@H the greatest project in the world... a while from now!

Message boards : Number crunching : Excessive disk activity

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#31 (2) {
      ["db_conn"]=>
      resource(102) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(51) "update DBNAME.thread set views=views+1 where id=179"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#31 (2) {
      ["db_conn"]=>
      resource(102) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(3) "179"
        ["forum"]=>
        string(1) "2"
        ["owner"]=>
        string(3) "192"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(23) "Excessive disk activity"
        ["timestamp"]=>
        string(10) "1179086162"
        ["views"]=>
        string(4) "2735"
        ["replies"]=>
        string(2) "25"
        ["activity"]=>
        string(23) "1.7189412810210998e-120"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1171580683"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(3) "179"
      ["forum"]=>
      string(1) "2"
      ["owner"]=>
      string(3) "192"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(23) "Excessive disk activity"
      ["timestamp"]=>
      string(10) "1179086162"
      ["views"]=>
      string(4) "2735"
      ["replies"]=>
      string(2) "25"
      ["activity"]=>
      string(23) "1.7189412810210998e-120"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1171580683"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=179