Issues with Charmm 5.07


Advanced search

Message boards : Number crunching : Issues with Charmm 5.07

Sort
Author Message
Profile Krunchin-Keith [USA]
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 41
ID: 4
Credit: 1,539,093
RAC: 0
Message 3127 - Posted 27 Apr 2007 3:52:15 UTC

OK, I'll kick this off.

Heres the good and the bad...

Listed as
Host and O/S
Average time of last dozen or so Charmm 5.05 results
Estimated completion of first running Charmm 5.07 all about 50-60% complete

3.80GHz P4-HT WindowsXP - 4:20 hrs vs 2:56 hrs **Faster
3.20GHz P4-HT WindowsXP - 4:45 hrs vs 3:29 hrs **Faster
3.06GHz P4-HT WindowsXP - 5:16 hrs vs 4:09 hrs **Faster

733MHz P3 Linux - 12:51hrs vs 16:25 hrs **Slower, Much Slower.

3.20GHz P4-HT Linux - no data vs 4:20 hrs
This host is identical hardware wise to the 3.20GHz WindowsXP above - no other software running currently except BOINC
5.07 WindowsXP run time 3:29 hours
5.07 Linux run time 4:20 hours

I'll report again tomorrow after some actual tasks complete.
____________
Alpha Tester ~~~ BOINCin since 10-Apr-2004 (2.28) ~~~ Join team USA

Profile John B. Kalla
Volunteer tester
Avatar

Joined: Oct 18 06
Posts: 54
ID: 188
Credit: 104,643
RAC: 0
Message 3130 - Posted 27 Apr 2007 8:12:25 UTC
Last modified: 27 Apr 2007 8:16:46 UTC

MacOS X 10.4.9, BOINC 5.8.17:

Before: One workunit ~1.6hrs
Last week: One workunit ~11hrs
Today: One workunit ~12.4hrs (estimated, as I'm at 41% and 04:57) (Charmm 5.07)

Slower than last week. Yikes!

____________
John

MacPro
2 x 2.66GHz Dual-Core Xeon | 2GB RAM | ATI x1900 | BOINC 5.9.5

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3133 - Posted 27 Apr 2007 9:41:09 UTC

Don't believe the initial estimates of runtime.

I think the estimates are based on a combination of an estimated FLOPS count for the WU, the BOINC client benchmark, and the runtime of the previous WUs. It might take a few completed work units to correct itself.

I just finished my first WU on a Pentiun D 925 and it came in at just over 2 hours. That's about an hour shorter than it was and this is on a system with an SATA drive so it wasn't affected much by the extra disk I/O. It's a Windows Vista system, BTW.

-- David

____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

Aaron Finney
Volunteer tester

Joined: Mar 23 07
Posts: 74
ID: 367
Credit: 2,409,831
RAC: 0
Message 3137 - Posted 27 Apr 2007 11:34:06 UTC - in response to Message ID 3133 .
Last modified: 27 Apr 2007 11:34:28 UTC

Don't believe the initial estimates of runtime.

I think the estimates are based on a combination of an estimated FLOPS count for the WU, the BOINC client benchmark, and the runtime of the previous WUs. It might take a few completed work units to correct itself.

I just finished my first WU on a Pentiun D 925 and it came in at just over 2 hours. That's about an hour shorter than it was and this is on a system with an SATA drive so it wasn't affected much by the extra disk I/O. It's a Windows Vista system, BTW.

-- David


I'm blowing through 2 workunits every hour and 20 mins >:P I'm stoked. Before it was 2 hours or more for the same result!
Profile Rebirther
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 63
ID: 52
Credit: 69,033
RAC: 0
Message 3140 - Posted 27 Apr 2007 12:46:58 UTC

Charmm 5.06/5.07=5h/2:48h
great improvement :)

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3145 - Posted 27 Apr 2007 14:38:22 UTC - in response to Message ID 3130 .

This is weird. Unfortunately we only have an Intel MiniMac in the lab,but our tests showed a consistent 5 hours running time for this workunit. Your computer is much more powerful than a minimac so you should be lots faster. Weird... Can you send us your input and output files (the ones ending with .inp, x_0, x_1, x_2 and x_3 in the projects/docking.utep.edu directory? An email to dockingadmin@utep.edu would suffice.

Thanks
Andre

MacOS X 10.4.9, BOINC 5.8.17:

Before: One workunit ~1.6hrs
Last week: One workunit ~11hrs
Today: One workunit ~12.4hrs (estimated, as I'm at 41% and 04:57) (Charmm 5.07)

Slower than last week. Yikes!


____________
D@H the greatest project in the world... a while from now!
Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 3146 - Posted 27 Apr 2007 15:10:22 UTC

>> Despite all your hard work Docking team, the new application only improved a bit for Linux but heaps for Windows, so old Linux is getting slugged again.

Windows Intel 2.53@2.66 was ave 12,400 sec (3h 26m) now 8,892 sec (2h 47m).
Windows AMD X2 4800+ was ave 8,580 sec (2h 23m) now 5,148 sec (1h 26m).
Linux AMD Opteron 275 was ave 10,250 sec (2h 51m) now 17,568 sec (4h 53m).
Linux AMD Opteron 285 was ave 8,680 sec (2h 25m) now 17,136 sec (4h 46m).

This is after running a few of the new work units through.

At least the Linux machines are down from the 7 hours plus they were doing last week when things went haywire.
But something is still wrong as my Linux machines are more than 2 hours slower per work unit than before the slow processing problem started.
____________

imaum
Volunteer tester

Joined: Nov 14 06
Posts: 8
ID: 238
Credit: 244,384
RAC: 0
Message 3148 - Posted 27 Apr 2007 15:53:42 UTC

mac os x (10.4.9) and macbook (2GHz Core Duo)
Charmm 5.02, 2h50m/wu
Charmm 5.05, 5h30m/wu
Charmm 5.07, 5h30m/wu

hmm...

Profile John B. Kalla
Volunteer tester
Avatar

Joined: Oct 18 06
Posts: 54
ID: 188
Credit: 104,643
RAC: 0
Message 3149 - Posted 27 Apr 2007 16:05:25 UTC
Last modified: 27 Apr 2007 16:51:37 UTC

Wow. Maybe it's something to do with my setup, then. I'm pretty sure my prefs are set correctly, now that I enabled more than 2 procs (last month)...

Andre, I emailed you a copy of the files (currently I'm at 11.5hrs @ 95%).
____________
John

MacPro
2 x 2.66GHz Dual-Core Xeon | 2GB RAM | ATI x1900 | BOINC 5.9.5

Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 3151 - Posted 27 Apr 2007 17:00:59 UTC

First Linux 5.07 wu finished...

5.07 - 9,563.07 sec
5.05 - between approx. 16,100.00 and 13,400.00
5.04 - approx. 13,300.00
5.02 - same as 5.04

Athlon 2600+

Looks fine to me... ;-)
____________

fubared
Volunteer tester

Joined: Nov 14 06
Posts: 11
ID: 293
Credit: 57,379
RAC: 0
Message 3158 - Posted 28 Apr 2007 2:09:25 UTC

linux 5.07 is about 5700s, 2000s faster than 5.05/5.06. Thats good since I had about 800 worth of pending credits that was granted a big fat 0. Yes, I got the mail but doesn't help if the units were already crunched.

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3159 - Posted 28 Apr 2007 2:46:31 UTC
Last modified: 28 Apr 2007 2:58:10 UTC

Computer Current Config
Linux Distro: Ubuntu 6.10
Linux Kernel Version: Linux 2.6.17-11-generic
Linux File System: ext3 rw,errors=remount-ro (8% used)
Linux Swap Space: 1 GB on same drive (one drive system)
Linux GUI: Gnome
CPU: Socket A Sempron 2500+

AuthenticAMD
AMD Sempron(tm) 2500+ [Family 6 Model 8 Stepping 1][fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow up ts]

Memory: 1 GB PC-3200
Video: on-board S-3 Unichrome (I don't do gaming)
Disk Interface: IDE - old 40 GB drive
NOTE: This computer previously ran Fedora Core 3 with default gui. Also, it doesn't do anything that uses much memory or CPU other than BOINC. The screensaver is turned off. It's set to power down the display after about 10 minutes of no keyboard or mouse activity.

BOINC
Current Version: 5.8.17
Current General Preferences:
...Leave applications in memory while suspended: YES
...Switch between applications every: 60 Minutes
...On multiprocessors, use at most: 16 processors
...Use at most: 100 percent of CPU time
...Use at most: 30 GB disk space
...Leave at least: 5 GB disk space free
...Use at most: 50% of total disk space
...Write to disk at most every: 60 seconds
...Use at most: 75% of page file (swap space)
...Use at most: 40% of memory when computer is in use
...Use at most: 40% of memory when computer is idle

Stats for WU on different versions
Boinc = 5.8.17, D@H = 5.07, Approx seconds per WU = 11,250
Boinc = 5.8.17, D@H = 5.05, Approx seconds per WU = 16,250
Boinc = 5.8.17, D@H = 5.04, Approx seconds per WU = 14,900
Boinc = 5.4.11, D@H = 5.02, Approx seconds per WU = 14,800
Boinc = 5.4.9 , D@H = 5.02, Approx seconds per WU = 14,900
Boinc = 5.4.9 , D@H = 5.02, Approx seconds per WU = 7,500 - Mid October 2006 and earlier

I think that jump from about 7,500 to 14,900 in October 2006 was when the project decided to double the work done per WU.

Hope This Helps,

-- David Ball

EDIT: There's almost no disk activity except BOINC. I see the disk activity light blink every 2 - 3 seconds.
____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?
Profile Frank Boerner
Volunteer tester

Joined: Sep 13 06
Posts: 18
ID: 101
Credit: 744,548
RAC: 0
Message 3162 - Posted 28 Apr 2007 6:50:56 UTC

Hi,

on the Intel Macs the crunching time is about 11.15 hours. The same time as 5.06.

http://docking.utep.edu/result.php?resultid=163561

BOINC 5.8.17 http://docking.utep.edu/show_host_detail.php?hostid=1334.

On this old PPC Mac http://docking.utep.edu/show_host_detail.php?hostid=1868 the time is about 6.98 hours.

ON this PPC Mac http://docking.utep.edu/show_host_detail.php?hostid=1875 i thing i would be around 8-9 hours.

On one of my windows PC http://docking.utep.edu/show_host_detail.php?hostid=1660 the time http://docking.utep.edu/result.php?resultid=163260 is 1.97 hours.

Profile Krunchin-Keith [USA]
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 41
ID: 4
Credit: 1,539,093
RAC: 0
Message 3164 - Posted 28 Apr 2007 13:17:31 UTC - in response to Message ID 3127 .

OK, I'll kick this off.

Heres the good and the bad...

Listed as
Host and O/S
Average time of last dozen or so Charmm 5.05 results
Estimated completion of first running Charmm 5.07 all about 50-60% complete

3.80GHz P4-HT WindowsXP - 4:20 hrs vs 2:56 hrs **Faster
3.20GHz P4-HT WindowsXP - 4:45 hrs vs 3:29 hrs **Faster
3.06GHz P4-HT WindowsXP - 5:16 hrs vs 4:09 hrs **Faster

733MHz P3 Linux - 12:51hrs vs 16:25 hrs **Slower, Much Slower.

3.20GHz P4-HT Linux - no data vs 4:20 hrs
This host is identical hardware wise to the 3.20GHz WindowsXP above - no other software running currently except BOINC
5.07 WindowsXP run time 3:29 hours
5.07 Linux run time 4:20 hours

I'll report again tomorrow after some actual tasks complete.


Back to my original report:

Hosts

ID 5
Completed 7 results for Charmm 5.07 Windows
at of 2:44:48 to 2:47:52 (184s difference min-max)
which is faster than 5.05 was at 4:20:00
**GOOD**

---

ID 1829 - Twin1
GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz
[x86 Family 15 Model 4 Stepping 1]
[fpu tsc pae nx sse sse2 mmx]
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

ID 1932 - Twin2
GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz
[Family 15 Model 4 Stepping 1]
[fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor ds_cpl cid xtpr]
Linux 2.6.17-10-generic (Ubuntu 6)

Twin2 could be either windows(ID:1828) or linux(ID:1932), dual boot.

Both use the same BOINC venue settings.

You can look at results for host 1828 (Twin2 on windows) vs host 1829 (Twin1 on windows) and see that for Charmm 5.05 the run times all were within the same run time range, average 17000s.

Twin1
Completed 6 Charmm 5.07 windows tasks
at 11036s to 11376s (340s difference min-max)
which beats previous times of ~17000s
**GOOD**
I compared these times against one other host I found with same processor on docking and they were about the same.

Twin2
Completed 5 Charmm 5.07 linux tasks
at 16892 to 17323 (431s difference min-max)
I do not have and 5.05 times for this host, sorry I just installed linux on it a few days ago.
**NEUTRAL**

BUT .
The windows 5.07 app at on the same speed processor and identical hardware is outperforming the linux 5.07 app.
approx 11206s vs 17107s difference of 5901s or 98min or 1.63 hours

I could live with a small difference, say a few minutes.

**Note**
At this time there are no other applications installed or running on these hosts.
They were both set up with the O/S, BOINC installed and run, thats it.
No modifications to hardware (except memory upgrade, they both have same amount, brand and type.).

Bottom line:
The WindowsXP hosts can crunch more work per 24 hours for a bigger benefit to science.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3166 - Posted 28 Apr 2007 15:22:37 UTC - in response to Message ID 3164 .

Thanks for all the useful posts people. From all of these results, it looks like we have to go back to some sort of variable credit scheme: there is just too much variability between results to continue with the fixed credit. In the lab we have begun doing experiments with bigger molucules and we've observed that the variability becomes even worse. Anybody that has a view on what the state of the boinc client benchmarks is right now? Are linux/windows/macs in the same league now? If they are we could probably go back to variable credits immediately; if they are not, that might not be a great solution too.

Regarding the windows execution times: the reduction in time is just as big a surprise for us as it is for you: the executable has been compiled with exactly the same compiler flags, which means that the new input file is just so much more optimized. The question is why is the same not happening for Macs and Linux boxes? That is exactly what we are researching right now and we hope to have an answer soon.

Thanks
Andre

OK, I'll kick this off.

Heres the good and the bad...

Listed as
Host and O/S
Average time of last dozen or so Charmm 5.05 results
Estimated completion of first running Charmm 5.07 all about 50-60% complete

3.80GHz P4-HT WindowsXP - 4:20 hrs vs 2:56 hrs **Faster
3.20GHz P4-HT WindowsXP - 4:45 hrs vs 3:29 hrs **Faster
3.06GHz P4-HT WindowsXP - 5:16 hrs vs 4:09 hrs **Faster

733MHz P3 Linux - 12:51hrs vs 16:25 hrs **Slower, Much Slower.

3.20GHz P4-HT Linux - no data vs 4:20 hrs
This host is identical hardware wise to the 3.20GHz WindowsXP above - no other software running currently except BOINC
5.07 WindowsXP run time 3:29 hours
5.07 Linux run time 4:20 hours

I'll report again tomorrow after some actual tasks complete.


Back to my original report:

Hosts

ID 5
Completed 7 results for Charmm 5.07 Windows
at of 2:44:48 to 2:47:52 (184s difference min-max)
which is faster than 5.05 was at 4:20:00
**GOOD**

---

ID 1829 - Twin1
GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz
[x86 Family 15 Model 4 Stepping 1]
[fpu tsc pae nx sse sse2 mmx]
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

ID 1932 - Twin2
GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz
[Family 15 Model 4 Stepping 1]
[fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor ds_cpl cid xtpr]
Linux 2.6.17-10-generic (Ubuntu 6)

Twin2 could be either windows(ID:1828) or linux(ID:1932), dual boot.

Both use the same BOINC venue settings.

You can look at results for host 1828 (Twin2 on windows) vs host 1829 (Twin1 on windows) and see that for Charmm 5.05 the run times all were within the same run time range, average 17000s.

Twin1
Completed 6 Charmm 5.07 windows tasks
at 11036s to 11376s (340s difference min-max)
which beats previous times of ~17000s
**GOOD**
I compared these times against one other host I found with same processor on docking and they were about the same.

Twin2
Completed 5 Charmm 5.07 linux tasks
at 16892 to 17323 (431s difference min-max)
I do not have and 5.05 times for this host, sorry I just installed linux on it a few days ago.
**NEUTRAL**

BUT .
The windows 5.07 app at on the same speed processor and identical hardware is outperforming the linux 5.07 app.
approx 11206s vs 17107s difference of 5901s or 98min or 1.63 hours

I could live with a small difference, say a few minutes.

**Note**
At this time there are no other applications installed or running on these hosts.
They were both set up with the O/S, BOINC installed and run, thats it.
No modifications to hardware (except memory upgrade, they both have same amount, brand and type.).

Bottom line:
The WindowsXP hosts can crunch more work per 24 hours for a bigger benefit to science.


____________
D@H the greatest project in the world... a while from now!
Profile Krunchin-Keith [USA]
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 41
ID: 4
Credit: 1,539,093
RAC: 0
Message 3168 - Posted 28 Apr 2007 16:35:04 UTC - in response to Message ID 3166 .

Anybody that has a view on what the state of the boinc client benchmarks is right now? Are linux/windows/macs in the same league now?


I just ran a set

Twin1 Windows cc5.8.16
1398 floating point MIPS (Whetstone) per CPU
1657 integer MIPS (Drystone) per CPU

Twin2 Linux cc5.8.17
1427 floating point MIPS (Whetstone) per CPU
1794 integer MIPS (Drystone) per CPU

I would say these are fairly close


Aaron Finney
Volunteer tester

Joined: Mar 23 07
Posts: 74
ID: 367
Credit: 2,409,831
RAC: 0
Message 3172 - Posted 28 Apr 2007 17:48:47 UTC - in response to Message ID 3166 .
Last modified: 28 Apr 2007 17:50:31 UTC

Anybody that has a view on what the state of the boinc client benchmarks is right now? Are linux/windows/macs in the same league now? If they are we could probably go back to variable credits immediately; if they are not, that might not be a great solution too.


According to the alpha test mailing list, there are some funky things going on with benchmarking in the entire 5.9 series of BOINC. It hasn't quite settled yet. The official releases seem to be unaffected, and have no benchmarking issues that I know of..
Profile John B. Kalla
Volunteer tester
Avatar

Joined: Oct 18 06
Posts: 54
ID: 188
Credit: 104,643
RAC: 0
Message 3178 - Posted 29 Apr 2007 17:57:56 UTC
Last modified: 29 Apr 2007 18:01:29 UTC

MacPro 2.66GHz Xeon(four cores), MacOS X 10.4.9, BOINC 5.8.17:

2162 floating point MIPS/CPU
6673 integer MIPS/CPU

Came up with the same result whether I enabled one proc or four. Times to complete charmm 5.07 units:

1 proc enabled: 3.5hrs
3 procs enabled: 5.3hrs
4 procs enabled: 11-12hrs

____________
John

MacPro
2 x 2.66GHz Dual-Core Xeon | 2GB RAM | ATI x1900 | BOINC 5.9.5

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3179 - Posted 29 Apr 2007 22:02:20 UTC - in response to Message ID 3178 .

John,

I assume, that when you say 1 proc enabled, there is one charmm process running on your machine (you can check using the top command in a terminal), when you enable 3 processors, you see 3 charmm processes running using 99% of 3 of your cpu's and when you enable all of them, you see 4 charmm processes running. Correct?

If this is the case, I don't understand why your execution times are going up so much; if I do the same experiment on a smp linux box, both processes finish in about the same time as one would do when running alone on the machine.

Thanks
Andre

MacPro 2.66GHz Xeon(four cores), MacOS X 10.4.9, BOINC 5.8.17:

2162 floating point MIPS/CPU
6673 integer MIPS/CPU

Came up with the same result whether I enabled one proc or four. Times to complete charmm 5.07 units:

1 proc enabled: 3.5hrs
3 procs enabled: 5.3hrs
4 procs enabled: 11-12hrs


____________
D@H the greatest project in the world... a while from now!
zombie67 [MM]
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 207
ID: 114
Credit: 2,817,648
RAC: 0
Message 3180 - Posted 29 Apr 2007 23:33:47 UTC

Yeah. I don't understand those numbers either. They should all be roughly the same.
____________
Dublin, CA
Team SETI.USA

Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3181 - Posted 30 Apr 2007 0:27:37 UTC

One thing I have noticed on my linux box (single core) is that other projects like Rosetta spend about 0.3 % in system time and Docking spends about 18 % to 25 % in system time.

Is it the Fortran code that reads the input file over and over or does the wrapper feed it to the program? Is that file being closed and opened each time?

I'd love to see the wrapper code. There's something in the app that is making a *nix OS call that really ties up the system. Of course, it could also be in the core client. I guess I'm going to have to learn subversion since BOINC switched to it from CVS.

In the wrapper function that checks to see if it's time to checkpoint, could you put some code like the following at the top of the function and see what it does to the % of time spent in system.

static unsigned int CKCounter = 0;
if( (++CKCounter % 25) != 0 ) return; // do not checkpoint

Maybe that routine is being called a lot and some OS call in it causes the problem.

Speaking of the BOINC core client. I'm a couple of hundred messages behind at the moment, but I noticed that the idea of putting calls to setrlimit() has re-surfaced. If they reduce the stack limit, they may not have the authority to increase it again, so that could be a problem for docking.

Happy Crunching,

-- David

____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 3182 - Posted 30 Apr 2007 1:29:45 UTC

> I currently run Docking on 4 computers, 2 Windows ans 2 Linux.
The Windows machines are flying along, I have increased my % share of resouces so they will do more Docking (Intel P4 2.53@2.66 takes 8600 sec and AMD X2 4800+ takes 5200 sec).

My 2 Linux machines are both dual processor dual core AMD Opterons (a 275 and 285). These machines now take up to 18,000 sec (285) and 19,500 sec (275), which seems to be increasing, returning just 9 to 10 cobblestones per hour.

It does not matter if one Docking WU is running or 4 WU's all take the same amount of time.

At this stage I will complete my current WU's on Linux and stop any further work, just leaving the Windows machines going as Windows preforms very well on this project just not Linux/Mac, which was not the original desire of the Project Team for it was Windows that originally did not run well and Linux zoomed along. It has all reversed.

I have not left I am still here, waiting.
____________

Profile John B. Kalla
Volunteer tester
Avatar

Joined: Oct 18 06
Posts: 54
ID: 188
Credit: 104,643
RAC: 0
Message 3194 - Posted 30 Apr 2007 15:04:32 UTC - in response to Message ID 3179 .

John,

I assume, that when you say 1 proc enabled, there is one charmm process running on your machine (you can check using the top command in a terminal), when you enable 3 processors, you see 3 charmm processes running using 99% of 3 of your cpu's and when you enable all of them, you see 4 charmm processes running. Correct?

If this is the case, I don't understand why your execution times are going up so much; if I do the same experiment on a smp linux box, both processes finish in about the same time as one would do when running alone on the machine.

Thanks
Andre

MacPro 2.66GHz Xeon(four cores), MacOS X 10.4.9, BOINC 5.8.17:

2162 floating point MIPS/CPU
6673 integer MIPS/CPU

Came up with the same result whether I enabled one proc or four. Times to complete charmm 5.07 units:

1 proc enabled: 3.5hrs
3 procs enabled: 5.3hrs
4 procs enabled: 11-12hrs



Yes, that's correct. All charmm processes finish at about the same time. It's just that the time increases with the more processors I enable. Very weird...

____________
John

MacPro
2 x 2.66GHz Dual-Core Xeon | 2GB RAM | ATI x1900 | BOINC 5.9.5
mikus
Volunteer tester

Joined: Oct 28 06
Posts: 18
ID: 193
Credit: 2,915,329
RAC: 0
Message 3203 - Posted 30 Apr 2007 23:45:32 UTC - in response to Message ID 3181 .

One thing I have noticed on my linux box (single core) is that other projects like Rosetta spend about 0.3 % in system time and Docking spends about 18 % to 25 % in system time.

Linux (64-bit Ubuntu 7.04, 32-bit boinc 5.9.4, 32-bit Charmm 5.07).

Noticed that the application starts four processes. Using strace on the second (next-to-lowest numbered) of these processes, it is __continuously__ issuing 'gettimeofday()' calls. Seems to me it is *this* activity that is the cause of the huge demand for Linux system time by recent Docking application versions.
.

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3205 - Posted 1 May 2007 2:24:23 UTC - in response to Message ID 3203 .
Last modified: 1 May 2007 2:24:57 UTC

Thanks Mikus,

I agree with you. We have found this as well (using the same strace method) and are trying to find out which part of charmm (or maybe compiler flag) is causing all of these gettimeofday calls. Have come up with nothing yet, but good chance that this is causing the increased running times on linux and possibly mac.

Andre

One thing I have noticed on my linux box (single core) is that other projects like Rosetta spend about 0.3 % in system time and Docking spends about 18 % to 25 % in system time.

Linux (64-bit Ubuntu 7.04, 32-bit boinc 5.9.4, 32-bit Charmm 5.07).

Noticed that the application starts four processes. Using strace on the second (next-to-lowest numbered) of these processes, it is __continuously__ issuing 'gettimeofday()' calls. Seems to me it is *this* activity that is the cause of the huge demand for Linux system time by recent Docking application versions.
.



____________
D@H the greatest project in the world... a while from now!
Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 3207 - Posted 1 May 2007 5:19:43 UTC

Are these 'gettimeofday()' calls used for the following...

Use network only between the hours of
Enforced by versions 4.46 and greater


or are those calls monitoring the actual time of the unit?
Just a question..

;-)
____________
j2satx
Volunteer tester

Joined: Dec 22 06
Posts: 183
ID: 339
Credit: 16,191,581
RAC: 0
Message 3211 - Posted 1 May 2007 12:00:27 UTC - in response to Message ID 3203 .

One thing I have noticed on my linux box (single core) is that other projects like Rosetta spend about 0.3 % in system time and Docking spends about 18 % to 25 % in system time.

Linux (64-bit Ubuntu 7.04, 32-bit boinc 5.9.4, 32-bit Charmm 5.07).

Noticed that the application starts four processes. Using strace on the second (next-to-lowest numbered) of these processes, it is __continuously__ issuing 'gettimeofday()' calls. Seems to me it is *this* activity that is the cause of the huge demand for Linux system time by recent Docking application versions.
.



@Mikus, Does your 32-bit BOINC Manager work with your 64-bit Ubuntu?
mikus
Volunteer tester

Joined: Oct 28 06
Posts: 18
ID: 193
Credit: 2,915,329
RAC: 0
Message 3212 - Posted 1 May 2007 12:55:02 UTC - in response to Message ID 3211 .
Last modified: 1 May 2007 12:59:25 UTC

Mikus, Does your 32-bit BOINC Manager work with your 64-bit Ubuntu?

Yes.

I'm using only the Berkeley BOINC packages (which include boinc and boincmgr). So far, they have been available for Linux only in 32-bit. [On 64-bit Ubuntu, some versions of boincmgr (but 5.9.4 works) would "hang" (100% CPU) if the terminal from which I started boincmgr had 'ulimit -s unlimited'. On 64-bit SuSE, the ulimit setting made no difference to boincmgr - it always worked.]

What is more interesting to me is that even with the 32-bit boinc client, I am able to download 64-bit applications (if available) and use app_info.xml to run them. For example, ABC@home workunits run much faster under the 64-bit application than under the 32-bit application. [Docking aims to use "homogeneous redundancy", so does not support the use of app_info.xml.]
.
Profile suguruhirahara
Forum moderator
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 282
ID: 15
Credit: 56,614
RAC: 0
Message 3261 - Posted 7 May 2007 14:38:25 UTC
Last modified: 7 May 2007 14:39:36 UTC

http://docking.utep.edu/result.php?resultid=181260

Here's the log.

<core_client_version>5.8.8</core_client_version>
<![CDATA[
<message>
ファイルの終わりです。 (0x26) - exit code 38 (0x26)
</message>
<stderr_txt>
Starting charmm run...
Starting charmm run...
forrtl: 要求された操作はユーザー マップ セクションで開いたファイルでは実行できません。

</stderr_txt>
]]>

I'm not quite sure, but surely it's related with 'forrtl'. I search the forum for the same exit code, and I found this: http://docking.utep.edu/forum_thread.php?id=77&nowrap=true#1232
It is what I've reported as a issue with charmm 5.03, very old one.

Also I noticed finally this thread should be made sticky.

suguruhirahara
____________

I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.
Rene
Volunteer tester
Avatar

Joined: Oct 2 06
Posts: 121
ID: 160
Credit: 109,415
RAC: 0
Message 3265 - Posted 9 May 2007 5:28:44 UTC

No issues up till now for me... only thing I had to alter was my Win Avast scanner.
A while ago I did add a "skip scanning" rule to prevent Avast from constantly scanning the 1tng.crt file.
Yesterday I saw a constant scanning of the 1tng.bin file... both cores were running Docking... "bin"-file has been added now to those rules.

;-)
____________

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3281 - Posted 12 May 2007 21:42:38 UTC - in response to Message ID 3261 .

Thanks Suguru. I'm going with the post you refer as I can read the english in the message and it seems to me that Charmm is trying to open a file that is already open by another process. Often these things occur due to a virus scanner or something scanning through files. Could you track down if this might have been happening? I've seen this error quite a bit and really suspect the virus scanners...

Thanks
AK

http://docking.utep.edu/result.php?resultid=181260

Here's the log.
<core_client_version>5.8.8</core_client_version>
<![CDATA[
<message>
ファイルの終わりです。 (0x26) - exit code 38 (0x26)
</message>
<stderr_txt>
Starting charmm run...
Starting charmm run...
forrtl: 要求された操作はユーザー マップ セクションで開いたファイルでは実行できません。

</stderr_txt>
]]>

I'm not quite sure, but surely it's related with 'forrtl'. I search the forum for the same exit code, and I found this: http://docking.utep.edu/forum_thread.php?id=77&nowrap=true#1232
It is what I've reported as a issue with charmm 5.03, very old one.

Also I noticed finally this thread should be made sticky.

suguruhirahara


____________
D@H the greatest project in the world... a while from now!
Profile Conan
Volunteer tester
Avatar

Joined: Sep 13 06
Posts: 219
ID: 100
Credit: 4,256,493
RAC: 0
Message 3284 - Posted 13 May 2007 0:14:57 UTC - in response to Message ID 3182 .

> I currently run Docking on 4 computers, 2 Windows ans 2 Linux.
The Windows machines are flying along, I have increased my % share of resouces so they will do more Docking (Intel P4 2.53@2.66 takes 8600 sec and AMD X2 4800+ takes 5200 sec).

My 2 Linux machines are both dual processor dual core AMD Opterons (a 275 and 285). These machines now take up to 18,000 sec (285) and 19,500 sec (275), which seems to be increasing, returning just 9 to 10 cobblestones per hour.

It does not matter if one Docking WU is running or 4 WU's all take the same amount of time.

At this stage I will complete my current WU's on Linux and stop any further work, just leaving the Windows machines going as Windows preforms very well on this project just not Linux/Mac, which was not the original desire of the Project Team for it was Windows that originally did not run well and Linux zoomed along. It has all reversed.

I have not left I am still here, waiting.



> I have only been using my Windows machines for the last 1 to 2 weeks due to the slow processing times on Linux.
I thought this may of changed so I downloaded 5 Wus to my AMD Opteron 275 Linux Fedora Core 6 machine.
Unfortunately things have gotten worse. Where it was taking an average of about 10,500 seconds (sowewhere around 2h 50m) before 5.07 and then went to 19,500 seconds on 5.07, with a few hitting 23,000 seconds.
Well now it has increased to 28,000 seconds (7h 30m) and the corrosponding granted cobblestones have dropped from around 17/h to 9-10/h now they are 6-6.5/h.

What I can't understand is that some other users using Linux, paired against me still have similar run times as before mostly less than 3 hours.
What gives?
I have changed nothing on my machine (my other Opteron has the same problem), and other projects work as normal.
____________
Profile David Ball
Forum moderator
Volunteer tester
Avatar

Joined: Sep 18 06
Posts: 274
ID: 115
Credit: 1,634,401
RAC: 0
Message 3298 - Posted 13 May 2007 19:13:18 UTC


@Conan

If you look at "top", where it shows the percent of cpu time spent in different states, Other projects spend almost all of their time in either "user" or "nice" CPU state, but Charmm is spending a LOT of time in the "system" state (running inside the OS). For instance, I have noticed on my linux box (single core socket A sempron 2500+) that Rosetta spends about 0.3 % in "system" state and Docking spends about 18 % to 25 % in "system" state. Apparently the more cores you have, the worse off you are. The Core 2 Quads are really getting huge run times.

Also, it's doing a lot of disk access, reading the same file over and over. Of course, after the first time, it's just reading it from the OS disk cache, but I thought it might be spending time updating the file access time on the file it was reading so I ran a test. I set "noatime" on a partition, rebooted, and ran a docking WU. It still spent all that time in the system state so I went back to using the defaults on that partition. Of course, "noatime" wouldn't prevent it from updating the time on the output file, which it is probably writing to continuously.

I still think there's something going on with the disk. On my systems ( 2 WinXP, 1 Vista, 1 ubuntu Linux ), the systems with older IDE drives slowed down and the systems with new IDE drives or SATA drives got up to 1/3 faster. Of course, the systems with new IDE or SATA drives also had faster cpu's. My PD 925 system experienced a 1/3 speedup. IIRC, it does a docking WU (per core) in just over 2 hours with both cores running docking.

On one of my XP machines (Single core socket 754 Sempron 3100+), I've noticed a rythmic sound from the disk about once per second where it seems to do a couple of really long seeks.

@Andre and Memo

Is there a thread in the BOINC core client or the Docking application which checks the time (or maybe checks a pipe or mutex for an event) and then yields the CPU if there's nothing to do (sort of a polling loop). Their might be an OS function to yield the CPU, but on many systems you can also yield the CPU by doing a delay for zero milliseconds. Normally, that would use very little "system" state time. If the application IS disk I/O bound, then that routine would get the CPU a lot and "system" state time would replace "wait" state time in "top".

Hmm, if it's doing the "disk I/O" a LOT and what it's reading is coming from the disk cache, wouldn't that show up as "system" state time instead of "I/O wait" state time since there's no actual wait, only a memory copy in the kernel or driver. Does Fortran buffer that input file like "c/c++" does, or is there a chance it's actually issuing a million OS calls to read a 1 MB input file? Does Fortran think that's a block device or a character device?

Sorry. The programmer in me can't help but try debugging things. I guess I spent too many years writing OS code and runtime libraries :-)

-- David

____________
The views expressed are my own.
Facts are subject to memory error :-)
Have you read a good science fiction novel lately?

Odysseus

Joined: May 6 07
Posts: 18
ID: 376
Credit: 3,914
RAC: 0
Message 3328 - Posted 16 May 2007 0:45:06 UTC
Last modified: 16 May 2007 0:48:43 UTC

I just attached my G4/733 running BOINC v5.8.17 under Mac OS 10.3.9, and the first three results it returned errored out immediately with “exit status 5 (0x5)”. The output files say:

<message>
process got signal 5
</message>
<stderr_txt>
dyld: charmm_5.7_powerpc-apple-darwin Undefined symbols:
charmm_5.7_powerpc-apple-darwin undefined reference to _statvfs expected to be defined in /usr/lib/libSystem.B.dylib

P.S. I notice this host is being grouped with G5s running Mac OS 10.4.x—obviously these particular results never got that far, but could this degree of ‘heterogeneity’ affect validation?
Odysseus

Joined: May 6 07
Posts: 18
ID: 376
Credit: 3,914
RAC: 0
Message 3338 - Posted 16 May 2007 16:02:31 UTC

Here are the messages logged by BOINC Manager for one of those tasks:

Tue May 15 17:23:12 2007|Docking@Home|Starting 1tng_mod0011_7508_204189_1
Tue May 15 17:23:12 2007|Docking@Home|Starting task 1tng_mod0011_7508_204189_1 using charmm version 507
Tue May 15 17:23:13 2007|Docking@Home|Deferring communication for 1 min 0 sec
Tue May 15 17:23:13 2007|Docking@Home|Reason: Unrecoverable error for result 1tng_mod0011_7508_204189_1 (process got signal 5)
Tue May 15 17:23:13 2007|Docking@Home|Computation for task 1tng_mod0011_7508_204189_1 finished
Tue May 15 17:23:13 2007|Docking@Home|Output file 1tng_mod0011_7508_204189_1_0 for task 1tng_mod0011_7508_204189_1 absent
Tue May 15 17:23:13 2007|Docking@Home|Output file 1tng_mod0011_7508_204189_1_1 for task 1tng_mod0011_7508_204189_1 absent
Tue May 15 17:23:13 2007|Docking@Home|Output file 1tng_mod0011_7508_204189_1_2 for task 1tng_mod0011_7508_204189_1 absent
Tue May 15 17:23:13 2007|Docking@Home|Output file 1tng_mod0011_7508_204189_1_3 for task 1tng_mod0011_7508_204189_1 absent

Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3343 - Posted 16 May 2007 19:54:48 UTC - in response to Message ID 3328 .

Hmmm, we compile on 10.4 (don't have 10.3) I wonder if there are functions in the 10.4 system library that are missing in 10.3. I think you are the first Panther user and so we didn't see this before. We'll check.

Andre

I just attached my G4/733 running BOINC v5.8.17 under Mac OS 10.3.9, and the first three results it returned errored out immediately with “exit status 5 (0x5)”. The output files say:
<message>
process got signal 5
</message>
<stderr_txt>
dyld: charmm_5.7_powerpc-apple-darwin Undefined symbols:
charmm_5.7_powerpc-apple-darwin undefined reference to _statvfs expected to be defined in /usr/lib/libSystem.B.dylib

P.S. I notice this host is being grouped with G5s running Mac OS 10.4.x—obviously these particular results never got that far, but could this degree of ‘heterogeneity’ affect validation?


____________
D@H the greatest project in the world... a while from now!
Odysseus

Joined: May 6 07
Posts: 18
ID: 376
Credit: 3,914
RAC: 0
Message 3348 - Posted 17 May 2007 3:29:26 UTC - in response to Message ID 3343 .
Last modified: 17 May 2007 3:30:32 UTC

Hmmm, we compile on 10.4 (don't have 10.3) I wonder if there are functions in the 10.4 system library that are missing in 10.3. I think you are the first Panther user and so we didn't see this before. We'll check.

Thanks. I was preoccupied today and forgot to stop my host from asking for more work, so I’m afraid it’s coughed up a few more failed results, and will probably add a couple more overnight. Sorry about that: I’ll try and remember to set it to NNT for D@h tomorrow morning.
Odysseus

Joined: May 6 07
Posts: 18
ID: 376
Credit: 3,914
RAC: 0
Message 3351 - Posted 19 May 2007 8:57:16 UTC - in response to Message ID 3298 .
Last modified: 19 May 2007 8:57:55 UTC

If you look at "top", where it shows the percent of cpu time spent in different states, Other projects spend almost all of their time in either "user" or "nice" CPU state, but Charmm is spending a LOT of time in the "system" state (running inside the OS).

I’ve noticed this too on my G5 Mac (with Activity Monitor—essentially a GUI for top ): when running other projects there’s very little non-“nice” activity, but a fair bit of “system” activity appears (on both CPUs) when a Docking task is running. The only other project whose app I’ve noticed behaving this way is SzTAKI Desktop Grid.
Profile Andre Kerstens
Forum moderator
Project tester
Volunteer tester
Avatar

Joined: Sep 11 06
Posts: 749
ID: 1
Credit: 15,199
RAC: 0
Message 3353 - Posted 20 May 2007 0:41:47 UTC - in response to Message ID 3351 .

We're looking into this, but haven't found the issue causing the high system overhead yet. Thanks for the report.

AK

If you look at "top", where it shows the percent of cpu time spent in different states, Other projects spend almost all of their time in either "user" or "nice" CPU state, but Charmm is spending a LOT of time in the "system" state (running inside the OS).

I’ve noticed this too on my G5 Mac (with Activity Monitor—essentially a GUI for top ): when running other projects there’s very little non-“nice” activity, but a fair bit of “system” activity appears (on both CPUs) when a Docking task is running. The only other project whose app I’ve noticed behaving this way is SzTAKI Desktop Grid.


____________
D@H the greatest project in the world... a while from now!

Message boards : Number crunching : Issues with Charmm 5.07

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#44 (2) {
      ["db_conn"]=>
      resource(144) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(51) "update DBNAME.thread set views=views+1 where id=239"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#44 (2) {
      ["db_conn"]=>
      resource(144) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(3) "239"
        ["forum"]=>
        string(1) "2"
        ["owner"]=>
        string(1) "4"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(23) "Issues with Charmm 5.07"
        ["timestamp"]=>
        string(10) "1179621707"
        ["views"]=>
        string(4) "2260"
        ["replies"]=>
        string(2) "38"
        ["activity"]=>
        string(22) "6.263654538361599e-120"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1177645935"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(3) "239"
      ["forum"]=>
      string(1) "2"
      ["owner"]=>
      string(1) "4"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(23) "Issues with Charmm 5.07"
      ["timestamp"]=>
      string(10) "1179621707"
      ["views"]=>
      string(4) "2260"
      ["replies"]=>
      string(2) "38"
      ["activity"]=>
      string(22) "6.263654538361599e-120"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1177645935"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=239