Posts by Steven Meyer
1)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5360 )Posted 2899 days ago by Steven Meyer Are you running the BOINC screensaver on that system, Steven? If so, maybe it's the D@H screensaver interacting with the CUDA code. No screen saver at all because that would cut into the CUDA through-put since the CUDA code by itself will peg the GPU at 100% usage. In any case, I decided to just run S@H Op Apps, for CPU and GPU, on the Q6600. The other comp is running S@H Op Apps for CPU and D@H. Since it has no CUDA-capable GPU, it is running with no CUDA. ... and no troubles. |
2)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5338 )Posted 2912 days ago by Steven Meyer Garrrrg... the quoting is somewhat messed up. I have cleaned it up below (I hope) to show who said what in reply to which ... Topolm said: Steven said: I am a lunatic Will try turning off the GPU and turning on D@H. Steven said: Steven said: As it took some doing to get a version of the nVidia drivers installed that would work with the optimized CUDA app, I'm not willing to go down that road again yet. Will start with checking to see if SETI and DOCKING will co-exist without CUDA because that is just a setting in options to turn off the use of the GPU, so I will not have to download and install all sorts of drivers and CUDA apps to find an older pair that will work together. Unfortunately, since the GPU can process about 3 WU per hour while the CPU does less than 1 per hour with each 4 of the processors, disabling the GPU will be a huge reduction in throughput. So I would rather find another way. |
3)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5335 )Posted 2912 days ago by Steven Meyer Yes, I'm aware that you need it for seti on cuda but to rule out that the problem is the nvidia driver you should disable the nvidia driver, reboot and run docking with the windows provided driver. As seti have fixed downtimes you can do that in that timeframe. I know already that both the nVidia video driver and the D@H code are involved, because the problem began after I upgraded the nVidia video driver, and then it went away after I stopped running D@H. The S@H CUDA code may also be involved. It takes all of the components together to create the problem, because by removing just one (D@H) I have a stable system with the other two (nVidia and CUDA). Now that we know what the components are, can anything be done to make them all co-exist? Or do I just have to give up on running more than the one project? FYI, here is some of the info from the nVidia control panel. CPU Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Operating System Microsoft Windows XP Professional Service Pack 3, Build 2600 Motherboard Vendor NVIDIA Motherboard Version 2.0 Motherboard Model NVK84CRB DirectX 9.0c (5.3.2600.5512) Nforce Driver Package 6.03 Graphics Driver 190.38 (6.14.11.9038) Ethernet Driver 67.72 (1.00.02.06772) IDE Driver 9.99.0.8 nTune 6.03.12 GPU GeForce 8800 GT |
4)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5332 )Posted 2913 days ago by Steven Meyer Does it happen too if you set in the preferences: Leave applications in memory while suspended? I already have that option set. Should I unset it? Could you also try to run your system only with the windows provided drivers? The problem there is that the S@H CUDA code requires the nVidia video drivers. |
5)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5330 )Posted 2913 days ago by Steven Meyer Docking is the only other BOINC project you have tried... You have made the assumption that SETI is fine and Docking is the problem - if that is what you want to believe, well I doubt we will change your mind. I could have just abandoned D@H, uninstalled all of its code, and happily gone on my way, with zero random reboots. Instead, I am still writing to this thread. This should suggest to you that I am interested in getting to the bottom of this and that I am not interested in just attaching blame to the easiest target. |
6)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5329 )Posted 2913 days ago by Steven Meyer Reverting the video driver is easy, but then, cannot really tell me what happened or why. If this method is successful, it will simply tell me that the video driver is involved because the problem will go away. Also, the CUDA code from S@H will not work with the older video driver, so that part of the situation will be changed as well as the video driver. I will not be testing just 1 change, but 2. I need to work on something that will let us see the problem happen, and why. I was thinking more along the lines of some settings that will make BOINC put some debugging info into the log so that when the error occurs, I will know what happened and why. No one has yet answered my question, which I have asked 4 times, in 4 different ways: Posted 8 Aug 2009 16:14:16 UTC Posted 9 Aug 2009 17:03:13 UTC Posted 16 Aug 2009 15:04:36 UTC Posted 17 Aug 2009 8:31:08 UTC If there really is no way to debug BOINC, then please just say so instead of ignoring the questions. By the way, as of this posting D@H is still not running and still there have been zero random reboots after almost 10 days. Zero response about debugging and zero reboots without D@H running does not make me want to turn D@H on again. |
7)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5323 )Posted 2914 days ago by Steven Meyer Docking is the only other BOINC project you have tried. My suggestion was to try others and see if they run okay or not. That could point you in the direction of the problem - all you know now is that SETI and Docking won't cohabit on your system. I am not trying to point fingers, so chill. :D I wonder if anything can be done to help locate the problem other than trying a bunch of other projects? |
8)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5317 )Posted 2915 days ago by Steven Meyer It has been a week since my last post on this subject, and during that week D@H has not been running while S@H has been. Also, I have not had any spontaneous reboots during that week. By not doing so, you make the assumption that Docking is at fault and that SETI is fine. The thousands of other Docking users that do not have your problem must just be lucky? Rather then increasing the possibilities, the course of action I suggest narrows your problem. Turning Docking off achieves nothing. What I have achieved is that now we know that it is very likely that D@H is involved. If S@H optimized code is the cause then at very least D@H is the catalyst. One or the other of these two projects may be stepping on the other in such a way that causes these reboots. It could also be that S@H is not involved at all. Something in my system is stepping on D@H, or vice-versa.
|
9)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5304 )Posted 2922 days ago by Steven Meyer Try another co-project, POEM or Rosetta for example. Try to narrow the problem down, ie. if the problem is occuring with other co-running projects as well. A lot of the "optimised" stuff around is not 100% reliable across all hardware systems. I may try that after a while. I want to reduce the possibilities rather than increase them. The optimized code seems to be working fine. There have been zero reboots since I suspended the D@H tasks. So I have two time periods during which there were no reboots, and in both of those times, there were also no D@H tasks running. However, this second time period has been only about 1 day so I'll be letting it go on for a few more days to make sure. What about the debugging? |
10)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5301 )Posted 2923 days ago by Steven Meyer To clarify, "Random Reboot" is not the infamous "Blue Screen Of Death", which at least reports the probable cause of the problem. When one of these reboots happens, the screen simply goes black, and then the POST appears. |
Next 10 posts