>> To the Docking project team,
I have noticed in the results when I check what other computers are processing a workunit with me, that a couple of testers are having a problem and getting errors.
Computer Id 207
AMD-K6 (tm) 3D processor
Linux 2.6.16-2-486
Joined 18/9/06
Total cobblestones = 0.00
Recent RAC = 0.00
client version 5.4.11
number of results = 94
All have a cpu time of 0.00 and all have the error 'process got signal 4'.
Computer Id 597
AMD 64 X2 4200+
Linux 2.6.17-10-server
Joined 7/11/06
Total cobblestones = 0.00
Recent RAC = 0.00
number of results = 6
2 WU's have been aborted and the other 4 error out with 'exited with code 1'.
Unsure of the problem with Computer 207, but I think that Computer 597 needs the 'ulimit' fix.
Is it possible to contact and get these people working or do you have to wait for them to contact you? The owner of Computer 207 does not seem to have done this since connecting to the project.
____________
Unsure of the problem with Computer 207, but I think that Computer 597 needs the 'ulimit' fix.
Is it possible to contact and get these people working or do you have to wait for them to contact you? The owner of Computer 207 does not seem to have done this since connecting to the project.
The onus to fix something should not be on the computer owner but on the project admins who should fix the client.
My own Linux box is not doing Docking@Home any more because of this.
____________
The onus to fix something should not be on the computer owner but on the project admins who should fix the client.
My own Linux box is not doing Docking@Home any more because of this.
You were supposed to be an alpha tester.
If you were not, you were of little use, so don't worry.
Have a good stable day.
Conan, don't be afraid, your idea is very good indeed. Anyway we are coming to the big problem: are these users alpha testers only by the name? Can't they see there's something wrong with their machines? :)
You were supposed to be an alpha tester.
If you were not, you were of little use, so don't worry.
Have a good stable day.
Alpha testers aren't supposed to fix problems, they're supposed to test and find problems. Hence the word "tester". Short term workarounds to problems are fine, but to suggest that the solution is to have the project admins contact users to get them to implement a workaround is, IMHO, the wrong solution to the problem.
____________
You were supposed to be an alpha tester.
If you were not, you were of little use, so don't worry.
Have a good stable day.
Alpha testers aren't supposed to fix problems, they're supposed to test and find problems. Hence the word "tester". Short term workarounds to problems are fine, but to suggest that the solution is to have the project admins contact users to get them to implement a workaround is, IMHO, the wrong solution to the problem.
You are probably right, I'm more an admin-inside than an alpha tester.
Anyway the best alpha tester is the one who never complains if a problem needs a workaround and it's not fixed in few minutes. Hence the word alpha.
> Sorry Marky-UK and daniele, I wasn't out to cause any problems.
I could see that these testers have not returned any successful results, and because I had a bit of bother when I started I thought I would offer a form of help by getting the admin people to contact them.
project Docking said they needed Linux testers and even sent out more access codes to get Linux people to join.
These are both Linux users but are not contributing with the computers I noted before.
Have also found this one
Computer ID 509
AMD K6 3D+ Processor
Joined 15/10/06
Total credit = 0.00
RAC = 0.00
Results = 22
All results are error "received signal 4" ( this is the same as the other K6 I mentioned above)
I am offering help but through the Project Staff, who have access to who owns what computers (I think).
If we can get people like this connected then it speeds up work flow with not having to keep resending out extra results all the time.
Just trying to be helpful.
Keep smiling, it makes people wonder what you have been up to.
____________
Signal 4 (SIGILL) is an illegal instruction. It would probably require an application compiled especially for that CPU, which is similar to a 486. Since the application is closed source, there's not really anything the user can do to fix it.
BTW, You'll find that my machine number 223 (a celeron 2.3 GHz with RHEL3) is running one workunit a day and always getting an error. I'm working on it. It's getting past the ulimit -s unlimited, but seems to be hitting some other resource limit. With 2 GB of RAM and 2 GB of swap, it's hard to track down what limit it could be hitting, but I'm working on it. I'm beginning to wonder if the WU is trying to do it's own ulimit, and the lcap privilege has been dropped by the time the docking application is started by boinc.
There are 3 error result, all of which resulted from 0x1 error, and 1 unsent result. Has this workunit been already cenceled or not? I don't yet expect the credits for the result my manager sent to be granted, but if there will be so many same cases I want for the team to give credits to each results. So far I have 3 pending results because of this issue, such as
this
and
this
:( When the result was sent, the other three results has been already encountered the error! It could be thought hundreds of workunits are encountering the issue...what a mess!
@team: could you change the value of the number of total error higher so that other copies of the workunit would be sent? Or would you re-issue these results and sent them again after the error will be solved / the way of avoiding the error will be well-known?
Thanks for reading,
suguruhirahara
____________
I'm a volunteer participant; my views are not necessarily those of Docking@Home or its participating institutions.
We have been experimenting with letting the app set a ulimit high enough, but haven't released anything regarding that one yet. The current app shouldn't be setting any ulimits.
Andre
I'm beginning to wonder if the WU is trying to do it's own ulimit, and the lcap privilege has been dropped by the time the docking application is started by boinc.
Regards,
-- David
____________
D@H the greatest project in the world... a while from now!
> Host 207 still has not had a successful result after 111 WU's. This host and host 509 are both AMD K6 processors and according to David Ball (thanks David), the 'signal 4' error they are getting is due to the processor and possibly is not compatiable with the project ??
> Have found some more hosts that have not gotten any successful workunits yet and are still on 0.00 cobblestones;
> Host 878 has 29 WU's, 28 with exit code 1 and 1 to process.
> Host 1006 has 7 WU's, 6 with exit code 1 and 1 to process.
> Host 771 has 19 WU's, all with exit code 1.
So all these latest ones only need the 'ulimt' fix (at least it appears so), and then they can begin contributing to this project.
> It would appear that these users have not scanned the boards much or do not understand the Linux stack problem, thinking that they just download a WU and away they go. So they may need a little extra help from the project team if they can spare the time. Just a short e-mail saying "Please add "ulimit -s unlimited" as the first line in your 'run_manager' file in your Boinc folder.
If they have dual cores then they should have Linux 2.6 or later and also add ulimit to the 'run_client' as well.
Doing this can add a few more Linux machines to the mix.
Thanks and happy crunching to the Docking team.
____________
I thought Conan's idea about contacting this group of testers was a good one when I first read it; and in the weeks since, I've only come to agree more. It is clear that there are a good many Linux users who are unaware of the ulimit fix, but are still crunching these WUs.
The fact that they're still attatched and crunching them shows that they are interested in this project. More than likely, they believe that they are helping the project by faithfully continuing to send in their errored results; when in fact they are actually causing a backlog of pending results, (possibly) contributing to the HR issue, and ABSOLUTELY wasting their CPU cycles needlessly. I am willing to bet that they would rather be getting credit for their time.
I don't run Linux myself (yet!), which is why I didn't comment on this thread earlier, but I do read all the threads here, even if the subject doesn't directly apply to me. It's still an issue for the project as a whole; and until all the problems are solved project-wide (OS-specific bugs, the P3 problem, the HR redundancy quandary, etc.) we cannot achieve our main goal of moving out of Alpha.
That said, it appears to me that the ulimit fix is relatively simple to implement on most platforms, and there has been a wide enough variety of some of the more stubborn or exotic breeds tested by now that there are workarounds posted for many of them already. Moreover, the support here has been fantastic - the devs and other Linux masterminds lurking in the forum have been quick to help fellow users troubleshoot their systems and patiently walk them through the steps to get them going - and they have been happy to do it.
I recommend doing one of two things:
1. Do as Conan suggested and notify all the Linux users here of the current status of the situation. Show them the basic ulimit fix, include links to the FAQ (and perhaps even to some of the applicable threads on the subject), let them know your best guess on when you might have a permanent fix (if you can), and tell them you'll send notification again when it's ready. This information will offer them a choice: Either they will try the workaround and begin producing valid results on their machines, or they will decide it's not worth the trouble and detach from the project. Personally, I believe that there will be many more that opt for the former, especially those who are still consistently sending in (0x1) errors. After all, why crunch for no reason? For those that can't or won't do the fix, at least they will know what is going on, and can reallocate their resources elsewhere. Regardless of what they decide, the project will benefit from a leaner, more efficient test group. And, as a strong believer in communication, I think that by being open and direct about this, you will build a foundation of trust and increase your chances of regaining any users that may leave.
2. This brings me to my second (and preferred) recommendation, which would be to do all of the above, and go a step farther by sending a more comprehensive assessment of the project. This would be sent to ALL Docking users, and include not only the Linux fix, but info on the P3's, the HR , and a brief rundown on any other progress the team is making (or has already made!). This not only would bring everyone up to speed on everything, but you would also save yourselves the trouble of sifting through the userlist to find the individual Linux users you want to email about the ulimit fix.
Being an Alpha project, it is generally assumed that volunteers are going to be more involved than in a full release; however, in reality we know this is not always the case. Did some people rush to sign up for this project to get another notch in their "BOINC Belt"? Sure, that's always going to happen, but I don't think that is what's really lying at the heart of this matter. Very often, people just don't have the time, or feel the inclination to cruise the forums of all their projects, even though they mean well. Does D@H have an obligation to spell it all out neatly for everyone who isn't paying attention? No, but choosing to do so will help both the project, and its volunteers - active and inactive.
I will email those users. Let me know if you come up with any more.
I am not sure that the AMD K6 machines with signal 4 errors (illegal instruction) are incompatible with the app. I've googled around a bit and it seems that it might have to do with the OS as well (debian in this case). There are other K6's that don't seem to have a problem. We'll have to keep an eye on these and see what is causing these errors.
Andre
> Host 207 still has not had a successful result after 111 WU's. This host and host 509 are both AMD K6 processors and according to David Ball (thanks David), the 'signal 4' error they are getting is due to the processor and possibly is not compatiable with the project ??
> Have found some more hosts that have not gotten any successful workunits yet and are still on 0.00 cobblestones;
> Host 878 has 29 WU's, 28 with exit code 1 and 1 to process.
> Host 1006 has 7 WU's, 6 with exit code 1 and 1 to process.
> Host 771 has 19 WU's, all with exit code 1.
So all these latest ones only need the 'ulimt' fix (at least it appears so), and then they can begin contributing to this project.
> It would appear that these users have not scanned the boards much or do not understand the Linux stack problem, thinking that they just download a WU and away they go. So they may need a little extra help from the project team if they can spare the time. Just a short e-mail saying "Please add "ulimit -s unlimited" as the first line in your 'run_manager' file in your Boinc folder.
If they have dual cores then they should have Linux 2.6 or later and also add ulimit to the 'run_client' as well.
Doing this can add a few more Linux machines to the mix.
Thanks and happy crunching to the Docking team.
____________
D@H the greatest project in the world... a while from now!
very well said and some very good suggestions. We are already working on a bigger update (when we've got all the info gathered) for a news update (still hope that everybody is subscribed to our RSS feed as a minimum), but emailing everybody perosnally (like a news letter) is a not a bad idea at all and will be a good medium to reach the linux users without the ulimit fix as well.
Thanks!
It's great to have people on board who are actively thinking with us and for us (and making web page headers too :-)
Andre
I thought Conan's idea about contacting this group of testers was a good one when I first read it; and in the weeks since, I've only come to agree more. It is clear that there are a good many Linux users who are unaware of the ulimit fix, but are still crunching these WUs.
The fact that they're still attatched and crunching them shows that they are interested in this project. More than likely, they believe that they are helping the project by faithfully continuing to send in their errored results; when in fact they are actually causing a backlog of pending results, (possibly) contributing to the HR issue, and ABSOLUTELY wasting their CPU cycles needlessly. I am willing to bet that they would rather be getting credit for their time.
I don't run Linux myself (yet!), which is why I didn't comment on this thread earlier, but I do read all the threads here, even if the subject doesn't directly apply to me. It's still an issue for the project as a whole; and until all the problems are solved project-wide (OS-specific bugs, the P3 problem, the HR redundancy quandary, etc.) we cannot achieve our main goal of moving out of Alpha.
That said, it appears to me that the ulimit fix is relatively simple to implement on most platforms, and there has been a wide enough variety of some of the more stubborn or exotic breeds tested by now that there are workarounds posted for many of them already. Moreover, the support here has been fantastic - the devs and other Linux masterminds lurking in the forum have been quick to help fellow users troubleshoot their systems and patiently walk them through the steps to get them going - and they have been happy to do it.
I recommend doing one of two things:
1. Do as Conan suggested and notify all the Linux users here of the current status of the situation. Show them the basic ulimit fix, include links to the FAQ (and perhaps even to some of the applicable threads on the subject), let them know your best guess on when you might have a permanent fix (if you can), and tell them you'll send notification again when it's ready. This information will offer them a choice: Either they will try the workaround and begin producing valid results on their machines, or they will decide it's not worth the trouble and detach from the project. Personally, I believe that there will be many more that opt for the former, especially those who are still consistently sending in (0x1) errors. After all, why crunch for no reason? For those that can't or won't do the fix, at least they will know what is going on, and can reallocate their resources elsewhere. Regardless of what they decide, the project will benefit from a leaner, more efficient test group. And, as a strong believer in communication, I think that by being open and direct about this, you will build a foundation of trust and increase your chances of regaining any users that may leave.
2. This brings me to my second (and preferred) recommendation, which would be to do all of the above, and go a step farther by sending a more comprehensive assessment of the project. This would be sent to ALL Docking users, and include not only the Linux fix, but info on the P3's, the HR , and a brief rundown on any other progress the team is making (or has already made!). This not only would bring everyone up to speed on everything, but you would also save yourselves the trouble of sifting through the userlist to find the individual Linux users you want to email about the ulimit fix.
Being an Alpha project, it is generally assumed that volunteers are going to be more involved than in a full release; however, in reality we know this is not always the case. Did some people rush to sign up for this project to get another notch in their "BOINC Belt"? Sure, that's always going to happen, but I don't think that is what's really lying at the heart of this matter. Very often, people just don't have the time, or feel the inclination to cruise the forums of all their projects, even though they mean well. Does D@H have an obligation to spell it all out neatly for everyone who isn't paying attention? No, but choosing to do so will help both the project, and its volunteers - active and inactive.
Just my $.02
Atomic
____________
D@H the greatest project in the world... a while from now!
I am thrilled to hear that you are planning a newsletter style update! Personally, I love receiving info about projects in my mailbox! And as I mentioned in another related thread, I think there should be an automatic notification sent out to new users when they first sign up, so that people are informed of the issues (and expectations) from the start. Otherwise, as more people join the same problems will begin to compound themselves, and we will end up with the same troubles all over again.
Until we get most of our pressing problems fixed, we're not planning to take more volunteers aboard; that mishap of opening the user registration by accident hopefully won't happen anymore :-)
AK
Andre,
I am thrilled to hear that you are planning a newsletter style update! Personally, I love receiving info about projects in my mailbox! And as I mentioned in another related thread, I think there should be an automatic notification sent out to new users when they first sign up, so that people are informed of the issues (and expectations) from the start. Otherwise, as more people join the same problems will begin to compound themselves, and we will end up with the same troubles all over again.
Atomic
____________
D@H the greatest project in the world... a while from now!
I will email those users. Let me know if you come up with any more.
I am not sure that the AMD K6 machines with signal 4 errors (illegal instruction) are incompatible with the app. I've googled around a bit and it seems that it might have to do with the OS as well (debian in this case). There are other K6's that don't seem to have a problem. We'll have to keep an eye on these and see what is causing these errors.
Andre
> Host 207 still has not had a successful result after 111 WU's. This host and host 509 are both AMD K6 processors and according to David Ball (thanks David), the 'signal 4' error they are getting is due to the processor and possibly is not compatiable with the project ??
> Have found some more hosts that have not gotten any successful workunits yet and are still on 0.00 cobblestones;
> Host 878 has 29 WU's, 28 with exit code 1 and 1 to process.
> Host 1006 has 7 WU's, 6 with exit code 1 and 1 to process.
> Host 771 has 19 WU's, all with exit code 1.
So all these latest ones only need the 'ulimt' fix (at least it appears so), and then they can begin contributing to this project.
> It would appear that these users have not scanned the boards much or do not understand the Linux stack problem, thinking that they just download a WU and away they go. So they may need a little extra help from the project team if they can spare the time. Just a short e-mail saying "Please add "ulimit -s unlimited" as the first line in your 'run_manager' file in your Boinc folder.
If they have dual cores then they should have Linux 2.6 or later and also add ulimit to the 'run_client' as well.
Doing this can add a few more Linux machines to the mix.
Thanks and happy crunching to the Docking team.
I will email those users. Let me know if you come up with any more.
I am not sure that the AMD K6 machines with signal 4 errors (illegal instruction) are incompatible with the app. I've googled around a bit and it seems that it might have to do with the OS as well (debian in this case). There are other K6's that don't seem to have a problem. We'll have to keep an eye on these and see what is causing these errors.
Andre
> Host 207 still has not had a successful result after 111 WU's. This host and host 509 are both AMD K6 processors and according to David Ball (thanks David), the 'signal 4' error they are getting is due to the processor and possibly is not compatiable with the project ??
> Have found some more hosts that have not gotten any successful workunits yet and are still on 0.00 cobblestones;
> Host 878 has 29 WU's, 28 with exit code 1 and 1 to process.
> Host 1006 has 7 WU's, 6 with exit code 1 and 1 to process.
> Host 771 has 19 WU's, all with exit code 1.
So all these latest ones only need the 'ulimt' fix (at least it appears so), and then they can begin contributing to this project.
> It would appear that these users have not scanned the boards much or do not understand the Linux stack problem, thinking that they just download a WU and away they go. So they may need a little extra help from the project team if they can spare the time. Just a short e-mail saying "Please add "ulimit -s unlimited" as the first line in your 'run_manager' file in your Boinc folder.
If they have dual cores then they should have Linux 2.6 or later and also add ulimit to the 'run_client' as well.
Doing this can add a few more Linux machines to the mix.
Thanks and happy crunching to the Docking team.
I will email those users. Let me know if you come up with any more.
I am not sure that the AMD K6 machines with signal 4 errors (illegal instruction) are incompatible with the app. I've googled around a bit and it seems that it might have to do with the OS as well (debian in this case). There are other K6's that don't seem to have a problem. We'll have to keep an eye on these and see what is causing these errors.
Andre
> Host 207 still has not had a successful result after 111 WU's. This host and host 509 are both AMD K6 processors and according to David Ball (thanks David), the 'signal 4' error they are getting is due to the processor and possibly is not compatiable with the project ??
> Have found some more hosts that have not gotten any successful workunits yet and are still on 0.00 cobblestones;
> Host 878 has 29 WU's, 28 with exit code 1 and 1 to process.
> Host 1006 has 7 WU's, 6 with exit code 1 and 1 to process.
> Host 771 has 19 WU's, all with exit code 1.
So all these latest ones only need the 'ulimt' fix (at least it appears so), and then they can begin contributing to this project.
> It would appear that these users have not scanned the boards much or do not understand the Linux stack problem, thinking that they just download a WU and away they go. So they may need a little extra help from the project team if they can spare the time. Just a short e-mail saying "Please add "ulimit -s unlimited" as the first line in your 'run_manager' file in your Boinc folder.
If they have dual cores then they should have Linux 2.6 or later and also add ulimit to the 'run_client' as well.
Doing this can add a few more Linux machines to the mix.
Thanks and happy crunching to the Docking team.
I will email those users. Let me know if you come up with any more.
I am not sure that the AMD K6 machines with signal 4 errors (illegal instruction) are incompatible with the app. I've googled around a bit and it seems that it might have to do with the OS as well (debian in this case). There are other K6's that don't seem to have a problem. We'll have to keep an eye on these and see what is causing these errors.
Andre
> Host 207 still has not had a successful result after 111 WU's. This host and host 509 are both AMD K6 processors and according to David Ball (thanks David), the 'signal 4' error they are getting is due to the processor and possibly is not compatiable with the project ??
> Have found some more hosts that have not gotten any successful workunits yet and are still on 0.00 cobblestones;
> Host 878 has 29 WU's, 28 with exit code 1 and 1 to process.
> Host 1006 has 7 WU's, 6 with exit code 1 and 1 to process.
> Host 771 has 19 WU's, all with exit code 1.
So all these latest ones only need the 'ulimt' fix (at least it appears so), and then they can begin contributing to this project.
> It would appear that these users have not scanned the boards much or do not understand the Linux stack problem, thinking that they just download a WU and away they go. So they may need a little extra help from the project team if they can spare the time. Just a short e-mail saying "Please add "ulimit -s unlimited" as the first line in your 'run_manager' file in your Boinc folder.
If they have dual cores then they should have Linux 2.6 or later and also add ulimit to the 'run_client' as well.
Doing this can add a few more Linux machines to the mix.
Thanks and happy crunching to the Docking team.
I will email those users. Let me know if you come up with any more.
I am not sure that the AMD K6 machines with signal 4 errors (illegal instruction) are incompatible with the app. I've googled around a bit and it seems that it might have to do with the OS as well (debian in this case). There are other K6's that don't seem to have a problem. We'll have to keep an eye on these and see what is causing these errors.
Andre
> Host 207 still has not had a successful result after 111 WU's. This host and host 509 are both AMD K6 processors and according to David Ball (thanks David), the 'signal 4' error they are getting is due to the processor and possibly is not compatiable with the project ??
> Have found some more hosts that have not gotten any successful workunits yet and are still on 0.00 cobblestones;
> Host 878 has 29 WU's, 28 with exit code 1 and 1 to process.
> Host 1006 has 7 WU's, 6 with exit code 1 and 1 to process.
> Host 771 has 19 WU's, all with exit code 1.
So all these latest ones only need the 'ulimt' fix (at least it appears so), and then they can begin contributing to this project.
> It would appear that these users have not scanned the boards much or do not understand the Linux stack problem, thinking that they just download a WU and away they go. So they may need a little extra help from the project team if they can spare the time. Just a short e-mail saying "Please add "ulimit -s unlimited" as the first line in your 'run_manager' file in your Boinc folder.
If they have dual cores then they should have Linux 2.6 or later and also add ulimit to the 'run_client' as well.
Doing this can add a few more Linux machines to the mix.
Thanks and happy crunching to the Docking team.
Also Host 121 which has 21.89 cobblestones after 116 WU's, same error.
If we can get these hosts working then it will deffinately speed up completion times.
Catch you later and keep flying.
> Another one for you Andre,
Host 1117 has a number of results all with 'exit code 1' and zero credit.
>> And here is another,
Host 1197, 26 results, 0.00 credit. Exit 1.
I shut this computer down and returned it to using W2K when I realized the Linux just wasn't working with that CPU and D@H.
Thanks for keepings tabs on things.
Thats ok, j2satx. Just trying to keep as many machines working as it helps get the results back quicker. Andre said he would e-mail people having trouble to see if he can get them working. You may only need the 'ulimit -s unlimited' fix added to you 'run_manager' file ? to get that machine working. If you have taken it back to Windows well it will work without modification.
Have a good Christmas and crunch through the New Year.
Everybody that we know of has the error 1 problem has been emailed a while ago to make them aware; now it's up to them to act upon it :-)
Thanks!
Andre
Thats ok, j2satx. Just trying to keep as many machines working as it helps get the results back quicker. Andre said he would e-mail people having trouble to see if he can get them working. You may only need the 'ulimit -s unlimited' fix added to you 'run_manager' file ? to get that machine working. If you have taken it back to Windows well it will work without modification.
Have a good Christmas and crunch through the New Year.
____________
D@H the greatest project in the world... a while from now!
I shut this computer down and returned it to using W2K when I realized the Linux just wasn't working with that CPU and D@H.
Thanks for keepings tabs on things.
Thats ok, j2satx. Just trying to keep as many machines working as it helps get the results back quicker. Andre said he would e-mail people having trouble to see if he can get them working. You may only need the 'ulimit -s unlimited' fix added to you 'run_manager' file ? to get that machine working. If you have taken it back to Windows well it will work without modification.
Have a good Christmas and crunch through the New Year.
I had the ulimit instruction in the run_client and run_manager. I think the CPU is a K6 (1.2Ghz Thunderbird).
I'll put another computer on instead with a better processor.