Shared memory overview page available
Message boards : Number crunching : Shared memory overview page available
Author | Message | |
---|---|---|
A new web page has been made available that will give an overview of the currently scheduled work in the shared memory area. So if you get the message 'there was work, but it is committed to other platforms' please check
this
page first. Currently we are in need of many more linux machines and macs to take out all of that reserved work in the queue for them. If you know of any colleagues or friends that have such machines, please let them write an email to dockingadmin@utep.edu for an invitation code. Thanks!
|
||
ID: 2606 | Rating: 0 | rate: / | ||
A new web page has been made available that will give an overview of the currently scheduled work in the shared memory area. So if you get the message 'there was work, but it is committed to other platforms' please check this page first. Currently we are in need of many more linux machines and macs to take out all of that reserved work in the queue for them. If you know of any colleagues or friends that have such machines, please let them write an email to dockingadmin@utep.edu for an invitation code. Thanks! Does that show 549 WUs reserved for Intel Linux and no work avail for AMD Linux? |
||
ID: 2607 | Rating: 0 | rate: / | ||
Correct.
A new web page has been made available that will give an overview of the currently scheduled work in the shared memory area. So if you get the message 'there was work, but it is committed to other platforms' please check this page first. Currently we are in need of many more linux machines and macs to take out all of that reserved work in the queue for them. If you know of any colleagues or friends that have such machines, please let them write an email to dockingadmin@utep.edu for an invitation code. Thanks! ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2608 | Rating: 0 | rate: / | ||
Darwin Intel seems to really be the Wu hog. Im not seeing much reserved for Intel Linux. I notice there is no pre p4 for Linux so i don't feel so bad for retiring my Coppermine anymore.
|
||
ID: 2610 | Rating: 0 | rate: / | ||
We've put some Intel Linux boxes to work here in the lab, to get rid of that large amount of workunits in their queue; that seems to have helped alleviate things a bit; Intel Darwin machines are a problem though since we only have one in the lab and not too many out there. I'm currently writing a tool that will show us the HR distribution of the current active machines attached to docking. That will help research this issue a bit too.
Darwin Intel seems to really be the Wu hog. Im not seeing much reserved for Intel Linux. I notice there is no pre p4 for Linux so i don't feel so bad for retiring my Coppermine anymore. ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2611 | Rating: 0 | rate: / | ||
Intel Darwin machines are a problem though since we only have one in the lab and not too many out there. There are tons of Mactels out there, but there's a reason for that. First, getting credits with an Intel Mac was a bit of a lottery. Lots of invalid results for Mactel crunchers, project staff didn't really know the cause and suggested (Andre?) to stop crunching temporarily with an Intel Mac until the new version is released and this problem is sorted out. BUT: App version is still 5.02, so how can you expect getting lots of Macs? Maybe the problem IS solved, but most people don't read the forums and don't know about it. Second, charmm still shows excessive disk writes every second. This IS a also a problem for many crunchers. |
||
ID: 2613 | Rating: 0 | rate: / | ||
Intel Darwin machines are a problem though since we only have one in the lab and not too many out there. I can't say that the Intel Macs gets invalid results. I have 6 Intel Macs for the last 3 days on Docking and after ca. 1000 Workunits i have no invalid result. I would say the problems are solved. Frank |
||
ID: 2614 | Rating: 0 | rate: / | ||
Ok the Linux Intel results are building up fast again. I can only supply 2 Linux Intels as the 3rd (my C2D) is 64bit on ABC and RS as it does its best work on 64bit projects. It will make a very small dent in the way those WU´s build up with the long crunch time on these older 2 and 2.4Ghz machines. Anyone know a lot of people with Intel Linux machines?
|
||
ID: 2615 | Rating: 0 | rate: / | ||
We know what causes the occasional Mac invalid results: it has to do with how we do the checkpointing. This process is currently being changed, but it is quite tricky to implement. The checkpointing and logfile writing we currently do also causes the disk writing. We explained that we need to write a lot of log messages, because we are in alpha and have to find the issues with the system.
There are tons of Mactels out there, but there's a reason for that. ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2616 | Rating: 0 | rate: / | ||
I just had an invalid result in my Intel iMac http://docking.utep.edu/workunit.php?wuid=26400
|
||
ID: 2617 | Rating: 0 | rate: / | ||
We know what causes the occasional Mac invalid results: it has to do with how we do the checkpointing. This process is currently being changed, but it is quite tricky to implement. The checkpointing and logfile writing we currently do also causes the disk writing. We explained that we need to write a lot of log messages, because we are in alpha and have to find the issues with the system. See the news item posted on Dec. 1st. It tells Mac users to suspend. Perhaps a new news item asking them to rejoin would be in order. ____________ Dublin, CA Team SETI.USA |
||
ID: 2618 | Rating: 0 | rate: / | ||
We'd rather first fix the checkpointing issue before asking all macs to attach again. At the moment, things might seem better for the Macs, but we haven't changed anything really... So I feel a bit hesitant to ask people back before the issue has been fixed.
____________ D@H the greatest project in the world... a while from now! |
||
ID: 2619 | Rating: 0 | rate: / | ||
I think this has more to do with the new boinc client (>5.8.11) than anything else. Boinc_dev changed the way how Macs identify themselves to the server and initially I thought this broke our HR rules, now I start to think it might have fixed some problems :-) We are also working on a new set of HR rules for this reason.
I just had an invalid result in my Intel iMac http://docking.utep.edu/workunit.php?wuid=26400 ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2620 | Rating: 0 | rate: / | ||
New host HR information is available on the
shared memory page
.
|
||
ID: 2623 | Rating: 0 | rate: / | ||
|
||
ID: 2624 | Rating: 0 | rate: / | ||
Great job on the new page!!! Thanks! After looking at it, I was wondering why There's a "Windows PreP4", but no "Linux PreP4" ? Linux PreP4 machines give the same results as Linux AMD machines, so they are going into that pool. *chuckle* Just my luck, as soon as I get totally involved securing a new server and moving websites, D@H goes from idle to everything happening at once :-) That's life :-) Good news: starting from next week, my involvement with the project goes from 2 days a week to 4 days a week. Means we can get a whole lot more done! Andre ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2625 | Rating: 0 | rate: / | ||
New host HR information is available on the shared memory page . Why are the WUs not distributed in shared memory in the same approximate distribution as the distribution of hosts? |
||
ID: 2626 | Rating: 0 | rate: / | ||
Good question :-)
New host HR information is available on the shared memory page . ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2627 | Rating: 0 | rate: / | ||
New host HR information is available on the shared memory page . Looks nice...!!! ;-) ____________ |
||
ID: 2628 | Rating: 0 | rate: / | ||
New host HR information is available on the shared memory page . Actually it looks kind of like the opposite... my first guess is because there are so many windows machines and most of the users are using windows machines to crunch that the amount of replicas reserved in the shared memory cannot grow, on the other hand mac intels are the opposite of windows. Every time computer X with owner Y comes and requests work no more of one replica for that WU can belong to X or Y, so if only few users have macs attached new WU are being assign to the mac HR class and 2 replicas stay in memory waiting for another user to download them. |
||
ID: 2629 | Rating: 0 | rate: / | ||
A lot of people are complimenting the pretty picture, and fair enough, nice, but what for me was really attractive was a couple of lines of text.
We'd rather first fix the checkpointing issue before asking all macs to attach again. At the moment, things might seem better for the Macs, but we haven't changed anything really... So I feel a bit hesitant to ask people back before the issue has been fixed. That tells me more about the project than the picture. |
||
ID: 2630 | Rating: 0 | rate: / | ||
Good question :-) If you can swap the replicas in and out of shared memory back to the database, is it not possible to redistribute them in shared memory at that point? |
||
ID: 2631 | Rating: 0 | rate: / | ||
If you can swap the replicas in and out of shared memory back to the database, is it not possible to redistribute them in shared memory at that point? You make it sound easy :-) With the current boinc code it is not that easy to implement though, because it implies HR knowledge of hosts that will come and ask for work at a certain point in time. And that's something you never know with volunteer computing projects: hosts disappear and attach all the time, so the best thing you can do is make a good as possible estimate. But that's what all the computer science in this project is for: to improve the boinc default scheduling algorithms. And we plan to do exactly that :-) Thanks Andre ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2633 | Rating: 0 | rate: / | ||
If you can swap the replicas in and out of shared memory back to the database, is it not possible to redistribute them in shared memory at that point? Also once one unassigned replica is downloaded by a host that WU will belong to that HR class and the other two replicas must go to the same HR class. So even if you send back replicas to the DB once they come back to the shared memory they will have the same HR class, otherwise, homogeneous redundancy will not work. |
||
ID: 2635 | Rating: 0 | rate: / | ||
That tells me more about the project than the picture. Hmmm, what's that supposed to mean...? I hope these are positive thoughts ;-) AK ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2636 | Rating: 0 | rate: / | ||
Please consider modifying the memory overview page so that both blocks have HR classes in the same order.
|
||
ID: 2637 | Rating: 0 | rate: / | ||
Since the quorum is now set to 2, would it help the shared memory situation to set the initial replication to 2 as well?
|
||
ID: 2638 | Rating: 0 | rate: / | ||
Fixed. Thanks for noticing :-)
Please consider modifying the memory overview page so that both blocks have HR classes in the same order. ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2639 | Rating: 0 | rate: / | ||
If you can swap the replicas in and out of shared memory back to the database, is it not possible to redistribute them in shared memory at that point? Do stale unassigned replicas return to the database and then come back as the same HR class? |
||
ID: 2641 | Rating: 0 | rate: / | ||
Fixed. Thanks for noticing :-) Also noticed the February 29th news...... |
||
ID: 2642 | Rating: 0 | rate: / | ||
If with unassigned you mean not distributed, yes: when a result gets assigned to a certain HR class it will keep that until a cannonical result was found. Andre ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2643 | Rating: 0 | rate: / | ||
From news:
|
||
ID: 2644 | Rating: 0 | rate: / | ||
Linux Intel will come back as Linux Intel. Once a workunit gets assigned an HR class, because it is assigned to a certain type of host, it will keep that forever.
From news: ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2645 | Rating: 0 | rate: / | ||
Linux Intel will come back as Linux Intel. Once a workunit gets assigned an HR class, because it is assigned to a certain type of host, it will keep that forever. So, the few that are in the "unassigned" are the only ones that could come back and be assigned another class? |
||
ID: 2646 | Rating: 0 | rate: / | ||
Correct! AK ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2647 | Rating: 0 | rate: / | ||
Also noticed the February 29th news...... :-) Now I understand what you're getting at :-) I've corrected the date... AK ____________ D@H the greatest project in the world... a while from now! |
||
ID: 2648 | Rating: 0 | rate: / | ||
New host HR information is available on the shared memory page . Hope this is not a duplicate request, I've was on vacation and have been quite busy the last week trying to catch up on everything, Yeah I'm still around a little and keep an eye on you Andre. This page looks nice, but could you label the columns. I can tell what the o/s column is but I would have to guess what this next number is for, unless I read this entire thread. Would make it easier for someone new to the project or maybe those of us to lazy to read, understand what they are looking at. |
||
ID: 2653 | Rating: 0 | rate: / | ||
Message boards : Number crunching : Shared memory overview page available
Database Error: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) { [0]=> array(7) { ["file"]=> string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc" ["line"]=> int(97) ["function"]=> string(8) "do_query" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#42 (2) { ["db_conn"]=> resource(126) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(51) "update DBNAME.thread set views=views+1 where id=188" } } [1]=> array(7) { ["file"]=> string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc" ["line"]=> int(60) ["function"]=> string(6) "update" ["class"]=> string(6) "DbConn" ["object"]=> object(DbConn)#42 (2) { ["db_conn"]=> resource(126) of type (mysql link persistent) ["db_name"]=> string(7) "docking" } ["type"]=> string(2) "->" ["args"]=> array(3) { [0]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "188" ["forum"]=> string(1) "2" ["owner"]=> string(1) "1" ["status"]=> string(1) "0" ["title"]=> string(37) "Shared memory overview page available" ["timestamp"]=> string(10) "1172882011" ["views"]=> string(4) "1488" ["replies"]=> string(2) "36" ["activity"]=> string(20) "8.3070855422043e-123" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1172613847" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } [1]=> &string(6) "thread" [2]=> &string(13) "views=views+1" } } [2]=> array(7) { ["file"]=> string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php" ["line"]=> int(184) ["function"]=> string(6) "update" ["class"]=> string(11) "BoincThread" ["object"]=> object(BoincThread)#3 (16) { ["id"]=> string(3) "188" ["forum"]=> string(1) "2" ["owner"]=> string(1) "1" ["status"]=> string(1) "0" ["title"]=> string(37) "Shared memory overview page available" ["timestamp"]=> string(10) "1172882011" ["views"]=> string(4) "1488" ["replies"]=> string(2) "36" ["activity"]=> string(20) "8.3070855422043e-123" ["sufferers"]=> string(1) "0" ["score"]=> string(1) "0" ["votes"]=> string(1) "0" ["create_time"]=> string(10) "1172613847" ["hidden"]=> string(1) "0" ["sticky"]=> string(1) "0" ["locked"]=> string(1) "0" } ["type"]=> string(2) "->" ["args"]=> array(1) { [0]=> &string(13) "views=views+1" } } }query: update docking.thread set views=views+1 where id=188