Main Page | Report this Page
 
   
Science Forum Index  »  Astro - Seti Forum  »  Mega-long/slow workunit?
Page 1 of 1    
Author Message
Patrick Vervoorn
Posted: Fri Aug 31, 2007 7:43 am
Guest
Hi,

I'm running SetBOINC on several computers, one of which is a P4-2.8GHz
HT-enabled machine.

It is currently processing a WU which seems to take a very long time.
Using boinc_curses, I get:

setiathome_enhanced 527 04mr07ab.7106.' Running 47:24:28 0.089%
48:51:06


Full WU name/id:

04mr07ab.7106.4571.10.4.203

I've looked this one up via the BOINC account page, and other
people/machines which have crunched this WU, have aborted with a 'too long
processing' error.

My SetiBOINC install hasn't reached that conclusion (yet). Any idea on
when it will do that, and what the criteria are for such a conclusion?

This mega-long WU is now blocking one of the two threads on this machine
from crunching 'useful' WU's. Is there are way to manually clear it?

First time I ever ran into a WU like this actually...

Regards,

Patrick.
Mark Conroe
Posted: Sat Sep 01, 2007 6:04 am
Guest
"Patrick Vervoorn" <patrick.vervoorn@NOSPAM.perihelion.demon.nl> wrote:

Quote:
This mega-long WU is now blocking one of the two threads on this machine
from crunching 'useful' WU's. Is there are way to manually clear it?

There is an 'abort' option on 5.x clients. Sounds like you may have one of
the buggy clients/WUs sent out accidentally. See these thread for more details:

http://setiathome.berkeley.edu/forum_thread.php?id=41585
http://setiathome.berkeley.edu/forum_thread.php?id=41736
Odysseus
Posted: Sat Sep 01, 2007 2:25 pm
Guest
In article <81d14$46d9831c$82a1d3bf$5590@news1.tudelft.nl>,
Patrick Vervoorn <patrick.vervoorn@NOSPAM.perihelion.demon.nl> wrote:

<snip>
Quote:

[BOINC's] estimates are totally wrong, since it finishes WU's in
around 8 or 9 hours each. This also means the client does not
maintain a 'realistic' cache anymore, since it 'thinks' it has
several days worth of WUs in the cache (which it doesn't).

Restarting the client has no effect on these estimates, anyway to
force these to go lower again, or will the client self-adjust once it
has crunch a few more WUs in the 'standard' time of ~8 hours?

Yes; it will take a while to come down, because the secheduler is
designed to err on the side of caution when figuring out how much work
your system can do. You can ascertain the basis of its decisions from
the "Result duration correction factor" (RDCF) shown near the bottom of
the host page(s) in your account: the project-supplied time estimate for
each task is multiplied by this figure to obtain the estimate that your
BOINC client uses to decide when to ask for work and so on. The
algorithm for calculating this value makes it rise much more easily than
fall.

You can reset the RDCF by editing certain XML files in the BOINC data
folder, but unless the high estimates are extremely inconvenient for
you, I recommend letting it adjust on its own. Note that the new
Multibeam tasks have a different 'profile' (in terms of angle ranges and
the computations required), so everyone's clients will be adapting for a
while.

--
Odysseus
Patrick Vervoorn
Posted: Sat Sep 01, 2007 3:14 pm
Guest
In article <odysseus1479-at-06FFE5.13251701092007@news.telus.net>,
Odysseus <odysseus1479-at@yahoo-dot.ca> wrote:
Quote:
In article <81d14$46d9831c$82a1d3bf$5590@news1.tudelft.nl>,
Patrick Vervoorn <patrick.vervoorn@NOSPAM.perihelion.demon.nl> wrote:

snip

[BOINC's] estimates are totally wrong, since it finishes WU's in
around 8 or 9 hours each. This also means the client does not
maintain a 'realistic' cache anymore, since it 'thinks' it has
several days worth of WUs in the cache (which it doesn't).

Restarting the client has no effect on these estimates, anyway to
force these to go lower again, or will the client self-adjust once it
has crunch a few more WUs in the 'standard' time of ~8 hours?

Yes; it will take a while to come down, because the secheduler is
designed to err on the side of caution when figuring out how much work
your system can do. You can ascertain the basis of its decisions from
the "Result duration correction factor" (RDCF) shown near the bottom of
the host page(s) in your account: the project-supplied time estimate for
each task is multiplied by this figure to obtain the estimate that your
BOINC client uses to decide when to ask for work and so on. The
algorithm for calculating this value makes it rise much more easily than
fall.

You can reset the RDCF by editing certain XML files in the BOINC data
folder, but unless the high estimates are extremely inconvenient for
you, I recommend letting it adjust on its own. Note that the new
Multibeam tasks have a different 'profile' (in terms of angle ranges and
the computations required), so everyone's clients will be adapting for a
while.

It's not overly inconvenient, but the machine is plenty fast, and I'd set
the queues pretty long, so it at most of the times caches around 15 - 20
WUs (which it ran out of sometimes, when the servers were down for a
longer time).

Anyway, the new estimates the client does are set around 50 hours now, so
it seems to be adjusting. However, if this happens every time I get a
'faulty WU' as described in my OP, that would be pretty inconvenient....

Anyway, I'll keep a look on it, funny to see the client self-adjusting. ;)

Regards,

Patrick.
Patrick Vervoorn
Posted: Thu Sep 06, 2007 3:51 am
Guest
In article <odysseus1479-at-06FFE5.13251701092007@news.telus.net>,
Odysseus <odysseus1479-at@yahoo-dot.ca> wrote:
Quote:
In article <81d14$46d9831c$82a1d3bf$5590@news1.tudelft.nl>,
Patrick Vervoorn <patrick.vervoorn@NOSPAM.perihelion.demon.nl> wrote:

snip

[BOINC's] estimates are totally wrong, since it finishes WU's in
around 8 or 9 hours each. This also means the client does not
maintain a 'realistic' cache anymore, since it 'thinks' it has
several days worth of WUs in the cache (which it doesn't).

Restarting the client has no effect on these estimates, anyway to
force these to go lower again, or will the client self-adjust once it
has crunch a few more WUs in the 'standard' time of ~8 hours?

Yes; it will take a while to come down, because the secheduler is
designed to err on the side of caution when figuring out how much work
your system can do. You can ascertain the basis of its decisions from
the "Result duration correction factor" (RDCF) shown near the bottom of
the host page(s) in your account: the project-supplied time estimate for
each task is multiplied by this figure to obtain the estimate that your
BOINC client uses to decide when to ask for work and so on. The
algorithm for calculating this value makes it rise much more easily than
fall.

I've finally found this. it's currently:

Result duration correction factor 5.659713


For this host, while other computers I have running SetiBOINC are lower
than 1.0. The machine is currently struggling to get work. It crunches two
WU's at a time (taking about 8 to 9 hours per WU), and it only starts
fetching new WU's when either of these is close to finishing (and the
estimated time to finishing a WU is finally getting more realistic).

What's the easy way to set this to a more realistic value, and _what_ is a
more realistic value? 1.0?

Quote:
You can reset the RDCF by editing certain XML files in the BOINC data
folder, but unless the high estimates are extremely inconvenient for
you, I recommend letting it adjust on its own. Note that the new
Multibeam tasks have a different 'profile' (in terms of angle ranges and
the computations required), so everyone's clients will be adapting for a
while.

How fast will this re-adjust itself? I'll keep monitoring the Host page,
and see what this 'factor' does over the next days... I suppose I should
be coming down, since the machine is back to it's usual turnaround
times...

I must say it's rather sloppy programming if this 'factor' is skewed like
this after the machine got handed 1 'buggy' WU....

Regards,

Patrick.
Patrick Vervoorn
Posted: Tue Sep 11, 2007 8:29 am
Guest
In article <25bd$46dfbfa6$82a1d3bf$15395@news1.tudelft.nl>,
Patrick Vervoorn <patrick.vervoorn@NOSPAM.perihelion.demon.nl> wrote:

[snip]

Quote:
How fast will this re-adjust itself? I'll keep monitoring the Host page,
and see what this 'factor' does over the next days... I suppose I should
be coming down, since the machine is back to it's usual turnaround
times...

I must say it's rather sloppy programming if this 'factor' is skewed like
this after the machine got handed 1 'buggy' WU....

After looking at the machine struggling to get a few workunits, and being
idle during the entire weekend because the SetiBOINC pipelines dried up, I
finally just edited the BOINC/client_state.xml file.

In there, in the <time_stats> section, there was a field called
<duration_correction_factor>, which containted a value of ~5.x. I stopped
BOINC, changed this value to 1.0, then restarted the client again, which
immediately started requesting WUs and slurping them in.

It now has a healthy cache of about 15 WUs again, enough to keep it busy
for a few days if/when the servers are down again. I suppose BOINC will
start re-adjusting this 'factor' once it has finished some WUs, but it now
has a more realistic 'starting-value' again...

Hope this is of any help to other people having the same problem...

Regards,

Patrick.
Patrick Vervoorn
Posted: Tue Sep 11, 2007 8:35 am
Guest
In article <24f21$46e6983a$82a1d3bf$2276@news2.tudelft.nl>,
Patrick Vervoorn <patrick.vervoorn@NOSPAM.perihelion.demon.nl> wrote:

[Mega-snip, apologies for these follow-ups to myself, but I'd like to get
this into Google and other archives correctly]

Quote:
After looking at the machine struggling to get a few workunits, and being
idle during the entire weekend because the SetiBOINC pipelines dried up, I
finally just edited the BOINC/client_state.xml file.

In there, in the <time_stats> section, there was a field called
duration_correction_factor>, which containted a value of ~5.x. I stopped
BOINC, changed this value to 1.0, then restarted the client again, which
immediately started requesting WUs and slurping them in.

My apologies, this is not in the <time_stats> section, but in the
<project> section.

Regards,

Patrick.
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sat Oct 11, 2008 12:02 am