Latest workunits have bad runtime esrimates
log in

Advanced search

Message boards : Number crunching : Latest workunits have bad runtime esrimates

1 · 2 · 3 · 4 · Next
Author Message
Profile robertmiles
Send message
Joined: 22 Apr 13
Posts: 62
Credit: 436,328
RAC: 214
Message 2101 - Posted: 20 Mar 2015, 23:22:07 UTC

All the workunits I got today start out with an estimated runtime of 00:04:46, but actually take between 50 and 60 minutes to complete. Does this mean you have a new version of the application, for which the BOINC client needs to complete a certain number of workunits before it can calculate the estimated runtimes accurately?

Profile robertmiles
Send message
Joined: 22 Apr 13
Posts: 62
Credit: 436,328
RAC: 214
Message 2102 - Posted: 20 Mar 2015, 23:29:09 UTC - in response to Message 2101.

Also, I see that these longer workunits do not have checkpoints yet, which will mean some lost CPU time when I shut down BOINC to install the latest Nvidia driver.

Profile robertmiles
Send message
Joined: 22 Apr 13
Posts: 62
Credit: 436,328
RAC: 214
Message 2103 - Posted: 20 Mar 2015, 23:45:52 UTC

Since the workunits are taking several times as long as before, is that because they are also doing several times as much useful work as before?

Ananas
Send message
Joined: 8 Jun 13
Posts: 128
Credit: 1,947,833
RAC: 0
Message 2104 - Posted: 20 Mar 2015, 23:50:57 UTC - in response to Message 2101.
Last modified: 21 Mar 2015, 0:30:52 UTC

Not a new version, the workunits actually have (and always had) an extremely wide range of possible runtimes and if your client had a bunch of short running ones before, it probably had adjusted to those already.

I'm currently receiving mostly long running ones, but always mixed with a few very short ones now and then. This time there are not too many short ones in a row (yet) so the core client will not have the time to adjust to them I guess. For me this is a very good mix as the core client cache management can handle this quite well.

p.s.: Older core clients adjusted to longer running workunits immediately after completing only one single long one. I'm not sure yet how the 7.x client that I installed lately handles this situation ... it seems to be somewhat different.

TJ
Send message
Joined: 14 Apr 13
Posts: 17
Credit: 340,802
RAC: 0
Message 2105 - Posted: 21 Mar 2015, 0:40:05 UTC

I got WU's estimated 2m25s, they run up to 100% quickly, in about 4m, and then run for 1h45m50s and counting. No idea if these are faulty or have wrong estimates. Let it run overnight and see tomorrow what happened.
____________
Greetings from,
TJ

Ananas
Send message
Joined: 8 Jun 13
Posts: 128
Credit: 1,947,833
RAC: 0
Message 2106 - Posted: 21 Mar 2015, 1:11:00 UTC - in response to Message 2105.
Last modified: 21 Mar 2015, 1:13:05 UTC

From the previous batches I can tell that it's just an estimate problem, the results should be useful. If there is anything bad (typo in the ligand), they fail really quick. My top runtime has been nearly 3 hours - but my box isn't the fastest (dual 60W Xeon, those run with a reduced frequency)

Ananas
Send message
Joined: 8 Jun 13
Posts: 128
Credit: 1,947,833
RAC: 0
Message 2107 - Posted: 21 Mar 2015, 3:38:28 UTC - in response to Message 2106.

... My top runtime has been nearly 3 hours ...

Top is > 4.5 hours now.

TJ
Send message
Joined: 14 Apr 13
Posts: 17
Credit: 340,802
RAC: 0
Message 2108 - Posted: 21 Mar 2015, 10:08:55 UTC - in response to Message 2107.

Thanks for the information Ananas. I let them run and now all finished without error. The longest ran for 6 hours and 35 minutes. But the shortest for around 35 seconds.
____________
Greetings from,
TJ

Maxxina
Send message
Joined: 11 Dec 13
Posts: 22
Credit: 251,337
RAC: 0
Message 2109 - Posted: 21 Mar 2015, 10:40:04 UTC - in response to Message 2108.

Yea Times are variable. Quad with I5-4690K . Me run times are from 15 seconds to two hours. Its fun how it variate :)

Profile jay
Send message
Joined: 12 Jul 14
Posts: 13
Credit: 51,444
RAC: 0
Message 2111 - Posted: 21 Mar 2015, 11:05:32 UTC
Last modified: 21 Mar 2015, 11:06:48 UTC

Greetings !!

Please bump the estimates up to 1.5 or 2 hours per WU.

With a short completion estimate, BOINC will fill itself with FiNDAH WU.
Then, FINDAH takes way more than its project allocation as it finishes
the WU that take 50 times as long as the estimate.
As a result. FiNDAH does not play well with others.


On the occasion that a WU does finish within a few minutes, BOINC can then get
more wu - maintaining a balance between projects - according to Resource Share.

Is there another alternative to allowing my other projects to run - other than the user drop Findah to less than 1.0% of resource share??


Thanks,
Jay

Profile robertmiles
Send message
Joined: 22 Apr 13
Posts: 62
Credit: 436,328
RAC: 214
Message 2112 - Posted: 21 Mar 2015, 13:10:39 UTC

Is the scheduler able to calculate a better estimate of the runtime from the inputs to be used? That's what some other projects with highly variable runtime use.

Ananas
Send message
Joined: 8 Jun 13
Posts: 128
Credit: 1,947,833
RAC: 0
Message 2113 - Posted: 21 Mar 2015, 13:23:21 UTC - in response to Message 2111.
Last modified: 21 Mar 2015, 14:06:01 UTC

... Is there another alternative to allowing my other projects to run - other than the user drop Findah to less than 1.0% of resource share?? ...

Yes, there is !

Create a text file named app_config.xml in the findah project directory (adjust max_concurrent as needed) :

<app_config> <app> <name>vina</name> <max_concurrent>7</max_concurrent> </app> </app_config>


I'm not sure which client started to support this, 5.x did not, 7.4 does so the introduction has been somewhere between.

p.s.: You'll have to reread the configuration files, either through the commandline tool, through the BOINC manager or by restarting the client.

fredh
Send message
Joined: 22 Feb 15
Posts: 1
Credit: 48,104
RAC: 0
Message 2114 - Posted: 21 Mar 2015, 14:00:49 UTC

FIND should really revise their time estimates. I currently have 38 WU on one of my machines, each has a time estimate of around 8 and a half minutes. Most have been running 2 to 3 hours or more. This isn't fair to the other BOINC projects

TJ
Send message
Joined: 14 Apr 13
Posts: 17
Credit: 340,802
RAC: 0
Message 2115 - Posted: 21 Mar 2015, 14:17:40 UTC - in response to Message 2112.

Is the scheduler able to calculate a better estimate of the runtime from the inputs to be used? That's what some other projects with highly variable runtime use.

That will be hard I thinks as the run times are so variable.
The "problem" I see is with the very short run time estimates that a lot of WU's are downloaded. And it may take more than a day to get a queue of 30 or 40 WU's cleared.
____________
Greetings from,
TJ

Profile robertmiles
Send message
Joined: 22 Apr 13
Posts: 62
Credit: 436,328
RAC: 214
Message 2118 - Posted: 22 Mar 2015, 5:51:06 UTC - in response to Message 2115.
Last modified: 22 Mar 2015, 5:53:41 UTC

Is the scheduler able to calculate a better estimate of the runtime from the inputs to be used? That's what some other projects with highly variable runtime use.

That will be hard I thinks as the run times are so variable.
The "problem" I see is with the very short run time estimates that a lot of WU's are downloaded. And it may take more than a day to get a queue of 30 or 40 WU's cleared.


Would it be practical for the server to calculate a runtime estimate based on the size of sizes of some or all of the input files? That might at least be better than assigning the same runtime estimate to every workunit.

Also, could the server allow the users to set a limit on how many workunits each computer can have downloaded but not returned yet?

Profile Steve Dodd
Send message
Joined: 8 Nov 13
Posts: 1
Credit: 40,651
RAC: 0
Message 2120 - Posted: 22 Mar 2015, 15:28:42 UTC
Last modified: 22 Mar 2015, 15:29:18 UTC

You want long runtimes :)? 10+ hours on an old Xeon machine (1.66GHz) And they were all estimated @4:30 min.
____________

Ananas
Send message
Joined: 8 Jun 13
Posts: 128
Credit: 1,947,833
RAC: 0
Message 2122 - Posted: 22 Mar 2015, 16:48:16 UTC - in response to Message 2120.
Last modified: 22 Mar 2015, 16:56:53 UTC

... old Xeon machine (1.66GHz) ...

That's not an old Xeon, THIS is an old xeon (NetBurst architecture, worst of Intel). I still have it here but it isn't running/crunching anymore (one 486DX2, a P1 and a bunch of Tualatins, XP and MP boards too, nice home for dust bunnies).

Indigo Blue
Send message
Joined: 28 Dec 12
Posts: 78
Credit: 186,251
RAC: 0
Message 2123 - Posted: 22 Mar 2015, 17:16:54 UTC

I have 3 WU's coming up to the 2 Hour mark, been showing 100% complete for the last hour with no remaining time left ? Will they ever end ?

Aurel
Send message
Joined: 10 Feb 13
Posts: 84
Credit: 275,734
RAC: 0
Message 2124 - Posted: 22 Mar 2015, 18:02:02 UTC - in response to Message 2123.

I have 3 WU's coming up to the 2 Hour mark, been showing 100% complete for the last hour with no remaining time left ? Will they ever end ?


Abort them. Threre odd.

Like this unit: http://findah.ucd.ie/result.php?resultid=22988538

Ananas
Send message
Joined: 8 Jun 13
Posts: 128
Credit: 1,947,833
RAC: 0
Message 2125 - Posted: 22 Mar 2015, 18:07:33 UTC - in response to Message 2123.
Last modified: 22 Mar 2015, 18:13:04 UTC

I have 3 WU's coming up to the 2 Hour mark, been showing 100% complete for the last hour with no remaining time left ? Will they ever end ?

There is a high percentage of long running results this time and I had to abort a few myself because I had a cache overload (unable to crunch them in time) - but the long running ones are not junk. As long as they still consume CPU time, they are doing something useful.

p.s.: It's normal for the current Vina project application that the percentage goes to 100% much too quick, this is caused by the wrong runtime estimates and not a sign for failing results.

1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Latest workunits have bad runtime esrimates


Main page · Your account · Message boards


Copyright © 2017 Dr Anthony Chubb