Wu last 3 hours instead of 14 min.
log in

Advanced search

Message boards : Number crunching : Wu last 3 hours instead of 14 min.

1 · 2 · Next
Author Message
Cees Nolet
Send message
Joined: 2 Sep 12
Posts: 3
Credit: 60,984
RAC: 19
Message 928 - Posted: 26 Aug 2013, 16:57:19 UTC

I have reset the workload. It takes a days work against 14 min per WU. So I got more then 150 WU next to the other work I do for other projects.
But they take 3 hours to crunch.
With that load it doesn't fit in the expected time. So I have reset the work and will take new when this problem has been solved.

Good to know; I don't care how much points I get for a WU. But it has to load the proper volume of work.

mikey
Avatar
Send message
Joined: 21 Jul 13
Posts: 68
Credit: 1,289,610
RAC: 836
Message 931 - Posted: 27 Aug 2013, 11:25:14 UTC - in response to Message 928.

I have reset the workload. It takes a days work against 14 min per WU. So I got more then 150 WU next to the other work I do for other projects.
But they take 3 hours to crunch.
With that load it doesn't fit in the expected time. So I have reset the work and will take new when this problem has been solved.

Good to know; I don't care how much points I get for a WU. But it has to load the proper volume of work.


They are having problems with the new units, some are taking the normal amount of time and some are going very long.

Ant Chubb
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 18 Jul 12
Posts: 118
Credit: 1,019,875
RAC: 0
Message 933 - Posted: 28 Aug 2013, 13:46:54 UTC - in response to Message 931.

This is weird. Please send the command lines for the very long WUs and I'll try to figure out what's happening. We've already reduced the search space, and filtered the ligands down to an acceptable number of rotatable bonds. So it's weird that things are going so far off center now.

ciao,
Ant

mikey
Avatar
Send message
Joined: 21 Jul 13
Posts: 68
Credit: 1,289,610
RAC: 836
Message 934 - Posted: 29 Aug 2013, 12:05:03 UTC - in response to Message 933.

This is weird. Please send the command lines for the very long WUs and I'll try to figure out what's happening. We've already reduced the search space, and filtered the ligands down to an acceptable number of rotatable bonds. So it's weird that things are going so far off center now.

ciao,
Ant


Here is a link to some:
http://boinc.ucd.ie/fmah/result.php?resultid=136763587
http://boinc.ucd.ie/fmah/result.php?resultid=136763625
http://boinc.ucd.ie/fmah/result.php?resultid=136908353
http://boinc.ucd.ie/fmah/result.php?resultid=136908357

Each of those took in the 75 minute or so range, so NOT the really long ones by any means.

This one though took in the 177 minute range:
http://boinc.ucd.ie/fmah/result.php?resultid=136666451

Brian Priebe
Send message
Joined: 12 Sep 12
Posts: 4
Credit: 2,807,990
RAC: 0
Message 938 - Posted: 30 Aug 2013, 8:22:14 UTC - in response to Message 933.

FYI, a few more from the last 2 days that took >2hours on XEON E5645, E5520, or E5540:

http://boinc.ucd.ie/fmah/result.php?resultid=137103130 (156min)
http://boinc.ucd.ie/fmah/result.php?resultid=137103052 (215min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081381 (149min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081377 (160min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081376 (160min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081373 (137min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081368 (183min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081367 (191min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081366 (186min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081362 (157min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081361 (138min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081360 (126min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081358 (177min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081354 (193min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081353 (171min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081347 (146min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081346 (159min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081345 (198min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081342 (182min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081338 (155min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081336 (152min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081325 (197min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081312 (146min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081305 (188min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081304 (173min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081294 (180min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081284 (178min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081279 (158min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081274 (126min)
http://boinc.ucd.ie/fmah/result.php?resultid=137081272 (138min)
http://boinc.ucd.ie/fmah/result.php?resultid=137064986 (121min)
http://boinc.ucd.ie/fmah/result.php?resultid=137064834 (123min)
http://boinc.ucd.ie/fmah/result.php?resultid=137064662 (123min)
http://boinc.ucd.ie/fmah/result.php?resultid=137048991 (135min)
http://boinc.ucd.ie/fmah/result.php?resultid=136746790 (124min)

John C MacAlister
Send message
Joined: 19 Nov 12
Posts: 28
Credit: 371,441
RAC: 0
Message 944 - Posted: 4 Sep 2013, 13:22:09 UTC

Hey, Guys:

Your running times are really off the scale.....far too long. Any comments?

Thanks,

John

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 62
Credit: 4,116,833
RAC: 1,127
Message 2243 - Posted: 6 Apr 2015, 14:08:17 UTC

Hah, and to think we were unhappy with 3 hour long work units.

I've got one on its 21st hour, today. ><

adrianxw
Send message
Joined: 26 Aug 12
Posts: 10
Credit: 70,434
RAC: 716
Message 2244 - Posted: 8 Apr 2015, 7:38:43 UTC
Last modified: 8 Apr 2015, 7:39:31 UTC

The wu I mentioned elsewhere, vina_361594_83593_25p_1, is still running here. It has now used 203:20:15 of crunch time. Still using CPU, but I've no way of seeing what, if anything it is doing/acheiving.

Ant Chubb
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 18 Jul 12
Posts: 118
Credit: 1,019,875
RAC: 0
Message 2246 - Posted: 8 Apr 2015, 11:04:28 UTC - in response to Message 2244.

Please cancel this WU. They should not run for more than 2-3 hours - tops. There is probably something wrong with that ligand file - we'll look into it and remove it from our lists.

Thanks for the feedback.

ciao,
Ant

adrianxw
Send message
Joined: 26 Aug 12
Posts: 10
Credit: 70,434
RAC: 716
Message 2247 - Posted: 8 Apr 2015, 11:47:15 UTC
Last modified: 8 Apr 2015, 11:58:48 UTC

Aborted, but somewhat annoyed.

27086101 22071990 2647 28 Mar 2015, 23:22:59 UTC 8 Apr 2015, 11:35:07 UTC Aborted by user 745,744.52 732,102.50 --- Autodock Vina v1.01

You could have said that when I reported the unit at 37 hours, 57 hours, 80 hours, 115 hours, or 140 hours.

Indigo Blue
Send message
Joined: 28 Dec 12
Posts: 78
Credit: 186,251
RAC: 0
Message 2253 - Posted: 8 Apr 2015, 20:22:32 UTC - in response to Message 2247.

Aborted, but somewhat annoyed.

27086101 22071990 2647 28 Mar 2015, 23:22:59 UTC 8 Apr 2015, 11:35:07 UTC Aborted by user 745,744.52 732,102.50 --- Autodock Vina v1.01

You could have said that when I reported the unit at 37 hours, 57 hours, 80 hours, 115 hours, or 140 hours.


Wow what a lot of time wasted for nothing.

Leopards
Send message
Joined: 29 Jun 13
Posts: 19
Credit: 1,130,022
RAC: 484
Message 2254 - Posted: 9 Apr 2015, 2:10:13 UTC - in response to Message 2253.

Gee! 14hrs and aborted by server! Who knew! I've had a bunch between 2.5 and 5 hrs before and they completed with 100 credit. I wasn't adverse to only getting 100 credit for the BIG ones as long as they were legitimate WUs and were helping the project! Worked out anyway, as I was waiting to reboot after the last kernel update.

adrianxw
Send message
Joined: 26 Aug 12
Posts: 10
Credit: 70,434
RAC: 716
Message 2255 - Posted: 10 Apr 2015, 9:53:51 UTC

>>> Wow what a lot of time wasted for nothing.

Yes, as I said, somewhat annoyed. Reported numerous times, no action. Project is about neglected disease, seems worked by neglected crunchers.

Leopards
Send message
Joined: 29 Jun 13
Posts: 19
Credit: 1,130,022
RAC: 484
Message 2256 - Posted: 11 Apr 2015, 4:36:50 UTC - in response to Message 2255.

Wasn't all that wasted, since when I reported how long certain ones were taking those got canceled rather quickly in batches. I had them disappearing in wholesale lots before I got to work on them, so saved lots of time!

Profile Michael H.W. Weber
Send message
Joined: 5 Mar 15
Posts: 17
Credit: 303,100
RAC: 0
Message 2257 - Posted: 11 Apr 2015, 8:57:38 UTC - in response to Message 2246.

The wu I mentioned elsewhere, vina_361594_83593_25p_1, is still running here. It has now used 203:20:15 of crunch time. Still using CPU, but I've no way of seeing what, if anything it is doing/acheiving.

Please cancel this WU. They should not run for more than 2-3 hours - tops.

On what system?

This is nothing unusual:
8:31:10 for WU vina_363777_83604_25p_2 still taking CPU time.
6:39:06 for WU vina_363777_83622_25p_2 still taking CPU time.
Both on an AMD Phenom II 955BE.

There is probably something wrong with that ligand file - we'll look into it and remove it from our lists.

...there seems to be a lot wrong with a lot of ligand files then.

Michael.
____________

Profile Michael H.W. Weber
Send message
Joined: 5 Mar 15
Posts: 17
Credit: 303,100
RAC: 0
Message 2258 - Posted: 11 Apr 2015, 10:30:32 UTC - in response to Message 2257.
Last modified: 11 Apr 2015, 10:32:22 UTC

This is nothing unusual:
8:31:10 for WU vina_363777_83604_25p_2 still taking CPU time.
Both on an AMD Phenom II 955BE.

These long runners do complete and validate as detailed below:

11.04.2015 12:06:47 | FiND@Home | Computation for task vina_363777_83604_25p_2 finished 11.04.2015 12:06:49 | FiND@Home | Started upload of vina_363777_83604_25p_2_0 11.04.2015 12:06:50 | FiND@Home | Finished upload of vina_363777_83604_25p_2_0 11.04.2015 12:06:54 | FiND@Home | Sending scheduler request: To report completed tasks. 11.04.2015 12:06:54 | FiND@Home | Reporting 1 completed tasks

...and the WU validated: http://findah.ucd.ie/workunit.php?wuid=25200240

Paket 25200240 Michael H.W. Weber · Abmelden Name vina_363777_83604_25p Anwendung Autodock Vina erstellt 27 Mar 2015, 16:09:09 UTC autorisiertes Ergebnis 28690580 gewährte Punkte 100.00 Mindestanzahl 1 Anfängliche Kopien 1 max # von Fehler/Gesamt/Erfolg Aufgaben 1, 5, 1 Aufgabe anklicken für Einzelheiten Computer Gesendet Meldezeit oder Ablaufdatum Erklärung Status Laufzeit (sek) CPU Zeit (sek) Punkte Anwendung 26307236 44459 27 Mar 2015, 16:09:30 UTC 3 Apr 2015, 16:10:17 UTC Nicht vor Ablaufdatum gestartet - abgebrochen 0.00 0.00 --- Autodock Vina v1.01 28333868 58533 3 Apr 2015, 16:09:39 UTC 10 Apr 2015, 16:09:39 UTC Zeitüberschreitung - keine Antwort 0.00 0.00 --- Autodock Vina v1.01 28690580 73852 10 Apr 2015, 16:09:50 UTC 11 Apr 2015, 10:06:50 UTC Fertig und Bestätigt 34,965.37 33,620.62 100.00 Autodock Vina v1.01

Michael.
____________

Ananas
Send message
Joined: 8 Jun 13
Posts: 128
Credit: 1,947,833
RAC: 0
Message 2259 - Posted: 11 Apr 2015, 10:44:05 UTC - in response to Message 2258.

DB00290_54, actually the whole DB00290_xx series, works perfectly fine, they are just all larger then 14k with usually more than 40 torsions and more than 120 atoms.

The only problem is that there has been a plan to limit this batch to ligands with no more than 20 torsions and this plan somehow hasn't worked, not in this batch and not in the previous one.

Despite of that, the workunit and the result are perfectly fine and the runtime isn't unusual with these parameters.

Cees Nolet
Send message
Joined: 2 Sep 12
Posts: 3
Credit: 60,984
RAC: 19
Message 2269 - Posted: 11 Apr 2015, 19:28:51 UTC

So after 1,5 years the problem isn't solved. I still have lots of WU that says the time to crunch one is 6 min and 1 sec. So in one load you get way to much WU's with a take-in for 24H. If one WU is taking 5 hours to crunch and I got 50 WU's at the take-in, the take-in is 50 times 5 hours is 250 hours. Even for my 6 CPU's that's way to much.

So the problem is not that the WU's takes that long to crunch. The problem is the intake exceeds the workload of 24 hours. For 6 CPU's it should take not more then 24 WU's with an average crunchtime of 5 hours.
It has to middle the crunchtime of a WU at the rate of 10 WU and work out how much work can be taken in the next time.

Manuel
Send message
Joined: 13 Feb 13
Posts: 1
Credit: 55,387
RAC: 0
Message 2272 - Posted: 12 Apr 2015, 12:40:27 UTC

Come on, Volunteers! I'm sure they're going to solve it by the next 60M batch. For now, let's crack those 30000 30h-per-WU's left. I have had some 20h WUs myself and it's not funny, but that's volunteering!! Let's keep crunching!!

Cheers!

Robert Gammon
Send message
Joined: 16 Jan 13
Posts: 2
Credit: 531,714
RAC: 689
Message 2274 - Posted: 13 Apr 2015, 16:03:47 UTC - in response to Message 2272.

I have at least 11 in the last 24 hours that have run for 4 hours or more with 100% completed. Aborted and in my user tasks.

Irritating

1 · 2 · Next

Message boards : Number crunching : Wu last 3 hours instead of 14 min.


Main page · Your account · Message boards


Copyright © 2017 Dr Anthony Chubb