[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] HowTo Speed up condor 6.7.20?



Hello, list!

I would know if it is possible to speed-up the matchmaking in Condor. And if it is, I would know the range of speed-up to expect. 

In a pure-condor vanilla submission, it is quite quick (a few seconds for the first jobs). But with submissions to the grid it can take a while : at least 5 minutes for the first jobs.
I've accelerated the state's information collection with ganglia (at most 10 seconds to collect all the data), and these data are forwarded to condor every 3s.
SchedLog shows some strange things : a lot of lines like : "6/28 10:24:43 (pid:12556) IO: Failed to read packet header". But the other files seems ok.

These considerations let me hope that my tool generates bad or incomplete machine ad's from my grid nodes. I posted a sample Ad in post-scriptum. So if anyone see some weird line, please let me know.

I am a bit frightened to change the internal intervals shown in the config file, because condor works very well in a only-condor manner.

I would appreciate any experience, positive or not.
Thank you very much.

JF.

PS:
Context:
-------------------------
a Condor pool with:
* 2 pure condor (6.7.20) compute nodes (master, startd)
* 2 globus (GT4.0.2) grid nodes, monitored with ganglia
* 1 condor central manager with submitting ability (6.7.20)  (master, collector, negociator, schedd) with a custom perl script to fetch ganglia data into machine's Ad format, and forward it via condor_advertise

sample machine ad I generate:
------------------------------------
MyType = "Machine"
TargetType = "Job"
Requirements = (TARGET.JobUniverse == 9)
Rank = 0.000000
CurrentRank = 0.000000
COLLECTOR_HOST_STRING = "<collector FQDN>"
Name = "grid@<machine FQDN>"
Machine = "<machine FQDN>"
UidDomain = "<machine FQDN>"
FileSystemDomain = "<machine FQDN>"
Arch = "INTEL"
OpSys = "LINUX"
Cpus = 1
resource_name = "gt4 https://<machine FQDN>:8443 Fork"
UpdateSequenceNumber = 41270
StartdIpAddr = "10.24.247.214"
START = TRUE
Activity = Idle
State = Unclaimed
Memory = 192.0
LoadAvg = 0.022000
Activity = Idle



________________________________________________________________________
iBELGIQUE, exprimez-vous !
http://web.ibelgique.com/