[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Dynamic Slots in Parallel Universe



On 3/9/2018 11:40 AM, Larne Pekowsky wrote:
Hi Todd,

Iâm resurrecting this thread because I think weâre still seeing related problems. ÂOne of our users has a parallel universe job that has been idle for almost a day. ÂThe StartLog on the available nodes seem to indicate that the nodes are held for a wile and then released without ever having enough nodes to start the job

[snip]>
Any suggestions? ÂIf you need any additional information please let me know.

Cheers,

- Larne


Hi Larne,

Look like your schedd is indeed running with Greg's v8.7.7 code patch here
  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6517
so it should be working for you...

Does your condor_config on your central manager include
  ALLOW_PSLOT_PREEMPTION = True
?

And the condor_config on all your execute nodes have a RANK expression that prefers your dedicated scheduler submit machine? (e.g. like the example at http://tinyurl.com/yaolvshk ) ?

If the answer to both of the above questions is yes, then the next step is Greg will likely have more questions for you to get to the bottom of this... After the above patch Greg observed parallel universe jobs working here at UW with partitionable slots, so imagine he will need to figure out what is different at Syracuse...

Thanks
Todd