[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [CondorLIGO] Problem with parallel universe jobs and dynamic partitioning



On Tue, Sep 16, 2014 at 11:01:15AM -0500, Todd L Miller wrote:
> >In that case, it should be ensured that parallel jobs may use already
> >existing dynamic slots if they match the requirements. Something that
> >apparently doesn't happen now.
> 
> 	Well, yes.  Hence the need for startd rank on dynamic slots.
> 
> >I understand that, but I don't understand why it has be done by a central
> >task, not a daemon running locally on the machine
> 
> 	The "pool-wide" part is important -- how would any individual
> startd be able to make policy decisions about how much less work it
> should do now in order to accept a wider variety of jobs later?

By returning to an initial (unpartitioned) state, or something as
close as possible. Yes, this can only consider idle dyn slots, but
that's what a defrag would be able to do if preemption is forbidden.

> >Okay, so we've reached a catch-22 if there are only parallel jobs
> >on a pool?
> 
> 	The queued idle jobs would have to wait for one (or or more) of the
> running jobs to finish.

And slots to become available, with or without defragmenting/reuniting.

> >I've been trying to do this using NEGOTIATOR_PRE_JOB_RANK, but
> >obviously failed (at least in the ParU case).
> 
> 	IIRC, NEGOTIATOR_PRE_JOB_RANK can not cause preemption.

It's not meant to (at least in our setup), but it should act the
same for both universes. As long as re-matching idle (even: claimed
by the same user) dyn slots doesn't work as expected, I cannot force
any allocation scheme as long as ParU jobs are involved.

> >I cannot see the difference of a dynamically created slot and a
> >static one when it comes to matching (should there be any?)...
> 
> 	AFAIK, there isn't.  The problem (as I understand it) isn't the
> matching -- it's after the match, when the question comes up about
> what to do about the job that's already using the matched resoures.

Hm, as these details seem to be properly hidden from my view at the
moment, which log-level settings would you recommend to get a better
insight into these processes?


Thanks,
 Steffen