[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Trying to set-up DedicatedScheduler for parallel universe, but not preempting serial jobs



> On Dec 6, 2016, at 9:47 AM, Carsten Aulbert <carsten.aulbert@xxxxxxxxxx> wrote:
> 
> Hi all
> 
> I'm currently stuck with this problem and don't really know where to
> continue.
> 
> I've set-up two execute nodes with four cores each to only run my jobs (
> 
> START = Owner == "carsten"
> 
> ). The set-up is with fully partitionable slots, i.e.
> 
> SLOT_TYPE_1 = ram=15023, swap=0%
> SLOT_TYPE_1_PARTITIONABLE = True
> 
> In the absence of any jobs running on the execute machines, I get both via
> 
> machine_count = 2
> request_cpus = 4
> 
> So far so good. To allow both parallel and serial jobs I followed the
> second policy of
> https://research.cs.wisc.edu/htcondor/manual/v8.4/3_12Setting_Up.html#SECTION004128000000000000000
> with the only exception for START described above and
> 
> PREEMPT = Scheduler =!= $(DedicatedScheduler)
> 
> after both PREEMPT = True as well as PREEMPT = false did not really work
> out:
> 
> I started four single core jobs on one of the execute nodes and
> hoped/expected HTCondor to preempt those to launch the parallel job, but
> so far to no avail. The empty second execute nodes is fully matched but
> no preemption occurs on the first one.
> 
> Other things tried so far:
> 
> * setting ALLOW_PSLOT_PREEMPTION = True on negotiatior, schedd and
> execute node
> * reducing various timers to check if these have any say in preemption,
> but so far blanks only:
> 
> MaxJobRetirementTime = 1
> MachineMaxVacateTime = 10
> CLAIM_WORKLIFE = 60
> 
> As I'm running out of options, anyone succeeded with such a set-up?


For this setup to work, you will need ALLOW_PSLOT_PREEMPTION=True, since you want one 4-core request to preempt four 1-core requests.

It appears that pslot preemption doesnât work for parallel jobs, but that would be easy to fix. I will work on getting this in for a future release.

With the current release, setting ALLOW_PSLOT_PREEMPTION=False, your parallel job could request 8 single-core allocations, like this:
  machine_count = 8
  request_cpus = 1

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project