[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Submitting a job which occupies the whole machine



Dear All,

I have a question about how to configure condor for users to submit a job which occupies the whole machine?

Suppose that our machine has 4 CPUs in the same node. We configure it for parallel universe:

=======================================================
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxx"
STARTÂÂÂÂÂÂÂ = True
SUSPENDÂÂÂÂÂ = False
CONTINUEÂÂÂÂ = True
PREEMPTÂÂÂÂÂ = False
KILLÂÂÂÂÂÂÂÂ = False
WANT_SUSPEND = False
WANT_VACATEÂ = False
RANKÂÂÂÂÂÂÂÂ = Scheduler =?= $(DedicatedScheduler)
=======================================================

I searched in google and saw that I can add the following settings to allow people to submit either the single CPU job or a parallel job which blocked the whole machine:

==============================================================================
#require that whole-machine jobs only match to Slot1
START = ($(START)) && (TARGET.RequiresWholeMachine =!= TRUE || SlotID == 1)

# have the machine advertise when it is running a whole-machine job
STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS) RequiresWholeMachine

# Export the job expr to all other slots
STARTD_SLOT_EXPRS = RequiresWholeMachine

# require that no single-cpu jobs may start when a whole-machine job is running
START = ($(START)) && (SlotID == 1 || Slot1_RequiresWholeMachine =!= True)
===============================================================================

Then when the slot1 is occupied by a single CPU job, this node will not run the job which blocked has to block the whole node. If the job is blocked a whole node, that node will not run other single CPU jobs. So it looks good.

However, there is a problem. If the node already has a single CPU job running in either slot2, or slot3, or slot4, the job which need to block the whole node still can be submitted into this node. How could we prevent this situation ?

Presumably when there is a job that needs the whole node, condor should check whether this node is completely free (no job running at all) before submitting that job to this node. However, I don't know how to do the configuration.

Is there any suggestions to solve this problem ?

Thank you very much.


T.H.Hsieh