[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Whole Machine Slot 'stealing' load



Hello,

I used the recipe for the whole machine slots: https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=WholeMachineSlots

We run a configuration where Condor will be preempted for PBS jobs by load.  We have noticed that when the machine is full of PBS jobs and the load is equal to the number of cores, condor will assign load to the whole machine slot, leaving 1 slot idle and unclaimed.  This is undesirable behavior as it will oversubscribe a node.

I guess I don't understand how condor assigns slots to cpus.  How it assigns load to each slot?

Here are the suspend and continue lines:

START = (( (KeyboardIdle > 15 * 60) && ( ((LoadAvg - CondorLoadAvg) <= 0.1) || (State != "Unclaimed" && State != "Owner")) )) && ( (TARGET.RequiresWholeMachine =!= True && MY.IsWholeMachineSlot == False && eval(strcat("Slot",(8+1),"_State")) =!= "Claimed") || (TARGET.RequiresWholeMachine =?= True && MY.IsWholeMachineSlot) )

SUSPEND = (( (KeyboardIdle < 60) || ( (CpuBusyTime > 2 * 60 ) ) )) || ( MY.IsWholeMachineSlot =!= True && eval(strcat("Slot",(8+1),"_State")) =?= "Claimed" )

PREEMPT = (( ((Activity == "Suspended") && ((CurrentTime - EnteredCurrentActivity) > 10 * 60)) || (SUSPEND && (WANT_SUSPEND == False)) )) || (ImageSize/1024 > (Memory*0.8))

CONTINUE = ( ((( (KeyboardIdle < 60) || ( (CpuBusyTime > 2 * 60 ) ) )) || ( MY.IsWholeMachineSlot =!= True && eval(strcat("Slot",(8+1),"_State")) =?= "Claimed" )) =!= True ) && (( ((LoadAvg - CondorLoadAvg) <= 0.1) && ((CurrentTime - EnteredCurrentActivity) > 10) && (KeyboardIdle > 5 * 60) ))

Let me know if you need any more configs.

Derek Weitzel
Graduate Research Assistant
University of Nebraska Holland Computing Center