[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Whole System Scheduling



So to distill my troubles...

Using Dan's whole system recipe whith the whole-system job suspending
untill the system clears:

Basicly works unless it takes longer than
MaxSuspendTime for the other slots to clear.  At the end of
MaxSuspendTime I'm seeing overlapping runtime of anywhere from a
couple of seconds, OK I guess, to 10 minutes which is not.  How long
shoudl you wait for resources to free up is a question for the ages,
but once you decide not to seems like you should go right to Vacating
do not pass Busy


Attempting to kick everything out when a whole-system job starts,
avoids the question of waiting:

SUSPEND = (SlotID =!= 1 && Slot1_RequiresWholeMachine =?= True) 
PREEMPT = (SlotID =!= 1 && Slot1_RequiresWholeMachine =?= True )

Seems simple enough if rude, but again I'm seeing one to three minutes
of overlapping runtime before the other slots clear out. 

While I'd like it to be shorter my main problem is the indeterminance,
is there any way on the execute site to see what slots are busy so I
could write a wrapper that waits until they are free?

Thanks,
-Jon