[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] GPU machines is "mixed" environment (conditional preemption)



Hi all,

we have a bunch of machines with GPU cards inside. I'm already advertising 
these via the ClassAds, e.g.

STARTD_ATTRS = GPU_DEV, GPU_NAME, GPU_CAPABILITY, GPU_GLOBALMEM_MB, 
GPU_MULTIPROC, GPU_NUMCORES, GPU_CLOCK_GHZ
SLOT1_GPU_DEV=0
SLOT1_GPU_NAME="Tesla C2050"
SLOT1_GPU_CAPABILITY=2.0
SLOT1_GPU_GLOBALMEM_MB=2687
SLOT1_GPU_MULTIPROC=14
SLOT1_GPU_NUMCORES=448
SLOT1_GPU_CLOCK_GHZ=1.15
SLOT2_GPU_DEV=1
SLOT2_GPU_NAME="Tesla C2050"
SLOT2_GPU_CAPABILITY=2.0
SLOT2_GPU_GLOBALMEM_MB=2687
SLOT2_GPU_MULTIPROC=14
SLOT2_GPU_NUMCORES=448
SLOT2_GPU_CLOCK_GHZ=1.15


Disabling vanilla universe jobs on this machine as I do want to use preemption 
(and of course only let myself run jobs here for testing as the rest of the 
pool is a production system).

START = ( Owner =?= "carsten" ) && ( JobUniverse != 5 )

Possible change for the future to allow any universe in conjunction with 
NeedGpu otherwise standard universe only (correct this way?)

START =  ( JobUniverse != 1 ) || ( TARGET.NeedGpu =!= UNDEFINED )

However, now the big question how to address preemption. Essentially, I want 
to ensure that the machine works as a standard compute nodes with multiple 
cores (identically setup as others; sans vanilla universe jobs) in the absence 
of any idle jobs which have "NeedGpu" set.

As soon as there are idle jobs which have this set and there are jobs running 
which have this not set, I'd like to preempt/checkpoint these and let the 
other jobs run - however, I'm not quite sure how to achieve this as I would 
need to access the currently running JobAd (MYRUNNINGJOB refers to this):

PREEMPTION_REQUIREMENTS = ( MYRUNNINGJOB.NeedGpu =?= UNDEFINED && 
TARGET.NeedGpu =!= UNDEFINED) || ( $(StateTimer) > (4 * $(HOUR)) && 
RemoteUserPrio > SubmittorPrio * 1.2 )

Is there any way to achieve this? Which part of the manual need I to look at 
again.

Thanks a lot in advance

Carsten