[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GPU and CPU ressources managed seperately by Condor



Hi,

did you get any answer yet? I have managed to get this working to a
certain extend, namely with the following slot based policy (I have 8
cores and 4 GPUs per host). In the job script one has to add a line

+NeedGpu = TRUE

for GPU jobs and 

+NeedGpu = FALSE

for cpu-only jobs. In addition you probably need to add HasGpus to the
requirements if you also have hosts that don't offer GPUs, otherwise
your GPU jobs will be started also on non-GPU hosts. I automatically
assume that for every GPU one CPU is used. 

One problem of this is that in principle a job in the non-GPU queue
can use the GPU's as well. I did not find a way yet to prevent this
from happening. But at least I run the GPU's in "COMPUTE exclusive
mode" (see nvidia-smi man page), such that only one process may use
the GPU at once.

Hope this helps

Carsten

here my condor_config.local file:

##  What machine is your central manager?

CONDOR_HOST = $(FULL_HOSTNAME)

## Pool's short description

COLLECTOR_NAME = Personal Condor at $(FULL_HOSTNAME)

NUM_CPUS = 8

NUM_GPUS = 4
HasGpus = TRUE

START = (((SlotId < 5) && $(SLOT1_START)) || ((SlotId > 4) && $(SLOT2_START))) || FALSE

SUSPEND        = False
CONTINUE       = True
PREEMPT        = False
KILL           = False
WANT_SUSPEND   = False
WANT_VACATE    = False
#RANK           = Scheduler =?= $(DedicatedScheduler)

SLOT1_START = (TARGET.NeedGpu =?= TRUE)
SLOT2_START = (TARGET.NeedGpu =?= FALSE)

##  This macro determines what daemons the condor_master will start
    and keep its watchful eyes on.
##  The list is a comma or space separated list of subsystem names

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD

##  Sets how often the condor_negotiator starts a negotiation cycle. 
##  It is defined in seconds and defaults to 60 (1 minute). 

NEGOTIATOR_INTERVAL = 20

##  Disable UID_DOMAIN check when submit a job

TRUST_UID_DOMAIN = TRUE

STARTD_ATTRS = $(STARTD_ATTRS) , NUM_GPUS, HasGpus

--
Carsten Urbach
e-mail: curbach@xxxxxx
        urbach@xxxxxxxxxxxxxxxxx
Fon   : +49 (0) 228 73 2379
skype : carsten.urbach
URL: http://www.carsten-urbach.eu

Attachment: smime.p7s
Description: S/MIME cryptographic signature