[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Machine with both CPUs and GPUs under condor



On 01/06/2013 09:11 AM, Tung-Han Hsieh wrote:
Dear All,

We are going to configure a multi-core machine which has several GPUs
under condor. Our machine has 8 cores, and 4 GPUs. We want to configure
our machine with slot1, 2, 3, 4 dedicated for CPU jobs, and slot5, 6, 7, 8
dedicated for GPU jobs. Further, when user specifies +RequiresWholeCPUs,
the job will occupy all CPU slots, i.e., slot1, 2, 3, 4. When user specifies
+RequiresWholeGPUs, the job will occupy all GPU slots, i.e., slot5, 6, 7, 8.
In other words, we don't want a single job to occupy the whole machine.

Here is the local condor configuration we have tried, but it does not work:
(we installed condor-7.9.2)

==========================================================================
SLOT5_HAS_GPU = TRUE
SLOT5_GPU_DEV = 0
SLOT6_HAS_GPU = TRUE
SLOT6_GPU_DEV = 1
SLOT7_HAS_GPU = TRUE
SLOT7_GPU_DEV = 2
SLOT8_HAS_GPU = TRUE
SLOT8_GPU_DEV = 3

START = ($(START)) && \
         ((SlotID == 1 || TARGET.RequiresWholeCPUs    =!= True) && \
          (SlotID == 1 || Slot1_RequiresWholeCPUs     =!= True))
START = ($(START)) || \
         ((SlotID == 5 || TARGET.RequiresWholeGPUs    =!= True) && \
          (SlotID == 5 || Slot5_RequiresWholeGPUs     =!= True))
STARTD_JOB_EXPRS   = $(STARTD_JOB_EXPRS) RequiresWholeCPUs RequiresWholeGPUs
SLOT1_STARTD_EXPRS = RequiresWholeCPUs
SLOT2_STARTD_EXPRS = RequiresWholeCPUs
SLOT3_STARTD_EXPRS = RequiresWholeCPUs
SLOT4_STARTD_EXPRS = RequiresWholeCPUs
SLOT5_STARTD_EXPRS = RequiresWholeGPUs
SLOT6_STARTD_EXPRS = RequiresWholeGPUs
SLOT7_STARTD_EXPRS = RequiresWholeGPUs
SLOT8_STARTD_EXPRS = RequiresWholeGPUs
==========================================================================

May I ask what's the correct way to configure condor for our desires ?

Thanks for your reply in advance.


Sincerely,

T.H.Hsieh

http://spinningmatt.wordpress.com/2012/11/19/extensible-machine-resources/

On your startd config -
SLOT_TYPE_1 = cpus=100%,auto
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS_TYPE_1 = 1
MACHINE_RESOURCE_NAMES = GPUS
MACHINE_RESOURCE_GPUS = 4

Your whole CPU job's submit file -
request_cpus = 4

Your whole GPU job's submit file -
request_gpus = 4

Your non-whole GPU job's submit file -
request_gpus = 1

If you need policy enforcement to prevent a single job from using all CPUs, you would write something along the lines of START = RequestCpus < 5.

If you want to make sure that you run no more than 4 CPU-only jobs (i.e. don't let 5 non-GPU jobs run, preventing the use of a GPU), you may be able to do something like START = Cpus > 4 || RequestGpus > 0 (i've not tested this). You can probably also replace 4 with (TotalCpus - TotalGpus).

Best,


matt