[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] configuring a GPU machine



Hi Eddie, 

Thank you for your advice. 

Yes, I also tried both the static and the automatic configuration. For the latter I tried the output of the (not officially supported) condorgpu project. In both cases only CPU slots were shown. 

SLOT1_HAS_GPU=TRUE
SLOT1_GPU_DEV=0
...
SLOT4_HAS_GPU=TRUE
SLOT4_GPU_DEV=3
STARTD_ATTRS=HAS_GPU,GPU_DEV

Output: 
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.100 3018  0+00:00:04
...
slot16@xxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 3018  0+00:00:23


I also tried a configuration that I found on the users list that actually configured the same hardware combination (4 GPUs, 8 CPUs):

NUM_CPUS = 8

NUM_GPUS = 4
HasGpus = TRUE

START = (((SlotId < 5) && $(SLOT1_START)) || ((SlotId > 4) && $(SLOT2_START))) || FALSE

SUSPEND        = False
CONTINUE       = True
PREEMPT        = False
KILL           = False
WANT_SUSPEND   = False
WANT_VACATE    = False

SLOT1_START = (TARGET.NeedGpu =?= TRUE)
SLOT2_START = (TARGET.NeedGpu =?= FALSE)

This again only shows the CPUs (8 in this case).

slot1@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle      0.080 6036  0+00:05:04
...
slot8@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle      0.000 6036  0+00:05:03



Btw., the configuration mentioned in my previous mail shows the following status:

slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.030 12073  0+00:00:04
...
slot4@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 12073  0+00:00:07


So, currently I either can define slots for the GPUs or the CPUs, not both at the same time and also not the combined approach as intended. 

Regards, 
Tobias 


> Hi Tobias,
> 
> Did you see this in the recipes section on the wiki?
> 
> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToManageGpus
> 
> I am also a greenhorn but I am about to head down this path (have a couple of servers with GPU's I would like find a better way to advertise and utlize. Currently I am basically using the machine name to target the gpu machines and there is no contention.
> 
> Eddie
> 
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Tobias Beisel
> Sent: Tuesday, July 09, 2013 11:11 AM
> To: htcondor-users@xxxxxxxxxxx
> Subject: [HTCondor-users] configuring a GPU machine
> 
> Hi, 
> 
> I am new to condor and have problems configuring my machine. 
> 
> I'm using HTCondor V8.0.0 on a Ubuntu 12.04 machine with 16 CPUs (8 Cores with Hyperthreading) and 4 NVIDIA Tesla C2070 GPUs. I would like to configure condor to 1. use each GPU combined with 1 CPU as a slot and 2. each 4 of the remaining 12 CPU as a single slot. 
> 
> I managed to provide the slots for GPUs using the following configuration: 
> 
> MACHINE_RESOURCE_gpu = 4
> MACHINE_RESOURCE_actuator = 20
> 
> SLOT_TYPE_1 = gpu=1, cpu=1, actuator=1
> NUM_SLOTS_TYPE_1 = 4
> 
> condor_status shows these slots correctly. 
> 
> Unfortunately I can not get the remaining CPUs to be configured as slots. The following does not show any slots: 
> 
> SLOT_TYPE_2 = cpu=1, actuator=1
> NUM_SLOTS_TYPE_2 = 12
> 
> or 
> 
> SLOT_TYPE_2 = cpu=4, actuator=1
> NUM_SLOTS_TYPE_2 = 3
> 
> I tried several other configurations I found from examples, but in best case could manage one slot type to be shown. 
> 
> What would I need to change to make it work?
> 
> 
> Assuming the above would work, I'd have two more questions on how to create job submission files: 
> 
> 1. As configured, the above mentioned GPU slots show 'Arch x64_64' and so would the CPU slots. How can I choose a different executable based on the provided architecture then, as proposed in chapter 2.5.6 (heterogeneous submit) by using the $$(Arch) macro?
> 2. Is it also possible to choose different arguments to the executables based on the provided 'Arch'? This would  allow to choose the executed code within a single application binary, i.e., figuratively using a 'fat' binary. 
> 
> 
> Thank you for your help,
> Tobias
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/