[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] configuring a GPU machine



Hi everybody, 

after finding another example on the users list, I tried the following 

SLOT_TYPE_1 = cpus=100%,auto
SLOT_TYPE_1_PARTITIONABLE = TRUE
NUM_SLOTS_TYPE_1 = 1
MACHINE_RESOURCE_NAMES = GPUS
MACHINE_RESOURCE_GPUS = 4

Unfortunately this only shows one slot: 

slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.090 48295  0+00:00:04


Isn't there anybody who has a multi-GPU multi-CPU system running with condor and could provide me with a config-file example? 

Best regards, 
Tobias


> Hi Eddie, 
> 
> Thank you for your advice. 
> 
> Yes, I also tried both the static and the automatic configuration. For the latter I tried the output of the (not officially supported) condorgpu project. In both cases only CPU slots were shown. 
> 
> SLOT1_HAS_GPU=TRUE
> SLOT1_GPU_DEV=0
> ...
> SLOT4_HAS_GPU=TRUE
> SLOT4_GPU_DEV=3
> STARTD_ATTRS=HAS_GPU,GPU_DEV
> 
> Output: 
> slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.100 3018  0+00:00:04
> ...
> slot16@xxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 3018  0+00:00:23
> 
> 
> I also tried a configuration that I found on the users list that actually configured the same hardware combination (4 GPUs, 8 CPUs):
> 
> NUM_CPUS = 8
> 
> NUM_GPUS = 4
> HasGpus = TRUE
> 
> START = (((SlotId < 5) && $(SLOT1_START)) || ((SlotId > 4) && $(SLOT2_START))) || FALSE
> 
> SUSPEND        = False
> CONTINUE       = True
> PREEMPT        = False
> KILL           = False
> WANT_SUSPEND   = False
> WANT_VACATE    = False
> 
> SLOT1_START = (TARGET.NeedGpu =?= TRUE)
> SLOT2_START = (TARGET.NeedGpu =?= FALSE)
> 
> This again only shows the CPUs (8 in this case).
> 
> slot1@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle      0.080 6036  0+00:05:04
> ...
> slot8@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle      0.000 6036  0+00:05:03
> 
> 
> 
> Btw., the configuration mentioned in my previous mail shows the following status:
> 
> slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.030 12073  0+00:00:04
> ...
> slot4@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 12073  0+00:00:07
> 
> 
> So, currently I either can define slots for the GPUs or the CPUs, not both at the same time and also not the combined approach as intended. 
> 
> Regards, 
> Tobias 
> 
> 
>> Hi Tobias,
>> 
>> Did you see this in the recipes section on the wiki?
>> 
>> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToManageGpus
>> 
>> I am also a greenhorn but I am about to head down this path (have a couple of servers with GPU's I would like find a better way to advertise and utlize. Currently I am basically using the machine name to target the gpu machines and there is no contention.
>> 
>> Eddie
>> 
>> -----Original Message-----
>> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Tobias Beisel
>> Sent: Tuesday, July 09, 2013 11:11 AM
>> To: htcondor-users@xxxxxxxxxxx
>> Subject: [HTCondor-users] configuring a GPU machine
>> 
>> Hi, 
>> 
>> I am new to condor and have problems configuring my machine. 
>> 
>> I'm using HTCondor V8.0.0 on a Ubuntu 12.04 machine with 16 CPUs (8 Cores with Hyperthreading) and 4 NVIDIA Tesla C2070 GPUs. I would like to configure condor to 1. use each GPU combined with 1 CPU as a slot and 2. each 4 of the remaining 12 CPU as a single slot. 
>> 
>> I managed to provide the slots for GPUs using the following configuration: 
>> 
>> MACHINE_RESOURCE_gpu = 4
>> MACHINE_RESOURCE_actuator = 20
>> 
>> SLOT_TYPE_1 = gpu=1, cpu=1, actuator=1
>> NUM_SLOTS_TYPE_1 = 4
>> 
>> condor_status shows these slots correctly. 
>> 
>> Unfortunately I can not get the remaining CPUs to be configured as slots. The following does not show any slots: 
>> 
>> SLOT_TYPE_2 = cpu=1, actuator=1
>> NUM_SLOTS_TYPE_2 = 12
>> 
>> or 
>> 
>> SLOT_TYPE_2 = cpu=4, actuator=1
>> NUM_SLOTS_TYPE_2 = 3
>> 
>> I tried several other configurations I found from examples, but in best case could manage one slot type to be shown. 
>> 
>> What would I need to change to make it work?
>> 
>> 
>> Assuming the above would work, I'd have two more questions on how to create job submission files: 
>> 
>> 1. As configured, the above mentioned GPU slots show 'Arch x64_64' and so would the CPU slots. How can I choose a different executable based on the provided architecture then, as proposed in chapter 2.5.6 (heterogeneous submit) by using the $$(Arch) macro?
>> 2. Is it also possible to choose different arguments to the executables based on the provided 'Arch'? This would  allow to choose the executed code within a single application binary, i.e., figuratively using a 'fat' binary. 
>> 
>> 
>> Thank you for your help,
>> Tobias
>> 
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/