[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] configuring a GPU machine

Hi Tobias,

Did you see this in the recipes section on the wiki?


I am also a greenhorn but I am about to head down this path (have a couple of servers with GPU's I would like find a better way to advertise and utlize. Currently I am basically using the machine name to target the gpu machines and there is no contention.


-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Tobias Beisel
Sent: Tuesday, July 09, 2013 11:11 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] configuring a GPU machine


I am new to condor and have problems configuring my machine. 

I'm using HTCondor V8.0.0 on a Ubuntu 12.04 machine with 16 CPUs (8 Cores with Hyperthreading) and 4 NVIDIA Tesla C2070 GPUs. I would like to configure condor to 1. use each GPU combined with 1 CPU as a slot and 2. each 4 of the remaining 12 CPU as a single slot. 

I managed to provide the slots for GPUs using the following configuration: 

MACHINE_RESOURCE_actuator = 20

SLOT_TYPE_1 = gpu=1, cpu=1, actuator=1

condor_status shows these slots correctly. 

Unfortunately I can not get the remaining CPUs to be configured as slots. The following does not show any slots: 

SLOT_TYPE_2 = cpu=1, actuator=1


SLOT_TYPE_2 = cpu=4, actuator=1

I tried several other configurations I found from examples, but in best case could manage one slot type to be shown. 

What would I need to change to make it work?

Assuming the above would work, I'd have two more questions on how to create job submission files: 

1. As configured, the above mentioned GPU slots show 'Arch x64_64' and so would the CPU slots. How can I choose a different executable based on the provided architecture then, as proposed in chapter 2.5.6 (heterogeneous submit) by using the $$(Arch) macro?
2. Is it also possible to choose different arguments to the executables based on the provided 'Arch'? This would  allow to choose the executed code within a single application binary, i.e., figuratively using a 'fat' binary. 

Thank you for your help,

HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: