[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] assigning correct GPU on multi-GPU hosts



Hi, Henning.

This is a known problem, 

There is currently no way for the Negotiator to communicate the choice of a specific GPU to the Startd. 
So the Startd will always use the first available GPU when creating a slot that requires a GPU. 

You either need to have a single type of GPU in each Execute node, or use Static slots,
or have a separate partitionable slot for each type of GPU device. 

We are currently in the planning stages for fixing this problem, but the fix will very likely involve changes
to the Negotiator, Schedd and the Startd, and there are difficult forward and backward compatibility
issues that occur for these sort of changes.

-tj


-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Henning Fehrmann
Sent: Wednesday, May 27, 2020 7:18 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] assigning correct GPU on multi-GPU hosts

Hi,

we are using HTCondor v 8.8.9. We have few nodes which host
different kinds of GPUs. If I start interactive jobs I specify the GPU
of interest in the submit file in the 'Requirements' line. E.g.:

Requirements = (TARGET.CUDA1DeviceName == "GeForce GTX 1660 Ti")

The Negotiator assigns a correct node hosting my chosen GPU for this
job. If this is the first job running on this node it seems that
automatically the setting is as follows:

CUDA_VISIBLE_DEVICES=0
_CONDOR_AssignedGPUs=CUDA0

regardless which GPU has been chosen.

In my example this actually points to the other GPU.

Is there a way for the submitter to get the correct CUDA_VISIBLE_DEVICES
and _CONDOR_AssignedGPUs variables on the startd host?

I haven't tested it with non-interactive jobs though.

Cheers,
Henning
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/