[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] matching gpu devicename



These are effectively the same device? It's annoying that they have separate names then. You should complain to NVidia... ;)

But seriously, your best course for now would be to condor_gpu_discovery program in a script that re-writes it's output to change the various CUDA[n]DeviceName attributes to a single CUDADeviceName = "Tesla K10" attribute.

Or tell your users to target the GPUs based on capability rather than by name.

If the '.' in the name a standard pattern for GPU names and if we can reasonably assume that everything after the first '.' in the name is noise, then condor_gpu_discovery tool could be changed to truncate the names to everything before the first '.' before publishing the name.

-tj

On 6/1/2015 8:24 AM, Michael Di Domenico wrote:
How does one match a slot on a machine where the gpu's are mixed by devicename?

for example

if i have a machine with 6 slots where

slot1-4 are
CUDA0DeviceName = Tesla K10.G2.8GB
CUDA1DeviceName = Tesla K10.G2.8GB
CUDA2DeviceName = Tesla K10.G2.8GB
CUDA3DeviceName = Tesla K10.G2.8GB
slot5-6 are
CUDA4DeviceName = Tesla K10.G1.8GB
CUDA5DeviceName = Tesla K10.G1.8GB

because i have multiple GPU device name types in the machines there is
no global CUDADeviceName classad.

if i do

condor_status -constraint 'regexp("Tesla K10.G2.8GB",
CUDA0DeviceName)' everything works out fine

but is there a way to find all the cards in the pool regardless of
which CUDA<num> they are?

this will specifically apply to us because we have users that write
cuda code optimized for a specific gpu and they'll want to put in
their requirements expression regexp("Tesla K10", CUDADeviceName)
which doesn't match anything currently

the difference between the two cards above is airflow direction , not
technical, so they're really the same card
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/