[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Matching specific GPU model



The expression (CUDACapability >= 1.2), will only match if the slot ad has an attribute named CUDACapability.  
It is NOT matching CUDA[:digit:]Capability. Something else must be going on if the job is matching and running.

HTCondor will only create attributes with CUDA[:digit:] prefix when the value for that attribute is not the same
for all GPUs.   In particular, this looks very weird to me

CUDA2DeviceName = "Tesla K10.G2.8GB"
CUDA3DeviceName = "Tesla K10.G2.8GB"
CUDA4DeviceName = "Tesla K10.G2.8GB"
CUDA5DeviceName = "Tesla K10.G2.8GB"
CUDA2DeviceName = "Tesla K10.G2.8GB"

These are all the same name, but CUDA1DeviceName is missing entirely!!  What you should be getting is a single
attribute

CUDADeviceName = "Tesla K10.G2.8GB"

This indicates to me that something has gone badly wrong in during GPU detection.

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stuart Anderson
Sent: Thursday, March 21, 2019 11:33 AM
To: condor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Matching specific GPU model

How do I specify a specific GPU model in a Condor 8.8 submit file?

The CUDACapability requirement example in the manual works for me,
requirements = (CUDACapability >= 1.2) && $(requirements:True)
http://research.cs.wisc.edu/htcondor/manual/v8.8/SubmittingaJob.html#x17-510002.5.12

However, what I am doing wrong with, 
requirements = regexp("K10", TARGET.CUDADeviceName)

Here is part of condor_status -long from a random GPU node,

CUDA2DeviceName = "Tesla K10.G2.8GB"
CUDA3DeviceName = "Tesla K10.G2.8GB"
CUDA4DeviceName = "Tesla K10.G2.8GB"
CUDA5DeviceName = "Tesla K10.G2.8GB"
CUDA2DeviceName = "Tesla K10.G2.8GB"

CUDA0Capability = 5.0
CUDA1Capability = 5.0
CUDA2Capability = 3.0
CUDA3Capability = 3.0
CUDA4Capability = 3.0

More generally should all of the CUDA* attributes be able to match CUDA[:digit:]attribute (as works for CUDACapability)?

Thanks.

--
Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/