[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] GPU selection in HTCondor 9.0.6 LT Release



Hello,

   We have several A100 GPU's that we have divided up using nVidia's MIG configuration.  Each nVidia A100 80GB GPU is dived into 3  19.955GB partitions and 1 9.721 GB partion

Here is a snippet of the ïcondor_gpu_discovery command output.

MIG_3f63dad5_849f_591e_9d4f_f7bacd6c2d97DeviceName="NVIDIA A100 80GB PCIe MIG 2g.20gb"
MIG_3f63dad5_849f_591e_9d4f_f7bacd6c2d97DeviceUuid="MIG-3f63dad5-849f-591e-9d4f-f7bacd6c2d97"
MIG_3f63dad5_849f_591e_9d4f_f7bacd6c2d97DriverVersion=11.60
MIG_3f63dad5_849f_591e_9d4f_f7bacd6c2d97GlobalMemoryMb=19955
MIG_3f63dad5_849f_591e_9d4f_f7bacd6c2d97MaxSupportedVersion=11060
MIG_56476b2d_78a8_5280_9fa9_02bf5b74dee1DeviceName="NVIDIA A100 80GB PCIe MIG 1g.10gb"
MIG_56476b2d_78a8_5280_9fa9_02bf5b74dee1DeviceUuid="MIG-56476b2d-78a8-5280-9fa9-02bf5b74dee1"
MIG_56476b2d_78a8_5280_9fa9_02bf5b74dee1DriverVersion=11.60
MIG_56476b2d_78a8_5280_9fa9_02bf5b74dee1GlobalMemoryMb=9721
MIG_56476b2d_78a8_5280_9fa9_02bf5b74dee1MaxSupportedVersion=11060

We are using partitionable slots. 

$CondorVersion: 9.0.6 Sep 23 2021 BuildID: racf PackageID: 9.0.6 $
$CondorPlatform: X86_64-ScientificLinux_7.9 $

Is there an easy way to add the GPUmemory to the requirements for a job. For users who have need for more memory than 9.721 GB we would like to allow the users to select.

Is there a condor classad short hand that would allow us to use *GlobalMemoryMb > 10000 to differential between GPU's.

Regards,

Doug Benjamin



Regards,
Doug Benjamin