[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPU benchmarking



For my GPU jobs, I set up a ranking based on the number of compute units, times the number of cores per CU. You might also add the global memory. I do like the idea of factoring in the CUDA capability level as well, if your cluster has more than one type of card in it.

 

So for example, in a submit description:

 

rank = TARGET. CUDAComputeUnits * TARGET. CUDACoresPerCU + CUDAFreeGlobalMemory

 

Michael V Pelletier

Principal Engineer

Raytheon Technologies

Digital Technology

HPC Support Team

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Todd Tannenbaum
Sent: Thursday, May 20, 2021 12:49 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Martin Sajdl <masaj.xxx@xxxxxxxxx>
Subject: [External] Re: [HTCondor-users] GPU benchmarking

 

On 5/20/2021 8:56 AM, Martin Sajdl wrote:

Hi!

we have a cluster of nodes with GPUs and we would need to set a benchmark number for each slot with GPU to be able to correctly control jobs ranking - start a job on the most powerful GPU available.
Do someone use or know a GPU benchmark tool? Ideally multi-platform (Linux, Windows)...


Hi Martin,

Just a quick thought:

While it is not strictly a benchmark, perhaps a decent proxy would be to use the CUDACapability attribute that is likely already present in each slot with a GPU (assuming they are NVIDIA gpus, that is). 

You could enter the following condor_status command to see if you feel that CUDACapability makes intuitive sense as a performance metric on your pool:

    condor_status -cons 'gpus>0' -sort CUDACapability -af name CudaCapability CudaDevicename

Hope the above helps
Todd