For my GPU jobs, I set up a ranking based on the number of compute units, times the number of cores per CU. You might also add the global memory. I do like the
idea of factoring in the CUDA capability level as well, if your cluster has more than one type of card in it.
So for example, in a submit description:
rank = TARGET.
CUDAComputeUnits * TARGET.
CUDACoresPerCU + CUDAFreeGlobalMemory
Michael V Pelletier
HPC Support Team
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Todd Tannenbaum
Sent: Thursday, May 20, 2021 12:49 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Martin Sajdl <masaj.xxx@xxxxxxxxx>
Subject: [External] Re: [HTCondor-users] GPU benchmarking
On 5/20/2021 8:56 AM, Martin Sajdl wrote:
we have a cluster of nodes with GPUs and we would need to set a benchmark number for each slot with GPU to be able to correctly control jobs ranking - start a job on the most powerful GPU available.
Do someone use or know a GPU benchmark tool? Ideally multi-platform (Linux, Windows)...
Just a quick thought:
While it is not strictly a benchmark, perhaps a decent proxy would be to use the CUDACapability attribute that is likely already present in each slot with a GPU (assuming they are NVIDIA gpus, that is).
You could enter the following condor_status command to see if you feel that CUDACapability makes intuitive sense as a performance metric on your pool:
condor_status -cons 'gpus>0' -sort CUDACapability -af name CudaCapability CudaDevicename
Hope the above helps