For my GPU jobs, I set up a ranking based on the number of compute units, times the number of cores per CU. You might also add the global memory. I do like the idea of factoring in the CUDA capability level as well, if your cluster has more than one type of card in it.
So for example, in a submit description:
rank = TARGET. CUDAComputeUnits * TARGET. CUDACoresPerCU + CUDAFreeGlobalMemory
Michael V Pelletier
HPC Support Team
On 5/20/2021 8:56 AM, Martin Sajdl wrote:
we have a cluster of nodes with GPUs and we would need to set a benchmark number for each slot with GPU to be able to correctly control jobs ranking - start a job on the most powerful GPU available.
Do someone use or know a GPU benchmark tool? Ideally multi-platform (Linux, Windows)...
Just a quick thought:
While it is not strictly a benchmark, perhaps a decent proxy would be to use the CUDACapability attribute that is likely already present in each slot with a GPU (assuming they are NVIDIA gpus, that is).Â
You could enter the following condor_status command to see if you feel that CUDACapability makes intuitive sense as a performance metric on your pool:
Â Â condor_status -cons 'gpus>0' -sort CUDACapability -af name CudaCapability CudaDevicename
Hope the above helps
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/