[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPU benchmarking



Thank you Michael!
The formula below looks like a good idea. I have one additional question. Is it Okay to use classads in the form TARGET.CUDAComputeUnits when the real slot classad looks like CUDA0ComputeUnits or
CUDA1ComputeUnits? Does Condor automatically able to translate to a correct value using AssignedGPU?

Regards,
Masaj


On 5/20/2021 10:14 PM, Michael Pelletier via HTCondor-users wrote:

For my GPU jobs, I set up a ranking based on the number of compute units, times the number of cores per CU. You might also add the global memory. I do like the idea of factoring in the CUDA capability level as well, if your cluster has more than one type of card in it.

Â

So for example, in a submit description:

Â

rank = TARGET. CUDAComputeUnits * TARGET. CUDACoresPerCU + CUDAFreeGlobalMemory

Â

Michael V Pelletier

Principal Engineer

Raytheon Technologies

Digital Technology

HPC Support Team

Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Todd Tannenbaum
Sent: Thursday, May 20, 2021 12:49 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Martin Sajdl <masaj.xxx@xxxxxxxxx>
Subject: [External] Re: [HTCondor-users] GPU benchmarking

Â

On 5/20/2021 8:56 AM, Martin Sajdl wrote:

Hi!

we have a cluster of nodes with GPUs and we would need to set a benchmark number for each slot with GPU to be able to correctly control jobs ranking - start a job on the most powerful GPU available.
Do someone use or know a GPU benchmark tool? Ideally multi-platform (Linux, Windows)...


Hi Martin,

Just a quick thought:

While it is not strictly a benchmark, perhaps a decent proxy would be to use the CUDACapability attribute that is likely already present in each slot with a GPU (assuming they are NVIDIA gpus, that is).Â

You could enter the following condor_status command to see if you feel that CUDACapability makes intuitive sense as a performance metric on your pool:

  condor_status -cons 'gpus>0' -sort CUDACapability -af name CudaCapability CudaDevicename

Hope the above helps
Todd


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/