[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] GpusAverageUsage value when requesting more than 1 GPU



Hello all,

The update to HTCondor 10.0.3 solved a bug related to the GPU metrics and now these metrics continue to be reported after the startd is reconfigured. We updated all our GPUs nodes to HTCondor 10.0.X and now the report of GpusAverageUsage is more reliable.

According to the documentation, GpusUsage was renamed to GpusAverageUsage and is the GPU usage over the lifetime of the job, reported as a fraction of the maximum possible utilization of one GPU. The value is under or slightly over 1 in all the jobs requesting 1 GPU. However, when a user request 2 GPUs for instance:Â

# condor_history -const 'RequestGpus == 2 && GpusAverageUsage > 1' -limit 1 -af RequestGpus GpusAverageUsage RemoteWallclockTime
2 21.29038032397889 11584.0

As you can see, the GpusAverageUsage is 21.29038. The machine where the job run has 8 GPUs. We did not know if requesting 2 GPUs would mean values of GpusAverageUsage around 2, but 21.29? Is the GpusAverageUsageÂvalue only meaningful if 1 GPU is requested?

Thank you in advance.

Best regards,

Carles


--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es