[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] DeviceGpusAverageUsage and GpusAverageUsage
- Date: Thu, 19 May 2022 07:17:39 +0200
- From: Carles Acosta <cacosta@xxxxxx>
- Subject: Re: [HTCondor-users] DeviceGpusAverageUsage and GpusAverageUsage
Hi Todd M, Todd T,
Thank you for your responses. You help me to understand how the monitor of the GPUs works in HTCondor.Â
I am checking the condor_history both for the schedd side and the startd_history and the GpusAverageUSage is reported for some jobs but not for all, as Todd M commented maybe this is because the jobs are not long enough. I'll continue to investigate.
On 5/18/2022 1:27 PM, Todd L Miller
So, at least, I understand that we can
play with DeviceGpusAverageUsage to
check if the utilization is 0, but I do not understand the
between DeviceGpusAverageUsage and GpusAverageUsage or why the
GpusAverageUsage is undefined while the DeviceGpusAverageUsage
If I recall correctly --
ÂÂÂÂThe GPU monitor can only monitor the utilization of a given
GPU; it knows nothing about which jobs are using which device.Â It
reports the "Device*" values for each GPU to the specific slot
assigned that GPU. "GPUsAverageUsage" is a per-_job_ attribute,
derived from the "Device*" values, and is set in the _job_ by the
startd.Â Those job-ad attributes are mirrored into the slot ad by
ÂÂÂÂAdditionally, none of this works for sufficiently-short jobs,
although since you're talking about checking four hours in, that
shouldn't be a problem.
ÂÂÂÂI haven't tested this recently, but last time I did, average
GPU utilization and peak GPU memory usage were certainly being
recorded in the job log (where the other usage is reported), and I
believe in the job ad as well.Â AFAIK, there's no reason why the
whole job ad wouldn't be written to the history file.
Realize there are _two_ history files involved here - 1) the job
history (on the submit/access point machine) which contains all the
job classads that left the job queue, and 2) the startd history
(that lives on the execute machine) which contains all the job
classads that ran on the execute machine.
I think ToddM above was talking about the job history (item 1
above), and Christoph in his email was looking at startd history
(item 2 above).
@ToddM: do you expect the GPU usage attributes to appear in the
startd history as the job history?
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at:
Carles Acosta i Silva
PIC (Port d'InformaciÃ CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10