[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [External] Additional GPU statistics



Hi,

Is there a way to have the startd cron write somewhere else than the class ads to get a finer granularity?

Benedikt


On Wed, Mar 20, 2024 at 14:05 Pelletier, Michael V. RTX via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
Hello,

What youâd want to do is set up a startd cron job. The ClassAd output from this is pulled into the Machine ClassAd and this becomes queriable by condor_status.Â

I do something similar with a job that calls ipmitool to check the power and cooling status of the machine and set a PowerOrCoolingFault Boolean attribute, allowing it to reject jobs if a PSU or fan fault is flagged.

You can set the interval for startd cron jobs in the configuration. Bear in mind that the collector is only updated periodically so a higher frequency doesnât gain you anything. I think itâs possible to push updates immediately from startd cron, but youâd want to keep an eye on the collector load in that case if you have a lot of machines.Â

-Michael Pelletier.Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Benedikt Riedel <briedel@xxxxxxxxxxxxxxxx>
Sent: Wednesday, March 20, 2024 5:08:58 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] [HTCondor-users] Additional GPU statistics
Â
Hi,

Is there a way to get additional GPU statistics like the power draw through condor? Is there a way to increase the query rate for GPU statistics from HTCondor?

Thanks,

Benedikt

--
Benedikt Riedel
Global Computing Coordinator IceCube Neutrino Observatory
Technical Coordinator IceCube Neutrino Observatory
Computing Manager Wisconsin IceCube Particle Astrophysics Center
University of Wisconsin-Madison
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/