[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] startd statistics
- Date: Wed, 18 Mar 2020 14:17:01 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] startd statistics
Hi Christoph. flatting the curve from home these days.
The STARTER collection of statistics is only in the condor_starter, and it never sends it's ad to the collector, The only stats in that collection are
BlockReads, BlockWrites, BlockReadBytes, BlockWriteBytes
these are all at level 1, so they should be published by default, but not into the STARTD ads directly. I think they may end up there indirectly
when a job is running.
>From a startd_cron script you can get ads directly from the startd and pass an argument to condor status that controls the statistics that are published at that time (although this won't affect the STARTER collection, since that's happening indirectly if at all).
condor_status -direct <name-of-local-startd> -statistics ALL:2 -long
Most of the stats in the STARTD are in the DC collection, but they are mostly not about jobs, but about performance of the STARTD. There are a few stats about jobs, these are not in any specific collection.
JobPreemptions, JobRankPreemptions, JobUserPrioPreemptions, JobStarts, JobBusyTime, JobDuration
If you aren't seeing these stats, try adding them to the STATISTICS_TO_PUBLISH_LIST.
Since these are not in any specific collection, the STATISTICS_TO_PUBLISH_LIST is the only way to make them publish if they are not being published by default.
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Beyer, Christoph
Sent: Wednesday, March 18, 2020 6:11 AM
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] startd statistics
I hope you are all well and work from home, help flattening the curve of new infections - I bet someone is using HTC somewhere to fight corona by the way :)
Anyway - something completley different, I think for a while about establishing a kind of error counter for workernodes that come with the host-classadd as a ratio of successful/unsuccessful jobstarts/jobfinishes.
I would like to use the startd-cron feature and the local startd statisitics to calculate that number. Therefore I did set
STATISTICS_TO_PUBLISH = STARTER:2
But that is currently not leading to any helpful numbers using 'condor_status -l -startd' maybe I am on the wrong track here and someone did something similar using different tools ?
I think I could come up with something by going through the job history on the sched but that sounds a bit over-engineered as I suppose the startd should have some numbers that I could use ?
Building 02b, Room 009
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: