[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd statistics



Hi Christoph.  flatting the curve from home these days.

The STARTER collection of statistics is only in the condor_starter, and it never sends it's ad to the collector, The only stats in that collection are 

BlockReads, BlockWrites, BlockReadBytes, BlockWriteBytes

these are all at level 1, so they should be published by default, but not into the STARTD ads directly.  I think they may end up there indirectly
when a job is running. 

>From a startd_cron script you can get ads directly from the startd and pass an argument to condor status that controls the statistics that are published at that time (although this won't affect the STARTER collection, since that's happening indirectly if at all).
 
try

condor_status -direct <name-of-local-startd> -statistics ALL:2 -long 

Most of the stats in the STARTD are in the DC collection, but they are mostly not about jobs, but about performance of the STARTD.  There are a few stats about jobs, these are not in any specific collection. 

JobPreemptions, JobRankPreemptions, JobUserPrioPreemptions, JobStarts, JobBusyTime, JobDuration

If you aren't seeing these stats, try adding them to the  STATISTICS_TO_PUBLISH_LIST. 

Since these are not in any specific collection, the STATISTICS_TO_PUBLISH_LIST is the only way to make them publish if they are not being published by default. 

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Beyer, Christoph
Sent: Wednesday, March 18, 2020 6:11 AM
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] startd statistics


Hi,

I hope you are all well and work from home, help flattening the curve of new infections - I bet someone is using HTC somewhere to fight corona by the way :) 

Anyway - something completley different, I think for a while about establishing a kind of error counter for workernodes that come with the host-classadd as a ratio of successful/unsuccessful jobstarts/jobfinishes. 

I would like to use the startd-cron feature and the local startd statisitics to calculate that number. Therefore I did set 

STATISTICS_TO_PUBLISH = STARTER:2

But that is currently not leading to any helpful numbers using 'condor_status -l -startd' maybe I am on the wrong track here and someone did something similar using different tools ? 

I think I could come up with something by going through the job history on the sched but that sounds a bit over-engineered as I suppose the startd should have some numbers that I could use ? 

Best
Christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/