[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] collect schedd Recent* classads for plotting in dashboard



Hi Vikrant,

STATISTICS_WINDOW_SECONDS is the breadth of time to collect/keep statistics for that is then broken up into buckets determined by STATISTICS_WINDOW_QUANTUM. If the quantum is larger than the window seconds, you will only have a singular bucket collecting statistics. While you can reduce the quantum for granularity and smoothness to a point since the statistics are recorded at a separate interval (Around 30 seconds).

The DaemonCoreDutyCycle is determined by HTCondor Daemons time busy (working) and time waiting for work. So, in the Schedd case more jobs means more work resulting in an increase of the duty cycle, but job management is not the only factor. As any work the Schedd is doing will effect its duty cycle.

It should also be noted that the DaemonCoreDutyCycle and the Job count statistics are part of different statistical pools. So, the times of recording/collecting the statistics may vary.

-Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Sent: Monday, January 8, 2024 8:42 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] collect schedd Recent* classads for plotting in dashboard
 
Hello Experts, 

We found that RecentDaemonCoreDutyCycle is a crucial parameter to decide whether sched jobs will be considered for match making or not. We noticed other Recent* parameters which are helpful to gather more information about sched. But the results of these parameters does not match with numbers of condor_q. I understand these parameter change in range from 0 to 20m but we never see the number of jobs submitted in the last 20mins more than 3-4k and the number we see is 6k, similarly for started and exited. 

    [
        Autoclusters = 16;
        RecentStatsLifetime = 1200;
        Name = "test.example.com";
        RecentDaemonCoreDutyCycle = 3.490321919877746E-01;
        RecentJobsExited = 9820;
        RecentJobsKilled = 2;
        RecentJobsStarted = 10078;
        RecentJobsSubmitted = 6049
    ]

Is't necessary to keep STATISTICS_WINDOW_SECONDS more than STATISTICS_WINDOW_QUANTUM. If we are changing the default of 20m to somewhat like 5m to get more granular metrics from sched. Is't increasing the memory needs of sched?

test:~# condor_config_val STATISTICS_WINDOW_SECONDS STATISTICS_WINDOW_QUANTUM
1200
240


What's the logic to increase RecentDaemonCoreDutyCycle? Based on the number of jobs in the queue or any other parameter also? 


Thanks & Regards,
Vikrant Aggarwal