[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Time elapsed since last completed job



Hi,

On Thu, 2023-04-27 at 16:30:55 -0400, Miguel Garrido wrote:
> Hello
> 
> Iâd like to determine how much time has elapsed since a startd completed
> its last job. Ideally I would like to know wether the system was busy
> running any job within the last X minutes.
> 
> I would prefer to do this within Condor with class ads if possible. Is it
> possible?
> 
> The other idea Iâve had is to periodically query the startd for Busy slots
> and save the query results in some external file to o query against; but
> that means I would then need to manage that file.

Well, basically you could use the "EnteredCurrentState" classAd provided
for every slot, together with the corresponding "State" and "Activity",
these are seconds since the epoch (iow, standard unix timestamps).

Are you using dynamic slots or static ones? This would make a difference
which "slot" prefixes to scan the 'condor_status -l" output for... also,
for dynamic slots you may check whether "ChildActivity" is an empty list
(otherwise there's still a job running, that is, the startd isn't fully
idle).

OTOH, there had been recipes for hibernating nodes - using the same info
that you're trying to retrieve (node having been fully idle for some time),
perhaps they still exist? 


HTH,
 Steffen


-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am MÃhlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~