Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
- Date: Fri, 27 Aug 2021 12:16:32 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
On 8/27/2021 2:42 AM, Stefano Dal Pra
wrote:
Experts here might want to confirm: i think that some job classads
(such as ResidentSetSize) are actually updated every 15 minutes.
If that is true, that means that this policy could put on hold a
job now, based on a value measured up to 15 minutes before.
It is a bit complicated....
The condor_starter on the execute node will send updates to the
condor_shadow every 5 minutes by default with dynamic attributes
about the job like ResidentSetSize. How often the starter updates
the shadow is controlled via condor_starter config knobs
STARTER_UPDATE_INTERVAL and STARTER_INITIAL_UPDATE_INTERVAL (how
long until the first update is sent).
Upon receiving an update from the condor_starter, the condor_shadow
for the job will evaluate job policy expressions like
SYSTEM_PERIODIC_HOLD for running jobs. Job policy expressions are
evaluated/handled by the condor_shadow when a job is running to help
offload work from the schedd.
Then, periodically at a lower frequency of every 15 min by default,
the condor_shadow will push those updated attributes to the schedd
so they are visible via condor_q. A lower frequency is used here
to minimize overloading the schedd when running thousands of jobs.
How often the shadow pushes attributes to the schedd is controlled
via config knob SHADOW_QUEUE_UPDATE_INTERVAL.
So, even though you will only see changes to ResidentSetSize every
15 minutes via condor_q, the SYSTEM_PERIODIC_HOLD _expression_ should
be operating on values that are no more than 5 minutes old.
Hope this helps,
Todd