Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored

Date: Fri, 27 Aug 2021 09:42:16 +0200
From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored

Hi,

On 27/08/21 06:20, David Cohen wrote:

Hooray!!

It's working now and job's running over time are evicted.

Now to my next project, holding jobs that after 30 minutes of run still don't use more than 10% of the requested memory:

WastingMemory = (JobStatus == 2 && (time() - JobCurrentStartExecutingDate) > 1800) && (RequestMemory > 8192) && (ResidentSetSize/1024 < RequestMemory/10)

I believe that thread gives me all the tools needed to manage that one.

Experts here might want to confirm: i think that some job classads (such as ResidentSetSize) are actually updated every 15 minutes.
If that is true, that means that this policy could put on hold a job now, based on a value measured up to 15 minutes before.
A simple remedy would be that of waiting 2700 seconds instead of 1800.

When considering a hold policy, i use condor_q to check for candidate jobs, and verify that no "innocent" jobs are involved.
Running something like this or a variant:

condor_q -glob -all -cons '(JobStatus == 2 && (time() - JobCurrentStartExecutingDate) > 1800)' -af:j owner '(RequestMemory > 8192)' '(ResidentSetSize < RequestMemory * 102.4)'

Â
Could help to confirm that the right jobs are affected before enforcing the rule.

Stefano

Many thanks,

David

On Thu, Aug 26, 2021 at 4:48 PM Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx> wrote:

On 26/08/21 15:12, Stefano Dal Pra wrote:
> [SNIP]
>>
>> That works perfectly for MEMORY_EXCEEDED but totally ignored for
>> TIME_EXCEEDED.
[SNIP]

I stumbled on a somehow survived job running for 21 days, so i forged a
clause to get it held and verify that it works:

TooMuchTime = (jobstatus == 2 && (time() - JobStartDate > 86400 * 7))

This clause works, but it only takes effect after condor restart:
condor_reconfig not enough.

Stefano

Follow-Ups:
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Todd Tannenbaum
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen

References:
- [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Jaime Frey
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Beyer, Christoph
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Jaime Frey
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Stefano Dal Pra
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Jaime Frey
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Stefano Dal Pra
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Stefano Dal Pra
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Stefano Dal Pra
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: Stefano Dal Pra
- Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
  - From: David Cohen

Prev by Date: [HTCondor-users] Job Scheduling issue in 8.8.5 version
Next by Date: Re: [HTCondor-users] daemon foreground spawning with Condor >=8.9
Previous by thread: Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
Next by thread: Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored