[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored



I canât think of anything that would normally cause a periodic hold expression to stop working.
Here are a couple of ideas for debugging the problemâ

When thereâs a job in the queue that you think should be affected by the periodic hold expression, try running this command:
condor_q -all -nobatch -constraint `condor_config_val SYSTEM_PERIODIC_HOLD`

If that doesnât display the problematic job(s), try altering the expression (removing or adjusting terms) to see whatâs needed to make the jobs appear. That can reveal differences between what youâre checking for and whatâs in the job ads.

To ensure the schedd is evaluating the periodic job expressions on a timely basis, you can try amending the expression to always hold special test jobs. For example, you can add this to the end of your config files:
SYSTEM_PERIODIC_HOLD = ($SYSTEM_PERIODIC_HOLD) || AdminHoldJob=?=true

Then, submit a test job with the following line in the submit file:
+AdminHoldJob=True

Then, wait and see if the job gets held.

 - Jaime

> On Aug 17, 2021, at 5:09 AM, David Cohen <cdavid@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> Hi,
> A SYSTEM_PERIODIC_HOLD, configure on the schedd, that used to work is ignored lately:
> 
> SYSTEM_PERIODIC_HOLD = (Time() - JobCurrentStartDate) > IfthenElse(HiMemUser && (RequestMemory > 40*1024), 120*3600 , 72*3600)
> SYSTEM_PERIODIC_HOLD_Reason = "Job Is Running over time"
> SYSTEM_PERIODIC_REMOVE = JobStatus == 5 && (Time() - EnteredCurrentStatus) > 600
> 
> I could find no reference to that in the system's log.
> How can I debug that?
> 
> Best,
> David