[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SYSTEM_PERIODIC_HOLD ignored



Thanks Jaime for your reply,

condor_q -all -nobatch -constraint `condor_config_val SYSTEM_PERIODIC_HOLD`

-- Parse error in constraint _expression_ "("

Looking at a job that should have been put on hold:
HiMemUser = 0
RequestMemory = 5120
JobCurrentStartDate = 1628598643 Â Â## ÂTime() - 1628598643 > 72*3600 - Assuming Time() is working properly and returning the time as Epoch value.

The error seems to indicate a typo error, but I cannot figure it out.
All the arguments that need to be evaluated are present and have the expected values.




On Wed, Aug 18, 2021 at 12:03 AM Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
I canât think of anything that would normally cause a periodic hold _expression_ to stop working.
Here are a couple of ideas for debugging the problemâ

When thereâs a job in the queue that you think should be affected by the periodic hold _expression_, try running this command:
condor_q -all -nobatch -constraint `condor_config_val SYSTEM_PERIODIC_HOLD`

If that doesnât display the problematic job(s), try altering the _expression_ (removing or adjusting terms) to see whatâs needed to make the jobs appear. That can reveal differences between what youâre checking for and whatâs in the job ads.

To ensure the schedd is evaluating the periodic job expressions on a timely basis, you can try amending the _expression_ to always hold special test jobs. For example, you can add this to the end of your config files:
SYSTEM_PERIODIC_HOLD = ($SYSTEM_PERIODIC_HOLD) || AdminHoldJob=?=true

Then, submit a test job with the following line in the submit file:
+AdminHoldJob=True

Then, wait and see if the job gets held.

Â- Jaime

> On Aug 17, 2021, at 5:09 AM, David Cohen <cdavid@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
> A SYSTEM_PERIODIC_HOLD, configure on the schedd, that used to work is ignored lately:
>
> SYSTEM_PERIODIC_HOLD = (Time() - JobCurrentStartDate) > IfthenElse(HiMemUser && (RequestMemory > 40*1024), 120*3600 , 72*3600)
> SYSTEM_PERIODIC_HOLD_Reason = "Job Is Running over time"
> SYSTEM_PERIODIC_REMOVE = JobStatus == 5 && (Time() - EnteredCurrentStatus) > 600
>
> I could find no reference to that in the system's log.
> How can I debug that?
>
> Best,
> David


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/