[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] centrally force removal after some time even if leave_in_queue is true?



Hi Todd,

ok. Great. I'll keep a cron as workaround waiting to upgrade to 8.8.
Then the job transform will fix the issues in a clean way.

Thanks!
Regards,
Andrea


On 08/11/2018 00:18, Todd Tannenbaum wrote:
On 11/7/2018 10:38 AM, Andrea Sartirana wrote:
Hi Todd,

your solution seems to work as the LeavJobInQueue classadd is changed
[1] and correctly evaluates to false
when some expiration time has passed [2]. But indeed, as Michael said,
it does not really fix my problem
since the jobs are not removed from the queue (in the sense that they
still appear in condor_q output).
Is this because something is not well configured on our schedd?
If not I guess only a cron running "condor_rm -xforce ..." can fix the
issue...

(anyways, job-transform seems indeed very powerful)

Regards,
Andrea

Hi Andrea,

Ugh, what you observe above is currently correct.  It is a bug, thank
you for reporting it.

Things work properly for jobs that *complete*, but it turns out thereâs
a bug when LeaveJobInQueue evaluates to True for *removed* jobs. The
removed jobs stay in the queue even when the expression later evaluates
to False.

The good news is we fixed this bug for the upcoming v8.8.0 release.
Details are in this ticket:
    https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6808

In the meantime, as you surmised, an potential immediate work-around
would be to run 'condor_rm -all -forcex' periodically.  This causes all
jobs that are already in X state to be immediately removed from the
queue, ignoring any conditions that would normally keep them in the
queue (like LeaveJobInQueue). It leaves jobs in other state alone.

regards,
Todd