[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] centrally force removal after some time even if leave_in_queue is true?



On 11/7/2018 10:38 AM, Andrea Sartirana wrote:
> Hi Todd,
> 
> your solution seems to work as the LeavJobInQueue classadd is changed 
> [1] and correctly evaluates to false
> when some expiration time has passed [2]. But indeed, as Michael said, 
> it does not really fix my problem
> since the jobs are not removed from the queue (in the sense that they 
> still appear in condor_q output).
> Is this because something is not well configured on our schedd?
> If not I guess only a cron running "condor_rm -xforce ..." can fix the 
> issue...
> 
> (anyways, job-transform seems indeed very powerful)
> 
> Regards,
> Andrea
> 

Hi Andrea,

Ugh, what you observe above is currently correct.  It is a bug, thank 
you for reporting it.

Things work properly for jobs that *complete*, but it turns out thereâs 
a bug when LeaveJobInQueue evaluates to True for *removed* jobs. The 
removed jobs stay in the queue even when the expression later evaluates 
to False.

The good news is we fixed this bug for the upcoming v8.8.0 release. 
Details are in this ticket:
   https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6808

In the meantime, as you surmised, an potential immediate work-around 
would be to run 'condor_rm -all -forcex' periodically.  This causes all 
jobs that are already in X state to be immediately removed from the 
queue, ignoring any conditions that would normally keep them in the 
queue (like LeaveJobInQueue). It leaves jobs in other state alone.

regards,
Todd