[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] centrally force removal after some time even if leave_in_queue is true?



On 10/31/2018 5:49 AM, Andrea Sartirana wrote:
> Hi,
> 
> much is in the title.
> I was wondering if there is a way to force removal from the queue of the 
> X state jobs after some centrally defined time even if the 
> leave_in_queue expression given by the user at submission still 
> evaluates to true. I'm running 8.6.0, vanilla universe, direct submission.
> 
> I've tried to include garbage collecting of the remove jobs in the 
> SYSTEM_PERIODIC_REMOVE but this does not seem to have the desired effect.
> 
> Regards
> Andrea
> 

Hi Andrea,

There may be an easier way, but a quick thought is you could use Job Transforms to accomplish the above.   Job Transforms allow you, the administrator, to edit job classads upon submission --- see this section of the v8.6 manual:

  http://htcondor.org/manual/v8.6/3_7Policy_Configuration.html#38930

So the idea here is to configure your schedd to edit the user's leave_in_queue expression (which ends up in the job classad as attribute LeaveJobInQueue) so that it will always evaluate to False for X state jobs after a specified amount of time, else fall back to whatever the user wanted.

Try appending the below to the HTCondor configuration (it will be used by your submit machines, and ignored on machines not running a schedd) to allow jobs in X state to leave the queue after 120 seconds regardless of what the user's submit file says:

JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES) LeaveInQueue
JOB_TRANSFORM_LeaveInQueue @=end
[
    copy_LeaveJobInQueue = "SubmitterLeaveJobInQueue";
    set_LeaveJobInQueue = (JobStatus == 3 && (time() - EnteredCurrentStatus) > 120) ? False : SubmitterLeaveJobInQueue
]
@end

Warning - the above is off the top of my head, I did not test it.

Seems like HTCondor would benefit from a SYSTEM_LEAVE_IN_QUEUE knob to make doing the above simpler.  But Job Transforms are a pretty powerful generic tool.

Hope the above helps.  

regards,
Todd


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685