[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor jobs moving from hold to idle / run state automatically



Hi Gagan,

Jobs can be automatically released on the user level by adding a periodic_release _expression_ to jobs submit description or, more likely what you are looking for, on the admin level for all jobs running on an access point (where the schedd is running) with the configuration knob: SYSTEM_PERIODIC_RELEASE. This can set to an _expression_ that when evaluated to TRUE against a job ad will result in that job being released to run again automatically. Do note is that more recent versions of HTCondor have been fixed to not have jobs put on hold manually with condor_hold be affected by any periodic release expressions.

Hope this helps,
Cole Bollig


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx>
Sent: Monday, April 10, 2023 10:29 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Condor jobs moving from hold to idle / run state automatically
 
Hi Guys,
                 We have several condor-execute nodes in our ht condor cluster. 
The issue we are facing is that if one of the execute nodes crash due to any reason , ht condor moves all jobs running on that node to "hold" state and we have to manually run condor_release to move those jos from "hold" to "idle" or "run" state ( to move the to other running execute nodes ). 

Is there any way that it is done automatically by ht condor without users having to manually run condor_release to move  jobs from failed nodes to healthy nodes ?


Thanks,
Gagan