[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Ways to limit schedd from accepting and/or starting jobs:
- Date: Fri, 12 May 2006 15:48:28 -0500 (CDT)
- From: Chris Green <greenc@xxxxxxxx>
- Subject: Re: [Condor-users] Ways to limit schedd from accepting and/or starting jobs:
On Fri, 12 May 2006, Steven Timm wrote:
The use case here is draining the queue nicely before a major system
upgrade, which we are doing next week.
I put START = 0 at the bottom of the global config file, but then I don't
override. If you do override START in local config files, then you could
DISABLE_START = 1
in your global config, and make sure your local configs all say:
START = $(DISABLE_START =?= 0) && ... <local-stuff>
Followed by a condor_reconfig -all, of course.
You could get this behavoir now by setting the max jobs per claim options
in later 6.7s, and shutting down your negotiator (or, using HOSTDENY
for the specific schedd at your negotatior) - that way, running jobs
will continue to run, but new jobs won't be able to get more resources.
The other would be:
The schedd doesn't take any new submissions, but supervises
the draining of its existing queue, getting jobs run until there
are no more left in the queue to run.
I haven't seen anything in the manual that might accomplish this.
Has anyone figured out how to do it? If not, can we request this feature?
I _think_ you can do this by setting MAX_JOBS_SUBMITTED = 0. That should
stop the schedd from creating any new jobs, so any submit attempts would
This is probably a bad idea, because users will get an error message that
their submit failed, and it would probably cause havoc with any DAGman
jobs in the queue (I don't know how often DAGMan will retry submits
that failed, but I know it's at least semi-robust against this failure,
if not completely robust)
My guess is that most of the reasons people want features like this are
handled by disconnected operations, so you can reboot submit nodes and not
lose all of the running jobs. Right now the one thing that sucks is the
schedd can't shut itself down without killing all of the running jobs,
even if they could be reconnected to. We're fixing that, but for now if
you want that you have to use a 'kill -9' or condor_off -schedd -fast
to not give the schedd a chance to shutdown "cleanly". When it comes
back up, it will reconnect to the running jobs.
Condor-users mailing list
Chris Green, MiniBooNE / LANL. Email greenc@xxxxxxxx
Tel: (630) 840-2167. Fax: (630) 840-3867