[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] pool drainoff





Steven Timm wrote:
The other two features I've wanted for a long time are (1) an instruction
to tell a schedd to start all its existing jobs but not
accept any more new ones.  Also (2) an instruction to let existing
jobs on a schedd complete but not start any more new ones.  (yes
I know the latter could be accomplished with condor_hold -constraint ...)

In Condor 7.1.1, condor_off -peaceful -schedd will cause the schedd to
stop starting new jobs and shut down after all currently running jobs
finish.

A good start.  Now is there a way to coordinate the -peaceful
of the startd with the -peaceful of the schedd?

In 7.1.1, if you send a -peaceful shutdown to both startds and schedds, all existing running jobs should finish and then the schedds and startds should shut down, so I think there is no problem there. The lack of coordination is with the collector. If you send a -peaceful shutdown to everything (including the collector), then the collector will exit immediately, before the startds and schedds have finished. The lack of a collector _might_ not interfere with the wrapping up and shutting down of the rest of the pool, but I'm not 100% sure, and lacking a collector sure won't make it easy to see what is going on.

I believe the answer to your request (1) above is to set
|MAX_JOBS_SUBMITTED=0.  Or at least so says my new (not yet publicly
announced) How-to:|
http://nmi.cs.wisc.edu/node/1466

We'll try this whenever we get around to testing condor 7.1.

FYI: MAX_JOBS_SUBMITTED has been lurking in obscurity for a long time. It should work in whatever version of Condor you are using. The general statement in the How-to that the advice is known to work in Condor 7.0 is just the standard thing I put in all the new How-to docs because I was in too much of a hurry to investigate how far back each feature goes.

--Dan