[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] upgrading schedd OS with non empty queue

Someone more knowledgeable should still respond...

Warning, putting a job on hold /will/ stop it from running, i.e. kill it on the execute machine.

Because of Condor's job leases if you upgrade the machine fast enough you could probably just kill the Schedd and when it is restarted it will reconnect to the jobs. I don't know how long job leases default to though, or if they are something you can dynamically reconfig, or if there is a way to tell the Schedd to "renew all leases now, I'm going to shut you down."

Luckily Condor also has job capability strings along with the job leases, so you very likely could kill the Schedd, move it's spool directory to a temp machine, and start a Schedd on the temp machine to manage your jobs during the upgrade.



Maxim kovgan wrote:

I wonder about this:
I have setup a really fast working netboot based os installation service based on FAI package. It takes about 20 minutes initially, incl. all the partitioning and formatting.

So, given some mess after the upgrade it could long upto an hour.

I have a schedd I want to upgrade the OS on
but I am ... afraid to do this with a remaining and existing queue.

The queue is running, and each job can run SEVERAL hours.
jobs are in vanilla universe.

So, killing all the queue would not be fun.
much more fun would be as follows:

   * condor_hold -a
   * properly backup /var/condor
   * upgrade the OS
   * restore /var/condor
   * condor_release -a

Would this work ?
Have there been any testing of this ?

Thanks in advance for the responses.

Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/