[HTCondor-users] how to drain offline nodes ?

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Tue, 10 May 2016 11:59:20 +0000

From: SCHAER Frederic <frederic.schaer@xxxxxx>

Subject: [HTCondor-users] how to drain offline nodes ?

Hi,

Let’s say we’ve had a few nodes offline for a substantial amount of time.

We’d like to restart them now…. But before they start processing jobs, we’d like to make sure x509 CRLs are updated (there’s a 6H cron, but that’s not an @boot cron), and to update the sytem/kernel and reboot the nodes on those new kernels…

Last time I tried to drain a node using condor_drain, I got an error telling me… the node was offline (or unreachable, or something like that).

Question : what’s the correct way to handle this situation ?

I was told to put a START=false in the startd configs… but that’s not the correct way for me as it requires starting up the nodes to change the configs, hence the nodes will likely eat and fail a few jobs before I manage to update all configs…

Any ideas (other than : “reinstall” ;) ) ?

Thanks && regards

Mailing List Archives

Public Access

[HTCondor-users] how to drain offline nodes ?