[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] how to drain offline nodes ?



Hi,

 

Let’s say we’ve had a few nodes offline for a substantial amount of time.

We’d like to restart them now…. But before they start processing jobs, we’d like to make sure x509 CRLs are updated (there’s a 6H cron, but that’s not an @boot cron), and to update the sytem/kernel and reboot the nodes on those new kernels…

Last time I tried to drain a node using condor_drain, I got an error telling me… the node was offline (or unreachable, or something like that).

 

Question : what’s the correct way to handle this situation ?

I was told to put a START=false in the startd configs… but that’s not the correct way for me as it requires starting up the nodes to change the configs, hence the nodes will likely eat and fail a few jobs before I manage to update all configs…

 

Any ideas (other than : “reinstall” ;) ) ?

 

Thanks && regards