Re: [HTCondor-users] how to drain offline nodes ?

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Sun, 15 May 2016 13:26:38 -0500

From: Brian Bockelman <bbockelm@xxxxxxxxxxx>

Subject: Re: [HTCondor-users] how to drain offline nodes ?

Hi Frederic,

Locally, we have the START _expression_ reference an attribute that is calculated based on the outcome of periodic startd cron tasks. This way, if the health check hasn't run, the attribute is missing - and hence we can keep the node idle.

It's a good way to wait for things like CRLs, a puppet run, or successful mount of the SE.

Would this help in your case?

Brian

Sent from my iPhone

On May 10, 2016, at 6:59 AM, SCHAER Frederic <frederic.schaer@xxxxxx> wrote:

Hi,

Letâs say weâve had a few nodes offline for a substantial amount of time.

Weâd like to restart them nowâ. But before they start processing jobs, weâd like to make sure x509 CRLs are updated (thereâs a 6H cron, but thatâs not an @boot cron), and to update the sytem/kernel and reboot the nodes on those new kernelsâ

Last time I tried to drain a node using condor_drain, I got an error telling meâ the node was offline (or unreachable, or something like that).

Question : whatâs the correct way to handle this situation ?

I was told to put a START=false in the startd configsâ but thatâs not the correct way for me as it requires starting up the nodes to change the configs, hence the nodes will likely eat and fail a few jobs before I manage to update all configsâ

Any ideas (other than : âreinstallâ ;) ) ?

Thanks && regards

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Public Access

Re: [HTCondor-users] how to drain offline nodes ?