[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to set a worker node offline in HTCondor



if 

condor_drain -help

does not show a -reason arg, that would explain it.  which would mean I'm wrong about what version that got released in. 

I have 8.9.13, and this is what I see

Usage: condor_drain [OPTIONS] machine

OPTIONS:
-cancel           Stop draining.
-graceful         (the default) Honor MaxVacateTime and MaxJobRetirementTime.
-quick            Honor MaxVacateTime but not MaxJobRetirementTime.
-fast             Honor neither MaxVacateTime nor MaxJobRetirementTime.
-reason <text>    While draining, advertise <text> as the reason.
-resume-on-completion    When done draining, resume normal operation.
-exit-on-completion      When done draining, STARTD should exit and not restart.
-restart-on-completion   When done draining, STARTD should restart.
-request-id <id>  Specific request id to cancel (optional).
-check <expr>     Must be true for all slots to be drained or request is aborted.
-start <expr>     Change START _expression_ to this while draining.

-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Anderson, Stuart B. <sba@xxxxxxxxxxx>
Sent: Thursday, April 8, 2021 1:01 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] How to set a worker node offline in HTCondor
 

> On Apr 8, 2021, at 10:47 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>
> In general the mechanism that we use to avoid cancelling a drain that was not started by defrag is to look for the DrainReason attribute of the p-slot.  
>
> Draining can be cancelled by the Defrag daemon if there is no DrainReason, or if the DrainReason is "defrag".
>
> There should always be a DrainReason attribute if draining was started by an 8.9.11 or later condor_drain command, or by an 8.9.11 or later DEFRAG daemon.

OK, then there appears to be a bug in 8.9.11 (or I need to enable another condor setting). In particular, I ran version 8.9.11 "condor_drain machine-name" and DEFRAG restarted jobs after it was drained.

Note, I don't see a condor_drain option to specify DrainReason. If you agree the above is a bug then once it is fixed how should I specify DrainReason to indicate that a manual drain should be canceled by the Defrag daemon when it is done draining?

Thanks.

--
Stuart Anderson
sba@xxxxxxxxxxx




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/