[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] peaceful node drain and shutdown



I just confirmed, START=FALSE did set the node into operator mode and the jobs died. Not sure why it seemed to work the first time.

Setting it to undefined did the right thing.

Thanks,
Kevin

From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Tom Downes [downes@xxxxxxx]
Sent: Wednesday, July 13, 2016 1:58 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] peaceful node drain and shutdown

I have my START _expression_ evaluate to FALSE when certain filesystems become unmounted/unmountable (using STARTD_CRON_*). Empirically the cluster keeps on rolling, it just doesn't start new jobs. The behavior that I expect and want.

I use partitionable slots throughout.

--
Tom Downes
Senior Scientist and Data Center Manager
Center for Gravitation, Cosmology and Astrophysics
University of Wisconsin-Milwaukee
414.229.2678

On Wed, Jul 13, 2016 at 3:27 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
You sure about this?

I also recall the same behavior that Bob describes - if START goes to FALSE instead of UNDEFINED, then the node transitions to Owner state, which then kills off running jobs.

(Again, might have changed at some point)

Brian

> On Jul 13, 2016, at 3:09 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
>
> On 7/13/2016 3:03 PM, Bob Ball wrote:
>> Maybe this info is now obsolete, but I remember once setting the START
>> to an _expression_ that evaluated "FALSE" and caused all the running jobs
>> to terminate....
>>
>> bob
>>
>
> Only if $(START) is referenced in the PREEMPT _expression_....
>
> START just controls when new jobs can be launched.
>
> PREEMPT controls when to kick off jobs (really would be more accurate to have named it "Evict" instead of "Preempt", sigh...).
>
> regards
> Todd
>
>
>> On 7/13/2016 3:56 PM, Fox, Kevin M wrote:
>>> I'm guessing the condor_drain command will have similar issues to the
>>> condor_off -peaceful command? That you have to have all the
>>> permissions setup right?
>>>
>>> The nice thing about the START=FALSE config trick is you only need
>>> root on the machine to do it.
>>>
>>> Thanks,
>>> Kevin
>>> ________________________________________
>>> From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of
>>> Todd Tannenbaum [tannenba@xxxxxxxxxxx]
>>> Sent: Wednesday, July 13, 2016 12:46 PM
>>> To: HTCondor-Users Mail List
>>> Subject: Re: [HTCondor-users] peaceful node drain and shutdown
>>>
>>> On 7/13/2016 2:29 PM, Fox, Kevin M wrote:
>>>> Ah. I had seen the docs for START but didn't realize it would affect new
>>>> job startup too. It seemed to imply that its for eviction.
>>>>
>>>> But, the following seems to work to drain the node gracefully, as you
>>>> suggested:
>>>> echo START=FALSE > /etc/condor/config.d/00shutdown
>>>> kill -HUP <PID OF MASTER>
>>>>
>>>> and to reverse it
>>>> rm -f /etc/condor/config.d/00shutdown
>>>> kill -HUP <PID OF MASTER>
>>>>
>>>> Thanks for the help. :)
>>>>
>>> Hi Kevin,
>>>
>>> If the above satisfies your needs, great.  But just wanted to point out
>>> you can do the same thing (drain a node gracefully) with the
>>> condor_drain tool.  Do "man condor_drain", or see
>>>   http://htcondor.org/manual/v8.4/condor_drain.html
>>>
>>> Also in the upcoming HTCondor v8.5.6, the condor_drain functionality is
>>> exposed via HTCondor's Python API. :)
>>>
>>> regards,
>>> Todd
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> --
> Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
> Center for High Throughput Computing   Department of Computer Sciences
> HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
> Phone: (608) 263-7132                  Madison, WI 53706-1685
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/