[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Draining in htcondor



----- Original Message -----
> From: "Todd L Miller" <tlmiller@xxxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Sent: Tuesday, 8 June, 2021 16:11:54
> Subject: Re: [HTCondor-users] Draining in htcondor

>> I was a bit surprised that this change seems to be only internal -
>> condor_config_val will return the original one (on a 8.9.11 startd).
> 
> 	condor_config_val usually only queries the on-disk configuration,
> not the in-memory configuration.  Did you try
> 
> condor_config_val -startd -name <name of startd>
> 
> ?

Hm, I was not sure if I did and was kinda hoping I didn't - but I retried it just now, same effect.

When I supply a condor_drain with -start or without, I get the original config for START.

>> But on static slots the value is apparently not reset, at least not on
>> canceling the drain.
> 
> 	Sorry, which value?

The AcceptedWhileDraining attribute - well, more below...

>> A restart of the startd did it and accepting jobs on the slot when not
>> draining might do it as well - regardless, this feels like an oversight
>> or is there a reason for this?
> 
> 	You don't need to drain static slots, so I'm a little surprised
> the drain command did anything at all, honestly.  Is the startd configured
> with both dynamic and static slots?

Yes, we typically have a static slot1 reserved for Jupytr-Jobs and a partitionable slot2 for standard jobs.

When I start the drain with a start expression allowing for jobs during drain, the first extra one always goes to slot1, setting AcceptedWhileDraining to true.

This was also surprising since my test jobs normally go to slot2, haven't yet figured out why drain mode is different here.

Once the drain is complete and canceled, the AcceptedWhileDraining remains on true on the static slot1.

Oh, and one more thing, since I just saw it when testing the condor_config_val: initially I had a completely empty startd, when starting the drain.

Nonetheless slot1 stayed in Drained/Retiring for almost 3 minutes - I was wondering what's taking so long here:

06/09/21 07:13:12 slot1: State change: entering Drained state
06/09/21 07:13:12 slot1: Changing state and activity: Unclaimed/Idle -> Drained/Retiring
06/09/21 07:15:47 slot1: State change: draining is complete.
06/09/21 07:15:47 slot1: Changing activity: Retiring -> Idle

I had seen that before on slot1 but I attributed it to AcceptedWhileDraining, which was true then - just now it was on false, though...

BTW, I realize those "issues" are not big deals, even if they turn out to be unintended - for me this is more about understanding how it works exactly.

Best
  Kruno

> 
> - ToddM
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
------------------------------------------------------------------------
Krunoslav Sever            Deutsches Elektronen-Synchrotron (IT-Systems)
                        Ein Forschungszentrum der Helmholtz-Gemeinschaft
                                                            Notkestr. 85
phone:  +49-40-8998-1648                                   22607 Hamburg
e-mail: krunoslav.sever@xxxxxxx                                  Germany
------------------------------------------------------------------------