[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DENY_WRITE and exclude execute node temporarily



Hello Zach,

First thanks for your reply.

The reason I want to do this way on collector lever is: 

I could have an excute node with hardware problem, so during the maintenance period, I could reboot it several times for whatever reason. condor_off is not enough, because after reboot, condor_master will automatically start condor_startd, then this node appears again in the pool and could take jobs. So I prefer to exclude it temporarily from condor central manager. 

I just noticed that my conf file is purged by puppet. I need to redo my test to be sure. I will keep you informed

Thanks,

Xiaomei





----- Original Message -----
From: "Zach Miller" <zmiller@xxxxxxxxxxx>
To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Sent: Thursday, 22 August, 2019 16:33:07
Subject: Re: [HTCondor-users] DENY_WRITE and exclude execute node temporarily

Hi Xiaomei,

I agree that adding the machine to DENY_WRITE seems like it should exclude the host.  I suspect there is an interaction between the 'WRITE' level access and the 'ADVERTISE_STARTD' access levels and I will look into that with some local testing.

At a higher level, there are other ways that can also accomplish taking a machine out of the pool for maintenance.  You could change the START expression on the machine to False, and it will no longer run NEW jobs (current job would finish I believe).  However the node will still be visible in condor_status.  If you want to avoid that, you could simply run condor_off on the execute node.  The master would still be reporting to the collector then, but the startd would not.  However, that is policy that is implemented at the execute node, whereas your solution was implemented at the collector.  I'm not sure if that was a requirement you had or just an artifact of your implementation.  But then you can simply run 'condor_on' again and the machine will reappear in the pool.


Cheers,
-zach


ïOn 8/22/19, 5:12 AM, "HTCondor-users on behalf of Xiaomei NIU" <htcondor-users-bounces@xxxxxxxxxxx on behalf of xiaomei.niu@xxxxxxxxxxx> wrote:

    Hello Everyone,
    
    I am testing how to exclude some execut nodes from condor pool temporarily under condor 8.9. It is for maintenance on execute nodes.
    From the doc and FFAQ, I chose to teste with DENY_WRITE on the central manager where there is NEGOTIATOR, COLLECTOR...
    
    Here is my setting:
    
    cat /etc/condor-ce/config.d/99_exclude.config
    DENY_WRITE = $(DENY_WRITE), tbcondor05.in2p3.fr
    
    then i run condor_reconfig -full on this machine,
    
    But one day after the change, this machine is always available when I run condor_status tbcondor05
    
    I also tried with
    DENY_WRITE = $(DENY_WRITE), tbcondor05.in2p3.fr, condor_pool@$(UID_DOMAIN)/tbcondor05.in2p3.fr, root@$(UID_DOMAIN)/tbcondor05.in2p3.fr
    
    Same results.
    
    I didn't try HOSTDENY_WRITE, I think DENY_WRITE is the higher lever?
    This machine is allowed under: ALLOW_WRITE, COLLECTOR.ALLOW_ADVERTISE_MASTER COLLECTOR.ALLOW_ADVERTISE_STARTD
    But I suppose DEBY_WRITE has the higher priority?
    
    
    Another question is: when the node is excluded, what will happens to the jobs running before this change? Will they finish properly?
    
    Any help is welcome
    
    
    Thanks in advance,
    
    
    
    Xiaomei
    
    Centre de Calcul IN2P3/CNRS
    France
    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
    
    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/
    


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/