[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DENY_WRITE and exclude execute node temporarily



Hi Xiaomei,

I agree that adding the machine to DENY_WRITE seems like it should exclude the host.  I suspect there is an interaction between the 'WRITE' level access and the 'ADVERTISE_STARTD' access levels and I will look into that with some local testing.

At a higher level, there are other ways that can also accomplish taking a machine out of the pool for maintenance.  You could change the START expression on the machine to False, and it will no longer run NEW jobs (current job would finish I believe).  However the node will still be visible in condor_status.  If you want to avoid that, you could simply run condor_off on the execute node.  The master would still be reporting to the collector then, but the startd would not.  However, that is policy that is implemented at the execute node, whereas your solution was implemented at the collector.  I'm not sure if that was a requirement you had or just an artifact of your implementation.  But then you can simply run 'condor_on' again and the machine will reappear in the pool.


Cheers,
-zach


ïOn 8/22/19, 5:12 AM, "HTCondor-users on behalf of Xiaomei NIU" <htcondor-users-bounces@xxxxxxxxxxx on behalf of xiaomei.niu@xxxxxxxxxxx> wrote:

    Hello Everyone,
    
    I am testing how to exclude some execut nodes from condor pool temporarily under condor 8.9. It is for maintenance on execute nodes.
    From the doc and FFAQ, I chose to teste with DENY_WRITE on the central manager where there is NEGOTIATOR, COLLECTOR...
    
    Here is my setting:
    
    cat /etc/condor-ce/config.d/99_exclude.config
    DENY_WRITE = $(DENY_WRITE), tbcondor05.in2p3.fr
    
    then i run condor_reconfig -full on this machine,
    
    But one day after the change, this machine is always available when I run condor_status tbcondor05
    
    I also tried with
    DENY_WRITE = $(DENY_WRITE), tbcondor05.in2p3.fr, condor_pool@$(UID_DOMAIN)/tbcondor05.in2p3.fr, root@$(UID_DOMAIN)/tbcondor05.in2p3.fr
    
    Same results.
    
    I didn't try HOSTDENY_WRITE, I think DENY_WRITE is the higher lever?
    This machine is allowed under: ALLOW_WRITE, COLLECTOR.ALLOW_ADVERTISE_MASTER COLLECTOR.ALLOW_ADVERTISE_STARTD
    But I suppose DEBY_WRITE has the higher priority?
    
    
    Another question is: when the node is excluded, what will happens to the jobs running before this change? Will they finish properly?
    
    Any help is welcome
    
    
    Thanks in advance,
    
    
    
    Xiaomei
    
    Centre de Calcul IN2P3/CNRS
    France
    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
    
    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/