[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Authentication error after upgrade to 9.0.16



On 12/12/2022 7:06 AM, David Cohen wrote:
Hi,
After upgrading from 8.8 to 9.0.16 I can't drain nodes anymore.


Hi David,

A few (hopefully helpful!) thoughts on the below -

1. condor_drain is an administrative command, which requires ADMINISTRATOR authorization level at the startd.  In the below, you are looking at DAEMON authorization level settings, not ADMINISTRATOR settings.

2. condor_reconfig is also an administrative command, so to help debug the administrator security authorization without any real side-effects (like starting to drain a node!),  you could do (in bash) :
   $ _condor_TOOL_DEBUG=D_SECURITY  condor_reconfig -debug tech-wn001
or to get even more debugging:
   $ _condor_TOOL_DEBUG=D_ALL  condor_reconfig -debug tech-wn001

3. the command "condor_config_val -summary" is nice because it just displays config knobs that have been customized (i.e. knobs that are set to their default values are not displayed)....  maybe sending along 'condor_config_val -summary' on your worker node could allow us to help better? 

4. just for the benefit of others on the list who may be reading this: IMHO, a good way to upgrade from v8.8.x to v9.x+ is to start off with a known fully functional new installation, then move over your old config settings in a controlled manner.  Essentially, this means run 'condor_config_val -summary' on your old pool to capture your old custom settings.  Then use get.htcondor.org to setup your new pool with a default configuration; get.htcondor.org sets up everything securely with IDTOKENS by default ( see https://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html ).  Finally, review your old custom settings and drop them into /etc/condor/config.d on your new installation (probably leaving out your old security config).

regards,
Todd


(Adding LEGACY_ALLOW_SEMANTICS = TRUE doesn't solve the problem. )

Trying to drain a node using:
cm ~]# condor_drain -graceful tech-wn001          
Attempt to send DRAIN_JOBS to startd <192.114.101.1:9618?addrs=192.114.101.1-9618&alias=tech-wn001.hep.technion.ac.il&noUDP&sock=startd_2694_703a> failed
Failed to start DRAIN_JOBS command to slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The worker node seems to look only for GSI
2/04/22 10:34:41 DC_AUTHENTICATE: required authentication of CM_IP failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5003:Failed to authenticate.  Globus is
reporting error (851968:254).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable t
o lstat(/tmp/FS_XXXZpyt30)

Looking at the DAEMON nobs at both the CM and the startd:
cm ~]# sudo grep -R DAEMON /etc/condor/*
/etc/condor/config.d/50-security:SEC_DAEMON_AUTHENTICATION = REQUIRED
/etc/condor/config.d/50-security:SEC_DAEMON_INTEGRITY = REQUIRED
/etc/condor/config.d/50-security:SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD
/etc/condor/config.d/50-security:ALLOW_DAEMON = condor_pool@*/*, condor@*/$(IP_ADDRESS)

cm ~]# condor_config_val -dump | grep DAEMON      
ALLOW_DAEMON = condor_pool@*/*, condor@*/$(IP_ADDRESS)
AUTO_INCLUDE_CREDD_IN_DAEMON_LIST = false
AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST = true
DAEMON_LIST = MASTER COLLECTOR NEGOTIATOR
DAEMON_SOCKET_DIR = auto
DC_DAEMON_LIST =  
GSI_DAEMON_CERT =  
GSI_DAEMON_DIRECTORY =  
GSI_DAEMON_KEY =  
GSI_DAEMON_NAME =  
GSI_DAEMON_PROXY =  
GSI_DAEMON_TRUSTED_CA_DIR =  
MASTER_DAEMON_AD_FILE =  
SCHEDD_DAEMON_AD_FILE = $(SPOOL)/.schedd_classad
SEC_DAEMON_AUTHENTICATION = REQUIRED
SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD
SEC_DAEMON_INTEGRITY = REQUIRED
SHARED_PORT_DAEMON_AD_FILE = $(LOCK)/shared_port_ad
START_DAEMONS =


wn001:~$ sudo grep -R DAEMON /etc/condor/*          
/etc/condor/config.d/50-security:SEC_DAEMON_AUTHENTICATION = REQUIRED
/etc/condor/config.d/50-security:SEC_DAEMON_INTEGRITY = REQUIRED
/etc/condor/config.d/50-security:SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD
/etc/condor/config.d/50-security:ALLOW_DAEMON = condor_pool@*/*, condor@*/$(IP_ADDRESS)

wn001:~$ condor_config_val -dump | grep DAEMON
ALLOW_DAEMON = condor_pool@*/*, condor@*/$(IP_ADDRESS)
AUTO_INCLUDE_CREDD_IN_DAEMON_LIST = false
AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST = true
DAEMON_LIST = MASTER, STARTD
DAEMON_SOCKET_DIR = auto
DC_DAEMON_LIST =  
GSI_DAEMON_CERT =  
GSI_DAEMON_DIRECTORY =  
GSI_DAEMON_KEY =  
GSI_DAEMON_NAME =  
GSI_DAEMON_PROXY =  
GSI_DAEMON_TRUSTED_CA_DIR =  
MASTER_DAEMON_AD_FILE =  
SCHEDD_DAEMON_AD_FILE = $(SPOOL)/.schedd_classad
SEC_DAEMON_AUTHENTICATION = REQUIRED
SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD
SEC_DAEMON_INTEGRITY = REQUIRED
SHARED_PORT_DAEMON_AD_FILE = $(LOCK)/shared_port_ad
START_DAEMONS =

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685