[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] (no subject)




Hi HTCondor Community,

We use sssd for authentication. Previously nscd service will also be run. Recently we disabled the nscd service and found that FS Authentication fails frequently for some users on some of our submit machines. We have to frequently remove any running job on the affected submit machines and restart the condor service on those machines to make the job submission work again.

Any advice on how to troubleshoot and debug this kind of issue is appreciated.

Thanks

Here are the related condor settings that we set:
# Parameters with names that match sec:
DCSTATISTICS_WINDOW_SECONDS =Â
ENCRYPT_SECRETS = true
IGNORE_ATTEMPTS_TO_SET_SECURE_JOB_ATTRS = true
SEC_CLAIMTOBE_INCLUDE_DOMAIN = false
SEC_CLAIMTOBE_USER =Â
SEC_DEBUG_PRINT_KEYS = false
SEC_DEFAULT_AUTHENTICATION_METHODS = FS
SEC_DEFAULT_AUTHENTICATION_TIMEOUT = 10
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = true
SEC_INVALIDATE_SESSIONS_VIA_TCP = true
SEC_PASSWORD_DOMAIN =Â
SEC_PASSWORD_FILE =Â
SEC_SESSION_DURATION_SLOP = 20
SEC_TCP_SESSION_TIMEOUT = 20
SECURE_JOB_ATTRS =Â
STATISTICS_WINDOW_SECONDS = 1200
SYSTEM_SECURE_JOB_ATTRS = x509userProxySubject x509UserProxyEmail x509UserProxyVOName x509UserProxyFirstFQAN x509UserProxyFQAN
SCHEDD_DEBUG =ÂD_PID D_FULLDEBUG D_SECURITY

Here are the corresponding error messages that we saw in SchedLog:

05/11/18 11:35:51 (pid:1512632) ============ Begin clean_shadow_recs =============Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
05/11/18 11:35:51 (pid:1512632) ============ End clean_shadow_recs =============
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.40.243.245:49415>
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received following ClassAd:
NewSession = "YES"
Subsystem = "TOOL"
AuthMethods = "FS"
CryptoMethods = "3DES,BLOWFISH"
Authentication = "OPTIONAL"
Integrity = "OPTIONAL"
Command = 519
Encryption = "OPTIONAL"
ServerPid = 1586331
SessionDuration = "60"
OutgoingNegotiation = "PREFERRED"
Enact = "NO"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781 $"
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: our_policy:
SessionDuration = "86400"
AuthMethods = "FS"
Authentication = "REQUIRED"
Subsystem = "SCHEDD"
Enact = "NO"
ParentUniqueID = "htdsubmit1:1512588:1525992838"
Integrity = "OPTIONAL"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "REQUIRED"
Encryption = "OPTIONAL"
SessionLease = 3600
ServerPid = 1512632
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: the_policy:
Authentication = "YES"
Integrity = "NO"
SessionDuration = "60"
AuthMethodsList = "FS"
Encryption = "NO"
SessionLease = 3600
CryptoMethods = "3DES,BLOWFISH"
Enact = "YES"
AuthMethods = "FS"
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: generating 3DES key for session htdsubmit1:1512632:1526052955:1047...
05/11/18 11:35:55 (pid:1512632) SECMAN: Sending following response ClassAd:
Authentication = "YES"
Integrity = "NO"Â
SessionDuration = "60"Â
AuthMethodsList = "FS"Â
Encryption = "NO"Â
RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781 $"
SessionLease = 3600Â
CryptoMethods = "3DES,BLOWFISH"
Enact = "YES"
AuthMethods = "FS"Â
05/11/18 11:35:55 (pid:1512632) SECMAN: new session, doing initial authentication.
05/11/18 11:35:55 (pid:1512632) Returning to DC while we wait for socket to authenticate.
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authenticating RIGHT NOW.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: setting timeout for (unknown) to 10.Â
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: in authenticate( addr == '(unknown)', methods == 'FS')
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these methods: FS
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods = 'FS')
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the serverÂ
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods == 4)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 4)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method == 4)
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: will try to use 4 (FS)
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is 1.
05/11/18 11:35:55 (pid:1512632) FS: client template is /tmp/FS_XXXXXXXXX
05/11/18 11:35:55 (pid:1512632) FS: client filename is /tmp/FS_XXXZFbeht
05/11/18 11:35:55 (pid:1512632) Will return to DC because authentication is incomplete.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE_FS: used dir /tmp/FS_XXXZFbeht, status: 0
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is 0.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: method -1 (FS) failed.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these methods: FS
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods = 'FS')
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the server
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods == 0)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 0)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method == 0)
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: no available authentication methods succeeded!
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authentication of <10.40.243.245:49415> did not result in a valid mapped user name, which is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting.
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: reason for authentication failure: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1006:Unable to lookup uid 1262