[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] (no subject)



Are you saying that condor_submit is failing when you run it?  Or what are the symptoms you are seeing as a result of the FS failure?


It appears from the included SchedLog that the submit process is unable to create the file required in /tmp.

You can get a detailed log from the client side by setting the environment variable _condor_TOOL_DEBUG to D_ALL:2
Then as the user having trouble submitting, run:
  condor_ping -debug WRITE

This will essentially simulate a job submission and you can capture the stderr and look at where it is doing FS authentication.  Perhaps there is a clue there.  Otherwise please forward me the captured stderr (off-list) and I will see if I can diagnose the problem.  Thanks!

Cheers,
-zach


> -----Original Message-----
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of
> Weiming Shi
> Sent: Friday, May 11, 2018 11:06 AM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] (no subject)
> 
> 
> Hi HTCondor Community,
> 
> We use sssd for authentication. Previously nscd service will also be run.
> Recently we disabled the nscd service and found that FS Authentication
> fails frequently for some users on some of our submit machines. We have to
> frequently remove any running job on the affected submit machines and
> restart the condor service on those machines to make the job submission
> work again.
> 
> Any advice on how to troubleshoot and debug this kind of issue is
> appreciated.
> 
> Thanks
> 
> Here are the related condor settings that we set:
> # Parameters with names that match sec:
> DCSTATISTICS_WINDOW_SECONDS =
> ENCRYPT_SECRETS = true
> IGNORE_ATTEMPTS_TO_SET_SECURE_JOB_ATTRS = true
> SEC_CLAIMTOBE_INCLUDE_DOMAIN = false
> SEC_CLAIMTOBE_USER =
> SEC_DEBUG_PRINT_KEYS = false
> SEC_DEFAULT_AUTHENTICATION_METHODS = FS
> SEC_DEFAULT_AUTHENTICATION_TIMEOUT = 10
> SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = true
> SEC_INVALIDATE_SESSIONS_VIA_TCP = true
> SEC_PASSWORD_DOMAIN =
> SEC_PASSWORD_FILE =
> SEC_SESSION_DURATION_SLOP = 20
> SEC_TCP_SESSION_TIMEOUT = 20
> SECURE_JOB_ATTRS =
> STATISTICS_WINDOW_SECONDS = 1200
> SYSTEM_SECURE_JOB_ATTRS = x509userProxySubject x509UserProxyEmail
> x509UserProxyVOName x509UserProxyFirstFQAN x509UserProxyFQAN
> SCHEDD_DEBUG = D_PID D_FULLDEBUG D_SECURITY
> 
> 
> Here are the corresponding error messages that we saw in SchedLog:
> 
> 05/11/18 11:35:51 (pid:1512632) ============ Begin clean_shadow_recs
> =============
> 05/11/18 11:35:51 (pid:1512632) ============ End clean_shadow_recs
> =============
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received DC_AUTHENTICATE
> from <10.40.243.245:49415 <http://10.40.243.245:49415> >
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received following
> ClassAd:
> NewSession = "YES"
> Subsystem = "TOOL"
> AuthMethods = "FS"
> CryptoMethods = "3DES,BLOWFISH"
> Authentication = "OPTIONAL"
> Integrity = "OPTIONAL"
> Command = 519
> Encryption = "OPTIONAL"
> ServerPid = 1586331
> SessionDuration = "60"
> OutgoingNegotiation = "PREFERRED"
> Enact = "NO"
> SessionLease = 3600
> RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781 $"
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: our_policy:
> SessionDuration = "86400"
> AuthMethods = "FS"
> Authentication = "REQUIRED"
> Subsystem = "SCHEDD"
> Enact = "NO"
> ParentUniqueID = "htdsubmit1:1512588:1525992838"
> Integrity = "OPTIONAL"
> CryptoMethods = "3DES,BLOWFISH"
> OutgoingNegotiation = "REQUIRED"
> Encryption = "OPTIONAL"
> SessionLease = 3600
> ServerPid = 1512632
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: the_policy:
> Authentication = "YES"
> Integrity = "NO"
> SessionDuration = "60"
> AuthMethodsList = "FS"
> Encryption = "NO"
> SessionLease = 3600
> CryptoMethods = "3DES,BLOWFISH"
> Enact = "YES"
> AuthMethods = "FS"
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: generating 3DES key for
> session htdsubmit1:1512632:1526052955:1047...
> 05/11/18 11:35:55 (pid:1512632) SECMAN: Sending following response ClassAd:
> Authentication = "YES"
> Integrity = "NO"
> SessionDuration = "60"
> AuthMethodsList = "FS"
> Encryption = "NO"
> RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781 $"
> SessionLease = 3600
> CryptoMethods = "3DES,BLOWFISH"
> Enact = "YES"
> AuthMethods = "FS"
> 05/11/18 11:35:55 (pid:1512632) SECMAN: new session, doing initial
> authentication.
> 05/11/18 11:35:55 (pid:1512632) Returning to DC while we wait for socket to
> authenticate.
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authenticating RIGHT NOW.
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: setting timeout for (unknown)
> to 10.
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: in authenticate( addr ==
> '(unknown)', methods == 'FS')
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these methods:
> FS
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods = 'FS')
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the server
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods == 4)
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 4)
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method == 4)
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: will try to use 4 (FS)
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is 1.
> 05/11/18 11:35:55 (pid:1512632) FS: client template is /tmp/FS_XXXXXXXXX
> 05/11/18 11:35:55 (pid:1512632) FS: client filename is /tmp/FS_XXXZFbeht
> 05/11/18 11:35:55 (pid:1512632) Will return to DC because authentication is
> incomplete.
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE_FS: used dir
> /tmp/FS_XXXZFbeht, status: 0
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is 0.
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: method -1 (FS) failed.
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these methods:
> FS
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods = 'FS')
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the server
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods == 0)
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 0)
> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method == 0)
> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: no available authentication
> methods succeeded!
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authentication of
> <10.40.243.245:49415 <http://10.40.243.245:49415> > did not result in a
> valid mapped user name, which is required for this command (519
> QUERY_JOB_ADS_WITH_AUTH), so aborting.
> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: reason for authentication
> failure: AUTHENTICATE:1003:Failed to authenticate with any
> method|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1006:Unable to
> lookup uid 1262
> 
> 
>