[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] (no subject)



Okay.  Just FYI the basic way FS authentication works is:

Server (SchedD) picks a path in /tmp (or somewhere else if configured) and asks the client (tool) to create a directory with that name.
Client creates directory and notifies server.
Server checks that file ownership of that directory matches who the client claims to be.

So, there's basically four things that could go wrong:
1) Client can't create directory.
2) Server can't find/stat directory to see ownership.
3) The client claimed it was something other than what the server detected.
4) Server can't map the client to a canonical name.

Based on the logs I've seen, it looks like 1,2, or 3.  Sounds like you ruled out #1.

You can get (insanely) details logs for the SchedD as well by setting SCHEDD_DEBUG=D_ALL:2 in your configuration.  WARNING: this is A LOT of debugging messages, but perhaps it will help us distinguish between #2 and #3.  You may need to increase the size of the log (MAX_SCHEDD_LOG) to prevent it from rotating too quickly.

Regarding #4:  In some sssd environments, I've see a situation where the client claims to be "zmiller@xxxxxxxxxxx@cs.wisc.edu" (with the two '@' signs).  HTCondor currently cannot handle multiple '@' signs in a name and so the mapping fails.

If you run "whoami" as the user from the command line, do you see something like "zmiller" or more like "zmiller@xxxxxxxxxxx"?


Cheers,
-zach

> -----Original Message-----
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of
> Weiming Shi
> Sent: Friday, May 11, 2018 2:46 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: Re: [HTCondor-users] (no subject)
> 
> Hi Zach,
> 
> Almost all the condor user commands will fail in the case of this
> authentication error. Sometimes I found condor_q worked but condor_submit
> failed for certain users.
> 
> This is also my first reaction. I did a check with a tool called
> inotifywatch and found that the file was actually created successfully.
> 
> Thanks for your advice on the further investigation. I will give it a trial
> when this issue shows up again. Usually it shows up several times a week.
> 
> 
> Thanks
> 
> 
> 
> On Fri, May 11, 2018 at 3:03 PM, Zach Miller <zmiller@xxxxxxxxxxx
> <mailto:zmiller@xxxxxxxxxxx> > wrote:
> 
> 
> 	Are you saying that condor_submit is failing when you run it?  Or
> what are the symptoms you are seeing as a result of the FS failure?
> 
> 
> 	It appears from the included SchedLog that the submit process is
> unable to create the file required in /tmp.
> 
> 	You can get a detailed log from the client side by setting the
> environment variable _condor_TOOL_DEBUG to D_ALL:2
> 	Then as the user having trouble submitting, run:
> 	  condor_ping -debug WRITE
> 
> 	This will essentially simulate a job submission and you can capture
> the stderr and look at where it is doing FS authentication.  Perhaps there
> is a clue there.  Otherwise please forward me the captured stderr (off-
> list) and I will see if I can diagnose the problem.  Thanks!
> 
> 	Cheers,
> 	-zach
> 
> 
> 
> 	> -----Original Message-----
> 	> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx
> <mailto:htcondor-users-bounces@xxxxxxxxxxx> > On Behalf Of
> 	> Weiming Shi
> 	> Sent: Friday, May 11, 2018 11:06 AM
> 	> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx
> <mailto:htcondor-users@xxxxxxxxxxx> >
> 	> Subject: [HTCondor-users] (no subject)
> 	>
> 	>
> 	> Hi HTCondor Community,
> 	>
> 	> We use sssd for authentication. Previously nscd service will also
> be run.
> 	> Recently we disabled the nscd service and found that FS
> Authentication
> 	> fails frequently for some users on some of our submit machines. We
> have to
> 	> frequently remove any running job on the affected submit machines
> and
> 	> restart the condor service on those machines to make the job
> submission
> 	> work again.
> 	>
> 	> Any advice on how to troubleshoot and debug this kind of issue is
> 	> appreciated.
> 	>
> 	> Thanks
> 	>
> 	> Here are the related condor settings that we set:
> 	> # Parameters with names that match sec:
> 	> DCSTATISTICS_WINDOW_SECONDS =
> 	> ENCRYPT_SECRETS = true
> 	> IGNORE_ATTEMPTS_TO_SET_SECURE_JOB_ATTRS = true
> 	> SEC_CLAIMTOBE_INCLUDE_DOMAIN = false
> 	> SEC_CLAIMTOBE_USER =
> 	> SEC_DEBUG_PRINT_KEYS = false
> 	> SEC_DEFAULT_AUTHENTICATION_METHODS = FS
> 	> SEC_DEFAULT_AUTHENTICATION_TIMEOUT = 10
> 	> SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = true
> 	> SEC_INVALIDATE_SESSIONS_VIA_TCP = true
> 	> SEC_PASSWORD_DOMAIN =
> 	> SEC_PASSWORD_FILE =
> 	> SEC_SESSION_DURATION_SLOP = 20
> 	> SEC_TCP_SESSION_TIMEOUT = 20
> 	> SECURE_JOB_ATTRS =
> 	> STATISTICS_WINDOW_SECONDS = 1200
> 	> SYSTEM_SECURE_JOB_ATTRS = x509userProxySubject x509UserProxyEmail
> 	> x509UserProxyVOName x509UserProxyFirstFQAN x509UserProxyFQAN
> 	> SCHEDD_DEBUG = D_PID D_FULLDEBUG D_SECURITY
> 	>
> 	>
> 	> Here are the corresponding error messages that we saw in SchedLog:
> 	>
> 	> 05/11/18 11:35:51 (pid:1512632) ============ Begin
> clean_shadow_recs
> 	> =============
> 	> 05/11/18 11:35:51 (pid:1512632) ============ End clean_shadow_recs
> 	> =============
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received
> DC_AUTHENTICATE
> 
> 	> from <10.40.243.245:49415 <http://10.40.243.245:49415>
> <http://10.40.243.245:49415> >
> 
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received
> following
> 	> ClassAd:
> 	> NewSession = "YES"
> 	> Subsystem = "TOOL"
> 	> AuthMethods = "FS"
> 	> CryptoMethods = "3DES,BLOWFISH"
> 	> Authentication = "OPTIONAL"
> 	> Integrity = "OPTIONAL"
> 	> Command = 519
> 	> Encryption = "OPTIONAL"
> 	> ServerPid = 1586331
> 	> SessionDuration = "60"
> 	> OutgoingNegotiation = "PREFERRED"
> 	> Enact = "NO"
> 	> SessionLease = 3600
> 	> RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781
> $"
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: our_policy:
> 	> SessionDuration = "86400"
> 	> AuthMethods = "FS"
> 	> Authentication = "REQUIRED"
> 	> Subsystem = "SCHEDD"
> 	> Enact = "NO"
> 	> ParentUniqueID = "htdsubmit1:1512588:1525992838"
> 	> Integrity = "OPTIONAL"
> 	> CryptoMethods = "3DES,BLOWFISH"
> 	> OutgoingNegotiation = "REQUIRED"
> 	> Encryption = "OPTIONAL"
> 	> SessionLease = 3600
> 	> ServerPid = 1512632
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: the_policy:
> 	> Authentication = "YES"
> 	> Integrity = "NO"
> 	> SessionDuration = "60"
> 	> AuthMethodsList = "FS"
> 	> Encryption = "NO"
> 	> SessionLease = 3600
> 	> CryptoMethods = "3DES,BLOWFISH"
> 	> Enact = "YES"
> 	> AuthMethods = "FS"
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: generating 3DES
> key for
> 	> session htdsubmit1:1512632:1526052955:1047...
> 	> 05/11/18 11:35:55 (pid:1512632) SECMAN: Sending following response
> ClassAd:
> 	> Authentication = "YES"
> 	> Integrity = "NO"
> 	> SessionDuration = "60"
> 	> AuthMethodsList = "FS"
> 	> Encryption = "NO"
> 	> RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781
> $"
> 	> SessionLease = 3600
> 	> CryptoMethods = "3DES,BLOWFISH"
> 	> Enact = "YES"
> 	> AuthMethods = "FS"
> 	> 05/11/18 11:35:55 (pid:1512632) SECMAN: new session, doing initial
> 	> authentication.
> 	> 05/11/18 11:35:55 (pid:1512632) Returning to DC while we wait for
> socket to
> 	> authenticate.
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authenticating
> RIGHT NOW.
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: setting timeout for
> (unknown)
> 	> to 10.
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: in authenticate(
> addr ==
> 	> '(unknown)', methods == 'FS')
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these
> methods:
> 	> FS
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods
> = 'FS')
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the
> server
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods ==
> 4)
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 4)
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method
> == 4)
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: will try to use 4
> (FS)
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is
> 1.
> 	> 05/11/18 11:35:55 (pid:1512632) FS: client template is
> /tmp/FS_XXXXXXXXX
> 	> 05/11/18 11:35:55 (pid:1512632) FS: client filename is
> /tmp/FS_XXXZFbeht
> 	> 05/11/18 11:35:55 (pid:1512632) Will return to DC because
> authentication is
> 	> incomplete.
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE_FS: used dir
> 	> /tmp/FS_XXXZFbeht, status: 0
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is
> 0.
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: method -1 (FS)
> failed.
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these
> methods:
> 	> FS
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods
> = 'FS')
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the
> server
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods ==
> 0)
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 0)
> 	> 05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method
> == 0)
> 	> 05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: no available
> authentication
> 	> methods succeeded!
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authentication of
> 
> 	> <10.40.243.245:49415 <http://10.40.243.245:49415>
> <http://10.40.243.245:49415> > did not result in a
> 	> valid mapped user name, which is required for this command (519
> 	> QUERY_JOB_ADS_WITH_AUTH), so aborting.
> 	> 05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: reason for
> authentication
> 	> failure: AUTHENTICATE:1003:Failed to authenticate with any
> 	> method|AUTHENTICATE:1004:Failed to authenticate using
> FS|FS:1006:Unable to
> 	> lookup uid 1262
> 	>
> 	>
> 	>
> 
> 
> 	_______________________________________________
> 	HTCondor-users mailing list
> 	To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> <mailto:htcondor-users-request@xxxxxxxxxxx>  with a
> 	subject: Unsubscribe
> 	You can also unsubscribe by visiting
> 	https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
> 
> 	The archives can be found at:
> 	https://lists.cs.wisc.edu/archive/htcondor-users/
> <https://lists.cs.wisc.edu/archive/htcondor-users/>
> 
>