[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 6.7.18 problem: Kerberos authentication issues post-upgrade



David McBride wrote:

It looks like it either cannot determine its local identity properly (note the "Client is condor@(null)" entry) or it is unable to process the local /etc/krb5.keytab file properly -- perhaps it is attempting to do so as the local 'condor' user, and not as root?

Follow up: interestingly, as soon as I (as root) had logged out of the machine Lightyear, Kerberos authentication started working again:

==> MasterLog <==
3/29 17:25:41 STARTCOMMAND: starting 2 to <146.169.1.113:9618> on UDP port 47920.
3/29 17:25:41 SECMAN: command 2 to <146.169.1.113:9618> on UDP port 47920.
3/29 17:25:41 SECMAN: command 60010 to <146.169.1.113:9618> on TCP port 55823.
3/29 17:25:41 SECMAN: new session, doing initial authentication.
3/29 17:25:41 SECMAN: Auth methods: KERBEROS
3/29 17:25:41 HANDSHAKE: in handshake(my_methods = 'KERBEROS')
3/29 17:25:41 HANDSHAKE: handshake() - i am the client
3/29 17:25:41 HANDSHAKE: sending (methods == 64) to server
3/29 17:25:41 HANDSHAKE: server replied (method = 64)
3/29 17:25:41 KERBEROS: krb5_unparse_name: host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx
3/29 17:25:41 KERBEROS: no user yet determined, will grab up to slash
3/29 17:25:41 KERBEROS: picked user: host
3/29 17:25:41 KERBEROS: remapping 'host' to 'condor'
3/29 17:25:41 unable to open map file (null), errno 14
3/29 17:25:41 Client is condor@(null)
3/29 17:25:41 KERBEROS: Server principal is host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx 3/29 17:25:41 init_daemon: client principal is 'host/lightyear.doc.ic.ac.uk@xxxxxxxxxxxx'
3/29 17:25:41 init_daemon: Using default keytab FILE:/etc/krb5.keytab
3/29 17:25:41 AUTH_ERROR: Internal credentials cache error
3/29 17:25:41 AUTHENTICATE: method 64 (KERBEROS) failed.
3/29 17:25:41 HANDSHAKE: in handshake(my_methods = '')
3/29 17:25:41 HANDSHAKE: handshake() - i am the client
3/29 17:25:41 HANDSHAKE: sending (methods == 0) to server
3/29 17:25:41 HANDSHAKE: server replied (method = 0)
3/29 17:25:41 AUTHENTICATE: no available authentication methods succeeded, failing!
3/29 17:25:41 SECMAN: unable to start session via TCP, failing.
3/29 17:25:41 ERROR: SECMAN:2004:Failed to start a session with TCP|AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS 3/29 17:26:41 STARTCOMMAND: starting 2 to <146.169.1.113:9618> on UDP port 47926.
3/29 17:26:41 SECMAN: command 2 to <146.169.1.113:9618> on UDP port 47926.
3/29 17:26:41 SECMAN: command 60010 to <146.169.1.113:9618> on TCP port 33339.
3/29 17:26:41 SECMAN: new session, doing initial authentication.
3/29 17:26:41 SECMAN: Auth methods: KERBEROS
3/29 17:26:41 HANDSHAKE: in handshake(my_methods = 'KERBEROS')
3/29 17:26:41 HANDSHAKE: handshake() - i am the client
3/29 17:26:41 HANDSHAKE: sending (methods == 64) to server
3/29 17:26:41 HANDSHAKE: server replied (method = 64)
3/29 17:26:41 KERBEROS: krb5_unparse_name: host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx
3/29 17:26:41 KERBEROS: no user yet determined, will grab up to slash
3/29 17:26:41 KERBEROS: picked user: host
3/29 17:26:41 KERBEROS: remapping 'host' to 'condor'
3/29 17:26:41 unable to open map file (null), errno 14
3/29 17:26:41 Client is condor@(null)
3/29 17:26:41 KERBEROS: Server principal is host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx 3/29 17:26:41 init_daemon: client principal is 'host/lightyear.doc.ic.ac.uk@xxxxxxxxxxxx'
3/29 17:26:41 init_daemon: Using default keytab FILE:/etc/krb5.keytab
3/29 17:26:41 init_daemon: Trying to get tgt credential
3/29 17:26:41 Success..........................
3/29 17:26:42 Remote host is 146.169.1.113
3/29 17:26:42 Authentication was a Success.
3/29 17:26:42 SECMAN: successfully enabled message authenticator!
3/29 17:26:42 SECMAN: added session skimmer:4921:1143649601:10363 to cache for 8640000 seconds.
3/29 17:26:42 SECMAN: startCommand succeeded.
3/29 17:26:42 SECMAN: using session skimmer:4921:1143649601:10363 for {<146.169.1.113:9618>,<2>}.
3/29 17:26:42 SECMAN: UDP, have_session == 1, can_neg == 1
3/29 17:26:42 SECMAN: successfully enabled message authenticator!
3/29 17:26:42 SECMAN: startCommand succeeded.

I suspect this is because all of the credential caches used by myself and mwj (who had also logged out by this point) had been kdestroy'd, suggesting that perhaps the modified Kerberos implementation in 6.7.18 doesn't always try to inspect the correct keytab / cache file?

However, this doesn't appear to have resolved the whole problem, as the jobs themselves are unable to set up the necessary file staging between Lightyear and the selected worker node:

(ray15.doc.ic.ac.uk is an execution node.)

==> ShadowLog <==
3/29 17:51:23 KEYCACHE: created: 0x8427b40
3/29 17:51:23 ******************************************************
3/29 17:51:23 ** condor_shadow (CONDOR_SHADOW) STARTING UP
3/29 17:51:23 ** /vol/condor/releases/6.7.18/linux-x86-glibc23/sbin/condor_shadow
3/29 17:51:23 ** $CondorVersion: 6.7.18 Mar 22 2006 $
3/29 17:51:23 ** $CondorPlatform: I386-LINUX_RH9 $
3/29 17:51:23 ** PID = 4570
3/29 17:51:23 ******************************************************
3/29 17:51:23 Using config file: /etc/condor/condor_config
3/29 17:51:23 Using local config files: /vol/condor/pool/doc/config/host/lightyear/condor_config.local /vol/condor/pool/doc/config/platform/INTEL.LINUX/condor_config.arch
3/29 17:51:23 DaemonCore: Command Socket at <146.169.1.103:39048>
3/29 17:51:23 Initializing a VANILLA shadow for job 283.0
3/29 17:51:23 (283.0) (4570): STARTCOMMAND: starting 444 to <146.169.49.115:33110> on TCP port 43344. 3/29 17:51:23 (283.0) (4570): SECMAN: command 444 to <146.169.49.115:33110> on TCP port 43344. 3/29 17:51:23 (283.0) (4570): SECMAN: new session, doing initial authentication.
3/29 17:51:23 (283.0) (4570): SECMAN: Auth methods: KERBEROS
3/29 17:51:23 (283.0) (4570): HANDSHAKE: in handshake(my_methods = 'KERBEROS')
3/29 17:51:23 (283.0) (4570): HANDSHAKE: handshake() - i am the client
3/29 17:51:23 (283.0) (4570): HANDSHAKE: sending (methods == 64) to server
3/29 17:51:23 (283.0) (4570): HANDSHAKE: server replied (method = 64)
3/29 17:51:23 (283.0) (4570): KERBEROS: krb5_unparse_name: host/ray15.doc.ic.ac.uk@xxxxxxxxxxxx 3/29 17:51:23 (283.0) (4570): KERBEROS: no user yet determined, will grab up to slash
3/29 17:51:23 (283.0) (4570): KERBEROS: picked user: host
3/29 17:51:23 (283.0) (4570): KERBEROS: remapping 'host' to 'condor'
3/29 17:51:23 (283.0) (4570): unable to open map file (null), errno 14
3/29 17:51:23 (283.0) (4570): Client is condor@(null)
3/29 17:51:23 (283.0) (4570): KERBEROS: Server principal is host/ray15.doc.ic.ac.uk@xxxxxxxxxxxx 3/29 17:51:23 (283.0) (4570): init_daemon: client principal is 'host/lightyear.doc.ic.ac.uk@xxxxxxxxxxxx' 3/29 17:51:23 (283.0) (4570): init_daemon: Using default keytab FILE:/etc/krb5.keytab
3/29 17:51:23 (283.0) (4570): AUTH_ERROR: Internal credentials cache error
3/29 17:51:23 (283.0) (4570): AUTHENTICATE: method 64 (KERBEROS) failed.
3/29 17:51:23 (283.0) (4570): HANDSHAKE: in handshake(my_methods = '')
3/29 17:51:23 (283.0) (4570): HANDSHAKE: handshake() - i am the client
3/29 17:51:23 (283.0) (4570): HANDSHAKE: sending (methods == 0) to server
3/29 17:51:23 (283.0) (4570): HANDSHAKE: server replied (method = 0)
3/29 17:51:23 (283.0) (4570): AUTHENTICATE: no available authentication methods succeeded, failing! 3/29 17:51:23 (283.0) (4570): ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS 3/29 17:51:23 (283.0) (4570): DCStartd::activateClaim: Failed to send command ACTIVATE_CLAIM to the startd
3/29 17:51:23 (283.0) (4570): Job 283.0 is being evicted

So again, there still seems to be some kind of credentical cache problem.

Cheers,
David
--
David McBride <dwm@xxxxxxxxxxxx>
Department of Computing, Imperial College, London