[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] High rate of random "Failed to securely exchange session key" errors in 8.6.x



In case this is helpful, below is an example of what happens in MasterLog with "MASTER_DEBUG = D_SECURITY" when "condor_reconfig" is successful [1] and when it fails [2].

Regards,
Andrew.

[1]
05/15/17 21:46:36 DC_AUTHENTICATE: received DC_AUTHENTICATE from <X.X.X.X:17558>
05/15/17 21:46:36 DC_AUTHENTICATE: generating 3DES key for session hostname:11227:1494881196:63...
05/15/17 21:46:36 SECMAN: new session, doing initial authentication.
05/15/17 21:46:36 Returning to DC while we wait for socket to authenticate.
05/15/17 21:46:36 AUTHENTICATE: setting timeout for (unknown) to 20.
05/15/17 21:46:36 HANDSHAKE: in handshake(my_methods = 'PASSWORD,FS')
05/15/17 21:46:36 HANDSHAKE: handshake() - i am the server
05/15/17 21:46:36 HANDSHAKE: client sent (methods == 516)
05/15/17 21:46:36 HANDSHAKE: i picked (method == 512)
05/15/17 21:46:36 HANDSHAKE: client received (method == 512)
05/15/17 21:46:36 PW.
05/15/17 21:46:36 Will return to DC because authentication is incomplete.
05/15/17 21:46:36 PASSWORD: entered authenticate_continue, state==100
05/15/17 21:46:36 PW: Server receiving 1.
05/15/17 21:46:36 Received: 0, 27(condor_pool@domain), 256
05/15/17 21:46:36 PW: Server fetching password.
05/15/17 21:46:36 PW: Server generating rb.
05/15/17 21:46:36 PW: Server sending.
05/15/17 21:46:36 In server_send: 0.
05/15/17 21:46:36 Calculating hkt 'condor_pool@domain' (27), 'condor_pool@domain' (27).
05/15/17 21:46:36 Server send 'condor_pool@domain', 'condor_pool@domain', 256 256 20
05/15/17 21:46:36 PASSWORD: leaving authenticate_continue, state==101, return=2
05/15/17 21:46:36 AUTHENTICATE: auth would still block
05/15/17 21:46:36 Will return to DC to continue authentication..
05/15/17 21:46:36 PASSWORD: entered authenticate_continue, state==101
05/15/17 21:46:36 PW: Server receiving 2.
05/15/17 21:46:36 PW: Server checking hk.
05/15/17 21:46:36 In calculate_hk.
05/15/17 21:46:36 Setting session key.
05/15/17 21:46:36 Key length: 20
05/15/17 21:46:36 PW: Server set session key.
05/15/17 21:46:36 PASSWORD: leaving authenticate_continue, state==101, return=1
05/15/17 21:46:36 Authentication was a Success.
05/15/17 21:46:36 ZKM: setting default map to condor_pool@domain
05/15/17 21:46:36 ZKM: post-map: current user is 'condor_pool'
05/15/17 21:46:36 ZKM: post-map: current domain is 'domain'
05/15/17 21:46:36 ZKM: post-map: current FQU is 'condor_pool@domain'
05/15/17 21:46:36 AUTHENTICATE: Exchanging keys with remote side.
05/15/17 21:46:36 In Condor_Auth_Passwd::wrap.
05/15/17 21:46:36 AUTHENTICATE: Result of end of authenticate is 1.
05/15/17 21:46:36 DC_AUTHENTICATE: authentication of X.X.X.X complete.
05/15/17 21:46:36 DC_AUTHENTICATE: message authenticator enabled with key id hostname:11227:1494881196:63.
05/15/17 21:46:36 DC_AUTHENTICATE: Success.


[2]
05/15/17 21:46:37 DC_AUTHENTICATE: received DC_AUTHENTICATE from <X.X.X.X:28260>
05/15/17 21:46:37 DC_AUTHENTICATE: generating 3DES key for session hostname:11227:1494881197:64...
05/15/17 21:46:37 SECMAN: new session, doing initial authentication.
05/15/17 21:46:37 Returning to DC while we wait for socket to authenticate.
05/15/17 21:46:37 AUTHENTICATE: setting timeout for (unknown) to 20.
05/15/17 21:46:37 HANDSHAKE: in handshake(my_methods = 'PASSWORD,FS')
05/15/17 21:46:37 HANDSHAKE: handshake() - i am the server
05/15/17 21:46:37 HANDSHAKE: client sent (methods == 516)
05/15/17 21:46:37 HANDSHAKE: i picked (method == 512)
05/15/17 21:46:37 HANDSHAKE: client received (method == 512)
05/15/17 21:46:37 PW.
05/15/17 21:46:37 PW: Server receiving 1.
05/15/17 21:46:37 Received: 0, 27(condor_pool@domain), 256
05/15/17 21:46:37 PW: Server fetching password.
05/15/17 21:46:37 PW: Server generating rb.
05/15/17 21:46:37 PW: Server sending.
05/15/17 21:46:37 In server_send: 0.
05/15/17 21:46:37 Calculating hkt 'condor_pool@domain' (27), 'condor_pool@domain' (27).
05/15/17 21:46:37 Server send 'condor_pool@domain', 'condor_pool@domain', 256 256 20
05/15/17 21:46:37 Authentication was a Success.
05/15/17 21:46:37 ZKM: setting default map to (null)
05/15/17 21:46:37 ZKM: post-map: current user is '(null)'
05/15/17 21:46:37 ZKM: post-map: current domain is '(null)'
05/15/17 21:46:37 ZKM: post-map: current FQU is '(null)'
05/15/17 21:46:37 AUTHENTICATE: Exchanging keys with remote side.
05/15/17 21:46:37 In Condor_Auth_Passwd::wrap.
05/15/17 21:46:37 AUTHENTICATE: Result of end of authenticate is 0.
05/15/17 21:46:37 DC_AUTHENTICATE: required authentication of X.X.X.X failed: AUTHENTICATE:1005:Failed to securely exchange session key

________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of andrew.lahiff@xxxxxxxxxx [andrew.lahiff@xxxxxxxxxx]
Sent: Monday, May 15, 2017 7:49 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] High rate of random "Failed to securely exchange session key" errors in 8.6.x

Hi,

I was recently upgrading lots of worker nodes from 8.6.1 to 8.6.2, and found that despite having:

MASTER_NEW_BINARY_RESTART=PEACEFUL

jobs were being killed on some (but not all) of the worker nodes. I then noticed that "condor_reconfig" randomly fails. Here's an example running it 4 times on a worker node:

[root@lcg1377 ~]# condor_reconfig
Sent "Reconfig" command to local master
[root@lcg1377 ~]# condor_reconfig
Sent "Reconfig" command to local master
[root@lcg1377 ~]# condor_reconfig
Sent "Reconfig" command to local master
[root@lcg1377 ~]# condor_reconfig
ERROR
AUTHENTICATE:1005:Failed to securely exchange session key
Can't send Reconfig command to local master

And with the "-debug" option:

[root@lcg1377 ~]# condor_reconfig -debug
05/15/17 19:41:12 condor_read() failed: recv(fd=3) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <X.X.X.X:9618>.
05/15/17 19:41:12 IO: Failed to read packet header
05/15/17 19:41:12 SECMAN: required authentication with <X.X.X.X:9618> failed, so aborting command DC_RECONFIG_FULL.
ERROR
AUTHENTICATE:1005:Failed to securely exchange session key
Can't send Reconfig command to local master

Today I upgraded the central managers to 8.6.2, and just noticed large numbers of "Failed to securely exchange session key" errors in CollectorLog and NegotiatorLog. Despite these frequent errors things are generally working fine.

We are using FS and PASSWORD authentication.

Has anyone else seen this? With 8.4.x I don't ever remember seeing issues like this.

Thanks,
Andrew.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/