[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Need help diagnosing flocking problems



Ben,

I believe the problem happens when authentication is attempted but fails between the negotiator and schedd. In your configuration, that is happening, because the schedd is using this:

SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD
SEC_NEGOTIATOR_AUTHENTICATION = OPTIONAL

but the negotiator is using this:

SEC_DEFAULT_AUTHENTICATION_METHODS = FS, CLAIMTOBE, ANONYMOUS
SEC_DEFAULT_AUTHENTICATION = PREFERRED

This policy results in Condor going through the motions of attempting authentication, discovering that there are no methods in common, and then emitted the following message in the schedd log with D_SECURITY enabled:

DC_SECURITY: authentication of 128.210.9.99 failed but was not required, so continuing.

I am investigating this code path to see if I can find a bug that would explain the behavior. In the mean time, I would suggest avoiding this code path by ensuring that authentication succeeds or is disabled.

--Dan

On 2/24/12 2:17 PM, Ben Cotton wrote:
On Thu, Feb 23, 2012 at 6:11 PM, Dan Bradley<dan@xxxxxxxxxxxx>  wrote:

I've reproduced the problem you reported.

I'm glad to know I'm not crazy (in this specific instance)

I haven't had time to fully understand what is going wrong.  I did find that
the following change in configuration made the problem go away:
I've set
SEC_DEFAULT_AUTHENTICATION_METHODS = FS
and I'm still seeing problems. Am I missing something else?

#
# Negotiator log
#
02/24/12 13:33:26 SECMAN: resume, other side is $CondorVersion: 7.6.4
Oct 20 2011 BuildID: 379441 $, NOT reauthenticating.
02/24/12 13:33:26 Getting authenticated user from cached session:
unauthenticated@unmapped
02/24/12 13:33:26 SECMAN: startCommand succeeded.
02/24/12 13:33:26 Authorizing server 'unauthenticated@unmapped/128.210.9.99'.
02/24/12 13:33:26 Got ads: 8 public and 1 private
02/24/12 13:33:26 Public ads include 1 submitter, 1 startd
02/24/12 13:33:26 Phase 2:  Performing accounting ...
02/24/12 13:33:26 Entering compute_significant_attrs()
02/24/12 13:33:26 Leaving compute_significant_attrs() -
result=JobUniverse,LastCheckpointPlatform,NumCkpts
02/24/12 13:33:26 Phase 3:  Sorting submitter ads by priority ...
02/24/12 13:33:26 Phase 4.1:  Negotiating with schedds ...
02/24/12 13:33:26     numSlots = 1
02/24/12 13:33:26     slotWeightTotal = 1.000000
02/24/12 13:33:26     pieLeft = 1.000
02/24/12 13:33:26     NormalFactor = 1.000000
02/24/12 13:33:26     MaxPrioValue = 0.799077
02/24/12 13:33:26     NumSubmitterAds = 1
02/24/12 13:33:26   Negotiating with bcotton@xxxxxxxxxxxxxxx at
<128.211.148.45:43260>
02/24/12 13:33:26 0 seconds so far
02/24/12 13:33:26   Calculating submitter limit with the following parameters
02/24/12 13:33:26     SubmitterPrio       = 0.799077
02/24/12 13:33:26     SubmitterPrioFactor = 1.000000
02/24/12 13:33:26     submitterShare      = 1.000000
02/24/12 13:33:26     submitterAbsShare   = 1.000000
02/24/12 13:33:26     submitterLimit    = 1.000000
02/24/12 13:33:26     submitterUsage    = 0.000000
02/24/12 13:33:26 Socket to bcotton@xxxxxxxxxxxxxxx
(<128.211.148.45:43260>) not in cache, creating one
02/24/12 13:33:26 SECMAN: command 416 NEGOTIATE to schedd
bcotton@xxxxxxxxxxxxxxx from TCP port 53927 (blocking).
02/24/12 13:33:26 SECMAN: using session carter-fe00:12491:1330098717:2
for {<128.211.148.45:43260>,<416>}.
02/24/12 13:33:26 SECMAN: found cached session id
carter-fe00:12491:1330098717:2 for {<128.211.148.45:43260>,<416>}.
Enact = "YES"
Subsystem = "NEGOTIATOR"
Sid = "carter-fe00:12491:1330098717:2"
CryptoMethods = "3DES,BLOWFISH"
TriedAuthentication = true
UseSession = "YES"
AuthMethodsList = ""
AuthMethods = "FS"
Encryption = "NO"
OutgoingNegotiation = "PREFERRED"
Integrity = "NO"
Command = 416
ValidCommands = "416,493,60007,1111,441,443,457,471"
SessionDuration = "86400"
User = "unauthenticated@unmapped"
CurrentTime = time()
MyRemoteUserName = "unauthenticated@unmapped"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 7.6.4 Oct 20 2011 BuildID: 379441 $"
AuthRequired = false
Authentication = "YES"
02/24/12 13:33:26 SECMAN: Security Policy:
Enact = "YES"
Subsystem = "NEGOTIATOR"
Sid = "carter-fe00:12491:1330098717:2"
CryptoMethods = "3DES,BLOWFISH"
TriedAuthentication = true
UseSession = "YES"
AuthMethodsList = ""
AuthMethods = "FS"
Encryption = "NO"
OutgoingNegotiation = "PREFERRED"
Integrity = "NO"
Command = 416
ValidCommands = "416,493,60007,1111,441,443,457,471"
SessionDuration = "86400"
User = "unauthenticated@unmapped"
CurrentTime = time()
MyRemoteUserName = "unauthenticated@unmapped"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 7.6.4 Oct 20 2011 BuildID: 379441 $"
AuthRequired = false
Authentication = "YES"
02/24/12 13:33:26 SECMAN: negotiating security for command 416.
02/24/12 13:33:26 SECMAN: sending DC_AUTHENTICATE command
02/24/12 13:33:26 SECMAN: sending following classad:
Enact = "YES"
Subsystem = "NEGOTIATOR"
Sid = "carter-fe00:12491:1330098717:2"
CryptoMethods = "3DES,BLOWFISH"
TriedAuthentication = true
UseSession = "YES"
AuthMethodsList = ""
AuthMethods = "FS"
Encryption = "NO"
ServerCommandSock = "<128.210.9.99:44546>"
OutgoingNegotiation = "PREFERRED"
Integrity = "NO"
Command = 416
ValidCommands = "416,493,60007,1111,441,443,457,471"
SessionDuration = "86400"
User = "unauthenticated@unmapped"
CurrentTime = time()
MyRemoteUserName = "unauthenticated@unmapped"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 7.6.4 Oct 20 2011 BuildID: 379441 $"
AuthRequired = false
Authentication = "YES"
02/24/12 13:33:26 SECMAN: resume, other side is $CondorVersion: 7.6.4
Oct 20 2011 BuildID: 379441 $, NOT reauthenticating.
02/24/12 13:33:26 Getting authenticated user from cached session:
unauthenticated@unmapped
02/24/12 13:33:26 SECMAN: startCommand succeeded.
02/24/12 13:33:26 Authorizing server 'unauthenticated@unmapped/128.211.148.45'.
02/24/12 13:33:26 SocketCache:  Found unused slot 0
02/24/12 13:33:26     Sending SEND_JOB_INFO/eom
02/24/12 13:33:26     Getting reply from schedd ...
02/24/12 13:33:26     Got JOB_INFO command; getting classad/eom
02/24/12 13:33:26     Request 00001.00000:
02/24/12 13:33:26 matchmakingAlgorithm: limit 1.000000 used 0.000000
pieLeft 1.000000
02/24/12 13:33:26       Sending PERMISSION, claim id, startdAd to schedd
02/24/12 13:33:26       Matched 1.0 bcotton@xxxxxxxxxxxxxxx
<128.211.148.45:43260>  preempting none<128.210.9.99:43282>
steele-cgf.rcac.purdue.edu
02/24/12 13:33:26       Notifying the accountant
02/24/12 13:33:26       Successfully matched with steele-cgf.rcac.purdue.edu
02/24/12 13:33:26     Over submitter resource limit (1.000000, used
1.000000) ... only consider startd ranks
02/24/12 13:33:26     Sending SEND_JOB_INFO/eom
02/24/12 13:33:26 condor_write(): Socket closed when trying to write
13 bytes to schedd bcotton@xxxxxxxxxxxxxxx, fd is 6
02/24/12 13:33:26 Buf::write(): condor_write() failed
02/24/12 13:33:26     Failed to send SEND_JOB_INFO/eom
02/24/12 13:33:26   Error: Ignoring submitter for this cycle
02/24/12 13:33:26  resources used by bcotton@xxxxxxxxxxxxxxx are 1.000000
02/24/12 13:33:26  resources used scheddUsed= 1.000000
02/24/12 13:33:26  negotiateWithGroup resources used scheddAds length 0
02/24/12 13:33:26 ---------- Finished Negotiation Cycle ----------
02/24/12 13:33:56 Getting monitoring info for pid 28217

02/24/12 13:33:26 SECMAN: other side is $CondorVersion: 7.6.4 Oct 20
2011 BuildID: 379441 $, NOT reauthenticating.
02/24/12 13:33:26 DC_AUTHENTICATE: not authenticating.
02/24/12 13:33:26 DC_AUTHENTICATE: Success.
02/24/12 13:33:26 PERMISSION GRANTED to unauthenticated@unmapped from
host 128.210.9.99 for command 416 (NEGOTIATE), access level
NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for
the full reason
02/24/12 13:33:26
02/24/12 13:33:26 Entered negotiate
02/24/12 13:33:26 Using negotiation protocol: NEGOTIATE
02/24/12 13:33:26 *** SwapSpace = 2147483647
02/24/12 13:33:26 *** ReservedSwap = 0
02/24/12 13:33:26 *** Shadow Size Estimate = 1800
02/24/12 13:33:26 *** Start Limit For Swap = 1193046
02/24/12 13:33:26 Negotiating for owner: bcotton@xxxxxxxxxxxxxxx
(flock level 1, pool steele-cgf.rcac.purdue.edu)
02/24/12 13:33:26
AutoCluster:config(JobUniverse,LastCheckpointPlatform,NumCkpts)
invoked
02/24/12 13:33:26 Reusing prioritized runnable job list because
nothing has changed.
02/24/12 13:33:26 Job 1.0: is runnable
02/24/12 13:33:26 Sent job 1.0 (autocluster=1) to the negotiator
02/24/12 13:33:26 Can't receive ClaimId from negotiator
02/24/12 13:33:26 Failed to send NEGOTIATE to<128.210.9.99:53927>:
02/24/12 13:33:51 DC_INVALIDATE_KEY: security session
carter-fe00:12491:1330108113:236 lifetime expired.
02/24/12 13:33:51 DC_INVALIDATE_KEY: removed key id
carter-fe00:12491:1330108113:236.

#
# SchedLog
#