[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Need help diagnosing flocking problems



On Thu, Feb 23, 2012 at 6:11 PM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:

> I've reproduced the problem you reported.
>
I'm glad to know I'm not crazy (in this specific instance)

> I haven't had time to fully understand what is going wrong.  I did find that
> the following change in configuration made the problem go away:

I've set
> SEC_DEFAULT_AUTHENTICATION_METHODS = FS
and I'm still seeing problems. Am I missing something else?

#
# Negotiator log
#
02/24/12 13:33:26 SECMAN: resume, other side is $CondorVersion: 7.6.4
Oct 20 2011 BuildID: 379441 $, NOT reauthenticating.
02/24/12 13:33:26 Getting authenticated user from cached session:
unauthenticated@unmapped
02/24/12 13:33:26 SECMAN: startCommand succeeded.
02/24/12 13:33:26 Authorizing server 'unauthenticated@unmapped/128.210.9.99'.
02/24/12 13:33:26 Got ads: 8 public and 1 private
02/24/12 13:33:26 Public ads include 1 submitter, 1 startd
02/24/12 13:33:26 Phase 2:  Performing accounting ...
02/24/12 13:33:26 Entering compute_significant_attrs()
02/24/12 13:33:26 Leaving compute_significant_attrs() -
result=JobUniverse,LastCheckpointPlatform,NumCkpts
02/24/12 13:33:26 Phase 3:  Sorting submitter ads by priority ...
02/24/12 13:33:26 Phase 4.1:  Negotiating with schedds ...
02/24/12 13:33:26     numSlots = 1
02/24/12 13:33:26     slotWeightTotal = 1.000000
02/24/12 13:33:26     pieLeft = 1.000
02/24/12 13:33:26     NormalFactor = 1.000000
02/24/12 13:33:26     MaxPrioValue = 0.799077
02/24/12 13:33:26     NumSubmitterAds = 1
02/24/12 13:33:26   Negotiating with bcotton@xxxxxxxxxxxxxxx at
<128.211.148.45:43260>
02/24/12 13:33:26 0 seconds so far
02/24/12 13:33:26   Calculating submitter limit with the following parameters
02/24/12 13:33:26     SubmitterPrio       = 0.799077
02/24/12 13:33:26     SubmitterPrioFactor = 1.000000
02/24/12 13:33:26     submitterShare      = 1.000000
02/24/12 13:33:26     submitterAbsShare   = 1.000000
02/24/12 13:33:26     submitterLimit    = 1.000000
02/24/12 13:33:26     submitterUsage    = 0.000000
02/24/12 13:33:26 Socket to bcotton@xxxxxxxxxxxxxxx
(<128.211.148.45:43260>) not in cache, creating one
02/24/12 13:33:26 SECMAN: command 416 NEGOTIATE to schedd
bcotton@xxxxxxxxxxxxxxx from TCP port 53927 (blocking).
02/24/12 13:33:26 SECMAN: using session carter-fe00:12491:1330098717:2
for {<128.211.148.45:43260>,<416>}.
02/24/12 13:33:26 SECMAN: found cached session id
carter-fe00:12491:1330098717:2 for {<128.211.148.45:43260>,<416>}.
Enact = "YES"
Subsystem = "NEGOTIATOR"
Sid = "carter-fe00:12491:1330098717:2"
CryptoMethods = "3DES,BLOWFISH"
TriedAuthentication = true
UseSession = "YES"
AuthMethodsList = ""
AuthMethods = "FS"
Encryption = "NO"
OutgoingNegotiation = "PREFERRED"
Integrity = "NO"
Command = 416
ValidCommands = "416,493,60007,1111,441,443,457,471"
SessionDuration = "86400"
User = "unauthenticated@unmapped"
CurrentTime = time()
MyRemoteUserName = "unauthenticated@unmapped"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 7.6.4 Oct 20 2011 BuildID: 379441 $"
AuthRequired = false
Authentication = "YES"
02/24/12 13:33:26 SECMAN: Security Policy:
Enact = "YES"
Subsystem = "NEGOTIATOR"
Sid = "carter-fe00:12491:1330098717:2"
CryptoMethods = "3DES,BLOWFISH"
TriedAuthentication = true
UseSession = "YES"
AuthMethodsList = ""
AuthMethods = "FS"
Encryption = "NO"
OutgoingNegotiation = "PREFERRED"
Integrity = "NO"
Command = 416
ValidCommands = "416,493,60007,1111,441,443,457,471"
SessionDuration = "86400"
User = "unauthenticated@unmapped"
CurrentTime = time()
MyRemoteUserName = "unauthenticated@unmapped"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 7.6.4 Oct 20 2011 BuildID: 379441 $"
AuthRequired = false
Authentication = "YES"
02/24/12 13:33:26 SECMAN: negotiating security for command 416.
02/24/12 13:33:26 SECMAN: sending DC_AUTHENTICATE command
02/24/12 13:33:26 SECMAN: sending following classad:
Enact = "YES"
Subsystem = "NEGOTIATOR"
Sid = "carter-fe00:12491:1330098717:2"
CryptoMethods = "3DES,BLOWFISH"
TriedAuthentication = true
UseSession = "YES"
AuthMethodsList = ""
AuthMethods = "FS"
Encryption = "NO"
ServerCommandSock = "<128.210.9.99:44546>"
OutgoingNegotiation = "PREFERRED"
Integrity = "NO"
Command = 416
ValidCommands = "416,493,60007,1111,441,443,457,471"
SessionDuration = "86400"
User = "unauthenticated@unmapped"
CurrentTime = time()
MyRemoteUserName = "unauthenticated@unmapped"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 7.6.4 Oct 20 2011 BuildID: 379441 $"
AuthRequired = false
Authentication = "YES"
02/24/12 13:33:26 SECMAN: resume, other side is $CondorVersion: 7.6.4
Oct 20 2011 BuildID: 379441 $, NOT reauthenticating.
02/24/12 13:33:26 Getting authenticated user from cached session:
unauthenticated@unmapped
02/24/12 13:33:26 SECMAN: startCommand succeeded.
02/24/12 13:33:26 Authorizing server 'unauthenticated@unmapped/128.211.148.45'.
02/24/12 13:33:26 SocketCache:  Found unused slot 0
02/24/12 13:33:26     Sending SEND_JOB_INFO/eom
02/24/12 13:33:26     Getting reply from schedd ...
02/24/12 13:33:26     Got JOB_INFO command; getting classad/eom
02/24/12 13:33:26     Request 00001.00000:
02/24/12 13:33:26 matchmakingAlgorithm: limit 1.000000 used 0.000000
pieLeft 1.000000
02/24/12 13:33:26       Sending PERMISSION, claim id, startdAd to schedd
02/24/12 13:33:26       Matched 1.0 bcotton@xxxxxxxxxxxxxxx
<128.211.148.45:43260> preempting none <128.210.9.99:43282>
steele-cgf.rcac.purdue.edu
02/24/12 13:33:26       Notifying the accountant
02/24/12 13:33:26       Successfully matched with steele-cgf.rcac.purdue.edu
02/24/12 13:33:26     Over submitter resource limit (1.000000, used
1.000000) ... only consider startd ranks
02/24/12 13:33:26     Sending SEND_JOB_INFO/eom
02/24/12 13:33:26 condor_write(): Socket closed when trying to write
13 bytes to schedd bcotton@xxxxxxxxxxxxxxx, fd is 6
02/24/12 13:33:26 Buf::write(): condor_write() failed
02/24/12 13:33:26     Failed to send SEND_JOB_INFO/eom
02/24/12 13:33:26   Error: Ignoring submitter for this cycle
02/24/12 13:33:26  resources used by bcotton@xxxxxxxxxxxxxxx are 1.000000
02/24/12 13:33:26  resources used scheddUsed= 1.000000
02/24/12 13:33:26  negotiateWithGroup resources used scheddAds length 0
02/24/12 13:33:26 ---------- Finished Negotiation Cycle ----------
02/24/12 13:33:56 Getting monitoring info for pid 28217

02/24/12 13:33:26 SECMAN: other side is $CondorVersion: 7.6.4 Oct 20
2011 BuildID: 379441 $, NOT reauthenticating.
02/24/12 13:33:26 DC_AUTHENTICATE: not authenticating.
02/24/12 13:33:26 DC_AUTHENTICATE: Success.
02/24/12 13:33:26 PERMISSION GRANTED to unauthenticated@unmapped from
host 128.210.9.99 for command 416 (NEGOTIATE), access level
NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for
the full reason
02/24/12 13:33:26
02/24/12 13:33:26 Entered negotiate
02/24/12 13:33:26 Using negotiation protocol: NEGOTIATE
02/24/12 13:33:26 *** SwapSpace = 2147483647
02/24/12 13:33:26 *** ReservedSwap = 0
02/24/12 13:33:26 *** Shadow Size Estimate = 1800
02/24/12 13:33:26 *** Start Limit For Swap = 1193046
02/24/12 13:33:26 Negotiating for owner: bcotton@xxxxxxxxxxxxxxx
(flock level 1, pool steele-cgf.rcac.purdue.edu)
02/24/12 13:33:26
AutoCluster:config(JobUniverse,LastCheckpointPlatform,NumCkpts)
invoked
02/24/12 13:33:26 Reusing prioritized runnable job list because
nothing has changed.
02/24/12 13:33:26 Job 1.0: is runnable
02/24/12 13:33:26 Sent job 1.0 (autocluster=1) to the negotiator
02/24/12 13:33:26 Can't receive ClaimId from negotiator
02/24/12 13:33:26 Failed to send NEGOTIATE to <128.210.9.99:53927>:
02/24/12 13:33:51 DC_INVALIDATE_KEY: security session
carter-fe00:12491:1330108113:236 lifetime expired.
02/24/12 13:33:51 DC_INVALIDATE_KEY: removed key id
carter-fe00:12491:1330108113:236.

#
# SchedLog
#

-- 
Ben Cotton
Systems Research Engineer
IT Research Systems
Purdue University