[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Windows, Credd, and run_as_owner question



Title: Message
Also, is CREDD_HOST defined in the condor_config for both machine A and machine B.
-----Original Message-----
From: Jones, Torrin A (US SSA)
Sent: Wednesday, December 05, 2007 12:38
To: 'Condor-Users Mail List'
Subject: RE: [Condor-users] Windows, Credd, and run_as_owner question

Did you also run condor_store_cred for the user you want to run as?
 
condor_store_cred add
 
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Valencia, Matthew C.
Sent: Wednesday, December 05, 2007 10:45
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] Windows, Credd, and run_as_owner question

Hi,
 
I'm trying to set up a simple Condor (6.9.5) pool where:
 
Machine A is the Collector / Negotiator / Submit machine
Machine B is the Execute machine
 
So far, I've been able to successfully run jobs *except* for when I set  'run_as_owner =  true' in the submit file.  When I do that, the jobs just sit in the queue (the output of condor_queue -an follows):
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
005.004:  Run analysis summary.  Of 2 machines,
      2 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
 
WARNING:  Be advised:
   No resources matched request's constraints
   Check the Requirements _expression_ below:
 
Requirements = (Arch == "INTEL") && (OpSys == "WINNT51") && (Disk >= DiskUsage)
&& ((Memory * 1024) >= ImageSize) && (HasFileTransfer) && (HasWindowsRunAsOwner
&& (LocalCredd =?= "A.dom1.jhuapl.edu"))
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
 
The only difference between the cases in which I use run_as_owner are the last two requirements (HasWindowsRunAsOwner and LocalCredd).  I verified that the ClassAd for Machine B has HasWindowsRunAsOwner = TRUE, but the LocalCredd doesn't appear to be defined.  I thought it likely that I messed something up in the configuration of credd, so I looked at the log file (ASDSUser is logged into machine A):
 
 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12/5 13:04:51 ******************************************************
12/5 13:04:52 ** condor_credd.exe (CONDOR_CREDD) STARTING UP
12/5 13:04:52 ** C:\condor\bin\condor_credd.exe
12/5 13:04:53 ** $CondorVersion: 6.9.5 Nov 28 2007 $
12/5 13:04:53 ** $CondorPlatform: INTEL-WINNT50 $
12/5 13:04:53 ** PID = 1716
12/5 13:04:53 ** Log last touched time unavailable (No such file or directory)
12/5 13:04:53 ******************************************************
12/5 13:04:53 Using config source: C:\condor\condor_config
12/5 13:04:53 Using local config sources:
12/5 13:04:53    C:\condor/condor_config.local
12/5 13:04:53    C:\condor/condor_config.local.credd
12/5 13:04:53 DaemonCore: Command Socket at <128.244.140.226:9620>
12/5 13:04:53 main_init() called
12/5 13:04:53 Calling Handler <<128.244.140.226:9618>>
12/5 13:04:53 ZKM: setting default map to (null)
12/5 13:04:53 Return from Handler <<128.244.140.226:9618>>
12/5 13:04:54 ZKM: setting default map to (null)
12/5 13:05:16 Calling Handler <DaemonCore::HandleReqSocketHandler>
12/5 13:05:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <128.244.140.110:4207>.
12/5 13:05:16 IO: Failed to read packet header
12/5 13:05:16 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <128.244.140.110:4207>.
12/5 13:05:16 IO: Failed to read packet header
12/5 13:05:16 AUTHENTICATE: handshake failed!
12/5 13:05:16 DC_AUTHENTICATE: authenticate failed: AUTHENTICATE:1002:Failure performing handshake|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
12/5 13:05:16 Return from Handler <DaemonCore::HandleReqSocketHandler>
12/5 13:06:34 Calling Handler <DaemonCore::HandleReqSocketHandler>
12/5 13:06:34 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:06:34 Calling HandleReq <store_cred_handler> (0)
12/5 13:06:34 Return from HandleReq <store_cred_handler>
12/5 13:06:34 Return from Handler <DaemonCore::HandleReqSocketHandler>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
I googled a bit and thought that I may have forgotten to set the condor_pool password.  So, I tried that (condor_store_cred -c -n A.dom1.jhuapl.edu add, condor_store_cred -c -n B.dom1.jhuapl.edu add).  I tried that and the same behavior occurred (although the condor_store_cred command did return with 'Operation succeeded.').  Here are the contents of my Machine A's MasterLog:
 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
12/5 13:04:47 SetEnvironmentVariable failed, errno=203
12/5 13:04:47 ******************************************************
12/5 13:04:47 ** Condor (CONDOR_MASTER) STARTING UP
12/5 13:04:47 ** C:\condor\bin\condor_master.exe
12/5 13:04:47 ** $CondorVersion: 6.9.5 Nov 28 2007 $
12/5 13:04:47 ** $CondorPlatform: INTEL-WINNT50 $
12/5 13:04:47 ** PID = 772
12/5 13:04:47 ** Log last touched time unavailable (No such file or directory)
12/5 13:04:47 ******************************************************
12/5 13:04:47 Using config source: C:\condor\condor_config
12/5 13:04:47 Using local config sources:
12/5 13:04:47    C:\condor/condor_config.local
12/5 13:04:47    C:\condor/condor_config.local.credd
12/5 13:04:47 DaemonCore: Command Socket at <128.244.140.226:3975>
12/5 13:04:48 Started DaemonCore process "C:\condor/bin/condor_collector.exe", pid and pgroup = 3788
12/5 13:04:51 Started DaemonCore process "C:\condor/bin/condor_negotiator.exe", pid and pgroup = 2536
12/5 13:04:51 Started DaemonCore process "C:\condor/bin/condor_schedd.exe", pid and pgroup = 2620
12/5 13:04:51 Started DaemonCore process "C:\condor/bin/condor_credd.exe", pid and pgroup = 1716
12/5 13:04:51 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:53 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:54 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:54 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:56 ZKM: setting default map to (null)
12/5 13:08:06 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:08:06 store_pool_cred: failed to receive all parameters
and my Machine A's CollectorLog (just in case that's important):

12/5 13:04:48 ******************************************************
12/5 13:04:48 ** condor_collector.exe (CONDOR_COLLECTOR) STARTING UP
12/5 13:04:48 ** C:\condor\bin\condor_collector.exe
12/5 13:04:48 ** $CondorVersion: 6.9.5 Nov 28 2007 $
12/5 13:04:48 ** $CondorPlatform: INTEL-WINNT50 $
12/5 13:04:48 ** PID = 3788
12/5 13:04:48 ** Log last touched time unavailable (No such file or directory)
12/5 13:04:48 ******************************************************
12/5 13:04:48 Using config source: C:\condor\condor_config
12/5 13:04:48 Using local config sources:
12/5 13:04:48    C:\condor/condor_config.local
12/5 13:04:48    C:\condor/condor_config.local.credd
12/5 13:04:48 DaemonCore: Command Socket at <128.244.140.226:9618>
12/5 13:04:48 In ViewServer::Init()
12/5 13:04:48 In CollectorDaemon::Init()
12/5 13:04:48 In ViewServer::Config()
12/5 13:04:48 In CollectorDaemon::Config()
12/5 13:04:48 enable: Creating stats hash table
12/5 13:04:49 ZKM: setting default map to ANONYMOUS LOGON@
12/5 13:04:51 ZKM: setting default map to (null)
12/5 13:04:52 MasterAd     : Inserting ** "< B.dom1.jhuapl.edu >"
12/5 13:04:52 stats: Inserting new hashent for 'Master':'B.dom1.jhuapl.edu':'128.244.140.110'
12/5 13:04:53 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:53 creating new table for type CredD
12/5 13:04:53 CredD: Inserting ** "< A.dom1.jhuapl.edu >"
12/5 13:04:53 stats: Inserting new hashent for 'CredD':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:04:54 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:54 (Sending 2 ads in response to query)
12/5 13:04:54 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:54 Got QUERY_STARTD_PVT_ADS
12/5 13:04:54 (Sending 0 ads in response to query)
12/5 13:04:54 NegotiatorAd  : Inserting ** "< A.dom1.jhuapl.edu >"
12/5 13:04:54 stats: Inserting new hashent for 'Negotiator':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:04:56 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:57 MasterAd     : Inserting ** "< A.dom1.jhuapl.edu >"
12/5 13:04:57 stats: Inserting new hashent for 'Master':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:04:58 ZKM: setting default map to SYSTEM@nt authority
12/5 13:04:58 ScheddAd     : Inserting ** "< A.dom1.jhuapl.edu , 128.244.140.226 >"
12/5 13:04:58 stats: Inserting new hashent for 'Schedd':'A.dom1.jhuapl.edu':'128.244.140.226'
12/5 13:05:03 DC_AUTHENTICATE: attempt to open invalid session A:2336:1196877304:8, failing.
12/5 13:05:04 ZKM: setting default map to ANONYMOUS LOGON@
12/5 13:05:04 WARNING:  No master ad for < slot2@xxxxxxxxxxxxxxxxx >
12/5 13:05:04 StartdAd     : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:05:04 stats: Inserting new hashent for 'Start':'slot2@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:04 StartdPvtAd  : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:05:04 stats: Inserting new hashent for 'StartdPvt':'slot2@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:08 Got INVALIDATE_STARTD_ADS
12/5 13:05:08   **** Removing stale ad: "< slot2@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:05:08 (Invalidated 1 ads)
12/5 13:05:08   **** Removing stale ad: "< slot2@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:05:08 (Invalidated 1 ads)
12/5 13:05:08 Got INVALIDATE_STARTD_ADS
12/5 13:05:08 (Invalidated 0 ads)
12/5 13:05:08 (Invalidated 0 ads)
12/5 13:05:15 ZKM: setting default map to ANONYMOUS LOGON@
12/5 13:05:15 (Sending 1 ads in response to query)
12/5 13:05:18 DaemonCore: Can't receive command request from 128.244.140.226 (perhaps a timeout?)
12/5 13:05:19 ZKM: setting default map to ANONYMOUS LOGON@
12/5 13:05:35 ZKM: setting default map to ANONYMOUS LOGON@
12/5 13:05:35 WARNING:  No master ad for < slot1@xxxxxxxxxxxxxxxxx >
12/5 13:05:35 StartdAd     : Inserting ** "< slot1@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:05:35 stats: Inserting new hashent for 'Start':'slot1@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:35 StartdPvtAd  : Inserting ** "< slot1@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:05:35 stats: Inserting new hashent for 'StartdPvt':'slot1@xxxxxxxxxxxxxxxxx':'128.244.140.110'
12/5 13:05:36 StartdAd     : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:05:36 StartdPvtAd  : Inserting ** "< slot2@xxxxxxxxxxxxxxxxx , 128.244.140.110 >"
12/5 13:06:15 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:06:15 Got QUERY_STARTD_ADS
12/5 13:06:15 (Sending 2 ads in response to query)
12/5 13:06:35 SubmittorAd  : Inserting ** "< ASDSUser@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx , 128.244.140.226 >"
12/5 13:06:35 stats: Inserting new hashent for 'Submittor':'ASDSUser@xxxxxxxxxxxxxxx':'128.244.140.226'
12/5 13:06:35 ZKM: setting default map to SYSTEM@nt authority
12/5 13:06:35 (Sending 1 ads in response to query)
12/5 13:06:36 (Sending 8 ads in response to query)
12/5 13:06:36 Got QUERY_STARTD_PVT_ADS
12/5 13:06:36 (Sending 2 ads in response to query)
12/5 13:06:40 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:06:40 Got QUERY_STARTD_ADS
12/5 13:06:40 (Sending 2 ads in response to query)
12/5 13:06:40 (Sending 1 ads in response to query)
12/5 13:08:06 ZKM: setting default map to ASDSUser@jhuapl
12/5 13:08:06 Got QUERY_MASTER_ADS
12/5 13:08:06 (Sending 1 ads in response to query)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

 

I would be very thankful if anyone could give me some suggestions,

Thanks, Matt