[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] jobs don't run when using condor_credd



Hi,
 
I am trying to set up condor_credd on Windows XP. I have a central manager machine (nes30700) and one submit/execute (ie. slave) machine (nes15300). The slave machine is configured to always run jobs:
 
=================================================================
> condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
vm1@NES30700. WINNT51     INTEL  Owner      Idle       0.040  1023  0+00:05:15
vm2@NES30700. WINNT51     INTEL  Owner      Idle       0.000  1023  0+00:05:16
nes15300.land     WINNT51     INTEL  Unclaimed  Idle       -0.010  1022  0+00:09:55
=================================================================
 
To run jobs I had to use "condor_store_cred" to set my password. I did this on both the central manager and slave manager. (Is that correct?)
Once that was done, I could successfully run a test program using condor_submit.
 
I want to use a shared filesystem, so I tried to set up condor_credd. I did the following:
1. copied the example file (etc/condor_config.local.credd) into condor_config.local in the condor main directory on both the central manager and the slave machines;
2. added the following lines to the condor_config file (on both the central manager and the slave machines):
    STARTER_ALLOW_RUNAS_OWNER = True
    CREDD_HOST = nes30700.lands.resnet.qg
    CREDD_CACHE_LOCALLY = True
    SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
3. Modified condor_config file (on both the central manager and the slave machines):
   COLLECTOR_NAME = QCCCE_condor
   where "QCCCE_condor" is the name of my condor pool
4. started condor on both the central manager and the slave machines (using net start condor)
The condor_master, condor_collector, condor_credd, condor_negotiator, condor_schedd and condor_startd) daemons started on both machines. I thought condor_negotiator and condor_collector were only supposed to run on the central manager machine, but they were running on the both the central manager and the slave machine.
5. added "run_as_owner = true" to the job config file
 
When I submit a job it appears in the queue but is "idle" and it doesn't get run:
=================================================================
> condor_q

-- Submitter: NES30700.lands.resnet.qg : <131.242.63.124:1144> : NES30700.lands.resnet.qg
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD              
   6.0   jeffreysj       3/7  14:07   0+00:00:00 I  0   9.8  output_name.exe  
 
1 jobs; 1 idle, 0 running, 0 held
=================================================================
 
This same job executed immediately before I installed the condor_credd.
 
The credd log file contains an authentication error:
 
=================================================================
3/8 11:53:30 ******************************************************
3/8 11:53:30 ** condor_credd.exe (CONDOR_CREDD) STARTING UP
3/8 11:53:30 ** D:\condor\bin\condor_credd.exe
3/8 11:53:30 ** $CondorVersion: 6.9.1 Jan  8 2007 $
3/8 11:53:30 ** $CondorPlatform: INTEL-WINNT50 $
3/8 11:53:30 ** PID = 2180
3/8 11:53:30 ** Log last touched 3/8 11:34:43
3/8 11:53:30 ******************************************************
3/8 11:53:30 Using config source: D:\condor\condor_config
3/8 11:53:30 Using local config sources:
3/8 11:53:30    D:\condor/condor_config.local
3/8 11:53:30 DaemonCore: Command Socket at <131.242.63.124:9620>
3/8 11:53:30 main_init() called
3/8 11:53:30 Calling Timer handler 0 (dc_touch_log_file)
3/8 11:53:31 Return from Timer handler 0 (dc_touch_log_file)
3/8 11:53:31 Calling Timer handler 1 (check_session_cache)
3/8 11:53:31 Return from Timer handler 1 (check_session_cache)
3/8 11:53:31 Calling Timer handler 2 (handle_cookie_refresh)
3/8 11:53:31 Return from Timer handler 2 (handle_cookie_refresh)
3/8 11:53:31 Calling Timer handler 3 (self_monitor)
3/8 11:53:31 Return from Timer handler 3 (self_monitor)
3/8 11:53:31 Calling Timer handler 6 (update_collector)
3/8 11:53:31 Return from Timer handler 6 (update_collector)
3/8 11:53:31 Calling Timer handler 5 (DaemonCore::SendAliveToParent)
3/8 11:53:31 Return from Timer handler 5 (DaemonCore::SendAliveToParent)
3/8 11:53:31 Calling Handler <<131.242.63.124:9618>>
3/8 11:53:31 Return from Handler <<131.242.63.124:9618>>
3/8 11:54:31 Calling Timer handler 7 (dc_touch_log_file)
3/8 11:54:31 Return from Timer handler 7 (dc_touch_log_file)
3/8 11:55:31 Calling Timer handler 8 (dc_touch_log_file)
3/8 11:55:31 Return from Timer handler 8 (dc_touch_log_file)
3/8 11:56:12 Calling Handler <DaemonCore::HandleReqSocketHandler>
3/8 11:56:12 getStoredCredential(): Could not locate credential for user 'condor_pool@xxxxxxxxxxxxxxx'
3/8 11:56:12 getStoredCredential(): Could not locate credential for user 'condor_pool@xxxxxxxxxxxxxxx'
3/8 11:56:32 AUTHENTICATE: no available authentication methods succeeded, failing!
3/8 11:56:32 DC_AUTHENTICATE: authenticate failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
3/8 11:56:32 Return from Handler <DaemonCore::HandleReqSocketHandler>
3/8 11:56:32 Calling Timer handler 9 (dc_touch_log_file)
3/8 11:56:32 Return from Timer handler 9 (dc_touch_log_file)
3/8 11:57:13 Calling Handler <DaemonCore::HandleReqSocketHandler>
3/8 11:57:13 Calling HandleReq <store_cred_handler> (0)
3/8 11:57:13 Return from HandleReq <store_cred_handler>
3/8 11:57:13 Return from Handler <DaemonCore::HandleReqSocketHandler>
3/8 11:57:31 Calling Timer handler 3 (self_monitor)
3/8 11:57:31 Return from Timer handler 3 (self_monitor)
3/8 11:57:32 Calling Timer handler 11 (dc_touch_log_file)
3/8 11:57:32 Return from Timer handler 11 (dc_touch_log_file)
=================================================================
 
 
Does anyone know what the problem could be?
 
cheers
steve

************************************************************************

The information in this e-mail together with any attachments is

intended only for the person or entity to which it is addressed

and may contain confidential and/or privileged material.

Any form of review, disclosure, modification, distribution

and/or publication of this e-mail message is prohibited.

If you have received this message in error, you are asked to

inform the sender as quickly as possible and delete this message

and any copies of this message from your computer and/or your

computer system network.

************************************************************************