[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor 7 + Windows Vista user policies



Dear all,

I am currently running some small tests on one single computer on which Windows Vista SP1 and Condor 7.0.3 have been installed previously. I am using this machine as management unit and computation node at the same time.

I was able to run my jobs successfully as long as I used a local adminis-trator account. However, my jobs have not been running anymore if I switch to a domain account with (restricted) administrator rights. All jobs have started running for a few seconds before being set to "idle" again. The condor output file state: "All jobs matched but rejected for unknown reasons". 

The following information was given in the "shadow" and "starterlog.slot1" log files.

Shadow log output:

8/18 16:21:04 ******************************************************
8/18 16:21:04 ** condor_shadow (CONDOR_SHADOW) STARTING UP
8/18 16:21:04 ** C:\condor\bin\condor_shadow.exe
8/18 16:21:04 ** $CondorVersion: 7.0.3 Jun 20 2008 BuildID: 91405 $
8/18 16:21:04 ** $CondorPlatform: INTEL-WINNT50 $
8/18 16:21:04 ** PID = 11408
8/18 16:21:04 ** Log last touched 8/18 15:21:00
8/18 16:21:04 ******************************************************
8/18 16:21:04 Using config source: C:\condor\condor_config
8/18 16:21:04 Using local config sources: 
8/18 16:21:04    C:\condor/condor_config.local
8/18 16:21:04 DaemonCore: Command Socket at <10.2.177.2:65237>
8/18 16:21:04 Initializing a VANILLA shadow for job 5.2
8/18 16:21:04 (5.0) (11596): condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.2.177.2:55898>.
8/18 16:21:04 (5.0) (11596): IO: Failed to read packet header
8/18 16:21:04 (5.0) (11596): Can no longer talk to condor_starter <10.2.177.2:55898>
8/18 16:21:04 (5.0) (11596): Trying to reconnect to disconnected job
8/18 16:21:04 (5.0) (11596): LastJobLeaseRenewal: 1219069264 Mon Aug 18 16:21:04 2008
8/18 16:21:04 (5.0) (11596): JobLeaseDuration: 1200 seconds
8/18 16:21:04 (5.0) (11596): JobLeaseDuration remaining: 1200
8/18 16:21:04 (5.0) (11596): Attempting to locate disconnected starter
8/18 16:21:04 (5.0) (11596): locateStarter(): ClaimId (<10.2.177.2:55898>#1218719401#179#4286547542) and GlobalJobId ( des-josef64.eu.trimblecorp.net#1219062330#5.0 ) not found
8/18 16:21:04 (5.0) (11596): Reconnect FAILED: Job not found at execution machine
8/18 16:21:04 (5.2) (11408): Request to run on <10.2.177.2:55898> was ACCEPTED
8/18 16:21:04 (5.0) (11596): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 107
8/18 16:21:05 (5.2) (11408): condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <10.2.177.2:55898>.
8/18 16:21:05 (5.2) (11408): IO: Failed to read packet header
8/18 16:21:05 (5.2) (11408): Can no longer talk to condor_starter <10.2.177.2:55898>
8/18 16:21:05 (5.2) (11408): Trying to reconnect to disconnected job
8/18 16:21:05 (5.2) (11408): LastJobLeaseRenewal: 1219069265 Mon Aug 18 16:21:05 2008
8/18 16:21:05 (5.2) (11408): JobLeaseDuration: 1200 seconds
8/18 16:21:05 (5.2) (11408): JobLeaseDuration remaining: 1200
8/18 16:21:05 (5.2) (11408): Attempting to locate disconnected starter
8/18 16:21:05 (5.2) (11408): locateStarter(): ClaimId (<10.2.177.2:55898>#1218719401#180#2803095825) and GlobalJobId ( des-josef64.eu.trimblecorp.net#1219062330#5.2 ) not found
8/18 16:21:05 (5.2) (11408): Reconnect FAILED: Job not found at execution machine
8/18 16:21:05 (5.2) (11408): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 107
8/18 16:21:08 (5.1) (8960): Attempting to locate disconnected starter
8/18 16:21:08 (5.1) (8960): locateStarter(): ClaimId (<10.2.177.2:55898>#1218719401#178#1006994303) and GlobalJobId ( des-josef64.eu.trimblecorp.net#1219062330#5.1 ) not found
8/18 16:21:08 (5.1) (8960): Reconnect FAILED: Job not found at execution machine
8/18 16:21:08 (5.1) (8960): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 107

StarterLog.slot1 output:

8/18 16:20:59 ******************************************************
8/18 16:20:59 ** condor_starter (CONDOR_STARTER) STARTING UP
8/18 16:20:59 ** C:\condor\bin\condor_starter.exe
8/18 16:20:59 ** $CondorVersion: 7.0.3 Jun 20 2008 BuildID: 91405 $
8/18 16:20:59 ** $CondorPlatform: INTEL-WINNT50 $
8/18 16:20:59 ** PID = 11628
8/18 16:20:59 ** Log last touched 8/18 15:16:00
8/18 16:20:59 ******************************************************
8/18 16:20:59 Using config source: C:\condor\condor_config
8/18 16:20:59 Using local config sources: 
8/18 16:20:59    C:\condor/condor_config.local
8/18 16:20:59 DaemonCore: Command Socket at <10.2.177.2:65235>
8/18 16:20:59 Setting resource limits not implemented!
8/18 16:21:04 Communicating with shadow <10.2.177.2:65224>
8/18 16:21:04 Submitting machine is "des-josef64.eu.trimblecorp.net"
8/18 16:21:04 setting the orig job name in starter
8/18 16:21:04 setting the orig job iwd in starter
8/18 16:21:04 ERROR: Could not locate valid credential for user 'jbraun@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
8/18 16:21:04 Could not initialize user_priv as "DES-JOSEF64.eu.timblecorp.net\jbraun".
	Make sure this account's password is securely stored with condor_store_cred.
8/18 16:21:04 ERROR: Failed to determine what user to run this job as, aborting
8/18 16:21:04 Failed to initialize JobInfoCommunicator, aborting
8/18 16:21:04 Unable to start job.
8/18 16:21:04 **** condor_starter (condor_STARTER) EXITING WITH STATUS 1


Does anyone have an idea what might be the reason for this problem? I would be grateful about any hint or additional information. Thank you very much in advance.

Best Regards / Mit freundlichen Grüßen

Thomas Laue
 
-- 
INPHO GmbH   *   Smaragdweg 1   *   70174 Stuttgart   *   Germany
phone: +49 711 2288 10  *  fax: +49 711 2288 111  *  web: www.inpho.de
place of business: Stuttgart    *   managing director: Johannes Saile
commercial register: Stuttgart, HRB 9586
Leader in Photogrammetry and Digital Surface Modelling
Please visit www.inpho.de