[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problems matching jobs.



Hello to all of you,

I have a little issue with a type of job I am trying to submit. I have a condor pool of 20 nodes. I initially upgrade all the pool to version 7.05 but after reading all the issues that version was having with pre-empting jobs I decide to downgrade the central manager to version 7.01. The description file is the following way:

#########################################################################################

# Description file for Batch File for TESTING purposes

#########################################################################################

universe = vanilla

requirements = (Arch == "INTEL" && OpSys == "WINNT51") || \

                          (Arch == "INTEL" && OpSys == "WINNT52")

getenv = True

notify_user=usename@xxxxxxxxxx

initialdir = c:\condor\execute_bk

should_transfer_files = YES

when_to_transfer_output = ON_EXIT

Transfer_input_files = c:\windows\system32\systeminfo.exe

run_as_owner = true

executable = Batch4testv2.bat

output = Batch4testv3.out.$(Process)

error = Batch4testv3.err.$(Process)

log = Batch4testv3.log

queue 10

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

 

If the job is submitted like that It will only run on one machine, if I omit the run as owner line, it will run fine on all the different nodes. Not a problem as I said after removing the line. But this condor project was originally implemented to run jobs over network shares. For that I configured the pool to have a credd_host (which is the central manager) and the I created a condoruser with some reading and limited right to run those jobs. I set the condor_pool and the condoruser credentials\passwords on all the different computers set as execute machines. When I run the condor_store_cred query –c and condor_store_cred query –u condoruser all the computers come back saying:  A credential is stored and is valid. The description file is attached next. When I try to run this type of jobs it will only run on one computer, the same computer as the other jobs. If I remove the line RUN_AS_OWNER, the central manager will try to match the job with all the pool’s nodes but it will error out due to saying: Logon failure: unknown user name or bad password.

Anyone has any ideas what log should I look into to find answers or any suggestions to solve this issue are more than welcome,

Thanks in advance for your input,

Alex

 

###################################################

## DESCRIPTION FILE FOR CONDOR JOBS

## PREPARED BY ALEX ALAS

###################################################

 

UNIVERSE = VANILLA

REQUIREMENTS = (Arch == "INTEL" && OpSys == "WINNT51") || \

                               (Arch == "INTEL" && OpSys == "WINNT52")

GETENV = TRUE

NOTIFY_USER = username@xxxxxxxxxx

INITIALDIR = c:\condor\execute_bk

SHOULD_TRANSFER_FILES = YES

WHEN_TO_TRANSFER_OUTPUT = ON_EXIT

TRANSFER_INPUT_FILES = \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\lasEnvelop.exe

RUN_AS_OWNER = TRUE

EXECUTABLE =  \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\Batchfile_lasEnvelop1.bat

OUTPUT = Batchfile_lasEnvelop1.out.$(Process)

ERROR = Batchfile_lasEnvelop1.err.$(Process)

LOG = Batchfile_lasEnvelop1.log

QUEUE 25

 

 

Respectfully,

Alex Alas

Systems Administrator
Fugro EarthData Inc.

Tel. 301-948-8550 x219 Fax 301-963-2064 E-mail: aalas@xxxxxxxxxxxxx

7320 Executive Way, Frederick, MD  21704

Website: http://www.fugroearthdata.com