[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems matching jobs.



More to add on this troubleshooting: Intentionally I mistyped the submission file, this due to the inability of running condor_q –better in order to obtain all the requirements of my job. I got the message below. As you can see I never stipulate in my description file the requirement about the amount of memory. Where are these settings coming from?

Any input will be much appreciated.

Alex

 

Please see below:

Submitting job(s)

ERROR: Parse error in _expression_:

        Requirements = (((Arch == "INTEL" && OpSys == "WINNT51") || (Arch == " INTEL" && OpSys == "WINNT52"))) && (Disk >= DiskUsage) && ( (Memory * 1024) >= ImageSize )&& (HasFileTransfer) && (HasWindowsRunAsOwner && (LocalCredd =?= "centralmanager.domain.com:9620"))

                                                                   ^^^

Error in submit file

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
Sent: Monday, December 08, 2008 4:26 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Problems matching jobs.

 

Again, hello to all of you,

In addition to my previous e-mail I ran the condor_q –analyze and the results are:

084.049:  Run analysis summary.  Of 20 machines,

     19 are rejected by your job's requirements

      0 reject your job because of their own requirements

      1 match but are serving users with a better priority in the pool

      0 match but reject the job for unknown reasons

      0 match but will not currently preempt their existing job

      0 are available to run your job

When I run the condor_status I have the following results:

C:\WINDOWS\system32>condor_status

 

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

 

Computer1.domain.com WINNT51    INTEL  Unclaimed Idle     0.060  1022  0+00:45:03

Computer2.domain.com WINNT51    INTEL  Unclaimed Idle     0.230  1022  0+00:00:49

slot1@xxxxxxxxxxxxxxxx WINNT51    INTEL  Unclaimed Idle     0.000  1022  5+22:33:03

slot2@xxxxxxxxxxxxxxxx WINNT51    INTEL  Unclaimed Idle     0.030  1022  0+02:30:05

slot1@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+20:21:17

slot2@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  0+00:20:05

slot3@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+20:21:19

slot4@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+20:21:20

slot1@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+21:24:31

slot2@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+21:28:45

slot3@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  0+02:30:06

slot4@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+21:33:45

slot1@xxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+20:26:28

slot2@xxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  0+00:25:05

slot3@xxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+20:26:30

slot4@xxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  2+20:26:31

slot1@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  0+03:35:41

slot2@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  0+03:35:42

slot3@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.050   511  0+03:35:43

slot4@xxxxxxxxxxxxxxxx WINNT52    INTEL  Unclaimed Idle     0.000   511  0+00:25:07

 

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

 

       INTEL/WINNT51     4     0       0         4       0          0        0

       INTEL/WINNT52    16     0       0        16       0          0        0

 

               Total    20     0       0        20       0          0        0

Unfortunately, I am not a condor expert to fully understand what this error message is trying to tell me or what could be the best wayt to interpret it. Also when I tried to run condor_q –better I got the following message:

Sorry, the -better-analyze option is not available on this platform.

Due to the message, I know now there is something wrong on my job’s requirements that is preventing the job to match other nodes but I don’t know what? If anyone had experienced a similar issue and know more less how to get it to work, I really would appreciate your input,

Alex

 

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
Sent: Monday, December 08, 2008 12:59 PM
To: Condor-Users Mail List
Subject: [Condor-users] Problems matching jobs.

 

Hello to all of you,

I have a little issue with a type of job I am trying to submit. I have a condor pool of 20 nodes. I initially upgrade all the pool to version 7.05 but after reading all the issues that version was having with pre-empting jobs I decide to downgrade the central manager to version 7.01. The description file is the following way:

#########################################################################################

# Description file for Batch File for TESTING purposes

#########################################################################################

universe = vanilla

requirements = (Arch == "INTEL" && OpSys == "WINNT51") || \

                          (Arch == "INTEL" && OpSys == "WINNT52")

getenv = True

notify_user=usename@xxxxxxxxxx

initialdir = c:\condor\execute_bk

should_transfer_files = YES

when_to_transfer_output = ON_EXIT

Transfer_input_files = c:\windows\system32\systeminfo.exe

run_as_owner = true

executable = Batch4testv2.bat

output = Batch4testv3.out.$(Process)

error = Batch4testv3.err.$(Process)

log = Batch4testv3.log

queue 10

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

 

If the job is submitted like that It will only run on one machine, if I omit the run as owner line, it will run fine on all the different nodes. Not a problem as I said after removing the line. But this condor project was originally implemented to run jobs over network shares. For that I configured the pool to have a credd_host (which is the central manager) and the I created a condoruser with some reading and limited right to run those jobs. I set the condor_pool and the condoruser credentials\passwords on all the different computers set as execute machines. When I run the condor_store_cred query –c and condor_store_cred query –u condoruser all the computers come back saying:  A credential is stored and is valid. The description file is attached next. When I try to run this type of jobs it will only run on one computer, the same computer as the other jobs. If I remove the line RUN_AS_OWNER, the central manager will try to match the job with all the pool’s nodes but it will error out due to saying: Logon failure: unknown user name or bad password.

Anyone has any ideas what log should I look into to find answers or any suggestions to solve this issue are more than welcome,

Thanks in advance for your input,

Alex

 

###################################################

## DESCRIPTION FILE FOR CONDOR JOBS

## PREPARED BY ALEX ALAS

###################################################

 

UNIVERSE = VANILLA

REQUIREMENTS = (Arch == "INTEL" && OpSys == "WINNT51") || \

                               (Arch == "INTEL" && OpSys == "WINNT52")

GETENV = TRUE

NOTIFY_USER = username@xxxxxxxxxx

INITIALDIR = c:\condor\execute_bk

SHOULD_TRANSFER_FILES = YES

WHEN_TO_TRANSFER_OUTPUT = ON_EXIT

TRANSFER_INPUT_FILES = \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\lasEnvelop.exe

RUN_AS_OWNER = TRUE

EXECUTABLE =  \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\Batchfile_lasEnvelop1.bat

OUTPUT = Batchfile_lasEnvelop1.out.$(Process)

ERROR = Batchfile_lasEnvelop1.err.$(Process)

LOG = Batchfile_lasEnvelop1.log

QUEUE 25

 

 

Respectfully,

Alex Alas

Systems Administrator
Fugro EarthData Inc.

Tel. 301-948-8550 x219 Fax 301-963-2064 E-mail: aalas@xxxxxxxxxxxxx

7320 Executive Way, Frederick, MD  21704

Website: http://www.fugroearthdata.com