[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Unable to run a standard universe job.



Greetings,

I am trying to run a standard job in our condor pool. However, I cannot get a test job to execute. The matchmaker is not finding a match even though my requirement only specifies a hostname. I have never run a standard job in our pool before. I am not sure it's configured properly. Here's my submit script:

universe = standard
executable = ./Cicero_CC_12750
should_transfer_files = YES
Requirements = machine == "bane.hq.ierustech.com"
when_to_transfer_output = ON_EXIT_OR_EVICT
log = $(Cluster).log

input = test_run.inp
output = test_run.out
error = test_run.err
transfer_input_files = test_run.inp
queue

The executable is compiled FORTRAN code relinked with condor_compile. 

When I check the status and try to determine why it's not matched to the execute host I use 'condor_q -analyze -better <JOB ID>' with the following output:

[michael.murphy@banzai Condor_checkpoint_test]$ condor_q -better -analyze 183.0 
-- Schedd: banzai.hq.ierustech.com : <192.168.6.67:9618?...
The Requirements _expression_ for job 183.000 is

    ( machine == "bane.hq.ierustech.com" ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( ( CkptArch == TARGET.Arch ) || ( CkptArch is undefined ) ) && ( ( CkptOpSys == TARGET.OpSys ) ||
      ( CkptOpSys is undefined ) ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory )

Job 183.000 defines the following attributes:

    DiskUsage = 3750
    ImageSize = 3500
    RequestDisk = DiskUsage
    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)

The Requirements _expression_ for job 183.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]           2  machine == "bane.hq.ierustech.com"
[6]         560  CkptArch is undefined
[10]        560  CkptOpSys is undefined

No successful match recorded.
Last failed match: Fri Jun 14 08:24:48 2019

Reason for last match failure: no match found 

183.000:  Run analysis summary ignoring user priority.  Of 560 machines,
    544 are rejected by your job's requirements 
      2 reject your job because of their own requirements 
     14 are exhausted partitionable slots 
      0 match and are already running your jobs 
      0 match but are serving other users 
      0 are available to run your job

WARNING:  Be advised:
   Job did not match any machines's constraints
   To see why, pick a machine that you think should match and add
     -reverse -machine <name>
   to your query.


The submitting machine's name is "banzai.hq.ierustech.com" and the execution machine is called "bane.hq.ierustech.com".

Have I forgotten to specifiy some macros to enable std universe jobs? Thanks for your time.

-- 
Michael McInerny Murphy
IERUS Technologies, Inc.
2904 Westcorp Blvd., Suite 210
Huntsville, AL  35805
(O): (256) 319-2026 ext 107