[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Unable to run a standard universe job.



I got the standard universe job to run by dropping the submitter's
firewall. Vanilla jobs work just fine through our host firewalls thanks
to the shared port. Is there a way to constrain the ports required for
standard universe jobs for firewall transversal? 

Thanks!

-- 
Michael McInerny Murphy
IERUS Technologies, Inc.
2904 Westcorp Blvd., Suite 210
Huntsville, AL  35805
(O): (256) 319-2026 ext 107

-----Original Message-----
From: Collin Mehring <collin.mehring@xxxxxxxxxxxxxx>
Reply-To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Unable to run a standard universe job.
Date: Fri, 14 Jun 2019 10:23:38 -0700

Hi Michael,

>From the analyze output it seems like that machine is rejecting your
job. I would either check the START expression on that machine directly
(1) or do a reverse analyze with condor_q (2) to find out why.

1: condor_config_val -name bane.hq.ierustech.com -v START
2: condor_q 183.0 --better-analyze -reverse
-machine bane.hq.ierustech.com

Best,
Collin

On Fri, Jun 14, 2019 at 6:43 AM Michael Murphy <
Michael.Murphy@xxxxxxxxxxxxx> wrote:
> Greetings,
> 
> I am trying to run a standard job in our condor pool. However, I
> cannot get a test job to execute. The matchmaker is not finding a
> match even though my requirement only specifies a hostname. I have
> never run a standard job in our pool before. I am not sure it's
> configured properly. Here's my submit script:
> 
> universe = standard
> executable = ./Cicero_CC_12750
> should_transfer_files = YES
> Requirements = machine == "bane.hq.ierustech.com"
> when_to_transfer_output = ON_EXIT_OR_EVICT
> log = $(Cluster).log
> 
> input = test_run.inp
> output = test_run.out
> error = test_run.err
> transfer_input_files = test_run.inp
> queue
> 
> The executable is compiled FORTRAN code relinked with
> condor_compile. 
> 
> When I check the status and try to determine why it's not matched to
> the execute host I use 'condor_q -analyze -better <JOB ID>' with the
> following output:
> 
> [michael.murphy@banzai Condor_checkpoint_test]$ condor_q -better
> -analyze 183.0 
> -- Schedd: banzai.hq.ierustech.com : <192.168.6.67:9618?...
> The Requirements expression for job 183.000 is
> 
>     ( machine == "bane.hq.ierustech.com" ) && ( TARGET.Arch ==
> "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( ( CkptArch ==
> TARGET.Arch ) || ( CkptArch is undefined ) ) && ( ( CkptOpSys ==
> TARGET.OpSys ) ||
>       ( CkptOpSys is undefined ) ) && ( TARGET.Disk >= RequestDisk )
> && ( TARGET.Memory >= RequestMemory )
> 
> Job 183.000 defines the following attributes:
> 
>     DiskUsage = 3750
>     ImageSize = 3500
>     RequestDisk = DiskUsage
>     RequestMemory = ifthenelse(MemoryUsage =!=
> undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
> 
> The Requirements expression for job 183.000 reduces to these
> conditions:
> 
>          Slots
> Step    Matched  Condition
> -----  --------  ---------
> [0]           2  machine == "bane.hq.ierustech.com"
> [6]         560  CkptArch is undefined
> [10]        560  CkptOpSys is undefined
> 
> No successful match recorded.
> Last failed match: Fri Jun 14 08:24:48 2019
> 
> Reason for last match failure: no match found 
> 
> 183.000:  Run analysis summary ignoring user priority.  Of 560
> machines,
>     544 are rejected by your job's requirements 
>       2 reject your job because of their own requirements 
>      14 are exhausted partitionable slots 
>       0 match and are already running your jobs 
>       0 match but are serving other users 
>       0 are available to run your job
> 
> WARNING:  Be advised:
>    Job did not match any machines's constraints
>    To see why, pick a machine that you think should match and add
>      -reverse -machine <name>
>    to your query.
> 
> The submitting machine's name is "banzai.hq.ierustech.com" and the
> execution machine is called "bane.hq.ierustech.com".
> 
> Have I forgotten to specifiy some macros to enable std universe jobs?
> Thanks for your time.
> 
>  -- 
> Michael McInerny Murphy
> IERUS Technologies, Inc.
> 2904 Westcorp Blvd., Suite 210
> Huntsville, AL  35805
> (O): (256) 319-2026 ext 107
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/