[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] jobs fail to run, with "Warning: Found no submitters"



Hello.  I've been struggling with a problem that is basically identical to the
one described in this post from last year:

https://lists.cs.wisc.edu/archive/condor-users/pre-2004-June/msg01340.shtml

The problem is that I can submit jobs, but whatever jobs are submitted are
rejected by all available nodes.

My cluster consists of one dual-cpu head node, and three diskless client nodes:

------------------------
~> condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

node1.cluster LINUX       X86_64 Unclaimed  Idle       0.950   435[?????]
node2.cluster LINUX       X86_64 Unclaimed  Idle       1.120   435  0+00:53:42
node3.cluster LINUX       X86_64 Unclaimed  Idle       1.000   435  0+01:00:47
vm1@xxxxxxxxx LINUX       X86_64 Owner      Idle       1.000  1002  4+20:07:37
vm2@xxxxxxxxx LINUX       X86_64 Unclaimed  Idle       0.210  1002  0+00:00:00

                     Machines Owner Claimed Unclaimed Matched Preempting

        X86_64/LINUX        5     1       0         4       0          0

               Total        5     1       0         4       0          0
------------------------

The Condor setup is very simple, pretty much default.  The head node has the
following condo_config.local file:

------------------------
NETWORK_INTERFACE = 10.0.0.1
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
------------------------

and the other nodes are using the
<release_dir>/etc/examples/condor_config.local.dedicated.resource file which
specifies the DedicatedScheduler as the head node.

I have made a single executable to calculate pi to 10000 digits (which works
fine normally), which I am trying to submit with the following command file:

------------------------
Executable = pi2
output = pi2.out 
Log = pi2.log                                                    
Universe = vanilla
Queue
------------------------

The result is the following:

------------------------
~> condor_q -analyze
Warning:  Found no submitters


-- Submitter: zajos.cluster : <10.0.0.1:44160> : zajos.cluster
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
---
012.000:  Run analysis summary.  Of 5 machines,
      0 are rejected by your job's requirements
      3 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      2 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job

1 jobs; 1 idle, 0 running, 0 held
------------------------

Does any one have any idea what's going wrong.  I'm wondering what types of
misconfigurations to look for, or ways in which I can more specifically debug
what's going on.  Unfortunately the tread mentioned above ended with a phone
call instead of a posting to the list.  Any help would be most appreciated.

Thanks.

jamie.