[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Scheduling problem



Hello,

I've installed the condor version 6.1.11. By now, I'm working with just
two nodes: a central manager (node01) and a node to run the jobs
(node03).

The problem appears when I try to submmit jobs from the central manager.
No job is scheduled in the node03, all are rejected. If I set the START
attribute in the central manager config file to FALSE (in order to force
jobs to be executed in the node03) no job is runned at all.

When I start the condor system in both nodes all seems right in the log
files, except:


 WARNING:  No master ad for < vm2@node03 >
9/15 20:20:45 StartdAd     : Inserting ** "< vm2@node03 , 192.168.1.3 >"
9/15 20:20:45 stats: Inserting new hashent for
'Start':'vm2@node03':'192.168.1.3'


I get this message for every cpu in the node. I also get this message
for the cpus in node01 (central manager) but this node can accept jobs.
When I submit a job, I get this message in the SchedLog:

9/15 20:37:26 Tables are consistent
9/15 20:37:26 Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs
rejected
9/15 20:39:05 IO: Failed to read packet header


The result of condor_status is (all the time):
Name          OpSys       Arch   State      Activity   LoadAv Mem
ActvtyTime

vm1@node01    LINUX       INTEL  Owner      Idle       0.000   252  0
+00:18:10
vm2@node01    LINUX       INTEL  Owner      Idle       0.000   252  0
+00:18:10
vm3@node01    LINUX       INTEL  Owner      Idle       0.000   252  0
+00:18:10
vm4@node01    LINUX       INTEL  Owner      Idle       0.000   252  0
+00:18:10
vm1@node03    LINUX       INTEL  Unclaimed  Idle       0.000   252  0
+00:17:06
vm2@node03    LINUX       INTEL  Unclaimed  Idle       0.000   252  0
+00:17:06
vm3@node03    LINUX       INTEL  Unclaimed  Idle       0.000   252  0
+00:17:06
vm4@node03    LINUX       INTEL  Unclaimed  Idle       0.000   252  0
+00:17:06

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        8     4       0         4       0          0

               Total        8     4       0         4       0          0


I've checked the manual but I'm not able to find the problem. Does
anyone know where the problem can be?


Thanks in advance