[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs are executed only on the submitting machines



Hello,

I'm new in Condor installing and configuring so I don't now if there is some
else to do to avoid this behaviour:
I installed the Condor 6.8.1 on a cluster build by 9 P4@xxxxxx with 1 master
host and 9 working nodes (basically on a Beowulf system) with Fedora Core 4
Linux. All nodes share /home and /opt. I followed the install procedure
choosing a full installation on the master node configuring it as the condor
control manager. All daemons on the controller starts up correctly.
In the CONTROL_CONFIG file I choose:
LOCAL_DIR               = /home/condor/hosts/$(HOSTNAME)
LOCAL_CONFIG_FILE       = $(RELEASE_DIR)/etc/$(HOSTNAME).local
REQUIRE_LOCAL_CONFIG_FILE = FALSE
HOSTALLOW_WRITE = *

Then I configured each working node defining the
CONDOR_HOME=/opt/condor-6.8.1, the
CODOR_CONFIG=/opt/condor-6.8.1/etc/condor_config. Condor starts up correctly
on each working node. The condor_status command shows all machines in the
pool:

Name          OpSys       Arch   State      Activity   LoadAv Mem
ActvtyTime

vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000  1012
0+20:51:56
vm2@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.070  1012
0+00:10:05
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:04
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:50
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:04
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:53
vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       1.000   250
0+22:20:50
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+02:50:08
vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       1.000   250
0+22:20:42
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+02:55:05
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:06
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:54
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   504
0+03:05:04
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   504
0+20:50:55
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:05
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:53
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.010   250
0+03:05:05
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:54

                     Total Owner Claimed Unclaimed Matched Preempting
Backfill

         INTEL/LINUX    18     3       0        15       0          0
0

               Total    18     3       0        15       0          0
0


It's appear working correctly, but if I submit a using the following script
with the command condor_submit -a "log = out.log" -a "error = error.log"
ex02.submit:

Executable     = /bin/hostname
Universe       = vanilla
Requirements   = OpSys == "LINUX" && Arch =="INTEL"
             Error   = err.$(Process)
             Output  = out.$(Process)
             Log = foo.log

Queue 50

The jobs are queued but executed only on the submitting machine.
I tried with more jobs, for example 500, with all machines unclaimed, but
nothing! If I submit from the master node all jobs are executed on the
master node, if I submit from the node01 all jobs are executed on the node01
and so on.

What is wrong?

Thank you,
  Raffaele