[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs don't run



Hello, I have two machines. The machine1 will be my Central Manager and the machine2 will send jobs to be executed in the machine1. Below the configuration file and any informations about the machine1:

COLLECTOR_NAME = Collector at fenix
FILESYSTEM_DOMAIN = mydomain
SUSPEND = FALSE
LOCK = /tmp/condor-lock.$(HOSTNAME)0.401902962549688
JAVA_MAXHEAP_ARGUMENT =
CONDOR_ADMIN = root@machine1
START = TRUE
MAIL = /bin/mail
RELEASE_DIR = /opt/condor
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
COLLECTOR = $(SBIN)/condor_collector
PREEMPT = FALSE
UID_DOMAIN = mydomain
NEGOTIATOR = $(SBIN)/condor_negotiator
JAVA = /usr/bin/java
VACATE = FALSE
CONDOR_HOST = machine1
CONDOR_IDS = 509.509
LOCAL_DIR = /opt/condor/local.$(HOSTNAME)

[aryjr@machine1 ~]$ ps axu | grep condor
condor    1550  0.0  0.0  5120 2036 ?        Ss   Oct26   0:09 condor_master
condor    1551  0.0  0.0  5496 2340 ?        Ss   Oct26   0:00 condor_collector -f
condor    1552  0.0  0.0  5176 2096 ?        Ss   Oct26   0:00 condor_negotiator -f
condor    1553  0.0  0.0  6112 2672 ?        Ss   Oct26   0:32 condor_startd -f
condor    1554  0.0  0.0  6064 2700 ?        Ss   Oct26   0:00 condor_schedd -f
 
[aryjr@machine1 bin]$ ./condor_q
 
-- Submitter: machine1 : <192.168.1.182:48988> : machine1
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   1.0   aryjr          10/27 00:00   0+00:00:00 I  0   0.0  simple 4 10
1 jobs; 1 idle, 0 running, 0 held

[aryjr@machine1 bin]$ ./condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

vm1@machine1 LINUX       INTEL  Unclaimed  Idle       0.000   256  0+03:16:56
vm2@machine1 LINUX       INTEL  Unclaimed  Idle       0.000   256  0+03:16:46

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        2     0       0         2       0          0

               Total        2     0       0         2       0          0

Now the configuration file and any informations about the machine2:

FILESYSTEM_DOMAIN = mydomain
LOCK = /tmp/condor-lock.$(HOSTNAME)0.336442166547389
JAVA_MAXHEAP_ARGUMENT =
CONDOR_ADMIN = root@machine2
MAIL = /bin/mail
RELEASE_DIR = /opt/condor
DAEMON_LIST = MASTER,SCHEDD,STARTD
COLLECTOR = $(SBIN)/condor_collector
UID_DOMAIN = mydomain
NEGOTIATOR = $(SBIN)/condor_negotiator
JAVA = /usr/bin/java
CONDOR_HOST = machine1
CONDOR_IDS = 504.504
LOCAL_DIR = /opt/condor/local.$(HOSTNAME)
#COLLECTOR_NAME =
#PREEMPT =
#START =
#SUSPEND =
#VACATE =

[aryjr@machine2 ~]$ ps axu | grep condor
condor   20469  0.0  0.4  5128 2032 ?        Ss   Oct26   0:09 condor_master
condor   20470  0.0  0.5  6064 2628 ?        Ss   Oct26   0:00 condor_schedd -f
condor   20471  0.0  0.5  6140 2696 ?        Ss   Oct26   0:27 condor_startd -f

[aryjr@machine2 bin]$ ./condor_q

-- Submitter: machine2 : <192.168.1.107:33670> : machine2
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   3.0   aryjr          10/25 17:50   0+00:00:00 I  0   0.0  sh_loop 60
   4.0   aryjr          10/26 16:45   0+00:00:00 I  0   0.0  sh_loop 60

2 jobs; 2 idle, 0 running, 0 held

[aryjr@machine2 bin]$ ./condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

vm1@machine1 LINUX       INTEL  Unclaimed  Idle       0.000   256  0+03:20:14
vm2@machine1 LINUX       INTEL  Unclaimed  Idle       0.000   256  0+03:20:05

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        2     0       0         2       0          0

               Total        2     0       0         2       0          0

Please, how you can see, i've submitted tree jobs to Condor, two in the machine2 and one in the machine1 (the master). Why the jobs had not been sent to machine1? What's wrong with my configuration?

Thanks very much

Ary Junior