[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] a question about the condor cluster: i can not determin if the submit machine is connected with the central manager!!!



condor-users,hi!
     

      at first,thanks for reading the question.  i installed condor on one machine as the cengtral manager as a manager and excute role.

   it run as follows: 

  [root@cngrid219 condor]# ps -ef| egrep condor

root      2720     1  0 Jun29 ?        00:00:10 condor_master
root      2721  2720  0 Jun29 ?        00:00:01 condor_collector -f
root      2722  2720  0 Jun29 ?        00:00:00 condor_negotiator -f
root      2723  2720  0 Jun29 ?        00:00:19 condor_startd -f
root      3483  3309  0 11:09 pts/0    00:00:00 grep -E condor

[root@cngrid219 condor]# condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@cngrid219    LINUX      INTEL  Owner     Idle     0.000  1007  0+00:05:04
slot2@cngrid219    LINUX      INTEL  Unclaimed Idle     0.000  1007  0+01:00:09

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     2     1       0         1       0          0        0

               Total     2     1       0         1       0          0        0



and then i installed condor on the other machine as the submit machine role: 

 it is running as follows:

[root@cngrid239 ~]# ps -ef | grep condor
condor    4550     1  3 11:53 ?        00:01:09 ./condor_master
condor    4551  4550  3 11:53 ?        00:00:56 condor_schedd -f
root      4552  4551  0 11:53 ?        00:00:00 condor_procd -A /tmp/condor-lock.cngrid2390.791864523737789/procd_pipe.SCHEDD -S 60 -C 501

 when i submit 10 job: 

    -- Submitter: cngrid239.localdomain : <127.0.0.1:32869> : cngrid239.localdomain
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   2.0   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.1   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.2   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.3   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.4   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.5   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.6   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.7   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.8   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       
   2.9   condor          6/30 12:08   0+00:00:00 I  0   0.0  nodejob.exe       

10 jobs; 10 idle, 0 running, 0 held

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!*****


ALL the job is idle. why? my job is so simple that it just print something out!!!!

is my submit machine cngird239 connected with the central manager 219?


i installed the 239 using: #condor-configure --install --type=submit --local-dir=/home/condor --central-manager=cngird219.xxxx
i have pinged the    cngird219.xxxx, it is ok!

who can tell me why? why are  the jobs idle not running? 

       thanks !!!!!


        

	
regards

             jiazhen zhang
        zhangjiazhen@xxxxxxxxxxxxxxxxxx
          2008-06-30