[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] problem in running condor with globus tool kit 4.0



I am having difficulty running Condor with the Globus toolkit. Need
advise URGENT please!!!!

Here're the sequence of steps I took

Step 1: Start database -

           /etc/init.d/postgresql start Step 2:
          gridoot>globus-start-container    
    
Step 3:

          condor>condor_master

Step 4:

          condor>grid-proxy-init
          condor>globus-personal-gatekeeper -start
      condor>condor_submit /usr/local/condor/testjobs/globusjob.submit

Step 5:   condor_q
          condor_q -globus


The response I get to "condor_q" is

-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   1.0   condor          9/2  16:11   0+00:00:00 I  0   0.0  date

However, I'm not sure what to do next. If I run the command "condor_q -

globu" (or any similar command of the form "condor_q
globusANYCHARACTERS (where ANYCHARACTERS are any random characters)" I
get a response of the form

-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
 ID      OWNER          STATUS  MANAGER  HOST                EXECUTABLE
   1.0   condor        UNSUBMITTED fork     pc-p31972.somedomain.co
/bin/date


In log file it saying taht Detected globus resource down at port 2119 I

dont know what is the reason ...............

**********************************************************
Kindly advise how to SUBMIT the above jobs over Globus
**********************************************************


By the way, the log file shows the following -
9/2 20:19:19 [6685] Resources down for more than 900 secs -- killing
GAHP
9/2 20:19:19 [6685] GAHP command 'RESULTS' failed
9/2 20:19:19 [6685] ERROR "Gahp Server (pid=6686) died due to signal 9
" at line 359 in file gahp-client.C
9/2 20:19:19 [6843] Resources down for 658 seconds!
9/2 20:19:35 [7274] Resources down for 238 seconds!
9/2 20:20:19 [6843] Resources down for 718 seconds!
9/2 20:20:34 ******************************************************
9/2 20:20:34 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
9/2 20:20:34 ** /usr/local/condor/sbin/condor_gridmanager
9/2 20:20:34 ** $CondorVersion: 6.6.10 Jun 13 2005 $
9/2 20:20:34 ** $CondorPlatform: I386-LINUX_RH80 $
9/2 20:20:34 ** PID = 7633
9/2 20:20:34 ******************************************************
9/2 20:20:34 Using config file: /home/condor/condor_config
9/2 20:20:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:20:34 DaemonCore: Command Socket at <192.168.2.140:38494>
9/2 20:20:34 [7633] GAHP server pid = 7634
9/2 20:20:35 [7274] Resources down for 298 seconds!
9/2 20:20:37 [7633] DaemonCore: Command received via UDP from host
<192.168.2.140:32795>
9/2 20:20:37 [7633] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:20:37 [7274] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:20:37 [7633] Found job 8.0 --- inserting
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] (8.0) proxy not cached yet, waiting...
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:21:19 [6843] Resources down for 778 seconds!
9/2 20:21:35 [7274] Resources down for 358 seconds!
9/2 20:21:35 [7633] Resources down for 58 seconds!
9/2 20:22:19 [6843] Resources down for 838 seconds!
9/2 20:22:35 [7274] Resources down for 418 seconds!
9/2 20:22:35 [7633] Resources down for 118 seconds!

<stuff deleted>

9/2 20:35:34 Using config file: /home/condor/condor_config
9/2 20:35:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:35:34 DaemonCore: Command Socket at <192.168.2.140:38875>
9/2 20:35:34 [7916] GAHP server pid = 7917
9/2 20:35:35 [7633] Resources down for 898 seconds!
9/2 20:35:37 [7916] DaemonCore: Command received via UDP from host
<192.168.2.14 0:32797>
9/2 20:35:37 [7916] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:35:37 [7633] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:35:37 [7916] Found job 6.0 --- inserting
9/2 20:35:37 [7916] Found job 7.0 --- inserting
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (7.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:36:19 [7682] Resources down for 778 seconds!
9/2 20:36:29 [7881] Resources down for 718 seconds!
9/2 20:36:29 [7883] Resources down for 718 seconds!
9/2 20:36:35 [7633] Resources down for more than 900 secs -- killing
AHP
/2 20:36:35 [7633] GAHP command 'RESULTS' failed
9/2 20:36:35 [7633] ERROR "Gahp Server (pid=7634) died due to signal 9"
at line 359 in file gahp-client.C
9/2 20:36:35 [7916] Resources down for 58 seconds!