[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Need help running Condor with Globus Toolkit



Hi,
  I am vey to new to condor world. I have installed the condor 6.6.9 and
globus tool kit 4.0 on my system and creted all the necesary certificates.so
when i run my job using globus-job-run command then the job executed well
but when i am trying to run the using condor_submit then it displays the
message that job is submited but when i check the status using condor_q it
shows me that job is ideal.whne I check with condor_globus then it shows me
the job status unsubmited and in the grid manager log file it shows me that
detected globus resource is down...

I dont what it means please also give some suggestion how we setup machines
in a condor-G pool or a simple grid .

Here're the sequence of steps I took

Step 1: Start database -

                     /etc/init.d/postgresql start

Step 2:

                     globus>globus-start-container

Step 3:

                     condor>condor_master

Step 4:

               condor>grid-proxy-init
               condor>globus-personal-gatekeeper -start
               condor> condor_submit
/usr/local/condor/testjobs/globusjob.submit

Step 5:
               condor_q
               condor_q -globus


The response I get to "condor_q" is

-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   1.0   condor          9/2  16:11   0+00:00:00 I  0   0.0  date

However, I'm not sure what to do next. If I run the command "condor_q
globu" (or any similar command of the form "condor_q globusANYCHARACTERS
(where ANYCHARACTERS are any random characters)" I get a response of the
form

-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
ID   OWNER  STATUS       MANAGER     HOST                         EXECUTABLE
1.0  condor  UNSUBMITTED fork         pc-p31972.somedomain.co     /bin/date


**********************************************************
Kindly advise how to SUBMIT the above jobs over Globus
**********************************************************


By the way, the log file shows the following -
9/2 20:19:19 [6685] Resources down for more than 900 secs -- killing
GAHP
9/2 20:19:19 [6685] GAHP command 'RESULTS' failed
9/2 20:19:19 [6685] ERROR "Gahp Server (pid=6686) died due to signal 9
" at line 359 in file gahp-client.C
9/2 20:19:19 [6843] Resources down for 658 seconds!
9/2 20:19:35 [7274] Resources down for 238 seconds!
9/2 20:20:19 [6843] Resources down for 718 seconds!
9/2 20:20:34 ******************************************************
9/2 20:20:34 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
9/2 20:20:34 ** /usr/local/condor/sbin/condor_gridmanager
9/2 20:20:34 ** $CondorVersion: 6.6.10 Jun 13 2005 $
9/2 20:20:34 ** $CondorPlatform: I386-LINUX_RH80 $
9/2 20:20:34 ** PID = 7633
9/2 20:20:34 ******************************************************
9/2 20:20:34 Using config file: /home/condor/condor_config
9/2 20:20:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:20:34 DaemonCore: Command Socket at <192.168.2.140:38494>
9/2 20:20:34 [7633] GAHP server pid = 7634
9/2 20:20:35 [7274] Resources down for 298 seconds!
9/2 20:20:37 [7633] DaemonCore: Command received via UDP from host
<192.168.2.140:32795>
9/2 20:20:37 [7633] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:20:37 [7274] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:20:37 [7633] Found job 8.0 --- inserting
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] (8.0) proxy not cached yet, waiting...
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:21:19 [6843] Resources down for 778 seconds!
9/2 20:21:35 [7274] Resources down for 358 seconds!
9/2 20:21:35 [7633] Resources down for 58 seconds!
9/2 20:22:19 [6843] Resources down for 838 seconds!
9/2 20:22:35 [7274] Resources down for 418 seconds!
9/2 20:22:35 [7633] Resources down for 118 seconds!

<stuff deleted>

9/2 20:35:34 Using config file: /home/condor/condor_config
9/2 20:35:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:35:34 DaemonCore: Command Socket at <192.168.2.140:38875>
9/2 20:35:34 [7916] GAHP server pid = 7917
9/2 20:35:35 [7633] Resources down for 898 seconds!
9/2 20:35:37 [7916] DaemonCore: Command received via UDP from host
<192.168.2.14 0:32797>
9/2 20:35:37 [7916] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:35:37 [7633] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:35:37 [7916] Found job 6.0 --- inserting
9/2 20:35:37 [7916] Found job 7.0 --- inserting
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (7.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:36:19 [7682] Resources down for 778 seconds!
9/2 20:36:29 [7881] Resources down for 718 seconds!
9/2 20:36:29 [7883] Resources down for 718 seconds!
9/2 20:36:35 [7633] Resources down for more than 900 secs -- killing
GAHP
9/2 20:36:35 [7633] GAHP command 'RESULTS' failed
9/2 20:36:35 [7633] ERROR "Gahp Server (pid=7634) died due to signal 9"
at line 359 in file gahp-client.C
9/2 20:36:35 [7916] Resources down for 58 seconds!

Thanks i Advance........................