[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] unable to submit to GT3.2



Hi,

I did a fresh install of Condor from VDT-1.3.3, this time installing Ant
and JDK, sourcing setup.sh and then proceeding with installing Condor.
This solved the problem I was facing earlier with the apache/java bug.
But I am still unable to successfully submit a job using the GT3.2
webservices environment. Below is the Gridmanager log. The problem is
that though the log says it sees the GAHP's PID, it thinks the server is
not up. The process list shows that the GAHP and Gridmanager process are
running (when job has been submitted and remains in the Idle state).

ps awux | grep condor
vdt      11137  0.1  0.2  5420 2080 ?        S    15:03   0:00
/opt/vdt-1.3.3/condor/sbin/condor_master
vdt      11138  0.3  0.2  5732 2324 ?        S    15:03   0:00
condor_collector -f
vdt      11139  0.1  0.2  5524 2156 ?        S    15:03   0:00
condor_negotiator -f
vdt      11140  1.0  0.2  6624 2808 ?        S    15:03   0:00
condor_schedd -f
vdt      11141 24.7  0.2  6224 2420 ?        R    15:03   0:02
condor_startd -f
murali   11157  1.0  0.2  6744 2976 ?        S    15:03   0:00
condor_gridmanager -f -C (Owner=?="murali"&&JobUniverse==9) -S
/tmp/condor_g_scratch.0x8439880.11140
murali   11159 73.0  1.7 228196 17560 ?      S    15:03   0:00
/opt/vdt-1.3.3/jdk1.4/bin/java
-Dorg.globus.ogsa.server.webroot=/opt/vdt-1.3.3/condor/lib/gt3
condor.gahp.Gahp

Any help appreciated.

thanks,
Murali

----------------
Gridmanager Log
----------------

4/1 15:03:56 passwd_cache::cache_uid(): getpwnam("condor") failed:
Success
 
4/1 15:03:56 passwd_cache::cache_uid(): getpwnam("condor") failed:
Success
 
4/1 15:03:56 ******************************************************
4/1 15:03:56 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
4/1 15:03:56 ** /opt/vdt-1.3.3/condor/sbin/condor_gridmanager
4/1 15:03:56 ** $CondorVersion: 6.7.6 Mar 15 2005 $
4/1 15:03:56 ** $CondorPlatform: I386-LINUX_RH9 $
4/1 15:03:56 ** PID = 11157
4/1 15:03:56 ******************************************************
4/1 15:03:56 Using config file: /opt/vdt-1.3.3/condor/etc/condor_config
4/1 15:03:56 Using local config files:
/opt/vdt-1.3.3/condor/local.pleiades/condor_config.local
4/1 15:03:56 DaemonCore: Command Socket at <128.118.2.93:34063>
4/1 15:03:59 [11157] DaemonCore: Command received via UDP from host
<128.118.2.93:32809>
4/1 15:03:59 [11157] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
4/1 15:03:59 [11157] Found job 3.0 --- inserting
4/1 15:03:59 [11157] (3.0) doEvaluateState called: gmState GM_INIT,
globusState 32
4/1 15:03:59 [11157] GAHP server pid = 11159
4/1 15:04:02 [11157] gahp server not up yet, delaying ping
4/1 15:04:02 [11157] (3.0) doEvaluateState called: gmState GM_SUBMIT,
globusState 32
4/1 15:04:07 [11157] gahp server not up yet, delaying ping
4/1 15:04:12 [11157] gahp server not up yet, delaying ping
4/1 15:04:17 [11157] gahp server not up yet, delaying ping
4/1 15:04:22 [11157] gahp server not up yet, delaying ping
4/1 15:04:27 [11157] gahp server not up yet, delaying ping
4/1 15:04:32 [11157] gahp server not up yet, delaying ping
4/1 15:04:37 [11157] gahp server not up yet, delaying ping
[snipped the repeated messages]
4/1 15:08:42 [11157] gahp server not up yet, delaying ping
4/1 15:08:47 [11157] gahp server not up yet, delaying ping
4/1 15:08:52 [11157] gahp server not up yet, delaying ping
4/1 15:08:57 [11157] gahp server not up yet, delaying ping
4/1 15:09:02 [11157] gahp server not up yet, delaying ping
4/1 15:09:03 [11157] (3.0) doEvaluateState called: gmState GM_SUBMIT,
globusState 32
4/1 15:09:03 [11157] (3.0) gmState GM_SUBMIT, globusState 32:
globus_gram_client_job_create() returned Globus error -103
4/1 15:09:03 [11157] (3.0)   
RSL='&(rsl_substitution=(GRIDMANAGER_GASS_URL
https://128.118.2.93:34074))(executable=$(GRIDMANAGER_GASS_URL)#'//bin/hostname')(scratchdir='')(directory=$(SCRATCH_DIRECTORY))(stdout=$(GLOBUS_CACHED_STDOUT))(stderr=$(GLOBUS_CACHED_STDERR))(file_stage_out=($(GLOBUS_CACHED_STDOUT) $(GRIDMANAGER_GASS_URL)#'/usr1/home/murali/1072/out')($(GLOBUS_CACHED_STDERR) $(GRIDMANAGER_GASS_URL)#'/usr1/home/murali/1072/err'))(proxy_timeout=240)(remote_io_url=$(GRIDMANAGER_GASS_URL))'
4/1 15:09:03 [11157] No jobs left, shutting down
4/1 15:09:03 [11157] Got SIGTERM. Performing graceful shutdown.
4/1 15:09:03 [11157] **** condor_gridmanager (condor_GRIDMANAGER)
EXITING WITH STATUS 0