[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor and globus webservices



On Tue, 2005-04-05 at 13:04, Jaime Frey wrote:
> On Wed, 30 Mar 2005, Murali Ramsunder wrote:
> 
> > On Wed, 2005-03-30 at 12:02, Erik Paulson wrote:
> > > On Wed, Mar 30, 2005 at 11:56:37AM -0500, Murali Ramsunder wrote:
> > > > Hi,
> > > >
> > > > I have Condor installed from vdt-1.3.2 and I'm trying to submit to
> > > > another machine that runs GT-3.2 WS (webservices). managed-job-factory
> > > > submits to the other machine works, however, when I use Condor it fails.
> > > > Has anyone tried this? Any help appreciated.
> > > >
> > >
> > > What does the Gridmanager log say?
> > >
> > > -Erik
> >
> > Hi Erik,
> >
> > I hope this captures the essense of the Gridmanager log for the run.
> > And, thanks for posting the message to the list.
> 
> The hostnames look suspicious.
> 
> > 3/30 10:39:40 [27954] GAHP[27955] <- 'GASS_SERVER_INIT 3 0'
> ...
> > 3/30 10:39:41 [27954] GAHP[27955] -> '3' '0' 'https://IPaddr:53590'
> 
> The server is going to attempt to connect to this URL to transfer files.
> What does 'hostname' report as the name of your machine?
> 
> Also, is the name of the server really 'hostname'?
> 
> > 3/30 10:39:41 [27954] GAHP[27955] <- 'GT3_GRAM_JOB_CREATE 4
> > http://hostname:8080/ogsa/services/base/gram/MasterForkManagedJobFactoryService 1 &(rsl_substitution=(GRIDMANAGER_GASS_URL\ https://IPaddr:53590))(executable=$(GRIDMANAGER_GASS_URL)#'//bin/hostname')(scratchdir='')(directory=$(SCRATCH_DIRECTORY))(stdout=$(GRIDMANAGER_GASS_URL)#'//usr1/home/murali/1072/out')(stderr=$(GRIDMANAGER_GASS_URL)#'//usr1/home/murali/1072/err')(proxy_timeout=240)(remote_io_url=$(GRIDMANAGER_GASS_URL))'
> 

[murali@xxxxxxxxxxxxxxxxxxxxx ~]$ managed-job-globusrun -factory
http://ligo-grid.aset.psu.edu:8080/ogsa/services/base/gram/MasterForkManagedJobFactoryService -file /opt/gt3.2-ws/schema/base/gram/examples/test.xml
WAITING FOR JOB TO FINISH
========== Status Notification ==========
Job Status: Active
=========================================
========== Status Notification ==========
Job Status: Done
=========================================
DESTROYING SERVICE
SERVICE DESTROYED


I'm submitting from pleiades to ligo-grid, and managed-job-globusrun
works fine. /bin/hostname on both machines return FQDN and the IP
addresses are correct in /etc/hosts on both machines. The Gridmanager
Log from the latest run is below.

thanks,
Murali

4/5 13:34:23 passwd_cache::cache_uid(): getpwnam("condor") failed:
Success 
4/5 13:34:23 passwd_cache::cache_uid(): getpwnam("condor") failed: 
Success 
4/5 13:34:23 ******************************************************
4/5 13:34:23 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
4/5 13:34:23 ** /opt/vdt-1.3.3/condor/sbin/condor_gridmanager
4/5 13:34:23 ** $CondorVersion: 6.7.6 Mar 15 2005 $
4/5 13:34:23 ** $CondorPlatform: I386-LINUX_RH9 $
4/5 13:34:23 ** PID = 6071
4/5 13:34:23 ******************************************************
4/5 13:34:23 Using config file: /opt/vdt-1.3.3/condor/etc/condor_config
4/5 13:34:23 Using local config files:
/opt/vdt-1.3.3/condor/local.pleiades/condor_config.local
4/5 13:34:23 DaemonCore: Command Socket at <128.118.2.93:37651>
4/5 13:34:26 [6071] DaemonCore: Command received via UDP from host
<128.118.2.93:33012>
4/5 13:34:26 [6071] DaemonCore: received command 60000 (DC_RAISESIGNAL),
calling handler (HandleSigCommand())
4/5 13:34:26 [6071] Found job 5.0 --- inserting
4/5 13:34:26 [6071] (5.0) doEvaluateState called: gmState GM_INIT,
globusState 3 2
4/5 13:34:26 [6071] GAHP server pid = 6073
4/5 13:34:30 [6071] gahp server not up yet, delaying ping
4/5 13:34:30 [6071] (5.0) doEvaluateState called: gmState GM_SUBMIT,
globusState 32
4/5 13:34:35 [6071] gahp server not up yet, delaying ping

[snip]

4/5 13:39:30 [6071] gahp server not up yet, delaying ping
4/5 13:39:31 [6071] (5.0) doEvaluateState called: gmState GM_SUBMIT,
globusState 32
4/5 13:39:31 [6071] (5.0) gmState GM_SUBMIT, globusState 32:
globus_gram_client_job_create() returned Globus error -103
4/5 13:39:31 [6071] (5.0)   
RSL='&(rsl_substitution=(GRIDMANAGER_GASS_URL
https://128.118.2.93:37663))(executable=$(GRIDMANAGER_GASS_URL)#'//bin/hostname')(scratchdir='')(directory=$(SCRATCH_DIRECTORY))(stdout=$(GLOBUS_CACHED_STDOUT))(stderr=$(GLOBUS_CACHED_STDERR))(file_stage_out=($(GLOBUS_CACHED_STDOUT) $(GRIDMANAGER_GASS_URL)#'/usr1/home/murali/1072/out')($(GLOBUS_CACHED_STDERR) $(GRIDMANAGER_GASS_URL)#'/usr1/home/murali/1072/err'))(proxy_timeout=240)(remote_io_url=$(GRIDMANAGER_GASS_URL))'
4/5 13:39:31 [6071] No jobs left, shutting down
4/5 13:39:31 [6071] Got SIGTERM. Performing graceful shutdown.
4/5 13:39:31 [6071] **** condor_gridmanager (condor_GRIDMANAGER) EXITING
WITH STATUS 0