[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [gt-user] Globus - CondorG setup



What does the container.log of the globus server say?

Steve Timm


On Fri, 12 Feb 2010, Kunal Patel wrote:


Hi, I have set up a condor pool, a linux central manager which can execute and submit and there are a combination of Linux and Windows machines in the pool which can execute and submit jobs.
I am now trying to use the Globus grid manager. I have been through the tutorial at https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4-Admin. I have installed globus on the central manager itself and am attempting to submit from there also. The certificates have been created for myself and the HIGH/LOW PORT macros have been set.
I am having trouble though, it seems as though the globus server, I think GRAM is never actually being started, hence the job never leaves the idle state; this is part of the gridmanager log:
GAHP[4439] <- 'GT4_GRAM_PING 4 https://10.1.207.26/wsrf/services/ManagedJobFactoryService'02/12 10:36:35 [4432] GAHP[4439] -> 'S'02/12 10:36:35 [4432] GAHP[4439] (stderr) -> AxisFault02/12 10:36:35 [4432] GAHP[4439] (stderr) ->  faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException02/12 10:36:35 [4432] GAHP[4439] (stderr) ->  faultSubcode: 02/12 10:36:35 [4432] GAHP[4439] (stderr) ->  faultString: java.net.ConnectException: Connection refused02/12 10:36:35 [4432] GAHP[4439] (stderr) ->  faultActor: 02/12 10:36:35 [4432] GAHP[4439] (stderr) ->  faultNode: 02/12 10:36:35 [4432] GAHP[4439] (stderr) ->  faultDetail: 02/12 10:36:35 [4432] GAHP[4439] (stderr) -> 	{http://xml.apache.org/axis/}stackTrace:java.net.ConnectException: Connection refused02/12 10:36:35 [4432] GAHP[4439] (stderr) -> 	at java.net.PlainSocketImpl.socketConnect(Native Method)02/12 10:36:35 [4432] GAHP[4439] (stderr) -> 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketIm!
pl.java:3
My condor submit file looks like :
######universe = gridgrid_resource = gt4 https://10.1.207.26/wsrf/services/ManagedJobFactoryService Condorexecutable = helloworld.batrequirements = OpSys == "MSWin32_NT51" && Arch == "X86"
output = hellowin.outerror = hellowin.errorlog = hellowin.log
should_transfer_files = YESwhen_to_transfer_output = ON_EXIT
Queue ######
I would appreciate any help from anyone with regards what is going on.
Thanks,
Kunal

_________________________________________________________________
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.