[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] can't get Condor-G job to run





--On 06 June 2005 18:03 -0500 Jaime Frey <jfrey@xxxxxxxxxxx> wrote:



On Jun 6, 2005, at 6:09 AM, Dr I.C. Smith wrote:


On Fri, 3 Jun 2005, Jaime Frey wrote:


On Jun 3, 2005, at 5:33 AM, Dr Ian C. Smith wrote:


--On 02 June 2005 14:50 -0500 Jaime Frey <jfrey@xxxxxxxxxxx> wrote:


On Jun 1, 2005, at 8:29 AM, Dr Ian C. Smith wrote:


I'm trying to get Condor-G working and I've tried submitting an job similar the example in the guide:


executable = hello.ksh globusscheduler = ulgsmp1.liv.ac.uk/jobmanager-fork universe = globus output = test.out log = test.log queue




but it just remains in the idle state. The logfile (/tmp/GridmanagerLog.smithic) shows:




6/1 14:23:43 ****************************************************** 6/1 14:23:43 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 6/1 14:23:43 ** /opt1/condor/sbin/condor_gridmanager 6/1 14:23:43 ** $CondorVersion: 6.6.7 Oct 11 2004 $ 6/1 14:23:43 ** $CondorPlatform: SUN4X-SOLARIS29 $ 6/1 14:23:43 ** PID = 9333 6/1 14:23:43 ****************************************************** 6/1 14:23:43 Using config file: /etc/condor/condor_config 6/1 14:23:43 Using local config files: /opt1/condor/home/condor_config.local 6/1 14:23:43 DaemonCore: Command Socket at <138.253.100.177:60984> 6/1 14:23:43 [9333] GAHP server pid = 9334 6/1 14:23:46 [9333] DaemonCore: Command received via UDP from host <138.253.100.177:52097> 6/1 14:23:46 [9333] DaemonCore: received command 60000 (DC_RAISESIGNAL), calling handler (HandleSigCommand()) 6/1 14:23:46 [9333] Found job 142108.0 --- inserting 6/1 14:23:46 [9333] Found job 142109.0 --- inserting 6/1 14:23:46 [9333] Found job 142110.0 --- inserting 6/1 14:23:46 [9333] (142110.0) doEvaluateState called: gmState GM_INIT, globusState 32 6/1 14:23:46 [9333] (142110.0) proxy not cached yet, waiting... 6/1 14:23:46 [9333] proxy near expiration or invalid, delaying ping 6/1 14:23:46 [9333] (142109.0) doEvaluateState called: gmState GM_INIT, globusState 32 6/1 14:23:46 [9333] (142109.0) proxy not cached yet, waiting... 6/1 14:23:46 [9333] (142108.0) doEvaluateState called: gmState GM_INIT, globusState 32 6/1 14:23:46 [9333] (142108.0) proxy not cached yet, waiting... 6/1 14:23:46 [9333] GAHP command 'CACHE_PROXY_FROM_FILE' failed: Failed to import credential maj=851968 min=5 6/1 14:23:46 [9333] ERROR "GAHP cache command failed!" at line 357 in file proxymanager.C


When I use globus-job-run it's fine so the globus bit seems OK.


Any ideas on what it going wrong ?


What version of Globus did you create the proxy with, and what command did you execute (including command-line options)? Globus 4.0 introduces a new proxy format that Condor may have trouble understanding.




I'm still using GT2. I've used this on another host with condor-G and it works OK. The command I used was:


$ globus-job-run ulgsmp1 -s /home/qcl/smithic/.lfs/condor-g/hello.ksh




I've run a series of globus integration tests and these work OK apart from gsissh and gsiftp. It look as though condor-g isn't even contacting the remote gatekeeper.




Condor-G is failing to acquire its local credentials (i.e. read the proxy). Can you turn on D_FULLDEBUG for the gridmanager, try again, and post the resulting log?




I've attached the log file. The proxy cert is in /tmp/x509up_u<MY_UID> and has read permission just for me - is this OK. AFAIK globus complains if you grant read permission for  any other users.




Can you try starting the gahp_server at the command line and typing the following line at it: CACHE_PROXY_FROM_FILE 1 /tmp/x509up_u41269


That is the command that's failing. What does the gahp server reply? If the command succeeds, it will print a single 'S'.



No I just get a single 'E' printed.

regards,

-ian.