[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] gt4 grid universe problem



Hi all,

Having recently upgraded from condor 6.6 to 6.8(.0), I'm trying to submit a grid universe gt4 job to a remote gatekeeper in front of a condor pool. Currently my job is failing with the error "Failed to create proxy delegation" (which is Code 0 Subcode 0 in the user log file). Does anybody have any idea how to debug this?

The gatekeeper is running globus 4.0.1 and I can successfully submit jobs using the pre-WS gram (using both the gt2 grid universe and the globus universe). At the moment I have pre-staged the executable and am not attempting to recover the output back to the submit machine - all I want to do is run a shell script on a condor node and return the output to the gatekeeper. I think my problem is with the condor-g submit machine, but I have access to log and configuration files at both ends.

Using the following submit file:


Universe        = grid
grid_resource = gt4 cartman.niees.group.cam.ac.uk Condor
Executable      = /home/andreww/globus_tests/test_9_mins.sh
Notification    = NEVER
GlobusRSL       = (condorsubmit=(initialdir /home/andreww/globus_tests)(transfer_files always))
Transfer_Executable = false
Transfer_Output = false

Stream_Output   = false
Stream_Error    = false

Output          = /home/andreww/globus_tests/task_$(PROCESS).out
Error           = job.err
Log             = job.log

Queue 1


Once I submit the job I see it sit idle for a couple of minutes and a gridftp server starts locally (also visible in condor's queue). After three minutes or so the main job fails and goes into a held state with the following in job.log:


000 (292.000.000) 09/26 13:47:22 Job submitted from host: <193.62.125.72:45828>
...
012 (292.000.000) 09/26 13:50:41 Job was held.
        Failed to create proxy delegation
        Code 0 Subcode 0
...  


One possibility is that gridftp is not correctly traversing the firewalls between the gatekeeper and the condor submit machine (I have two firewalls to worry about - both filter traffic in both directions). What are the network requirements for a gt4 resource? I guess the gatekeeper has to connect back to the submitting machine on TCP port 2811. However, I don't think this is the immediate problem as I'm not seeing any activity (or failing outbound network connections) from the gatekeeper.

Any suggestions welcome ns would be most.


Cheers,

Andrew




 


Dr Andrew Walker

Department of Earth Sciences
University of Cambridge
Downing Street
Cambridge 
CB2 3EQ
UK

phone +44 (0)1223 333432