[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_g problem

Hey everyone,

We're trying to get condor_g working with some one-off scripts (we
would've used glidein, but originally it seemed like it would've been
easier to just point at our ce instead of the administrativa of
getting attached to the UCSD glidein factory. I'm now regretting that
decision). I've got a couple of CEs that can be submitted to
globus-job-run okay, but submitting with condor, I keep getting this
in the CE logs:

Failed reading length 0
GSS authentication failure
    globus_gss_assist token :3: read failure: Connection closed
Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Looking around, I found the page:


And the checklist checks out (times are within ~10 secs of each other,
the right certificates are there, etc..)

I'm really not sure what else to look at. The two CEs I'm testing
against accept glidein/glite jobs fine (and used to accept condor_g
jobs). One of the clients I'm testing from is the same machine as one
of the CEs, and the other client is a VM running on the external
network. Like I said, globus tools work fine from both clients to both
servers, so there shouldn't be a firewall or cert issue going on. I
guess the last hint I have is a bunch of lines in the GridManager logs
on the client boxes that look like:

10/24/11 12:48:04 [29922] grid_monitor job submit failed for resource
ce1.accre.vanderbilt.edu:2119, gram error 12 (the connection to the
server failed (check host and port))
10/24/11 12:48:04 [29922] Giving up on grid_monitor for site
ce1.accre.vanderbilt.edu:2119.  Will retry in 15 seconds (0 minutes)
10/24/11 12:48:04 [29922] Stopping grid_monitor for resource

but I can connect to that machine/port just fine using telnet. I'm not
sure if the CE is just terminating the connection before the
handshaking gets done.

Thanks for the help,

Andrew Melo