[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_g error when globus-job-run works



Hello again,

So, as is always the way - as soon as you ask for help you find the answer yourself!

I replaced the /etc/grid-security directory with one from another machine and everything started working correctly! I'm not sure of the differences unfortunately as (stupidly) I overwrote the old directory and all of its files so cannot compare, but I know it is not to do with the grid-mapfile (as there wasn't one before or after) which this error code often relates to.

Sorry I can't be more specific about the solution, but thanks for reading!

Rich

Rich Bruin wrote:
Hello All,

I'm trying to debug a problem we're having with a submit machine here but can't find any help via google / the list archive, so hopefully someone here can help!

In short, I can run globus jobs using globus directly (via globus-job-run etc) but running condor jobs leads to them quickly going held and reporting the following error message (from the job's log file):

Globus job submission failed!
Reason: 7 authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context

In more detail, I am submitting via condor_g from a Debian sarge installation (2.4.27-2-386 kernel) running globus toolkit 4.0.1 and condor version 6.6.10 to any of a few remote resources running various versions of Linux and globus toolkits 2.4.3, 3.2.1 and 4.0.1. Simply running globus-job-run type commands works fine (directed to both the fork and pbs jobmanagers) but any job run via condor_g fails with the above error message in the local logs.

The logs on the remote machine read as follows:

Notice: 5: Authenticated globus user: /C=UK/O=eScience/OU=Cambridge/L=UCS/CN=richard bruin
Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6
Notice: 5: Requested service: jobmanager-fork
Notice: 5: Authorized as local user: rbru03
Notice: 5: Authorized as local uid: 501
Notice: 5:           and local gid: 501
Notice: 0: executing /usr/local/globus/libexec/globus-job-manager
Notice: 0: GRID_SECURITY_CONTEXT_FD=9
Notice: 0: Child 10758 started
Notice: 6: globus-gatekeeper pid=10838 starting at Tue Mar  7 16:25:06 2006

Notice: 6: Got connection 128.232.232.27 at Tue Mar  7 16:25:06 2006

Failed reading length 0
GSS authentication failure
     globus_gss_assist token :3: read failure: Connection closed
Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Notice: 6: globus-gatekeeper pid=10839 starting at Tue Mar  7 16:25:06 2006

Notice: 6: Got connection 128.232.232.27 at Tue Mar  7 16:25:06 2006

Failed reading length 0
GSS authentication failure
     globus_gss_assist token :3: read failure: Connection closed
Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Failure: GSS failed Major:01090000 Minor:00000000 Token:00000003

Does anyone have any idea what is happening here? I have other machines with near enough identical installations and they work fine, it just seems to be this one client machine!

Any help you could provide would be much appreciated, thanks in advance,

Rich

-------------------------------
Richard Bruin
PhD Student
Department of Earth Sciences
University of Cambridge
eMinerals project www.eminerals.org
rbru03@xxxxxxxxxxxxx
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users