[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] authentication question



I have an older condor pool with the master and all compute nodes running version 7.4.1 on Cent 5.x, installed and configured from the tar files of the day.  The master must be moved, so it's now on a Cent 6.4 machine installed via the UW RPM condor-7.8.8-110288.rhel6.4.i686.rpm.  I updated /etc/condor/condor_config as I was accustomed to do in the past, and the master started and runs.   There are two problems I'm seeing.

1) Authentication issues -  the master can see the jobs on the remote machines, but cannot remove jobs:

[root@condor condor]# condor_q -name leaf43.cc.lehigh.edu 2.0

-- Schedd: leaf43.cc.lehigh.edu : <128.180.3.109:50926>
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  2.0   lusol           5/23 09:15   0+00:00:00 I  0   0.0  where-am-i        

[root@condor condor]# condor_rm -name leaf43.cc.lehigh.edu 2.0
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using GSI
GSI:5003:Failed to authenticate.  Globus is reporting error (851968:45).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)
AUTHENTICATE:1004:Failed to authenticate using KERBEROS
AUTHENTICATE:1004:Failed to authenticate using FS
No result found for job 2.0


This general authentication issues crops up all the time, but I've never had an issues like this in the past. A user trying to submit a job gets a similar result:

[sol0@condor condor]$ condor_submit condorSumbit.job 
Submitting job(s)
ERROR: Failed to connect to local queue manager
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using GSI
GSI:5003:Failed to authenticate.  Globus is reporting error (851968:33).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)
AUTHENTICATE:1004:Failed to authenticate using KERBEROS
AUTHENTICATE:1004:Failed to authenticate using FS



2) When a non-root user on the master does a condor_q it fails with:

[sol0@condor condor]$ condor_q -global

-- Failed to fetch ads from: <128.180.3.10:50503> : leaf20.cc.lehigh.edu


-- Failed to fetch ads from: <128.180.3.21:50836> : leaf31.cc.lehigh.edu



..... etc .....




So, is the master too new to work with this older version of condor?  Or have I missed something simple?

Many thanks,
Steve