[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] authentication question



On May 27, 2013, at 9:12 AM, Steve Lidie <sol0@xxxxxxxxxx> wrote:

> I have an older condor pool with the master and all compute nodes running version 7.4.1 on Cent 5.x, installed and configured from the tar files of the day.  The master must be moved, so it's now on a Cent 6.4 machine installed via the UW RPM condor-7.8.8-110288.rhel6.4.i686.rpm.  I updated /etc/condor/condor_config as I was accustomed to do in the past, and the master started and runs.   There are two problems I'm seeing.
> 
> 1) Authentication issues -  the master can see the jobs on the remote machines, but cannot remove jobs:
> 
> [root@condor condor]# condor_q -name leaf43.cc.lehigh.edu 2.0
> 
> -- Schedd: leaf43.cc.lehigh.edu : <128.180.3.109:50926>
> ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
>  2.0   lusol           5/23 09:15   0+00:00:00 I  0   0.0  where-am-i        
> 
> [root@condor condor]# condor_rm -name leaf43.cc.lehigh.edu 2.0
> AUTHENTICATE:1003:Failed to authenticate with any method
> AUTHENTICATE:1004:Failed to authenticate using GSI
> GSI:5003:Failed to authenticate.  Globus is reporting error (851968:45).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)
> AUTHENTICATE:1004:Failed to authenticate using KERBEROS
> AUTHENTICATE:1004:Failed to authenticate using FS
> No result found for job 2.0
> 
> 
> This general authentication issues crops up all the time, but I've never had an issues like this in the past. A user trying to submit a job gets a similar result:
> 
> [sol0@condor condor]$ condor_submit condorSumbit.job 
> Submitting job(s)
> ERROR: Failed to connect to local queue manager
> AUTHENTICATE:1003:Failed to authenticate with any method
> AUTHENTICATE:1004:Failed to authenticate using GSI
> GSI:5003:Failed to authenticate.  Globus is reporting error (851968:33).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)
> AUTHENTICATE:1004:Failed to authenticate using KERBEROS
> AUTHENTICATE:1004:Failed to authenticate using FS
> 


I discovered the cause of this issue, but have no reasonable resolution. The machine running condor_master uses LDAP authentication via sssd, so there are no user records in the local passwd file. Once I manually added a person to /etc/passwd they are able to submit jobs and query the queue.

Is this behavior expected? Is there a good solution rather the hack I found?

Thanks,
Steve