[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Credd Issues between Amazon Cloud Machines



Just to update people that might have similar issues, I was able to fix this.

When the second machine was asking the first machine to authenticate,
it was doing it from the UID: condor_pool@AMAZONA-MHXXXXX.  However,
when you pushed out a password to AMAZONA-MHXXXXX from the first
machine, it was pushing the password for  condor_pool@amazona-csxxxxx.
 The authentication request was from the wrong domain and thus was
denied.  So, on the second computer, (or any computer that is not the
credd manager), I hardcoded UID_DOMAIN via
"UID_DOMAIN=AMAZONA-CSXXXXX" and restored the pool password.  Then the
authentication comes to amazona-csxxxxx as condor_pool@AMAZONA-CSXXXXX
and it is handled properly.

On Wed, Aug 15, 2012 at 11:50 AM, Michael Aschenbeck
<m.g.aschenbeck@xxxxxxxxx> wrote:
> Hello,
>
> I have two Amazon cloud machines set up.  The first one I setup has
> the CREDD daemon, and each nodes "LocalCredd" output via condor_status
> is correct.  I can successfully run "run_as_owner = True" jobs on this
> machine.
>
> The second machine is not able to authenticate.  My CreddLog shows the
> following:
>
> 08/15/12 17:18:39 Received TCP command 81100 (CREDD_NOP) from
> condor_pool@amazona-csxxxxx <10.16.5.1xx:57358>, access level DAEMON
> 08/15/12 17:18:39 Calling HandleReq <nop_handler> (0) for command
> 81100 (CREDD_NOP) from condor_pool@amazona-csxxxxx <10.16.5.1xx:57358>
>
> 08/15/12 17:18:41 getStoredCredential(): Could not locate credential
> for user 'condor_pool@AMAZONA-MHXXXXX'
> 08/15/12 17:19:01 condor_read(): timeout reading 5 bytes from
> <10.16.9.2xx:54640>.
> 08/15/12 17:19:01 IO: Failed to read packet header
> 08/15/12 17:19:01 AUTHENTICATE: handshake failed!
> 08/15/12 17:19:01 DC_AUTHENTICATE: required authentication of
> 10.16.9.2xx failed: AUTHENTICATE:1002:Failure performing
> handshake|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
>
> Where the first two lines are the first machine authenticating
> successfully, and the other lines are the second machine not
> authenticating.  I have run condor_store_cred -c add on both machines,
> that is not the issue.  I hypothesize that the machines are not in a
> common domain and thus amazona-csxxxxx has no record of the user
> condor_pool@AMAZONA-MHXXXXX, but I really don't know.
>
> The weird thing is that my user account authenticates fine from the
> second machine:
> 08/15/12 16:57:31 Received TCP command 479 (STORE_CRED) from
> maschenbec@amazona-csxxxxx <10.16.9.2xx:54541>, access level WRITE
> 08/15/12 16:57:31 Calling HandleReq <store_cred_handler> (0) for
> command 479 (STORE_CRED) from maschenbec@amazona-csxxxxx
> <10.16.9.2xx:54541>
>
>
> Any thoughts on how to get my condor_config settings shored up so this
> will work?