[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor high availability



Do I need to configure any other authentication methods in addition to all servers using LDAP via PAM ?

Yes, of course. Security between different nodes has nothing to with how users log in.

I tried to set the variable as you suggested, to no avail. Master2 now says
it can't connect to master1 ("Failed to fetch ads")

From your description, master1 is the original "master" node. I don't know if HAD will work for machines that are both submit nodes and central managers, but for now let's assume that it will. Note that HA instructions do NOT address security at all; that's deliberate, because security is complicated and nothing in HA changes anything about how your security should work, except the addition of another server. It's a bit more of surprise to you, perhaps, because you didn't separate your central manager from your submit server (and thus FS worked for all your client-to-daemon connections).

From your serverfault question, it looks like you basically don't have any security at all -- your ALLOW lists include *, so the problem must be in authentication, not authorization.

Note that condor_q, by default in recent HTCondor versions, requires authentication so that it only returns the jobs of the user who ran the command. Try running 'condor_q -all-users'; I think that will use a different command that doesn't require authentication.

For this purpose, given that you know that the two masters share a filesystem and user IDs, REMOTE_FS is not a bad choice. You'll need to set SEC_DEFAULT_AUTHENTICATION_METHODS on master1 and master2 to include FS and REMOTE_FS; I would remove KERBEROS (since you're not using it). Both master1 and master2 need to set FS_REMOTE_DIR to the same value. Be sure to restart HTCondor on both machines after you've done that (I can't keep straight which configuration changes only require a reconfig). Try running condor_q again; it should work. If it doesn't, try running

_CONDOR_TOOL_DEBUG=D_FULLDEBUG condor_q -debug

and we'll see what we can see.

- ToddM