[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor high availability



Hi Todd,

exactly. While obviously security is important and has nothing to do with
the HA setup itself, it was a surprise to me to have to configure security
for the communication between the masters. That's mainly because I
"inherited" this cluster and the original config contained * in the allow
list, so I never experienced these type of issues. Securing the HTCondor
part of the cluster is now added to my list of planned security changes :)
For now, since the cluster is completely separated from the rest of the
network, a working job processing and high availability of all services, was
more of a priority.

After changing SEC_DEFAULT_AUTHENTICATION_METHODS to FS,FS_REMOTE condor_q
now works as expected and jobs can be submitted and started. They change to
Idle after a while, but maybe that's not related to the HTCondor config.

Kind regards
Christian Hennen


-----Ursprüngliche Nachricht-----
Von: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> Im Auftrag von Todd
L Miller
Gesendet: Freitag, 9. Oktober 2020 22:21
An: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Betreff: Re: [HTCondor-users] HTCondor high availability

> Do I need to configure any other authentication methods in addition to 
> all servers using LDAP via PAM ?

 	Yes, of course.  Security between different nodes has nothing to
with how users log in.

> I tried to set the variable as you suggested, to no avail. Master2 now 
> says it can't connect to master1 ("Failed to fetch ads")

 	From your description, master1 is the original "master" node.  I
don't know if HAD will work for machines that are both submit nodes and
central managers, but for now let's assume that it will.  Note that HA
instructions do NOT address security at all; that's deliberate, because
security is complicated and nothing in HA changes anything about how your
security should work, except the addition of another server.  It's a bit
more of surprise to you, perhaps, because you didn't separate your central
manager from your submit server (and thus FS worked for all your
client-to-daemon connections).

 	From your serverfault question, it looks like you basically don't
have any security at all -- your ALLOW lists include *, so the problem must
be in authentication, not authorization.

 	Note that condor_q, by default in recent HTCondor versions, requires
authentication so that it only returns the jobs of the user who ran the
command.  Try running 'condor_q -all-users'; I think that will use a
different command that doesn't require authentication.

 	For this purpose, given that you know that the two masters share a
filesystem and user IDs, REMOTE_FS is not a bad choice.  You'll need to set
SEC_DEFAULT_AUTHENTICATION_METHODS on master1 and master2 to include FS and
REMOTE_FS; I would remove KERBEROS (since you're not using it). 
Both master1 and master2 need to set FS_REMOTE_DIR to the same value.  Be
sure to restart HTCondor on both machines after you've done that (I can't
keep straight which configuration changes only require a reconfig).  Try
running condor_q again; it should work.  If it doesn't, try running

_CONDOR_TOOL_DEBUG=D_FULLDEBUG condor_q -debug

and we'll see what we can see.

- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature