[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor high availability



Hi Thomas,

I tried to set the variable as you suggested, to no avail. Master2 now says
it can't connect to master1 ("Failed to fetch ads")

The documentation states that you would have to include FS_REMOTE into the
allowed methods, so I did:

FS_REMOTE_DIR = /clients/condor/sec
SEC_DEFAULT_AUTHENTICATION_METHODS = KERBEROS, FS, FS_REMOTE

This had the effect that now on master1 condor_q produces a segmentation
fault with the same error message as in my last mail, except only mentioning
Kerberos as the method tried out....

What I don't understand is why the documentation about high availability
doesn't mention anything about securing the daemons via authentication when
using the configs described there.

Kind regards
Christian

-----Ursprüngliche Nachricht-----
Von: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> Im Auftrag von
Thomas Hartmann
Gesendet: Donnerstag, 8. Oktober 2020 13:50
An: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Betreff: Re: [HTCondor-users] HTCondor high availability

Hi Christian,

my first gues would be, that the scheds on both machines want to
authenticate each other. The daemons on the same node do this normally by
writing/rading a file under /tmp, but to authz daemons over different nodes,
you need to secure these.
Maybe the easiest(?) could be to use the sahred file system (assuming that
it is secure), to let the daemons authenticate each other through it, i.e.,
with FS_REMOTE

Maybe you can try and put a shared path on the nodes with
  FS_REMOTE_DIR = /path/foo/condor/sec

(the other authentication options via SSL and so are probably more secure,
but the shared fs could be faster to setup for testing)

Cheers,
  Thomas

On 08/10/2020 12.04, Hennen, Christian wrote:
> Hi again,
> 
> searching through the log files once more something caught my eye: 
> When running condor_q on master2 while master1 is active, the 
> following lines appear in SchedLog (along with the segmentation fault
message):
> 
> 10/08/20 11:50:30 (pid:47347) Number of Active Workers 0  
> 10/08/20 11:50:41 (pid:47347) AUTHENTICATE: handshake failed!    
> 10/08/20 11:50:41 (pid:47347) DC_AUTHENTICATE: authentication of 
> <192.168.1.22:10977> did not result in a valid mapped user name, which 
> is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting.
> 10/08/20 11:50:41 (pid:47347) DC_AUTHENTICATE: reason for 
> authentication
> failure: AUTHENTICATE:1002:Failure performing
> handshake|AUTHENTICATE:1004:Failed to authenticate using
> KERBEROS|AUTHENTICATE:1004:Failed to authenticate using 
> KERBEROS|FS|FS:1004:Unable to
> lstat(/tmp/FS_XXXGNYmKn)
> 
> Do I need to configure any other authentication methods in addition to 
> all servers using LDAP via PAM ?
> 
> Kind regards
> 
> Christian
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> Im Auftrag 
> von Hennen, Christian
> Gesendet: Donnerstag, 1. Oktober 2020 12:58
> An: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Betreff: Re: [HTCondor-users] HTCondor high availability
> 
> Hello Thomas,
> 
> the spool directory (/clients/condor/spool) is located on a NFS v3 
> share every server has access to (/clients). All machines have a local 
> user
> (r-admin) with uid and gid 1000 and the spool directory is owned by 
> that user, since it is configured as the Condor user (see 
> condor_config.local in the Serverfault thread). Every other user is 
> mapped via LDAP on every server including the storage cluster. On both 
> master servers the user "condor" has the same uid and gid.
> 
> Kind regards
> 
> Christian
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> Im Auftrag 
> von Thomas Hartmann
> Gesendet: Donnerstag, 1. Oktober 2020 11:46
> An: htcondor-users@xxxxxxxxxxx
> Betreff: Re: [HTCondor-users] HTCondor high availability
> 
> Hi Christian,
> 
> the spool dir resides on a shared file system between both nodes, or?
> Maybe you can check, if it is writable from both clients and if the 
> users/permissions work for both? (sometimes NFS is a bit fiddly with 
> the ID
> mapping...)
> 
> Cheers,
>   Thomas
> 
> 
> On 01/10/2020 09.58, Hennen, Christian wrote:
>> Hi,
>>
>>  
>>
>> I am currently trying to make the job queue and submission mechanism 
>> of a local, isolated HTCondor cluster highly available. The cluster 
>> consists of 2 master servers (previously 1) and several compute nodes 
>> and a central storage system. DNS, LDAP and other services are 
>> provided by the master servers.
>>
>>  
>>
>> I followed the directions under
>> https://htcondor.readthedocs.io/en/latest/admin-manual/high-availabil
>> i ty.html but it doesn?t seem to work the way it should. Further 
>> information about the setup and the problems has been posted to
>> Serverfault:
>> https://serverfault.com/questions/1035879/htcondor-high-availability
>>
>>  
>>
>> Maybe any of you have got any insights on this? Any help would be 
>> appreciated!
>>
>>  
>>
>> Kind regards
>>
>> *
>> Christian Hennen*, M.Sc.**
>>
>> Project Manager Infrastructural Services
>>
>> Zentrum für Informations-, Medien-
>>
>> und Kommunikationstechnologie (ZIMK)
>>
>>  
>>
>> cid:image001.png@01D491F5.AD0E2F30
>>
>>  
>>
>> Universität Trier | Universitätsring 15 | 54296 Trier | Germany 
>> www.uni-trier.de <http://www.uni-trier.de/>
>>
>>  
>>
>> <https://50jahre.uni-trier.de/>
>>
>>  
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature