[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [External] - Help with authentication and condor mapfile for strong security



I am replying to this thread because I have debugged a few things that are most certainly related, but I still haven't solved my problem.

First step was increasing the logging, I thought I had it higher than I did, but I went up to D_SECURITY:3. After doing this I found out condor was failing DNS lookups for other machines because it was using the wrong interface, so the machines were unable to match their domain names with their allow-list.

Then I fixed an issue I was having with getting my machines to spit up FQDNs instead of normal domain names via the DEFAULT_DOMAIN_NAME macro.

After this, it seemed like rolling with Kerberos authentication would be a better fit, but I still can't get my machines to authenticate with each other, nor is Condor able to authenticate any of my domain users. 

I updated my security config to look like the following:
==============================================
@use SECURITY : Strong
SEC_DEFAULT_AUTHENTICATION_METHODS = KERBEROS

ALLOW_READ            = */*
ALLOW_WRITE           = */*
ALLOW_ADMINISTRATOR   = condor-admin*/*
ALLOW_CONFIG          = condor-admin*/*
ALLOW_NEGOTIATOR      = condor*/submit1*
ALLOW_DAEMON          = condor*/*
==============================================
condor-admin is a valid domain user, and submit1 is where my condor_schedd daemon lives.

But when I fire up condor, from my understanding for some reason my schedd daemon is sending the following classad to try and authenticate with the manager:

================================================================================================
ServerCommandSock = "<192.168.0.68:9618?addrs=192.168.0.68-9618&noUDP&sock=3949_4396_3>"
Enact = "YES"
Subsystem = "SCHEDD"
ParentUniqueID = "submit1:3949:1594324900"
TriedAuthentication = true
Integrity = "YES"
ServerPid = 3988
Encryption = "YES"
Authentication = "NO"
RemoteVersion = "$CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $"
SessionLease = 3600
OutgoingNegotiation = "REQUIRED"
User = "condor@parent"
UseSession = "YES"
CryptoMethods = "3DES"
Sid = "3e7fbe4351131b1ebe8437b870ffb34994c8a91b8ba1e0f9"
ValidCommands = "60000,60008,60026,60017,60004,60012,60021,60043,60007,457,60020,60044"
Command = 60008
SessionDuration = "86400"
AuthMethods = "PASSWORD"
====================================================================================================

Which is throwing me for a loop, because PASSWORD is not listed as an authentication method in my security config.

My manager node is sending back the following response:
====================================================================================================
Encryption = "YES"
Integrity = "YES"
AuthMethodsList = ""
CryptoMethods = "3DES,BLOWFISH"
Authentication = "YES"
SessionDuration = "86400"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $"
Enact = "YES"
=====================================================================================================
Which seems to suggest it isn't finding any authentication methods in common.

But even then, when I switched from KERBEROS to PASSWORD authentication, when I try to run condor_q from the user condor-admin on my machine with schedd, I see the following appear in the logfile:
================================================================================================================
07/09/20 13:58:50 DC_AUTHENTICATE: authentication of <192.168.0.68:13883> did not result in a valid mapped user name, which is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting.
================================================================================================================
Which I don't think makes sense, because that username would match with the rule I have in my config, wouldn't it?

Is there anything here which is standing out as something I can investigate further? I seem to be a little stuck in the water.

Thanks all,
Wes

Wesley Taylor â Cluster Manager
Numerica Corporation (www.numerica.us)
5042 Technology Parkway #100
Fort Collins, Colorado 80528
âï (970) 207 2233
ð wesley.taylor@xxxxxxxxxxx



Public Content

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Wesley Taylor
Sent: Tuesday, July 7, 2020 6:34 PM
To: 'htcondor-users@xxxxxxxxxxx' <htcondor-users@xxxxxxxxxxx>
Subject: [External] - [HTCondor-users] Help with authentication and condor mapfile for strong security

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


Hi all,

I had my Condors hissing and being silent as they should, but then I enabled the Strong security template and as expected, everything stopped working.

I read through the HTCondor documentation with regards to security in its entirety located at: https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fhtcondor.readthedocs.io%2Fen%2Fstable%2Fadmin-manual%2Fsecurity.html%3Fhighlight%3Dmapfile%23security&amp;data=02%7C01%7C%7C63920a8476584554f26d08d822d6d298%7Cfae7a2aedf1d444e91bebabb0900b9c2%7C0%7C0%7C637297653345981231&amp;sdata=%2BFcaI9lWYxS7LEVqqUcqNHdTRW%2FP367le9jZuUCTjgY%3D&amp;reserved=0 but I still have a few questions:
1. If I am using realmd to configure Kerberos and sssd to work with an Active Directory server, how do I configure Active Directory to have appropriate properties so that I can use Kerberos authentication with HTCondor?
2. How can I verify my HTCondor mapfile is correct? It appears below that my condor_schedd is unable to authenticate with the shared port because there is no mapped uid, but based on the documentation, I am a little fuzzy on how to make a correct mapping for my condor_schedd.

Security config:
===================================================
@use SECURITY : Strong
SEC_PASSWORD_FILE = /etc/condor/passwords.d/POOL SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD ALLOW_DAEMON = * ALLOW_NEGOTIATOR = * ===================================================

SchedLog:
===================================================================================================================================================================================================
07/02/20 19:16:19 ******************************************************
07/02/20 19:16:19 ** condor_schedd (CONDOR_SCHEDD) STARTING UP
07/02/20 19:16:19 ** /usr/sbin/condor_schedd
07/02/20 19:16:19 ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
07/02/20 19:16:19 ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
07/02/20 19:16:19 ** $CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $
07/02/20 19:16:19 ** $CondorPlatform: x86_64_CentOS7 $
07/02/20 19:16:19 ** PID = 24136
07/02/20 19:16:19 ** Log last touched time unavailable (No such file or directory)
07/02/20 19:16:19 ******************************************************
07/02/20 19:16:19 Using config source: /etc/condor/condor_config
07/02/20 19:16:19 Using local config sources:
07/02/20 19:16:19    /etc/condor/config.d/49-common
07/02/20 19:16:19    /etc/condor/config.d/50-security
07/02/20 19:16:19    /etc/condor/config.d/51-role-exec
07/02/20 19:16:19    /etc/condor/condor_config.local
07/02/20 19:16:19 config Macros = 71, Sorted = 71, StringBytes = 1922, TablesBytes = 2620
07/02/20 19:16:19 CLASSAD_CACHING is ENABLED
07/02/20 19:16:19 Daemon Log is logging: D_ALWAYS D_ERROR
07/02/20 19:16:19 SharedPortEndpoint: waiting for connections to named socket 24123_f333_3
07/02/20 19:16:19 DaemonCore: command socket at <172.20.0.56:9618?addrs=172.20.0.56-9618&noUDP&sock=24123_f333_3>
07/02/20 19:16:19 DaemonCore: private command socket at <172.20.0.56:9618?addrs=172.20.0.56-9618&noUDP&sock=24123_f333_3>
07/02/20 19:16:19 History file rotation is enabled.
07/02/20 19:16:19   Maximum history file size is: 20971520 bytes
07/02/20 19:16:19   Number of rotated history files is: 2
07/02/20 19:16:19 my_popenv: Failed to exec in child, errno=2 (No such file or directory)
07/02/20 19:16:19 Failed to execute /usr/sbin/condor_shadow.std, ignoring
07/02/20 19:16:19 Reloading job factories
07/02/20 19:16:19 Loaded 0 job factories, 0 were paused, 0 failed to load
07/02/20 19:16:25 TransferQueueManager stats: active up=0/100 down=0/100; waiting up=0 down=0; wait time up=0s down=0s
07/02/20 19:16:25 TransferQueueManager upload 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
07/02/20 19:16:25 TransferQueueManager download 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
07/02/20 19:16:51 DC_AUTHENTICATE: authentication of <172.20.0.56:41253> did not result in a valid mapped user name, which is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting.
07/02/20 19:16:51 DC_AUTHENTICATE: reason for authentication failure: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD ===================================================================================================================================================================================================

Thank you all for the help as always,
Wes

Wesley Taylor â Cluster Manager
Numerica Corporation (https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.numerica.us%2F&amp;data=02%7C01%7C%7C63920a8476584554f26d08d822d6d298%7Cfae7a2aedf1d444e91bebabb0900b9c2%7C0%7C0%7C637297653345981231&amp;sdata=BteIaHgLTOzaRDl3glhh9Oott4Z8TOv0n%2BMHKYGj%2FuQ%3D&amp;reserved=0)
5042 Technology Parkway #100
Fort Collins, Colorado 80528
âï (970) 207 2233
ð wesley.taylor@xxxxxxxxxxx



Public Content

Attachment: smime.p7s
Description: S/MIME cryptographic signature