[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [External] - Help with authentication and condor mapfile for strong security



Hi Wesly,

(Sorry I hadn't yet followed up on your earlier email)

I did a quick scan through your ALLOW_* settings and I noticed that you have things like "condor*/*".

I think you will need to change this to "condor@*/*".  Without digging into the code, I'm fairly certain that it looks for the presence of an '@' sign to figure out how to interpret each entry and not having one might be causing (some of the) trouble.

Also, re: using PASSWORD unexpectedly, run this:
	condor_config_val -dump SEC_

And verify that everything is the way you think it should be.  Have you reconfiged/restarted the daemons since making config changes?

Let me read the rest of your mail more thoroughly and get back to you.  In the meantime try the above and see if you get any further.


Cheers,
-zach


ïOn 7/9/20, 3:32 PM, "HTCondor-users on behalf of Wesley Taylor" <htcondor-users-bounces@xxxxxxxxxxx on behalf of wesley.taylor@xxxxxxxxxxx> wrote:

    I am replying to this thread because I have debugged a few things that are most certainly related, but I still haven't solved my problem.

    First step was increasing the logging, I thought I had it higher than I did, but I went up to D_SECURITY:3. After doing this I found out condor was failing DNS lookups for other machines because it was using the wrong interface, so the machines were unable to match their domain names with their allow-list.

    Then I fixed an issue I was having with getting my machines to spit up FQDNs instead of normal domain names via the DEFAULT_DOMAIN_NAME macro.

    After this, it seemed like rolling with Kerberos authentication would be a better fit, but I still can't get my machines to authenticate with each other, nor is Condor able to authenticate any of my domain users. 

    I updated my security config to look like the following:
    ==============================================
    @use SECURITY : Strong
    SEC_DEFAULT_AUTHENTICATION_METHODS = KERBEROS

    ALLOW_READ            = */*
    ALLOW_WRITE           = */*
    ALLOW_ADMINISTRATOR   = condor-admin*/*
    ALLOW_CONFIG          = condor-admin*/*
    ALLOW_NEGOTIATOR      = condor*/submit1*
    ALLOW_DAEMON          = condor*/*
    ==============================================
    condor-admin is a valid domain user, and submit1 is where my condor_schedd daemon lives.

    But when I fire up condor, from my understanding for some reason my schedd daemon is sending the following classad to try and authenticate with the manager:

    ================================================================================================
    ServerCommandSock = "<192.168.0.68:9618?addrs=192.168.0.68-9618&noUDP&sock=3949_4396_3>"
    Enact = "YES"
    Subsystem = "SCHEDD"
    ParentUniqueID = "submit1:3949:1594324900"
    TriedAuthentication = true
    Integrity = "YES"
    ServerPid = 3988
    Encryption = "YES"
    Authentication = "NO"
    RemoteVersion = "$CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $"
    SessionLease = 3600
    OutgoingNegotiation = "REQUIRED"
    User = "condor@parent"
    UseSession = "YES"
    CryptoMethods = "3DES"
    Sid = "3e7fbe4351131b1ebe8437b870ffb34994c8a91b8ba1e0f9"
    ValidCommands = "60000,60008,60026,60017,60004,60012,60021,60043,60007,457,60020,60044"
    Command = 60008
    SessionDuration = "86400"
    AuthMethods = "PASSWORD"
    ====================================================================================================

    Which is throwing me for a loop, because PASSWORD is not listed as an authentication method in my security config.

    My manager node is sending back the following response:
    ====================================================================================================
    Encryption = "YES"
    Integrity = "YES"
    AuthMethodsList = ""
    CryptoMethods = "3DES,BLOWFISH"
    Authentication = "YES"
    SessionDuration = "86400"
    SessionLease = 3600
    RemoteVersion = "$CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $"
    Enact = "YES"
    =====================================================================================================
    Which seems to suggest it isn't finding any authentication methods in common.

    But even then, when I switched from KERBEROS to PASSWORD authentication, when I try to run condor_q from the user condor-admin on my machine with schedd, I see the following appear in the logfile:
    ================================================================================================================
    07/09/20 13:58:50 DC_AUTHENTICATE: authentication of <192.168.0.68:13883> did not result in a valid mapped user name, which is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting.
    ================================================================================================================
    Which I don't think makes sense, because that username would match with the rule I have in my config, wouldn't it?

    Is there anything here which is standing out as something I can investigate further? I seem to be a little stuck in the water.

    Thanks all,
    Wes

    Wesley Taylor â Cluster Manager
    Numerica Corporation (www.numerica.us)
    5042 Technology Parkway #100
    Fort Collins, Colorado 80528
    âï (970) 207 2233
    ð wesley.taylor@xxxxxxxxxxx



    Public Content

    -----Original Message-----
    From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Wesley Taylor
    Sent: Tuesday, July 7, 2020 6:34 PM
    To: 'htcondor-users@xxxxxxxxxxx' <htcondor-users@xxxxxxxxxxx>
    Subject: [External] - [HTCondor-users] Help with authentication and condor mapfile for strong security

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.


    Hi all,

    I had my Condors hissing and being silent as they should, but then I enabled the Strong security template and as expected, everything stopped working.

    I read through the HTCondor documentation with regards to security in its entirety located at: https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fhtcondor.readthedocs.io%2Fen%2Fstable%2Fadmin-manual%2Fsecurity.html%3Fhighlight%3Dmapfile%23security&amp;data=02%7C01%7C%7C63920a8476584554f26d08d822d6d298%7Cfae7a2aedf1d444e91bebabb0900b9c2%7C0%7C0%7C637297653345981231&amp;sdata=%2BFcaI9lWYxS7LEVqqUcqNHdTRW%2FP367le9jZuUCTjgY%3D&amp;reserved=0 but I still have a few questions:
    1. If I am using realmd to configure Kerberos and sssd to work with an Active Directory server, how do I configure Active Directory to have appropriate properties so that I can use Kerberos authentication with HTCondor?
    2. How can I verify my HTCondor mapfile is correct? It appears below that my condor_schedd is unable to authenticate with the shared port because there is no mapped uid, but based on the documentation, I am a little fuzzy on how to make a correct mapping for my condor_schedd.

    Security config:
    ===================================================
    @use SECURITY : Strong
    SEC_PASSWORD_FILE = /etc/condor/passwords.d/POOL SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD ALLOW_DAEMON = * ALLOW_NEGOTIATOR = * ===================================================

    SchedLog:
    ===================================================================================================================================================================================================
    07/02/20 19:16:19 ******************************************************
    07/02/20 19:16:19 ** condor_schedd (CONDOR_SCHEDD) STARTING UP
    07/02/20 19:16:19 ** /usr/sbin/condor_schedd
    07/02/20 19:16:19 ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
    07/02/20 19:16:19 ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
    07/02/20 19:16:19 ** $CondorVersion: 8.8.9 May 07 2020 BuildID: 503236 PackageID: 8.8.9-1 FIPS $
    07/02/20 19:16:19 ** $CondorPlatform: x86_64_CentOS7 $
    07/02/20 19:16:19 ** PID = 24136
    07/02/20 19:16:19 ** Log last touched time unavailable (No such file or directory)
    07/02/20 19:16:19 ******************************************************
    07/02/20 19:16:19 Using config source: /etc/condor/condor_config
    07/02/20 19:16:19 Using local config sources:
    07/02/20 19:16:19    /etc/condor/config.d/49-common
    07/02/20 19:16:19    /etc/condor/config.d/50-security
    07/02/20 19:16:19    /etc/condor/config.d/51-role-exec
    07/02/20 19:16:19    /etc/condor/condor_config.local
    07/02/20 19:16:19 config Macros = 71, Sorted = 71, StringBytes = 1922, TablesBytes = 2620
    07/02/20 19:16:19 CLASSAD_CACHING is ENABLED
    07/02/20 19:16:19 Daemon Log is logging: D_ALWAYS D_ERROR
    07/02/20 19:16:19 SharedPortEndpoint: waiting for connections to named socket 24123_f333_3
    07/02/20 19:16:19 DaemonCore: command socket at <172.20.0.56:9618?addrs=172.20.0.56-9618&noUDP&sock=24123_f333_3>
    07/02/20 19:16:19 DaemonCore: private command socket at <172.20.0.56:9618?addrs=172.20.0.56-9618&noUDP&sock=24123_f333_3>
    07/02/20 19:16:19 History file rotation is enabled.
    07/02/20 19:16:19   Maximum history file size is: 20971520 bytes
    07/02/20 19:16:19   Number of rotated history files is: 2
    07/02/20 19:16:19 my_popenv: Failed to exec in child, errno=2 (No such file or directory)
    07/02/20 19:16:19 Failed to execute /usr/sbin/condor_shadow.std, ignoring
    07/02/20 19:16:19 Reloading job factories
    07/02/20 19:16:19 Loaded 0 job factories, 0 were paused, 0 failed to load
    07/02/20 19:16:25 TransferQueueManager stats: active up=0/100 down=0/100; waiting up=0 down=0; wait time up=0s down=0s
    07/02/20 19:16:25 TransferQueueManager upload 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
    07/02/20 19:16:25 TransferQueueManager download 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
    07/02/20 19:16:51 DC_AUTHENTICATE: authentication of <172.20.0.56:41253> did not result in a valid mapped user name, which is required for this command (519 QUERY_JOB_ADS_WITH_AUTH), so aborting.
    07/02/20 19:16:51 DC_AUTHENTICATE: reason for authentication failure: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD ===================================================================================================================================================================================================

    Thank you all for the help as always,
    Wes

    Wesley Taylor â Cluster Manager
    Numerica Corporation (https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.numerica.us%2F&amp;data=02%7C01%7C%7C63920a8476584554f26d08d822d6d298%7Cfae7a2aedf1d444e91bebabb0900b9c2%7C0%7C0%7C637297653345981231&amp;sdata=BteIaHgLTOzaRDl3glhh9Oott4Z8TOv0n%2BMHKYGj%2FuQ%3D&amp;reserved=0)
    5042 Technology Parkway #100
    Fort Collins, Colorado 80528
    âï (970) 207 2233
    ð wesley.taylor@xxxxxxxxxxx



    Public Content