[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Kerberos AS-REPs for Daemon communication not cached



Hi Oliver,

Thank you for the report.  I will indeed look into this.  I believe the addition of Kerberos support in HTCondor predates the existence of kernel keyrings, so it is definitely worth revisiting the API calls made by HTCondor to make sure we are correctly supporting that mode of operation.

I doubt that you do, but if you want to try configuring your Kerberos setup on some machine to use a FILE: credential cache instead of a KEYRING: credential cache, you could see if that is indeed the cause of the problem.

Either way, we will investigate and let you know what we find.  Thanks again.


Cheers,
-zach


ïOn 5/22/19, 7:56 PM, "HTCondor-users on behalf of Oliver Freyermuth" <htcondor-users-bounces@xxxxxxxxxxx on behalf of freyermuth@xxxxxxxxxxxxxxxxxx> wrote:

    Dear HTCondor experts,
    
    trying to read through "kinit" itself:
    https://github.com/krb5/krb5/blob/master/src/clients/kinit/kinit.c
    I do mainly see two major differences to HTCondor code:
    - They use "krb5_cc_resolve" always first to check if the a credential cache exists, and if it does, they use:
      "krb5_get_init_creds_opt_set_in_ccache" to have the init command use it. 
    - They are using "krb5_get_init_creds_opt_set_out_ccache" to enable storage of the fetched credential in the cache. 
    
    Maybe that is already sufficient? 
    I am unsure about the parameters and implications, but maybe the HTCondor authentication expert can use this information. 
    
    For reference, the manual steps would be:
    ------------------------------------------
    $ kinit -k host/condor-cm1.domain/REALM
    $ kvno host/schedd1.domain@REALM
    $ kvno host/condor-cm1.domain@REALM
    ------------------------------------------
    
    to get the wanted behaviour (i.e. credentials go to cache by default with kinit and kvno):
    ------------------------------------------
    $ klist -Af
    Ticket cache: KEYRING:persistent:0:0
    Default principal: host/condor-cm1.domain@REALM
    
    Valid starting     Expires            Service principal
    05/23/19 02:54:29  05/24/19 02:52:58  host/condor-cm1.domain@REALM
            renew until 05/30/19 02:52:58, Flags: FRT
    05/23/19 02:53:01  05/24/19 02:52:58  host/schedd1.domain@REALM
            renew until 05/30/19 02:52:58, Flags: FRT
    05/23/19 02:52:58  05/24/19 02:52:58  krbtgt/REALM@REALM
            renew until 05/30/19 02:52:58, Flags: FRI
    ------------------------------------------
    
    Hope this helps! 
    
    Cheers,
    	Oliver
    
    Am 23.05.19 um 02:06 schrieb Oliver Freyermuth:
    > Dear HTCondor experts,
    > 
    > we've observed hefty AS-REQs (Kerberos Authentication Service Requests) with rates up to several hundred requests per second
    > when a lot of jobs are started and daemons (using Kerberos auth) need to talk to each other, issued by the central manager node (running negotiator and collector). 
    > 
    > I can also reproduce that more easily by running "condor_q -all -global" as "root" user who does not have Kerberos credentials on our condor-cm (central manager),
    > but can access the host principal (and hence use the service credentials to authenticate). A snippet from the debug logs running condor_q confirms my observation:
    > 
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) KERBEROS: Server principal is host/schedd1.domain@REALM
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: client principal is 'host/condor-cm1.domain@REALM'
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: Using default keytab FILE:/etc/krb5.keytab
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: Trying to get tgt credential for service host/schedd1@REALM
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_PRIV) PRIV_UNKNOWN --> PRIV_ROOT at /slots/10/dir_2560730/userdir/.tmpV7H12D/BUILD/condor-8.8.2/src/condor_io/condor_auth_kerberos.cpp:632
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_PRIV) PRIV_ROOT --> PRIV_UNKNOWN at /slots/10/dir_2560730/userdir/.tmpV7H12D/BUILD/condor-8.8.2/src/condor_io/condor_auth_kerberos.cpp:634
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: gic_kt creds_->client is 'host/condor-cm1.domain@REALM'
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: gic_kt creds_->server is 'host/schedd1.domain@REALM'
    > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) Success..........................
    > 
    > It seems that in daemon authentication, a fresh credential is fetched for each single daemon-to-daemon interaction. We realized that since the KDC of our computing centre got DOSed by that
    > and the service failed (twice up to now). 
    > Fetching a credential means, in "Kerberos speak" issuing an AS-REQ and having the KDC generate an AS-REP. This is computationally pretty expensive on the KDC end. 
    > 
    > Our computing centre is trying to improve the situation on their end to stand this hefty load better, but still it's best practice in Kerberos to cache AS-REPs. 
    > 
    > Could caching be added? 
    > Sadly, I do not have a straightforward suggestion what the implementation is missing to get that - for user credentials, the Kerberos library takes care of that automatically
    > (by using credential caches in files or the persistent kernel keyring), but that does not seem to happen for host / service credentials with HTCondor. Maybe HTCondor purges them after usage? 
    > But I did not find that explicitly in the code. 
    > However, issuing:
    > kinit -k host/condor-cm1.domain@REALM
    > successfully adds a TGT to the credential cache (in our case, the persistent kernel keyring), as I would expect it. But that does not happen with HTCondor. 
    > 
    > Cheers,
    > 	Oliver
    > 
    
    
    -- 
    Oliver Freyermuth
    UniversitÃt Bonn
    Physikalisches Institut, Raum 1.047
    NuÃallee 12
    53115 Bonn
    --
    Tel.: +49 228 73 2367
    Fax:  +49 228 73 7869
    --