[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Kerberos AS-REPs for Daemon communication not cached



Hi Oliver,

Yeah, I didn't figure you would want to do any live experimentation.  On a closer reading, I see I also missed your point about the missing opt_set_out_ccache() call.
 
Thanks again for your detailed investigation and I'll get back to you once I've had a chance to look more closely.


Cheers,
-zach

ïOn 5/23/19, 3:02 AM, "Oliver Freyermuth" <freyermuth@xxxxxxxxxxxxxxxxxx> wrote:

    Dear Zach,
    
    many thanks for your reply!
    I would prefer not to test on our production central managers (they use the CentOS 7 default, which is the kernel keyring).
    But I would be astonished if the behaviour would change by using a different KRB5_CCNAME. It seems the credentials are not stored in any cache, at least from the kinit code it looks like this needs two dedicated API calls which are missing.
    
    Cheers and many thanks for looking into it!
    	Oliver
    
    Am 23.05.19 um 03:37 schrieb Zach Miller:
    > Hi Oliver,
    > 
    > Thank you for the report.  I will indeed look into this.  I believe the addition of Kerberos support in HTCondor predates the existence of kernel keyrings, so it is definitely worth revisiting the API calls made by HTCondor to make sure we are correctly supporting that mode of operation.
    > 
    > I doubt that you do, but if you want to try configuring your Kerberos setup on some machine to use a FILE: credential cache instead of a KEYRING: credential cache, you could see if that is indeed the cause of the problem.
    > 
    > Either way, we will investigate and let you know what we find.  Thanks again.
    > 
    > 
    > Cheers,
    > -zach
    > 
    > 
    > ïOn 5/22/19, 7:56 PM, "HTCondor-users on behalf of Oliver Freyermuth" <htcondor-users-bounces@xxxxxxxxxxx on behalf of freyermuth@xxxxxxxxxxxxxxxxxx> wrote:
    > 
    >      Dear HTCondor experts,
    >      
    >      trying to read through "kinit" itself:
    >      https://github.com/krb5/krb5/blob/master/src/clients/kinit/kinit.c
    >      I do mainly see two major differences to HTCondor code:
    >      - They use "krb5_cc_resolve" always first to check if the a credential cache exists, and if it does, they use:
    >        "krb5_get_init_creds_opt_set_in_ccache" to have the init command use it.
    >      - They are using "krb5_get_init_creds_opt_set_out_ccache" to enable storage of the fetched credential in the cache.
    >      
    >      Maybe that is already sufficient?
    >      I am unsure about the parameters and implications, but maybe the HTCondor authentication expert can use this information.
    >      
    >      For reference, the manual steps would be:
    >      ------------------------------------------
    >      $ kinit -k host/condor-cm1.domain/REALM
    >      $ kvno host/schedd1.domain@REALM
    >      $ kvno host/condor-cm1.domain@REALM
    >      ------------------------------------------
    >      
    >      to get the wanted behaviour (i.e. credentials go to cache by default with kinit and kvno):
    >      ------------------------------------------
    >      $ klist -Af
    >      Ticket cache: KEYRING:persistent:0:0
    >      Default principal: host/condor-cm1.domain@REALM
    >      
    >      Valid starting     Expires            Service principal
    >      05/23/19 02:54:29  05/24/19 02:52:58  host/condor-cm1.domain@REALM
    >              renew until 05/30/19 02:52:58, Flags: FRT
    >      05/23/19 02:53:01  05/24/19 02:52:58  host/schedd1.domain@REALM
    >              renew until 05/30/19 02:52:58, Flags: FRT
    >      05/23/19 02:52:58  05/24/19 02:52:58  krbtgt/REALM@REALM
    >              renew until 05/30/19 02:52:58, Flags: FRI
    >      ------------------------------------------
    >      
    >      Hope this helps!
    >      
    >      Cheers,
    >      	Oliver
    >      
    >      Am 23.05.19 um 02:06 schrieb Oliver Freyermuth:
    >      > Dear HTCondor experts,
    >      >
    >      > we've observed hefty AS-REQs (Kerberos Authentication Service Requests) with rates up to several hundred requests per second
    >      > when a lot of jobs are started and daemons (using Kerberos auth) need to talk to each other, issued by the central manager node (running negotiator and collector).
    >      >
    >      > I can also reproduce that more easily by running "condor_q -all -global" as "root" user who does not have Kerberos credentials on our condor-cm (central manager),
    >      > but can access the host principal (and hence use the service credentials to authenticate). A snippet from the debug logs running condor_q confirms my observation:
    >      >
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) KERBEROS: Server principal is host/schedd1.domain@REALM
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: client principal is 'host/condor-cm1.domain@REALM'
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: Using default keytab FILE:/etc/krb5.keytab
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: Trying to get tgt credential for service host/schedd1@REALM
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_PRIV) PRIV_UNKNOWN --> PRIV_ROOT at /slots/10/dir_2560730/userdir/.tmpV7H12D/BUILD/condor-8.8.2/src/condor_io/condor_auth_kerberos.cpp:632
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_PRIV) PRIV_ROOT --> PRIV_UNKNOWN at /slots/10/dir_2560730/userdir/.tmpV7H12D/BUILD/condor-8.8.2/src/condor_io/condor_auth_kerberos.cpp:634
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: gic_kt creds_->client is 'host/condor-cm1.domain@REALM'
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: gic_kt creds_->server is 'host/schedd1.domain@REALM'
    >      > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) Success..........................
    >      >
    >      > It seems that in daemon authentication, a fresh credential is fetched for each single daemon-to-daemon interaction. We realized that since the KDC of our computing centre got DOSed by that
    >      > and the service failed (twice up to now).
    >      > Fetching a credential means, in "Kerberos speak" issuing an AS-REQ and having the KDC generate an AS-REP. This is computationally pretty expensive on the KDC end.
    >      >
    >      > Our computing centre is trying to improve the situation on their end to stand this hefty load better, but still it's best practice in Kerberos to cache AS-REPs.
    >      >
    >      > Could caching be added?
    >      > Sadly, I do not have a straightforward suggestion what the implementation is missing to get that - for user credentials, the Kerberos library takes care of that automatically
    >      > (by using credential caches in files or the persistent kernel keyring), but that does not seem to happen for host / service credentials with HTCondor. Maybe HTCondor purges them after usage?
    >      > But I did not find that explicitly in the code.
    >      > However, issuing:
    >      > kinit -k host/condor-cm1.domain@REALM
    >      > successfully adds a TGT to the credential cache (in our case, the persistent kernel keyring), as I would expect it. But that does not happen with HTCondor.
    >      >
    >      > Cheers,
    >      > 	Oliver
    >      >
    >      
    >      
    >      --
    >      Oliver Freyermuth
    >      UniversitÃt Bonn
    >      Physikalisches Institut, Raum 1.047
    >      NuÃallee 12
    >      53115 Bonn
    >      --
    >      Tel.: +49 228 73 2367
    >      Fax:  +49 228 73 7869
    >      --
    >      
    >      
    > 
    > 
    > _______________________________________________
    > HTCondor-users mailing list
    > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    > subject: Unsubscribe
    > You can also unsubscribe by visiting
    > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
    > 
    > The archives can be found at:
    > https://lists.cs.wisc.edu/archive/htcondor-users/
    > 
    
    
    -- 
    Oliver Freyermuth
    UniversitÃt Bonn
    Physikalisches Institut, Raum 1.047
    NuÃallee 12
    53115 Bonn
    --
    Tel.: +49 228 73 2367
    Fax:  +49 228 73 7869
    --