[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Krb for grid universe jobs



Hello,

I just wanted to let you know that I am looking at this email and your log files.

The GridManager on the SchedD node runs as root, and then spawns a "condor_c-gahp" process as the user.  It's this process that will need access to your Kerberos credential.  Could you also send the log file for that process?

In general, we are working to add better "1st class" Kerberos support in the same way that HTCondor understands GSI proxies.  However, using Kerberos for remote Condor-C is not something we have yet tackled, and your Kerberos credentials are not automatically forwarded along just because condor_submit authenticated to the SchedD with krb.  We may be able to get something going but it will likely require a fair amount of configuration and some supporting scripts.  The other unfortunate part is this is not yet well documented because it is still a work in progress.

Perhaps we can continue this discussion off-list while we work out the details and then report back once we know more.  But let's start by taking a look at the condor_c-gahp logs.  Thank you.


Cheers,
-zach


ïOn 9/11/19, 12:02 AM, "HTCondor-users on behalf of Asvija B" <htcondor-users-bounces@xxxxxxxxxxx on behalf of asvijab@xxxxxxx> wrote:

    Hi,
    I am trying to submit a grid universe job to a remote machine.  The Schedd on the submit node is able to correctly recognize and authenticate my kerberos credentials.  However the remote Schedd still fails to authenticate with Kerberos.    I have enabled debugging
     on GridManager log on the submit node with D_ALL:2.  Upon inspection, the GridManager on the submit node is not selecting the proper kerberos credential for authenticating the remote schedd, instead it is using 'unauthenticated@unmapped' as the user. 
    
    How do I make the GridManager on the submit node to select the proper kerberos credential.  (The Schedd on the submit node is recognizing proper credentials and the client debug output also shows valid kerberos credentials).  Below are the various log outputs.
    
    
    Config file on submit node (gridfs.nsgtest.cdac.in  IP: 10.180.141.148) :
    
    SEC_DEFAULT_AUTHENTICATION_METHODS = KERBEROS
    KERBEROS_MAP_FILE = $(RELEASE_DIR)/etc/condor.kmap
    CERTIFICATE_MAPFILE = /usr/local/nsg/condor/etc/usermap
    SCHEDD_DEBUG            = D_SECURITY
    GRIDMANAGER_DEBUG       = D_ALL:2
    Condor map file on submit node:
    
    [root@gridfs log]# cat /usr/local/nsg/condor/etc/condor.kmap
    NSGTEST.CDAC.IN = nsgtest.cdac.in
    User map file on submit node:
    
    [root@gridfs log]# cat /usr/local/nsg/condor/etc/usermap
    FS (.*) \1
    FS_REMOTE (.*) \1
    GSI (.*) GSS_ASSIST_GRIDMAP
    SSL (.*) ssl@unmapped
    KERBEROS ([^/]*)/?[^@]*@(.*) \1@\2
    NTSSPI (.*) \1
    CLAIMTOBE (.*) \1
    PASSWORD (.*) \1
    
    
    Config file on remote node (grid-1-0.nsgtest.cdac.in  IP:  10.180.141.111) :
    SEC_DEFAULT_AUTHENTICATION_METHODS = KERBEROS
    KERBEROS_MAP_FILE = $(RELEASE_DIR)/etc/condor.kmap
    CERTIFICATE_MAPFILE = /usr/local/nsg/condor/etc/usermap
    
    
    Job script:
    [asvija@gridfs condor]$ cat condor-universe.job
    universe = grid
    executable = /bin/hostname
    output = myoutput
    error = myerror
    log = mylog
    
    grid_resource = condor grid-1-0.nsgtest.cdac.in grid-1-0.nsgtest.cdac.in
    +remote_jobuniverse = 5
    +remote_requirements = True
    +remote_ShouldTransferFiles = "YES"
    +remote_WhenToTransferOutput = "ON_EXIT"
    queue
    
    
    
    Client side debug output:
    
    [asvija@gridfs condor]$ _condor_TOOL_DEBUG=D_SECURITY condor_submit -debug condor-universe.job 2>&1 | tee out
    09/09/19 12:02:55 KEYCACHE: created: 0xf46150
    09/09/19 12:02:55 Can't open directory "/opt/condor//config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
    09/09/19 12:02:55 Cannot open /opt/condor//config: No such file or directory
    Submitting job(s)09/09/19 12:02:55 CRED: NO MODULES REQUESTED
    09/09/19 12:02:55 SECMAN: command 1112 QMGMT_WRITE_CMD to schedd at <10.180.141.148:9618> from TCP port 22978 (blocking).
    09/09/19 12:02:55 SECMAN: new session, doing initial authentication.
    09/09/19 12:02:55 SECMAN: Auth methods: KERBEROS
    09/09/19 12:02:55 AUTHENTICATE: setting timeout for <10.180.141.148:9618?addrs=10.180.141.148-9618&noUDP&sock=95471_0ec0_4> to 20.
    09/09/19 12:02:55 HANDSHAKE: in handshake(my_methods = 'KERBEROS')
    09/09/19 12:02:55 HANDSHAKE: handshake() - i am the client
    09/09/19 12:02:55 HANDSHAKE: sending (methods == 64) to server
    09/09/19 12:02:55 HANDSHAKE: server replied (method = 64)
    09/09/19 12:02:55 KERBEROS: krb5_unparse_name: 
    host/gridfs.nsgtest.cdac.in@xxxxxxxxxxxxxxx <mailto:host/gridfs.nsgtest.cdac.in@xxxxxxxxxxxxxxx>
    09/09/19 12:02:55 KERBEROS: no user yet determined, will grab up to slash
    09/09/19 12:02:55 KERBEROS: picked user: host
    09/09/19 12:02:55 KERBEROS: remapping 'host' to 'condor'
    09/09/19 12:02:55 Client is 
    condor@xxxxxxxxxxxxxxx <mailto:condor@xxxxxxxxxxxxxxx>
    09/09/19 12:02:55 KERBEROS: Server principal is 
    host/gridfs.nsgtest.cdac.in@xxxxxxxxxxxxxxx <mailto:host/gridfs.nsgtest.cdac.in@xxxxxxxxxxxxxxx>
    09/09/19 12:02:55 Acquiring credential for user
    09/09/19 12:02:55 Successfully located credential cache
    09/09/19 12:02:55 Remote host is 10.180.141.148
    09/09/19 12:02:55 Authentication was a Success.
    09/09/19 12:02:55 ZKM: setting default map to 
    condor@xxxxxxxxxxxxxxx <mailto:condor@xxxxxxxxxxxxxxx>
    09/09/19 12:02:55 ZKM: name to map is 'host/gridfs.nsgtest.cdac.in@xxxxxxxxxxxxxxx'
    09/09/19 12:02:55 ZKM: pre-map: current user is 'condor'
    09/09/19 12:02:55 ZKM: pre-map: current domain is 'nsgtest.cdac.in'
    09/09/19 12:02:55 ZKM: Parsing map file.
    09/09/19 12:02:55 ZKM: attempting to map 'host/gridfs.nsgtest.cdac.in@xxxxxxxxxxxxxxx'
    09/09/19 12:02:55 ZKM: 1: attempting to map 'host/gridfs.nsgtest.cdac.in@xxxxxxxxxxxxxxx'
    09/09/19 12:02:55 ZKM: 2: mapret: 0 included_voms: 0 canonical_user: 
    host@xxxxxxxxxxxxxxx <mailto:host@xxxxxxxxxxxxxxx>
    09/09/19 12:02:55 ZKM: found user 
    host@xxxxxxxxxxxxxxx <mailto:host@xxxxxxxxxxxxxxx>, splitting.
    09/09/19 12:02:55 ZKM: post-map: current user is 'host'
    09/09/19 12:02:55 ZKM: post-map: current domain is 'nsgtest.cdac.in'
    09/09/19 12:02:55 ZKM: post-map: current FQU is 'host@xxxxxxxxxxxxxxx'
    09/09/19 12:02:55 AUTHENTICATE: Exchanging keys with remote side.
    09/09/19 12:02:55 AUTHENTICATE: Result of end of authenticate is 1.
    09/09/19 12:02:55 SECMAN: added session gridfs:95518:1568010775:0 to cache for 60 seconds (3600s lease).
    09/09/19 12:02:55 SECMAN: startCommand succeeded.
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission ALLOW
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission READ
    09/09/19 12:02:55 ipverify: READ optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission WRITE
    09/09/19 12:02:55 ipverify: WRITE optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission NEGOTIATOR
    09/09/19 12:02:55 ipverify: NEGOTIATOR optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission ADMINISTRATOR
    09/09/19 12:02:55 ipverify: ADMINISTRATOR optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission OWNER
    09/09/19 12:02:55 ipverify: OWNER optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission CONFIG
    09/09/19 12:02:55 ipverify: CONFIG optimized to deny everyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission DAEMON
    09/09/19 12:02:55 ipverify: DAEMON optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission SOAP
    09/09/19 12:02:55 ipverify: SOAP optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission DEFAULT
    09/09/19 12:02:55 ipverify: DEFAULT optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission CLIENT
    09/09/19 12:02:55 ipverify: CLIENT optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission ADVERTISE_STARTD
    09/09/19 12:02:55 ipverify: ADVERTISE_STARTD optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission ADVERTISE_SCHEDD
    09/09/19 12:02:55 ipverify: ADVERTISE_SCHEDD optimized to allow anyone
    09/09/19 12:02:55 IPVERIFY: Subsystem SUBMIT
    09/09/19 12:02:55 IPVERIFY: Permission ADVERTISE_MASTER
    09/09/19 12:02:55 ipverify: ADVERTISE_MASTER optimized to allow anyone
    .
    1 job(s) submitted to cluster 27.
    09/09/19 12:02:55 SECMAN: command 421 RESCHEDULE to local schedd from TCP port 11296 (blocking).
    09/09/19 12:02:55 SECMAN: using session gridfs:95518:1568010775:0 for {<10.180.141.148:9618?addrs=10.180.141.148-9618&noUDP&sock=95471_0ec0_4>,<421>}.
    09/09/19 12:02:55 SECMAN: resume, other side is $CondorVersion: 8.8.4 Jul 09 2019 BuildID: 474941 $, NOT reauthenticating.
    09/09/19 12:02:55 SECMAN: startCommand succeeded.
    [asvija@gridfs condor]$
    
    
    
    Schedd Log on Submit node (gridfs.nsgtest.cdac.in)
    Pls see the contents from this link:
    https://github.com/asvija/condor-slurm/blob/master/Schedd-gridfs.txt
    
    
    
    GridManager Log on Submit node:
    
    Pls see the contents from this link:
    
    https://github.com/asvija/condor-slurm/blob/master/GridmanagerLog.asvija
    
    
    
    Schedd Log on Remote node (grid-1-0.nsgtest.cdac.in)
    09/09/19 12:02:03 KEYCACHE: created: 0xa27150
    09/09/19 12:02:03 Can't open directory "/opt/condor//config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
    09/09/19 12:02:03 Cannot open /opt/condor//config: No such file or directory
    09/09/19 12:02:03 Setting maximum file descriptors to 4096.
    09/09/19 12:02:03 ******************************************************
    09/09/19 12:02:03 ** condor_schedd (CONDOR_SCHEDD) STARTING UP
    09/09/19 12:02:03 ** /usr/local/nsg/condor/sbin/condor_schedd
    09/09/19 12:02:03 ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
    09/09/19 12:02:03 ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
    09/09/19 12:02:03 ** $CondorVersion: 8.8.4 Jul 09 2019 BuildID: 474941 $
    09/09/19 12:02:03 ** $CondorPlatform: x86_64_RedHat7 $
    09/09/19 12:02:03 ** PID = 309007
    09/09/19 12:02:03 ** Log last touched 9/9 12:01:56
    09/09/19 12:02:03 ******************************************************
    09/09/19 12:02:03 Using config source: /usr/local/nsg/condor/etc/condor_config
    09/09/19 12:02:03 Using local config sources:
    09/09/19 12:02:03    /opt/condor//condor_config.local
    09/09/19 12:02:03 config Macros = 100, Sorted = 100, StringBytes = 4012, TablesBytes = 3648
    09/09/19 12:02:03 CLASSAD_CACHING is ENABLED
    09/09/19 12:02:03 Daemon Log is logging: D_ALWAYS D_ERROR D_SECURITY
    09/09/19 12:02:03 SharedPortEndpoint: waiting for connections to named socket 308958_61ed_4
    09/09/19 12:02:03 SECMAN: created non-negotiated security session 828b90d9a353477b5f987995937491be00f0d6e46f223ce0 for 0 (inf) seconds.
    09/09/19 12:02:03 SECMAN: now creating non-negotiated command mappings
    09/09/19 12:02:03 IpVerify::PunchHole: opened DAEMON level to condor@parent
    09/09/19 12:02:03 IpVerify::PunchHole: opened WRITE level to condor@parent
    09/09/19 12:02:03 IpVerify::PunchHole: opened READ level to condor@parent
    09/09/19 12:02:03 IpVerify::PunchHole: open count at level READ for condor@parent now 2
    09/09/19 12:02:03 DaemonCore: command socket at <10.180.141.111:9618?addrs=10.180.141.111-9618&noUDP&sock=308958_61ed_4>
    09/09/19 12:02:03 DaemonCore: private command socket at <10.180.141.111:9618?addrs=10.180.141.111-9618&noUDP&sock=308958_61ed_4>
    09/09/19 12:02:03 History file rotation is enabled.
    09/09/19 12:02:03   Maximum history file size is: 20971520 bytes
    09/09/19 12:02:03   Number of rotated history files is: 20
    09/09/19 12:02:03 IpVerify::PunchHole: opened CLIENT level to execute-side@matchsession
    09/09/19 12:02:03 Reloading job factories
    09/09/19 12:02:03 Loaded 0 job factories, 0 were paused, 0 failed to load
    09/09/19 12:02:03 SECMAN: command 60008 DC_CHILDALIVE to daemon at <10.180.141.111:9618> from TCP port 28294 (blocking).
    09/09/19 12:02:03 SECMAN: using session 828b90d9a353477b5f987995937491be00f0d6e46f223ce0 for {<10.180.141.111:9618?addrs=10.180.141.111-9618&noUDP&sock=308958_61ed>,<60008>}.
    09/09/19 12:02:03 SECMAN: startCommand succeeded.
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission ALLOW
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission READ
    09/09/19 12:02:03 IPVERIFY: allow READ: * (from config value ALLOW_READ)
    09/09/19 12:02:03 ipverify: READ optimized to allow anyone
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission WRITE
    09/09/19 12:02:03 IPVERIFY: allow WRITE: grid-1-0.nsgtest.cdac.in, 10.180.141.111, 10.180.141.148, 10.180.141.111 (from config value ALLOW_WRITE)
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission NEGOTIATOR
    09/09/19 12:02:03 IPVERIFY: allow NEGOTIATOR: grid-1-0.nsgtest.cdac.in, , 10.180.141.111 (from config value ALLOW_NEGOTIATOR_SCHEDD)
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission ADMINISTRATOR
    09/09/19 12:02:03 IPVERIFY: allow ADMINISTRATOR: grid-1-0.nsgtest.cdac.in, 10.180.141.111 (from config value ALLOW_ADMINISTRATOR)
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission OWNER
    09/09/19 12:02:03 IPVERIFY: allow OWNER: grid-1-0.nsgtest.cdac.in, grid-1-0.nsgtest.cdac.in, 10.180.141.111 (from config value ALLOW_OWNER)
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission CONFIG
    09/09/19 12:02:03 ipverify: CONFIG optimized to deny everyone
    09/09/19 12:02:03 IPVERIFY: Subsystem SCHEDD
    09/09/19 12:02:03 IPVERIFY: Permission DAEMON
    09/09/19 12:02:03 IPVERIFY: allow DAEMON: grid-1-0.nsgtest.cdac.in, 10.180.141.111, 10.180.141.148, 10.180.141.111 (from config value ALLOW_WRITE)
    
    
    
    Thanks and regards,
    Asvija
    
    
    
    
    
    
     
    ------------------------------------------------------------------------------------------------------------
    
    [ C-DAC is on Social-Media too. Kindly follow us at: 
    Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] 
    
    This e-mail is for the sole use of the intended recipient(s) and may 
    contain confidential and privileged information. If you are not the 
    intended recipient, please contact the sender by reply e-mail and destroy 
    all copies and the original message. Any unauthorized review, use, 
    disclosure, dissemination, forwarding, printing or copying of this email 
    is strictly prohibited and appropriate legal action will be taken. 
    ------------------------------------------------------------------------------------------------------------