[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor Credd for multiple pools



Hi.  

So CREDD_HOST = CM02.XXXXXXXX.com
But not all pools can find the address of the CREDD 

This message

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Querying collector <10.1.22.53:9618> (CM01.XXXXXXXX.com) with classad:

LocationQuery = "CM02.XXXXXXXX.com"

Projection = "CondorVersion CondorPlatform MyAddress AddressV1 Name Machine"

TargetType = "CredD"

LimitResults = 1

MyType = "Query"

Requirements = ((Name == "CM02.XXXXXXXX.com"))


Shows that this machine is querying the collector at 10.1.22.53 to find the full address and port of the CREDD
with the name CM02.XXXXXX.com.  That query is failing, this is probably because the configuration of the CREDD does not tell it to put its classad into the collector at 10.1.22.53.

Did you add this collector to the COLLECTOR_HOST configuration of the CREDD? 
Did you perhaps forget to reconfig the CREDD after changing its configuration?

If you don't use a fixed port for the CREDD, then the configuration of the CREDD should have

   CREDD.COLLECTOR_HOST = CM01.XXXXX.COM CM02.XXXXX.COM CM03.XXXXX.COM

The other way you can solve this is to configure the CREDD to have a fixed port, rather than have it use the shared port of 9618.   With a fixed port, it will not be necessary to look up the address from the collector before opening a connection, so it will not matter if the CREDD is advertising to all of the collectors.

If you wish to use a fixed port, then the configuration of the CREDD should have

 CREDD_PORT = 9619

And all other machines should have an address and port

  CREDD_HOST = CM02.XXXXXX.com:9619

-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Lachlan Palmer <LPalmer@xxxxxxxxxxxx>
Sent: Tuesday, September 7, 2021 6:00 PM
To: 'HTCondor-Users Mail List' <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Condor Credd for multiple pools
 

TJ,

 

I attempted to setup your solution but I am running into issues with the CREDD not being located on CM02.

 

The configuration iformation for the schedd (CM01) is:

CONDOR_HOST = $(FULL_HOSTNAME)

DAEMON_LIST = MASTER COLLECTOR NEGOTIATOR SCHEDD

 

COLLECTOR     = $(SBIN)/condor_collector.exe

NEGOTIATOR    = $(SBIN)/condor_negotiator.exe

 

CREDD_HOST = CM02.XXXXXXXX.com

CREDD_CACHE_LOCALLY = True

 

 

This was the SchedLog output with D_ALL when trying to run a job:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) New Daemon obj (credd) name: "NULL", pool: "NULL", addr: "NULL"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) No name given, but CREDD_HOST defined to "CM02.XXXXXXXX.com"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Finding proper daemon name for "CM02.XXXXXXXX.com"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Daemon name contains no '@', treating as a regular hostname

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Returning daemon name: "CM02.XXXXXXXX.com"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Using "CM02.XXXXXXXX.com" for name in Daemon object

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Using "CM02.XXXXXXXX.com" for full hostname in Daemon object

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Local daemon name would be "CM01.XXXXXXXX.com"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) COLLECTOR_HOST is set to "CM01.XXXXXXXX.com"

09/07/21 15:36:19 (fd:5) (pid:92) (D_DAEMONCORE) *** TIMEOUT_MULTIPLIER :: 0

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if CM01.XXXXXXXX.com is a sinful address

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) CM01.XXXXXXXX.com is not a sinful address: does not begin with "<"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) New Daemon obj (collector) name: "CM01.XXXXXXXX.com", pool: "NULL", addr: "NULL"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Using name "CM01.XXXXXXXX.com" to find daemon

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Port not specified, using default (9618)

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Host info "CM01.XXXXXXXX.com" is a hostname, finding IP address

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) DNS returned:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   fe80::a4ff:5e4c:bb0:ea3a

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   10.1.22.53

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) We returned:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   10.1.22.53

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   fe80::a4ff:5e4c:bb0:ea3a

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Found IP address and port <10.1.22.53:9618>

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Daemon client (collector) address determined: name: "CM01.XXXXXXXX.com", pool: "CM01.XXXXXXXX.com", alias: "CM01.XXXXXXXX.com", addr: "<10.1.22.53:9618>"

09/07/21 15:36:19 (fd:5) (pid:92) (D_DAEMONCORE) *** TIMEOUT_MULTIPLIER :: 0

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if <10.1.22.53:9618> is a sinful address

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) <10.1.22.53:9618> is a sinful address!

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Daemon client (collector) address determined: name: "NULL", pool: "NULL", alias: "NULL", addr: "<10.1.22.53:9618>"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) New Daemon obj (collector) name: "NULL", pool: "NULL", addr: "<10.1.22.53:9618>"

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if <10.1.22.53:9618> is a sinful address

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) <10.1.22.53:9618> is a sinful address!

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Checking if <10.1.22.53:9618> is a sinful address

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) <10.1.22.53:9618> is a sinful address!

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Already have address, no info to locate

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Address "<10.1.22.53:9618>" specified but no name, looking up host info

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) DNS returned:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   fe80::a4ff:5e4c:bb0:ea3a

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   10.1.22.53

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) We returned:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   10.1.22.53

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)   fe80::a4ff:5e4c:bb0:ea3a

09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) IPVERIFY: for CM01.XXXXXXXX.com matched 10.1.22.53 to 10.1.22.53

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Querying collector <10.1.22.53:9618> (CM01.XXXXXXXX.com) with classad:

LocationQuery = "CM02.XXXXXXXX.com"

Projection = "CondorVersion CondorPlatform MyAddress AddressV1 Name Machine"

TargetType = "CredD"

LimitResults = 1

MyType = "Query"

Requirements = ((Name == "CM02.XXXXXXXX.com"))

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)  --- End of Query ClassAd ---

09/07/21 15:36:19 (fd:5) (pid:92) (D_COMMAND) Daemon::startCommand(QUERY_ANY_ADS,...) making connection to <10.1.22.53:9618>

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Guess address string for host = <10.1.22.53:9618>, port = 0

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) it was sinful string. ip = 10.1.22.53, port = 9618

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) get_port_range - not checking LOWPORT, HIGHPORT for outgoing connection on Windows.

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) CONNECT bound to <10.1.22.53:60022> fd=128 peer=<10.1.22.53:9618>

09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) SECMAN: command 48 QUERY_ANY_ADS to collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com) from TCP port 60022 (blocking).

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_write(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=554,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=5,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=280,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=5,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=187,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) SECMAN: added session CM01:3372:1631054179:21 to cache for 86400 seconds (3600s lease).

09/07/21 15:36:19 (fd:5) (pid:92) (D_SECURITY) SECMAN: startCommand succeeded.

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_write(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=274,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=5,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) condor_read(fd=128 collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com),,size=8,timeout=60,flags=0,non_blocking=0)

09/07/21 15:36:19 (fd:5) (pid:92) (D_NETWORK) CLOSE TCP <10.1.22.53:60022> fd=128

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Destroying Daemon object:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Type: 5 (collector), Name: (null), Addr: <10.1.22.53:9618>

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) FullHost: CM01.XXXXXXXX.com, Host: CM01, Pool: (null), Port: 9618

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) IsLocal: N, IdStr: collector at <10.1.22.53:9618> (CM01.XXXXXXXX.com), Error: (null)

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)  --- End of Daemon object info ---

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Destroying Daemon object:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Type: 5 (collector), Name: CM01.XXXXXXXX.com, Addr: <10.1.22.53:9618>

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) FullHost: CM01.XXXXXXXX.com, Host: CM01, Pool: CM01.XXXXXXXX.com, Port: 9618

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) IsLocal: N, IdStr: (null), Error: (null)

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)  --- End of Daemon object info ---

09/07/21 15:36:19 (fd:5) (pid:92) (D_ALWAYS) Can't find address for credd CM02.XXXXXXXX.com

09/07/21 15:36:19 (fd:5) (pid:92) (D_COMMAND) Daemon::startCommand(CREDD_GET_PASSWD,...) making connection to NULL

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Destroying Daemon object:

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) Type: 13 (credd), Name: CM02.XXXXXXXX.com, Addr: (null)

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) FullHost: CM02.XXXXXXXX.com, Host: (null), Pool: (null), Port: -1

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME) IsLocal: N, IdStr: (null), Error: Can't find address for credd CM02.XXXXXXXX.com

09/07/21 15:36:19 (fd:5) (pid:92) (D_HOSTNAME)  --- End of Daemon object info ---

09/07/21 15:36:19 (fd:5) (pid:92) (D_ALWAYS) ERROR: Could not locate valid credential for user 'lpalmer@XXXXXXXX'

 

Have I missed something in the configuration? Is it the CREDD_PORT you were referring to that I need to add?

 

Thanks,

 

Lachlan

 

From: Lachlan Palmer
Sent: Tuesday, September 7, 2021 8:07 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: RE: Condor Credd for multiple pools

 

Cheers TJ and Greg

 

Both of these options are great.

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Hitchen, Greg (IM&T, Kensington WA)
Sent: Sunday, September 5, 2021 5:21 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Condor Credd for multiple pools

 

Hi Lachlan

 

We have 9 separate pools of windows execute nodes, each with linux central managers.

 

We have all our submit nodes in one of those pools. We also have a standalone windows credd machine in that same pool.

 

So that all windows execute nodes in all the pools can also see the credd machine, it’s configuration for CONDOR_HOST points to multiple pools:

 

CONDOR_HOST = pool1.xxx.xxx, pool2.xxx.xxx, pool2.xxx.xxx, etc.

 

That way the central managers in every pool will know about the credd machine.

 

I will point out that we have not gone production on this yet, but we have tested it and everything seems to work OK.

 

Cheers

 

Greg

 

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of John M Knoeller
Sent: Saturday, 4 September 2021 6:35 AM
To: 'HTCondor-Users Mail List' <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Condor Credd for multiple pools

 

There is no particular need to have the condor_credd running on the same machine as the condor_collector.   

The central manager does not need to know about the Credd at all.  The Schedd and the execute nodes need to know how to locate it, but the central manager does not.

 

If you have only a single Schedd, you might consider running a single condor_credd on that machine.  Otherwise

you can run the condor_credd on any machine you choose.

 

If you have a domain controller or active directory, you might consider running the condor_credd on that machine. 

 

You just need to set the CREDD_HOST configuration variable on all of the Schedd and Execute nodes to point to the machine  where the condor_credd is running.  If you use a dedicated CREDD_PORT, make sure to include that in the value of the CREDD_HOST 

 

-tj

 

 


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Lachlan Palmer <LPalmer@xxxxxxxxxxxx>
Sent: Friday, September 3, 2021 11:59 AM
To: 'HTCondor-Users Mail List' <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Condor Credd for multiple pools

 

Hi All,

 

I am running into issues with running jobs in different pools. We have three pools of Windows machines with their own central manager running a condor_credd daemon. Everything works fine when submitting jobs within the pool the submit node is in but when you launch jobs to another pool then there is a match failure on the job’s LocalCredd pointing to the submit node’s central manager while the LocalCredd for the execute nodes in the other pool being that pool’s central manager.

 

What is the recommended configuration in this case? Should we just pick one of the pools central manager to be the sole condor_credd? What is the appropriate way to configure the other central managers to point to this credd? Is it simply the same as for the submit and execute config files?

 

For more information here is our condor_credd config lines for a central manager (CM01):

## CREDD logging settings

## Customize these if you wish.

CREDD_LOG = $(LOG)/CreddLog

CREDD_DEBUG = D_COMMAND

MAX_CREDD_LOG = 50000000

 

# Timeout session quickly since we normally only get contacted

# once per starter

SEC_CREDD_SESSION_TIMEOUT = 10

 

# Set security settings so that full security to the credd is required

CREDD.SEC_DEFAULT_AUTHENTICATION = REQUIRED

CREDD.SEC_DEFAULT_ENCRYPTION = REQUIRED

CREDD.SEC_DEFAULT_INTEGRITY = REQUIRED

CREDD.SEC_DEFAULT_NEGOTIATION = REQUIRED

 

# Require PASSWORD auth for password fetching

CREDD.SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD

 

# Only honor password fetch requests to the trusted "condor_pool" user

CREDD.ALLOW_DAEMON = condor_pool@$(UID_DOMAIN)

 

# Require NTSSPI for storing credentials

CREDD.SEC_DEFAULT_AUTHENTICATION_METHODS = NTSSPI

 

CREDD_HOST = $(CONDOR_HOST)

CREDD_CACHE_LOCALLY = True

 

And for the execute and submit config:

CREDD_HOST = CM01.XXXXXXX.com

CREDD_CACHE_LOCALLY = True

STARTER_ALLOW_RUNAS_OWNER = True

 

Thanks,

 

Lachlan

This communication (both the message and any attachments or links) is confidential and only intended for the use of the person or persons to whom it is addressed unless we have expressly authorized otherwise. It also may contain information that is protected by solicitor-client privilege. If you are reading this communication and are not an addressee or authorized representative of an addressee, we hereby notify you that any distribution, copying or other use of it without our express authorization is strictly prohibited. If you have received this communication in error, please delete both the message and any attachments from your system and notify us immediately by e-mail or phone. In addition, we note that this communication and its transmission of data have not been secured by encryption. Therefore, we are not able to confirm or guarantee that the communication has not been intercepted, amended, or read by an unintended third party.

This communication (both the message and any attachments or links) is confidential and only intended for the use of the person or persons to whom it is addressed unless we have expressly authorized otherwise. It also may contain information that is protected by solicitor-client privilege. If you are reading this communication and are not an addressee or authorized representative of an addressee, we hereby notify you that any distribution, copying or other use of it without our express authorization is strictly prohibited. If you have received this communication in error, please delete both the message and any attachments from your system and notify us immediately by e-mail or phone. In addition, we note that this communication and its transmission of data have not been secured by encryption. Therefore, we are not able to confirm or guarantee that the communication has not been intercepted, amended, or read by an unintended third party.