[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] credmon not starting in 23.5.0 on RHEL8



Hi Jason,

thanks a lot for your reply!

. The CREDD and CREDMON_OAUTH are not listed in the DAEMON_LIST.
The output of "condor_config_val DAEMON_LIST" only shows:
MASTER, SCHEDD, ADSTASH

In my configuration below [6], the daemons are listed in that way:
DAEMON_LIST = $(DAEMON_LIST), CREDD, CREDMON_OAUTH

. The CREDD shows up in the output of "condor_who -quick", but not the CREDMON_OAUTH:

DSTASH = "WaitForStartup"
ADSTASH_PID = 0
CREDD = "Alive"
CREDD_Addr = "<192.108.45.8:9620>"
CREDD_PID = 2395446
IsReady = false

MASTER = "Alive"
MASTER_Addr = "<192.108.45.8:9618?addrs=192.108.45.8-9618+[2a00-139c-3-2e5-0-21-d2-6c]-9618&alias=c4p-login-dev.gridka.de&noUDP&sock=master_2395386_2283>"
MASTER_PID = 2395386
NumAlive = 4
NumDaemons = 5
NumDead = 0
NumHold = 0
NumHung = 0
NumStartup = 1
SCHEDD = "Alive"
SCHEDD_Addr = "<192.108.45.8:9618?addrs=192.108.45.8-9618+[2a00-139c-3-2e5-0-21-d2-6c]-9618&alias=c4p-login-dev.gridka.de&noUDP&sock=schedd_2395386_2283>"
SCHEDD_PID = 2395443
SHARED_PORT = "Alive"
SHARED_PORT_Addr = "<192.108.45.8:9618?noUDP&sock=self>"
SHARED_PORT_PID = 2395440

. In the MasterLog, there is only a repetition of this block related to the condor adstash wrapper:

02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Setting maximum accepts per cycle 8.
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Setting maximum UDP messages per cycle 100.
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Will use TCP to update collector c4p-htcondor.gridka.de <192.108.45.28:9618?alias=c4p-htcondor.gridka.de>
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS) Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS) Adding CREDD to DAEMON_LIST. This machine is running a SCHEDD and AUTO_INCLUDE_CREDD_IN_DAEMON_LIST is TRUE)
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) enter Daemons::CheckForNewExecutable
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Time stamp of running /usr/sbin/condor_master: 1708004804
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) GetTimeStamp returned: 1708004804
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS) Reconfiguring all managed daemons.
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Send_Signal(): Doing kill(2395446,1) [SIGHUP]
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS) Sent SIGHUP to CREDD (pid 2395446)
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Send_Signal(): Doing kill(2395443,1) [SIGHUP]
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS) Sent SIGHUP to SCHEDD (pid 2395443)
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Send_Signal(): Doing kill(2395440,1) [SIGHUP]
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS) Sent SIGHUP to SHARED_PORT (pid 2395440)
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) enter Daemons::UpdateCollector
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Trying to update collector <192.108.45.28:9618?alias=c4p-htcondor.gridka.de>
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) Attempting to send update via TCP to collector c4p-htcondor.gridka.de <192.108.45.28:9618?alias=c4p-htcondor.gridka.de>
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) File descriptor limits: max 32768, safe 26215
02/15/24 16:48:54 (pid:2395386) (D_ALWAYS:2) exit Daemons::UpdateCollector
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS:2) ::RealStart; ADSTASH > 02/15/24 16:49:16 (pid:2395386) (D_ALWAYS:2) start recover timer (415)
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS) Started process "/opt/condor/py3venv/condor_adstash_wrapper.sh", pid and pgroup = 2398272
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS:2) enter Daemons::UpdateCollector
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS:2) Trying to update collector <192.108.45.28:9618?alias=c4p-htcondor.gridka.de>
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS:2) Attempting to send update via TCP to collector c4p-htcondor.gridka.de <192.108.45.28:9618?alias=c4p-htcondor.gridka.de>
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS:2) exit Daemons::UpdateCollector
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS) PERMISSION DENIED to root@xxxxxxxxx from host 192.108.45.8 for command 60043 (DC_SET_READY), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 192.108.45.8,c4p-login-dev.gridka.de, hostname size = 1, original ip address = 192.108.45.8
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS) DC_AUTHENTICATE: Command not authorized, done!
02/15/24 16:49:16 (pid:2395386) (D_ERROR) The ADSTASH (pid 2398272) exited with status 1
02/15/24 16:49:16 (pid:2395386) (D_ALWAYS) restarting /opt/condor/py3venv/condor_adstash_wrapper.sh in 60 seconds

. In the CredLog, I have some information concerning the CREDMON:

02/15/24 16:56:45 (pid:2395446) (D_ALWAYS:2) CREDD: calling and resetting sweep_timer_handler()
02/15/24 16:56:45 (pid:2395446) (D_ALWAYS:2) CREDMON: scandir(/var/lib/condor/mytoken_credentials)
02/15/24 16:56:45 (pid:2395446) (D_ALWAYS:2) CREDMON: CRED_DIR: /var/lib/condor/mytoken_credentials, MARK: manuel_giffels.mark
02/15/24 16:56:45 (pid:2395446) (D_ALWAYS:2) CREDMON: File manuel_giffels.mark has mtime 1708012356 which is less than 3600 seconds old. Skipping...
02/15/24 16:56:45 (pid:2395446) (D_ALWAYS:2) CREDMON: CRED_DIR: /var/lib/condor/mytoken_credentials, MARK: condor.mark
02/15/24 16:56:45 (pid:2395446) (D_ALWAYS:2) CREDMON: File condor.mark has mtime 1708012356 which is less than 3600 seconds old. Skipping...

So my feedback is somewhat limited, sorry for that.

Thanks a lot again!

Cheers,
ben


On 15/02/2024 15:52, Jason Patton via HTCondor-users wrote:
Hi Ben,

A couple of diagnostics you can check...

Do you still see the CREDD and CREDMON_OAUTH listed if you run "condor_config_val DAEMON_LIST"?

Do the CREDD and CREDMON_OAUTH show up in the output of "condor_who -quick"? For example:

$ condor_who -quick
CREDD = "Alive"
CREDD_Addr = "<snipped>"
CREDD_PID = 799083
CREDMON_OAUTH = "Startup"
CREDMON_OAUTH_PID = 799082
...

Are there any hints in the MasterLog (/var/log/condor/MasterLog) that the credmon is being started and/or its status?

Jason

On Thu, Feb 15, 2024 at 3:40âAM Benoit Roland <benoit.roland@xxxxxxx> wrote:
Dear all,

I have compiled the HTCondor versionÂ23.5.0 using the x86_64_AlmaLinux8-23050000 container [1], adding to the existing code
some plugins to produce [2], monitor and refresh [3,4] Helmhotz AAI access tokens.

The credential monitor [4] is based on the abstract class [5].

While I can successfully run standalone the executables /usr/sbin/condor_producer_mytoken and /usr/sbin/condor_credmon_mytoken,
only the producer is run when sending an condor test job (sleep 1800). It seems like the credmon does not start to run.
Â
My configuration is given by [6].

The credmon used to run successfully before I migrate to 23.5.0.
I don't have anymore the details about the version I was using by then.

I also tried to run the OAUTH credmon, but here gain, the credmon does not start to run when submitting a condor test job.

The main changes wrt my previous code is to make it compliant with the 23.5.0 update of [5].

Running my credmon standalone, I can see that these changes seem to be applied successfully, the credmon is running fine and doing its job.

Would you have any clue about what I would miss?

Thanks a lot in advance for your help!

Cheers,
ben

[1] https://github.com/benoitroland/C4P-HTCondor/blob/devel_rhel8/c4p-condor-utils/build-c4p-condor.sh
[2] https://github.com/benoitroland/C4P-HTCondor/blob/devel_rhel8/src/condor_credd/condor_credmon_oauth/condor_producer_mytoken
[3] https://github.com/benoitroland/C4P-HTCondor/blob/devel_rhel8/src/condor_credd/condor_credmon_oauth/condor_credmon_mytoken
[4] https://github.com/benoitroland/C4P-HTCondor/blob/devel_rhel8/src/condor_credd/condor_credmon_oauth/credmon/CredentialMonitors/MytokenCredmon.py
[5] https://github.com/benoitroland/C4P-HTCondor/blob/devel_rhel8/src/condor_credd/condor_credmon_oauth/credmon/CredentialMonitors/AbstractCredentialMonitor.py
[6] DAEMON_LIST = $(DAEMON_LIST), CREDD, CREDMON_OAUTH

use feature : OAUTH

SEC_PROCESS_SUBMIT_TOKENS = True
SendCredential = True

CREDD_HOST = $(FULL_HOSTNAME)

SEC_DEFAULT_ENCRYPTION = REQUIRED

OAUTH_ISSUER_URL = https://login.helmholtz.de/oauth2/
OAUTH_ISSUER_NAME = helmholtz

MYTOKEN_ISSUER_URL = https://mytoken.data.kit.edu
MYTOKEN_PROFILE = kit/c4p-htcondor

CREDMON_OAUTH = /usr/sbin/condor_credmon_mytoken
CREDMON_OAUTH_DEBUG = D_FULLDEBUG:2

SEC_CREDENTIAL_DIRECTORY_OAUTH = /var/lib/condor/mytoken_credentials
SEC_ENCRYPTION_KEY_DIRECTORY = /etc/condor/encryption.d/ENCRYPTION-KEY

# period at which the credd is checking the remaining life time of stored credentials
CRED_CHECK_INTERVAL = 60

# period at which the collector is updated - default value 5 minutes
CREDD_UPDATE_INTERVAL = 60


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/