[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] htcondor/execute container cannot connect to central manager



You say

 

> container-based execution does not spawn the required processes

 

Does that mean that the HTCondor daemons aren’t running at all? 

 

But then you post log snippets that seem to show that the daemons are in fact running. 

 

09/19/23 11:14:11 (D_ALWAYS) DaemonCore: command socket at <192.168.56.101:33407?addrs=192.168.56.101-33407&alias=wor>

The IP address 192.168.56.101 makes me a bit concerned, since I know that is one of the private IP ranges.   Is your Central Manager also on the 192.168.56 subnet?

 

You also show some messages from the hibernation daemon in red, but I don’t see anything that that suggests a failure of any kind.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Salkhordehhaghighi, Reza
Sent: Friday, September 22, 2023 5:02 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] htcondor/execute container cannot connect to central manager

 

Hi all,

 

I have been trying to use the htcondor/execute container to connect to a central manager with minimum config. After many attempts, container-based execution does not spawn the required processes. Running the execute using normal service works, but giving the same config to the htcondor/execute does not work. 


Here is the command I use. I gave the exact same config as my working service to the container. Using the examples in https://github.com/htcondor/htcondor/tree/master/build/docker/services also doesn't work. I have used both el8 and ubuntu containers, both not working.

 

docker run --rm --network host --env-file=env --name condor -v /etc/condor:/etc/condor htcondor/execute

 

cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)

 

 

Here is the log file when using the container:

 

root@worker0:/var/log/condor# cat StartLog
09/19/23 10:07:25 (D_ALWAYS:2) Result of reading /etc/issue:  Ubuntu 20.04.4 LTS \n \l
 
09/19/23 10:07:25 (D_ALWAYS:2) Using IDs: 1 processors, 1 CPUs, 0 HTs
09/19/23 10:07:25 (D_ALWAYS:2) Reading condor configuration from '/etc/condor/condor_config'
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: lo 127.0.0.1 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s3 10.0.2.15 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s8 192.168.56.101 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: docker0 172.17.0.1 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: lo ::1 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s3 fe80::a00:27ff:fe5c:373e up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s8 fe80::4f26:c:cb9d:5de4 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: docker0 fe80::42:56ff:fe11:aeef up
09/19/23 10:07:25 (D_ALWAYS) ******************************************************
09/19/23 10:07:25 (D_ALWAYS) ** condor_startd (CONDOR_STARTD) STARTING UP
09/19/23 10:07:25 (D_ALWAYS) ** /usr/sbin/condor_startd
09/19/23 10:07:25 (D_ALWAYS) ** SubsystemInfo: name=STARTD type=STARTD(6) class=DAEMON(1)
09/19/23 10:07:25 (D_ALWAYS) ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
09/19/23 10:07:25 (D_ALWAYS) ** $CondorVersion: 10.1.1 2022-11-10 BuildID: 612938 PackageID: 10.1.1-1.1 RC $
09/19/23 10:07:25 (D_ALWAYS) ** $CondorPlatform: X86_64-Ubuntu_20.04 $
09/19/23 10:07:25 (D_ALWAYS) ** PID = 1
09/19/23 10:07:25 (D_ALWAYS) ** Log last touched time unavailable (No such file or directory)
09/19/23 10:07:25 (D_ALWAYS) ******************************************************
09/19/23 10:07:25 (D_ALWAYS) Using config source: /etc/condor/condor_config
09/19/23 10:07:25 (D_ALWAYS) Using local config sources:
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/config.d/01-env.conf
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/config.d/02-execute.config
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/config.d/10-stash-plugin.conf
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/condor_config.local
09/19/23 10:07:25 (D_ALWAYS) config Macros = 71, Sorted = 71, StringBytes = 1912, TablesBytes = 2620
09/19/23 10:07:25 (D_ALWAYS) CLASSAD_CACHING is ENABLED
09/19/23 10:07:25 (D_ALWAYS) Daemon Log is logging: D_ALWAYS:2 D_ERROR D_STATUS
09/19/23 10:07:25 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 10:07:25 (D_ALWAYS) Daemoncore: Listening at <0.0.0.0:44747> on TCP (ReliSock) and UDP (SafeSock).
09/19/23 10:07:25 (D_ALWAYS) DaemonCore: command socket at <192.168.56.101:44747?addrs=192.168.56.101-44747&alias=worker0>
09/19/23 10:07:25 (D_ALWAYS) DaemonCore: private command socket at <192.168.56.101:44747?addrs=192.168.56.101-44747&alias=worker0>
09/19/23 10:07:25 (D_ALWAYS:2) Setting maximum accepts per cycle 8.
09/19/23 10:07:25 (D_ALWAYS:2) Setting maximum UDP messages per cycle 100.
09/19/23 10:07:25 (D_ALWAYS:2) Will use TCP to update collector <192.168.56.1:9618>
09/19/23 10:07:25 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 10:07:25 (D_ALWAYS:2) Memory: Detected 1024 megs RAM
09/19/23 10:07:25 (D_ALWAYS:2) Found interface enp0s8 that matches <192.168.56.101:0>
09/19/23 10:07:25 (D_ALWAYS:2) Found interface enp0s8 with ip 192.168.56.101
09/19/23 10:07:25 (D_ALWAYS:2) enp0s8 supports Wake-on: no (raw: 0x00)
09/19/23 10:07:25 (D_ALWAYS:2) enp0s8 enabled Wake-on: no (raw: 0x00)
09/19/23 10:07:25 (D_ALWAYS:2) Using network interface enp0s8 for hibernation

====================================================================================================

And here is the log file of when using standard service. The red lines are not written in the container log above, so I suspect something is stuck at this stage.

 

09/19/23 11:14:11 (D_ALWAYS:2) Result of reading /etc/issue:  \S

09/19/23 11:14:11 (D_ALWAYS:2) Result of reading /etc/redhat-release:  AlmaLinux release 9.2 (Turquoise Kodkod)

09/19/23 11:14:11 (D_ALWAYS:2) Using IDs: 1 processors, 1 CPUs, 0 HTs
09/19/23 11:14:11 (D_ALWAYS:2) Reading condor configuration from '/etc/condor/condor_config'
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: lo 127.0.0.1 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s3 10.0.2.15 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s8 192.168.56.101 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: docker0 172.17.0.1 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: lo ::1 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s3 fe80::a00:27ff:fe5c:373e up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s8 fe80::4f26:c:cb9d:5de4 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: docker0 fe80::42:56ff:fe11:aeef up
09/19/23 11:14:11 (D_ALWAYS) ******************************************************
09/19/23 11:14:11 (D_ALWAYS) ** condor_startd (CONDOR_STARTD) STARTING UP
09/19/23 11:14:11 (D_ALWAYS) ** /usr/sbin/condor_startd
09/19/23 11:14:11 (D_ALWAYS) ** SubsystemInfo: name=STARTD type=STARTD(6) class=DAEMON(1)
09/19/23 11:14:11 (D_ALWAYS) ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
09/19/23 11:14:11 (D_ALWAYS) ** $CondorVersion: 10.7.0 2023-07-31 BuildID: 665155 PackageID: 10.7.0-1 $
09/19/23 11:14:11 (D_ALWAYS) ** $CondorPlatform: x86_64_AlmaLinux9 $
09/19/23 11:14:11 (D_ALWAYS) ** PID = 334599
09/19/23 11:14:11 (D_ALWAYS) ** Log last touched time unavailable (No such file or directory)
09/19/23 11:14:11 (D_ALWAYS) ******************************************************

9/19/23 11:14:11 (D_ALWAYS) Using config source: /etc/condor/condor_config
09/19/23 11:14:11 (D_ALWAYS) Using local config sources:
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/config.d/01-env.conf
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/config.d/02-execute.config
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/config.d/10-stash-plugin.conf
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/condor_config.local
09/19/23 11:14:11 (D_ALWAYS) config Macros = 73, Sorted = 73, StringBytes = 2019, TablesBytes = 2692
09/19/23 11:14:11 (D_ALWAYS) CLASSAD_CACHING is ENABLED
09/19/23 11:14:11 (D_ALWAYS) Daemon Log is logging: D_ALWAYS:2 D_ERROR D_STATUS
09/19/23 11:14:11 (D_ALWAYS:2) Internal pipe for signals resized to 4096 from 65536
09/19/23 11:14:11 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 11:14:11 (D_ALWAYS) Daemoncore: Listening at <0.0.0.0:33407> on TCP (ReliSock) and UDP (SafeSock).
09/19/23 11:14:11 (D_ALWAYS) DaemonCore: command socket at <192.168.56.101:33407?addrs=192.168.56.101-33407&alias=wor>
09/19/23 11:14:11 (D_ALWAYS) DaemonCore: private command socket at <192.168.56.101:33407?addrs=192.168.56.101-33407&a>
09/19/23 11:14:11 (D_ALWAYS:2) Setting maximum accepts per cycle 8.
09/19/23 11:14:11 (D_ALWAYS:2) Setting maximum UDP messages per cycle 100.
09/19/23 11:14:11 (D_ALWAYS:2) Will use TCP to update collector <192.168.56.1:9618>
09/19/23 11:14:11 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 11:14:11 (D_ALWAYS:2) Memory: Detected 1024 megs RAM
09/19/23 11:14:11 (D_ALWAYS:2) Found interface enp0s8 that matches <192.168.56.101:0>
09/19/23 11:14:11 (D_ALWAYS:2) Found interface enp0s8 with ip 192.168.56.101
09/19/23 11:14:11 (D_ALWAYS:2) enp0s8 supports Wake-on: yes (raw: 0x2e)
09/19/23 11:14:11 (D_ALWAYS:2) enp0s8 enabled Wake-on: no (raw: 0x00)
09/19/23 11:14:11 (D_ALWAYS:2) Using network interface enp0s8 for hibernation
09/19/23 11:14:11 (D_ALWAYS:2) Initially invoking hibernation plugin '/usr/libexec/condor/condor_power_state ad'
09/19/23 11:14:11 (D_ALWAYS:2) Detected hibernation states: S3,S4,S5
09/19/23 11:14:18 (D_ALWAYS) VM universe will be tested to check if it is available
09/19/23 11:14:18 (D_ALWAYS) History file rotation is enabled.

Kind regards,

Reza