[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Startd fails on fresh install due to missing execute directory



Hello,

Thanks for your reply. This is quite likely what happens as /etc/passwd still contained the following line:

 

condor:x:115:119:HTCondor Daemons,,,:/var/lib/condor:/usr/sbin/nologin

 

After further testing:

 

$ apt-get -y remove --purge htcondor && apt-get -y autoremove --purge && rm -fr /etc/condor

-> deletes /var/lib/condor

-> does not delete existing condor user

 

Condor installation (get_htcondor or apt-get install)

-> will not recreate /var/lib/condor if condor user already defined in /etc/passwd

 

 

From: Tim Theisen <tim@xxxxxxxxxxx>
Sent: Thursday, October 5, 2023 10:57 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Fabrice Bouye <FabriceB@xxxxxxx>
Subject: Re: [HTCondor-users] Startd fails on fresh install due to missing execute directory

 

We test installing HTCondor on Ubuntu and have not seen that problem.

The installation script creates the condor account with a home directory of /var/lib/condor.

Is it possible that the condor account already exists with a different home directory. That would be one possible explanation for the failure.

Let me know. I can add some defensive coding for installations where condor's default home directory has been changed.

...Tim

On 10/4/23 17:50, Fabrice Bouye wrote:

Hello,

I am in the process of trying to update an Ubuntu 20 8.8.x flock to 23.0.  

 

I am not sure if this is new or well-known issue but there appears to be a problem with HTCondor install for 23.0, possibly 10.x and 10.0 too.

I haven’t tried 9.0, 9.x or 23.x. My 8.8.x are old and but I do not remember experiencing this issue when doing fresh 8.8.x installs.

As part of the testing, I tried directly upgrading from 8.8.x, but I also tried fresh installs using both the get_condor script and a manual install.

In both the later cases the previous install was stripped using the command line from the get_condor script: apt-get -y remove --purge htcondor && apt-get -y autoremove --purge && rm -fr /etc/condor (basically what get_condor suggests to do to remove older installs).

 

On Ubuntu 20, this issue seems to appear all the time when installing using get_condor (minicondor or other roles) or manually with apt-get install htcondor or apt-get install minicondor

When installing on a condor-free system, while /etc/condor is recreated,  it seems that the root of the execute directory /var/lib/condor is not recreated by the installation process.

This issue does not occur when updating from 8.8.x of course as the /var/lib/condor directory already exists.

 

This path is defined in the default /etc/condor/condor_config that is deployed by the installation:

LOCAL_DIR = /var

 

[…]

 

EXECUTE = $(LOCAL_DIR)/lib/condor/execute

 

This issue makes stard fail in loop on execute machines (see logs below) when starting condor, probably due to the fact that the condor user cannot create a directory in /var/lib.

The fix for this is for root to create the missing directory /var/lib/condor, startd will recreate the execute sub-directory belonging to the condor user next time condor is restarted.

 

$ cd /var/lib
$ mkdir condor

$ chmod 755 condor

 

In /var/log/condor/MasterLog:

10/05/23 10:09:11 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 69386

10/05/23 10:09:11 Daemons::StartAllDaemons all daemons were started

10/05/23 10:09:14 The STARTD (pid 69386) exited with status 4

10/05/23 10:09:14 Sending obituary for "/usr/sbin/condor_startd"

10/05/23 10:09:14 restarting /usr/sbin/condor_startd in 10 seconds

[…] loops from here

 

In /var/log/condor/StartLog:

10/05/23 10:10:42 ******************************************************

10/05/23 10:10:42 ** condor_startd (CONDOR_STARTD) STARTING UP

10/05/23 10:10:42 ** /usr/sbin/condor_startd

10/05/23 10:10:42 ** SubsystemInfo: name=STARTD type=STARTD(6) class=DAEMON(1)

10/05/23 10:10:42 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON

10/05/23 10:10:42 ** $CondorVersion: 23.0.0 2023-09-29 BuildID: 678686 PackageID: 23.0.0-1 $

10/05/23 10:10:42 ** $CondorPlatform: X86_64-Ubuntu_20.04 $

10/05/23 10:10:42 ** PID = 70350

10/05/23 10:10:42 ** Log last touched 10/5 10:10:17

10/05/23 10:10:42 ******************************************************

10/05/23 10:10:42 Using config source: /etc/condor/condor_config

10/05/23 10:10:42 Using local config sources:

10/05/23 10:10:42    /etc/condor/config.d/00-htcondor-9.0.config

10/05/23 10:10:42    /etc/condor/condor_config.local

10/05/23 10:10:42 config Macros = 93, Sorted = 93, StringBytes = 2628, TablesBytes = 3404

10/05/23 10:10:42 CLASSAD_CACHING is ENABLED

10/05/23 10:10:42 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS

10/05/23 10:10:42 SharedPortEndpoint: waiting for connections to named socket startd_69340_0d0f

10/05/23 10:10:42 DaemonCore: command socket at <192.168.8.246:9618?addrs=192.168.8.246-9618&alias=suvofpcand20.corp.spc.int&noUDP&sock=startd_69340_0d0f>

10/05/23 10:10:42 DaemonCore: private command socket at <192.168.8.246:9618?addrs=192.168.8.246-9618&alias=suvofpcand20.corp.spc.int&noUDP&sock=startd_69340_0d0f>

10/05/23 10:10:45 VM universe will be tested to check if it is available

10/05/23 10:10:45 History file rotation is enabled.

10/05/23 10:10:45   Maximum history file size is: 20971520 bytes

10/05/23 10:10:45   Number of rotated history files is: 2

10/05/23 10:10:45 Startd will not enforce disk limits via logical volume management.

10/05/23 10:10:45 Failed to stat /var/lib/condor/execute: (errno 2) No such file or directory

10/05/23 10:10:45 ERROR "Error accessing execute directory /var/lib/condor/execute specified in the configuration setting SLOT1_EXECUTE: (errno=2) No such file or directory" at line 78 in file /var/lib/condor/execute/slot1/dir_3182108/userdir/build-I2xw6a/condor-23.0.0/src/condor_startd.V6/slot_builder.cpp



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736