[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] startd name
- Date: Fri, 3 Apr 2020 11:30:03 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] startd name
On 4/3/2020 10:36 AM, David Schultz wrote:
We've started running glideins inside singularity, and have noticed that because they always have the same PID multiple
startds on a single host will have the same name.Â To work around this, we've tried setting the STARTD_NAME classad, but
it seems not to have an effect.Â Is this classad broken?
Not sure why you mention the PID of the startd above, as the default used by the startd to name the machine (slot)
classads is the fully qualified host name of the server.... it has nothing to do with the pid. But yes, if you are
running multiple startds on the same server, a condor_config file (or environment variable) will need to specify an
alternative value for STARTD_NAME. If your singularity container is also starting a condor_master, you will also want
to customize MASTER_NAME.
In addition, each instance of HTCondor running on the same server will need their own LOCAL_DIR path. The LOCAL_DIR,
specified in the default condor_config that ships with HTCondor, is used to create the file path for the LOG, EXECUTE,
SPOOL, and LOCK subdirectories, and these subdirectories cannot be shared across multiple instances of HTCondor running
on the same server. Some of the files HTCondor will create in these sub-directories is indeed based on the PID, so
perhaps this is why you mentioned PID collisions above.
Also, you may find it useful to check out (and maybe contribute?) our work to package an HTCondor execute node into a
container. Take a look at
Finally, be aware that an HTCondor v8.8.x+ startd running on the server OS (i.e. not inside a container) has the ability
to launch every job inside of a Singularity container. There are certainly reasons why you may want the Startd inside
of a container as well, but if you primary goal is to place each job into its own container, running the HTCondor
service outside of any container and using the Singularity support built into newer releases of HTCondor may be the
superior solution (for one thing, it will ensure each job is isolated in its own container...).
Hope the above helps,
We're currently using condor version 8.6.1.Â While this is an older version, I don't see any obvious changes around this
in newer versions.
Thanks for any insight you can provide.
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at:
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685