[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd name



On 4/3/2020 10:36 AM, David Schultz wrote:
Hi all,

We've started running glideins inside singularity, and have noticed that because they always have the same PID multiple startds on a single host will have the same name. To work around this, we've tried setting the STARTD_NAME classad, but it seems not to have an effect. Is this classad broken?


Hi David,

Not sure why you mention the PID of the startd above, as the default used by the startd to name the machine (slot) classads is the fully qualified host name of the server.... it has nothing to do with the pid. But yes, if you are running multiple startds on the same server, a condor_config file (or environment variable) will need to specify an alternative value for STARTD_NAME. If your singularity container is also starting a condor_master, you will also want to customize MASTER_NAME.

In addition, each instance of HTCondor running on the same server will need their own LOCAL_DIR path. The LOCAL_DIR, specified in the default condor_config that ships with HTCondor, is used to create the file path for the LOG, EXECUTE, SPOOL, and LOCK subdirectories, and these subdirectories cannot be shared across multiple instances of HTCondor running on the same server. Some of the files HTCondor will create in these sub-directories is indeed based on the PID, so perhaps this is why you mentioned PID collisions above.

Also, you may find it useful to check out (and maybe contribute?) our work to package an HTCondor execute node into a container. Take a look at
  https://hub.docker.com/r/htcondor/execute

Finally, be aware that an HTCondor v8.8.x+ startd running on the server OS (i.e. not inside a container) has the ability to launch every job inside of a Singularity container. There are certainly reasons why you may want the Startd inside of a container as well, but if you primary goal is to place each job into its own container, running the HTCondor service outside of any container and using the Singularity support built into newer releases of HTCondor may be the superior solution (for one thing, it will ensure each job is isolated in its own container...).

Hope the above helps,
Todd


We're currently using condor version 8.6.1. While this is an older version, I don't see any obvious changes around this in newer versions.

Thanks for any insight you can provide.

David Schultz

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685