[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd name



Hi Todd,

On Fri, Apr 3, 2020 at 11:30 AM Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 4/3/2020 10:36 AM, David Schultz wrote:
> Hi all,
>
> We've started running glideins inside singularity, and have noticed that because they always have the same PID multiple
> startds on a single host will have the same name. To work around this, we've tried setting the STARTD_NAME classad, but
> it seems not to have an effect. Is this classad broken?
>

Hi David,

Not sure why you mention the PID of the startd above, as the default used by the startd to name the machine (slot)
classads is the fully qualified host name of the server.... it has nothing to do with the pid.

The collector normally sees startds registered as <slot>@<pid>@hostname, for example:
slot1@10563@cobalt01.icecube.wisc.edu

I'm not sure where that comes from, but it's nothing I did. Maybe because we're starting them as a user, instead of as root?
Â
But yes, if you are
running multiple startds on the same server, a condor_config file (or environment variable) will need to specify an
alternative value for STARTD_NAME. ÂIf your singularity container is also starting a condor_master, you will also want
to customize MASTER_NAME.

In addition, each instance of HTCondor running on the same server will need their own LOCAL_DIR path. The LOCAL_DIR,
specified in the default condor_config that ships with HTCondor, is used to create the file path for the LOG, EXECUTE,
SPOOL, and LOCK subdirectories, and these subdirectories cannot be shared across multiple instances of HTCondor running
on the same server. Some of the files HTCondor will create in these sub-directories is indeed based on the PID, so
perhaps this is why you mentioned PID collisions above.

No, that shouldn't be an issue. They always get their own directories to start in.
Â

Also, you may find it useful to check out (and maybe contribute?) our work to package an HTCondor execute node into a
container. Take a look at
 Âhttps://hub.docker.com/r/htcondor/execute

That does look interesting. Do you know if it will run in singularity as well as docker?
Â
Finally, be aware that an HTCondor v8.8.x+ startd running on the server OS (i.e. not inside a container) has the ability
to launch every job inside of a Singularity container. There are certainly reasons why you may want the Startd inside
of a container as well, but if you primary goal is to place each job into its own container, running the HTCondor
service outside of any container and using the Singularity support built into newer releases of HTCondor may be the
superior solution (for one thing, it will ensure each job is isolated in its own container...).

The main reason to run HTCondor itself inside a container is because the underlying OS is strange, in that it is not a normal RHEL or Ubuntu based distro. Some sites think building their own distro is a great thing to do; we disagree.

David

Â

Hope the above helps,
Todd


> We're currently using condor version 8.6.1. While this is an older version, I don't see any obvious changes around this
> in newer versions.
>
> Thanks for any insight you can provide.
>
> David Schultz
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing ÂDepartment of Computer Sciences
HTCondor Technical Lead        1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132Â Â Â Â Â Â Â Â Â Madison, WI 53706-1685