[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Several HTCondor startd per node (pilot jobs)


> Le 20 janv. 2021 Ã 21:18, Todd L Miller <tlmiller@xxxxxxxxxxx> a Ãcrit :
>> What is the advised configuration to allow for several condor_startd daemons on each execution node (e.g. several pilot jobs per node)?
>> I assume this is done by using some random port between the condor_startd and the collector, but I could not figure out how to do this.
> 	Not sure what you're asking about here.  The root HTCondor configuration doesn't change if it's running pilots, and jobs can't use the root HTCondor's shared port daemon.  What's the specific problem you're seeing?

Sorry if was not clear, I was talking about pilot jobs executing HTCondor daemon to join an external HTCondor pool managed by a VO.
These pilot jobs run on a different clusters that can be managed by a different batch system than HTCondor.

> 	For most firewalls, you just have the pilot use its usual collector for CCB and that's all you need to do.

In case of several pilots running on the same node, their shared port daemon canât listen on the same port:

Sock::bind failed: errno = 98 Address already in use
Failed to listen(34339) on TCP/IPv4 command socket. Does this computer have IPv4 support?
Warning: Failed to create IPv4 command socket for ports 34339/34339no UDP.

I disabled the shared port daemon and it works. I think anyway this daemon is not needed anymore since everything goes through the CCB, is it correct?


> - ToddM
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature