[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor Schedd and Condor workers in docker containers on separate hosts



Yeah, that would be interesting to find out. I suspect it can be done without net=host but haven't actually tried it yet. using the shared ports helps in other parts of the system so might help compute nodes too. If you do get it working, please let me know. :)

I have not tried the bridge driver or rancher. I've had good luck with flannel though, which I think rancher supports? So if you get stuck on that driver, that might be another option to try too.

Thanks,
Kevin

From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Stephen Chan [sychan@xxxxxxx]
Sent: Thursday, April 12, 2018 9:58 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Condor Schedd and Condor workers in docker containers on separate hosts

Hi Kevin,
   We're running things on-premise using docker-compose and rancher, but I'm trying to figure out if we really need to use "net=host" to get things to work right. In the production environment that isn't a problem, but in our testing/CI environments we often run things using the bridge network driver.
   So far it looks like the host network driver is the only way to get this going.

On Wed, Apr 11, 2018 at 5:37 PM, Fox, Kevin M <Kevin.Fox@xxxxxxxx> wrote:
I'm running a htcondor instance successfully out of a Kubernetes managed cluster. Have had it in production for over a year. :)

Some notes that may or may not help in implementing your own:
I'm using k8s service discovery for the negotiator. COLLECTOR_HOST = $(CONDOR_HOST)?sock=collector, SCHEDD_NAME = schedd@. I plumbed in job draining into the k8s lifecycle hooks so I could do a safe rolling upgrade of all compute services. It works very well. I did use net=host though for the computes as in our particular case that matches up very well and then didn't need to watch node names so closely. I think you could probably use something like the k8s downward api and CONDOR_HOST variable to plumb that info through too though and not use net=host. I started the whole thing in k8s 1.3 though so it didn't have some of the features it does now.

Thanks,
Kevin

From: HTCondor-users [htcondor-users-bounces@cs.wisc.edu] on behalf of Stephen Chan [sychan@xxxxxxx]
Sent: Wednesday, April 11, 2018 5:23 PM
To: HTCondor-Users Mail List
Subject: [HTCondor-users] Condor Schedd and Condor workers in docker containers on separate hosts

Hello,
   Another question!

   We run an environment where all services run in docker containers, including all the daemons for the CONDOR_HOST as well as the startd on a condor worker.
   When all the containers run on the same docker host, and we can use the docker DNS service to set the name of the containers to fixed values (instead of the docker assigned IDs), everything works fine.
    When we have a condor worker on another host entirely connect to the services on the CONDOR_HOST container (through an exposed port in the docker config), what seems to happen is that the condor worker can advertise it's worker slots to the schedd, and we are able to see the slots. But when jobs are submitted, and the negotiator matches the jobs to the remote worker, they never get picked up and run.
    Has anyone tried this before? Is there a doc (or an old thread) that describes the network traffic that occurs for a startd to pickup jobs (or is it pushed by the negotiator?), and what hostnames need to line up? It looks kind of like a name resolution issue between the IP address that the daemons see and advertise from within their own containers, and what the actual IP addresses are on the LAN.
   I haven't spun up the packet sniffer yet to test my theory, but I've seen some threads where internal and external IP addresses in the StartdIpAddr and MyAddress have been an issue.

--
Steve Chan
KBase - Environ Genomics & Systems Biology
Lawrence Berkeley National Lab


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Steve Chan
KBase - Environ Genomics & Systems Biology
Lawrence Berkeley National Lab