[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor Schedd and Condor workers in docker containers on separate hosts



Hello,
 ÂAnother question!

 ÂWe run an environment where all services run in docker containers, including all the daemons for the CONDOR_HOST as well as the startd on a condor worker.
 ÂWhen all the containers run on the same docker host, and we can use the docker DNS service to set the name of the containers to fixed values (instead of the docker assigned IDs), everything works fine.
  When we have a condor worker on another host entirely connect to the services on the CONDOR_HOST container (through an exposed port in the docker config), what seems to happen is that the condor worker can advertise it's worker slots to the schedd, and we are able to see the slots. But when jobs are submitted, and the negotiator matches the jobs to the remote worker, they never get picked up and run.
  Has anyone tried this before? Is there a doc (or an old thread) that describes the network traffic that occurs for a startd to pickup jobs (or is it pushed by the negotiator?), and what hostnames need to line up? It looks kind of like a name resolution issue between the IP address that the daemons see and advertise from within their own containers, and what the actual IP addresses are on the LAN.
 ÂI haven't spun up the packet sniffer yet to test my theory, but I've seen some threads where internal and external IP addresses in the StartdIpAddr and MyAddress have been an issue.

--
Steve Chan
KBase -ÂEnviron Genomics & Systems Biology
Lawrence Berkeley National Lab