[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor and docker - advise for a newbie

On Mon, Dec 11, 2017 at 12:40:56PM -0500, Larry Martell wrote:
> Just getting stared with condor and I am looking for some guidance.
> Currently I have 2 docker containers that are linked to each other.
> One has a crontab that runs many jobs throughout the day. Many of
> these jobs are multithreaded and/or fork off other processes. The jobs
> require services from both its own and the other container. My goal is
> to use HTCondor to distribute these jobs, threads, and forked
> processes across multiple machines. From reading the docs I think I
> need the docker universe for this. Is that correct? But how can I have
> condor start up both containers? It is possible to already have the
> containers running on the remote hosts and have condor invoke the jobs
> inside them?


I believe that the docker universe is probably unsuitable for this use
case, but it should be possible to do what you want by way of vanilla
universe jobs -- with the caveat that HTCondor's resource tracking will
probably not work as you expect. It may also be possible to run HTCondor
startds within your existing containers as a way of scheduling jobs to

First, re: the docker universe. By design, it does not expose every
potential feature of Docker; it's designed to be a way of specifying an
environment to run a job in, and a way to isolate that job from the
surrounding host, and not really more. Notably for your use case, it
does not (as far as I'm aware) support docker's links or networking
features, nor would it allow running jobs inside an already-running
container. Basically, it's a good way to specify that you want the job
to run on Debian with X, Y, and Z packages installed, but not to specify
connected network resources, other processes, etc.

On to the parts which might help solve your case:

* Use the vanilla universe, but sacrifice HTCondor's resource tracking:
  You can run a vanilla universe job and write a script that calls out
  to 'docker run', 'docker exec', etc., so long as the user the job will
  run as is allowed to run docker. If you wanted to have the job start
  up the prerequisite containers, it could do so in the script, or you
  could set up your nodes to have the containers already running and
  then use 'docker exec' to run things within the containers. However,
  only the actual 'docker run' or 'docker exec' process (and thus not
  the containers themselves or the processes being run within them) will
  fall within HTCondor's jurisdiction, due to how Docker works. There's
  some funny potential ways to change this which probably aren't that
  advisable unless you're really attached to having HTCondor's resource
  tracking work as expected. (Specifically, if anyone needs to go down
  this road: with 'docker run' you can pass a cgroup parent, so with
  HTCondor cgroup-based tracking you can determine the parent script's
  cgroup (the htcondor-created one) and pass it as the parent to the
  docker container. However, you need to also pass down the resource
  constraints, probably slightly smaller than the slot -- if not, the
  wrapper script will get killed off but the container will persist,
  from the testing of this approach I've done)
* Run a startd inside the container:
  Instead of using a script from outside the container to run things
  within the container, you could instead run HTCondor itself inside a
  container where the environment you want is available, and have your
  jobs be routed there. To do so, you'd need to construct an appropriate
  configuration file -- most likely, you would turn on the shared port
  daemon, expose its port to the outside world when running the docker
  container, and use TCP_FORWARDING_HOST to specify the surrounding
  host's IP as the appropriate place to connect to. If you're running
  more than these jobs in your HTCondor cluster, you'll probably want to
  which identifies these special slots as inside the docker container,
  and add that as a requirement on your job, and set up the START
  expression of these slots to refuse jobs which don't explicitly
  request them.

Hopefully what I'm saying makes sense. The first option is most likely
easier to implement, and the second is arguably cleaner but more finicky
to set up.

> Thanks!
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/