[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor and Docker



I've done something related which is run Condor in Docker using Kubernetes on GCE. Â

http://jimwhite.github.io/blog/2014/11/12/condor-in-a-container/

That is just a proof of concept to work out the mechanics of making it work and further development would be needed to make it generally useful. But I don't plan to continue with that since I'm at Google now. Native support for Docker in Condor is the way to go though of course.

Jim

On Tue, Apr 7, 2015 at 8:02 AM, Brian Candler <b.candler@xxxxxxxxx> wrote:
I'd like to know what's the current state of HTCondor with Docker. I see some notes at the end of
https://indico.cern.ch/event/320819/session/3/contribution/56/material/slides/0.pptx
but these may just be "wish list" as far as I can tell.

There are three different things I'm thinking of.

(1) Running a HTCondor worker node as a Docker container.

This should be straightforward. All the jobs would run within the same container and therefore have an enforced limit on total resource usage.

This would be a quick way to add HTCondor execution capability to an existing Docker-aware server, just by
"docker run -d htcondor-worker"
or somesuch.

(2) A "docker universe" where each job instance launches within a new Docker container, from a chosen container template.

When the job starts, a container is created, and when the job terminates the container is destroyed (except perhaps on failure, in which case we can keep it around for post-mortem?)

condor_exec would need to fire off "docker run" (preferably via the docker API) and track it until the container terminated. Plumbing for stdin/stdout and file transfer would also be required. Hence maybe part of condor_exec itself should run within the container?

Note: in principle it should be possible to combine (1) and (2)
https://blog.docker.com/2013/09/docker-can-now-run-within-docker/

(3) Docker containers on the submit host

A docker container would be a convenient abstraction to use on the submission host. Normally when you start a HTCondor DAG you need to create an empty working directory, run a script to create the DAG and/or SUB files, run condor_submit_dag, monitor progress to wait for completion, check the exit status to see if all DAG nodes completed successfully, fix/restart if necessary, then tidy up the work directory.

Docker on the submission host could handle this lifecycle: the container would be the work directory, it would run the scripts you want, submit the DAG and be visible as a running container until it has completed, and the container itself has an exit status which would show whether the DAG completed succesfully or not, under "docker ps".
https://docs.docker.com/reference/commandline/cli/#filtering_2

When you are finished with the results then you would destroy the container.

This one might be a bit tricky to implement, as I don't see any way to have condor_submit_dag or condor_submit run in the foreground. I think it would be necessary to run "condor_dagman -f" directly as the process within the container.

The container also needs to communicate with the condor schedd, and I'm not sure if it needs access to bits of the filesystem as well (e.g. condor_config). If necessary, /etc/condor/ can be loopback-mounted as a volume within the container.

The user-provided scripts need to be available (e.g. by including them inside a custom docker image from "docker build") and they need to have parameters passed to them - this could be via environment variables (docker run -e). If the container is restarted, it should re-submit the existing DAG, not run the scripts to create a new DAG.

If anyone has done any of the above, I'd be very interested to hear about your experiences.

Regards,

Brian.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/