[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor and Docker



On 04/07/2015 10:02 AM, Brian Candler wrote:
There are three different things I'm thinking of.

(1) Running a HTCondor worker node as a Docker container.

This should be straightforward. All the jobs would run within the same container and therefore have an enforced limit on total resource usage.

This would be a quick way to add HTCondor execution capability to an existing Docker-aware server, just by
"docker run -d htcondor-worker"
or somesuch.

We've looked at this, and it is a bit more work than you might think, for the htcondor-worker would need to be configured to point the central manager, and be compatible with the rest of the pool. Generally, docker containers run within NATs, and worker nodes need inbound connections, so CCB needs to be set up on the central manager as well. You might want to volume mount the execute directory, otherwise, docker has a 10gb limit on container growth out of the box, though that limit can be increased.

Also, depending on your security posture, you probably don't want to run the worker node as root within the container, which may or may not be a problem for your HTCondor usage.


(2) A "docker universe" where each job instance launches within a new Docker container, from a chosen container template.

When the job starts, a container is created, and when the job terminates the container is destroyed (except perhaps on failure, in which case we can keep it around for post-mortem?)

condor_exec would need to fire off "docker run" (preferably via the docker API) and track it until the container terminated. Plumbing for stdin/stdout and file transfer would also be required. Hence maybe part of condor_exec itself should run within the container?


This is something we are actively working on. If you have ideas, or use cases, we'd love to hear them.

(3) Docker containers on the submit host

A docker container would be a convenient abstraction to use on the submission host. Normally when you start a HTCondor DAG you need to create an empty working directory, run a script to create the DAG and/or SUB files, run condor_submit_dag, monitor progress to wait for completion, check the exit status to see if all DAG nodes completed successfully, fix/restart if necessary, then tidy up the work directory.

Docker on the submission host could handle this lifecycle: the container would be the work directory, it would run the scripts you want, submit the DAG and be visible as a running container until it has completed, and the container itself has an exit status which would show whether the DAG completed succesfully or not, under "docker ps".
https://docs.docker.com/reference/commandline/cli/#filtering_2

When you are finished with the results then you would destroy the container.

This one might be a bit tricky to implement, as I don't see any way to have condor_submit_dag or condor_submit run in the foreground. I think it would be necessary to run "condor_dagman -f" directly as the process within the container.

The container also needs to communicate with the condor schedd, and I'm not sure if it needs access to bits of the filesystem as well (e.g. condor_config). If necessary, /etc/condor/ can be loopback-mounted as a volume within the container.

This is a use case we haven't considered, but dagman really works best now when it is a job managed by the schedd.

-greg