Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor and Docker

Date: Tue, 07 Apr 2015 10:21:40 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor and Docker

On 04/07/2015 10:02 AM, Brian Candler wrote:

There are three different things I'm thinking of.

(1) Running a HTCondor worker node as a Docker container.
This should be straightforward. All the jobs would run within the samecontainer and therefore have an enforced limit on total resource usage.
This would be a quick way to add HTCondor execution capability to anexisting Docker-aware server, just by
"docker run -d htcondor-worker"
or somesuch.

We've looked at this, and it is a bit more work than you might think,for the htcondor-worker would need to be configured to point the centralmanager, and be compatible with the rest of the pool. Generally, dockercontainers run within NATs, and worker nodes need inbound connections,so CCB needs to be set up on the central manager as well. You mightwant to volume mount the execute directory, otherwise, docker has a 10gblimit on container growth out of the box, though that limit can beincreased.

Also, depending on your security posture, you probably don't want to runthe worker node as root within the container, which may or may not be aproblem for your HTCondor usage.

(2) A "docker universe" where each job instance launches within a newDocker container, from a chosen container template.
When the job starts, a container is created, and when the jobterminates the container is destroyed (except perhaps on failure, inwhich case we can keep it around for post-mortem?)
condor_exec would need to fire off "docker run" (preferably via thedocker API) and track it until the container terminated. Plumbing forstdin/stdout and file transfer would also be required. Hence maybepart of condor_exec itself should run within the container?

This is something we are actively working on. If you have ideas, or usecases, we'd love to hear them.

(3) Docker containers on the submit host
A docker container would be a convenient abstraction to use on thesubmission host. Normally when you start a HTCondor DAG you need tocreate an empty working directory, run a script to create the DAGand/or SUB files, run condor_submit_dag, monitor progress to wait forcompletion, check the exit status to see if all DAG nodes completedsuccessfully, fix/restart if necessary, then tidy up the work directory.
Docker on the submission host could handle this lifecycle: thecontainer would be the work directory, it would run the scripts youwant, submit the DAG and be visible as a running container until ithas completed, and the container itself has an exit status which wouldshow whether the DAG completed succesfully or not, under "docker ps".
https://docs.docker.com/reference/commandline/cli/#filtering_2
When you are finished with the results then you would destroy thecontainer.
This one might be a bit tricky to implement, as I don't see any way tohave condor_submit_dag or condor_submit run in the foreground. I thinkit would be necessary to run "condor_dagman -f" directly as theprocess within the container.
The container also needs to communicate with the condor schedd, andI'm not sure if it needs access to bits of the filesystem as well(e.g. condor_config). If necessary, /etc/condor/ can beloopback-mounted as a volume within the container.

This is a use case we haven't considered, but dagman really works bestnow when it is a job managed by the schedd.


-greg

Follow-Ups:
- Re: [HTCondor-users] HTCondor and Docker
  - From: Brian Candler
- Re: [HTCondor-users] HTCondor and Docker
  - From: Nirav Merchant

References:
- [HTCondor-users] HTCondor and Docker
  - From: Brian Candler

Prev by Date: Re: [HTCondor-users] Condor held jobs should retry/release after certain configured timeout automatically
Next by Date: Re: [HTCondor-users] Condor held jobs should retry/release after certain configured timeout automatically
Previous by thread: [HTCondor-users] HTCondor and Docker
Next by thread: Re: [HTCondor-users] HTCondor and Docker
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] HTCondor and Docker