Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Submitting to a remote condor queue

Date: Mon, 08 Feb 2016 17:27:52 +0000
From: Brian Candler <b.candler@xxxxxxxxx>
Subject: Re: [HTCondor-users] Submitting to a remote condor queue

On 08/02/2016 16:37, Todd Tannenbaum wrote:

Consider an alternative similar to the following:
Do a volume mount so that your container and your host share somesubdirectory on the host file system. In this subdirectory, create a"runme" directory where your container will atomically write out DAGfiles along with their corresponding submit files. Meanwhile on thehost schedd, have a local universe job (submitted by whatever user youchoose) that peridoically scans the "runme" directory for .dag files,submits them, and then renames the submitted .dag file to.dag.submitted.jobX.Y.
This way you do not need to do any reconfiguration of your hostschedd, you don't need to have any trust relationships between yourcontainer and your host schedd, and you don't need to pay any extraoverhead of having HTCondor move files in and out of the container viafile transfer.

Thank you - I was sort-of coming to that conclusion myself. Indeed, ifthe container only wants to fire-and-forget a single DAG, I can just runthe container and wait for it to exit; if it exits with success (rc=0)then I pick up and submit the .dag file that it wrote. This does meanthat the container is only responsible for the preparation of the job,not for managing its lifecycle. So for example, it can't do anypost-processing actions when the DAG completes.

Alternatively, if I submit jobs in the way you suggest (by polling fordrop files), then the container can carry on running and itself cancheck for DAG completion, e.g. by polling the node status file. It seemsa pretty crude to use the filesystem in this way, instead of propercondor APIs, but it should be functional.

Either way, I need to allocate working directories outside thecontainer, and have some external system which tracks which DAGs arerunning and deletes the working directories afterwards.

If the container itself were responsible for running the DAG then eachcontainer would *be* the working directory, and "docker ps" would be mylist of running tasks. But having read a bit more about remotesubmission, I see there are a number of difficulties with this. Foremostseems to be that the jobs that condor_dagman submits would need to beable to write their outputs inside the container - which in turn I thinkmeans condor_dagman itself would have to run inside the container, andthat would be a very non-standard way of deploying htcondor, unless youalso had a schedd running inside the container.


Thanks again,

Brian.

References:
- [HTCondor-users] Submitting to a remote condor queue
  - From: Brian Candler
- Re: [HTCondor-users] Submitting to a remote condor queue
  - From: Todd Tannenbaum

Prev by Date: Re: [HTCondor-users] Submitting to a remote condor queue
Next by Date: [HTCondor-users] CFP Extended Deadline: Big Data and Cloud Performance (DCPerf'16) (due on Feb 17)
Previous by thread: Re: [HTCondor-users] Submitting to a remote condor queue
Next by thread: [HTCondor-users] HTCondor: job outputs not cleaned up?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Submitting to a remote condor queue