Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor and Docker

Date: Mon, 13 Apr 2015 12:53:36 -0500 (CDT)
From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor and Docker

On Mon, 13 Apr 2015, Brian Candler wrote:

So being able to kick off a manual retry would be a good feature. Anotherapproach I thought of would be for DAGman to delay its retries until the lastpossible moment - i.e. when there are no other jobs which can proceed -instead of retrying as soon as possible. Or perhaps just the *last* retryshould be handled this way.

Hmm -- DAGMAN_RETRY_SUBMIT_FIRST defaults to false, which means when anode fails it goes to the end of the ready queue (if it has retries). Butother nodes that become ready after the first node fails get added afterthat node. So I guess what you're looking for is a setting that keeps theretry attempt at the end of the ready queue even as other stuff is added.

Anyway... this is just a tweak. The main issue for me is creating a DAGdynamically (in response to a request received in an AMQP message), which inturn means a lifecycle of:
* create a working directory
* run the script to create the DAG/submit/input files in this directory
* submit the DAG
* wait for DAG to complete
* send back success/fail message to submitter, and results
* tidy up (i.e. remove the working directory) on DAG success
* on failure, keep all the temp files for post-mortem analysis; after fixes,resubmit the rescue DAG* management tools: e.g. list the working directories, clusterID for runningjobs, exit status for finished jobs (eventually a web interface)
I was initially surprised that HTCondor doesn't come with any tooling forthat sort of lifecycle - it seems the assumption is that all workflows areset up by hand at the CLI.

Well, one option is to make that top-level lifecycle into a DAG, and havethe "main" DAG be a sub-DAG of the top-level DAG.


Kent

Follow-Ups:
- Re: [HTCondor-users] HTCondor and Docker
  - From: Brian Candler

References:
- [HTCondor-users] HTCondor and Docker
  - From: Brian Candler
- Re: [HTCondor-users] HTCondor and Docker
  - From: Greg Thain
- Re: [HTCondor-users] HTCondor and Docker
  - From: Brian Candler
- Re: [HTCondor-users] HTCondor and Docker
  - From: Greg Thain
- Re: [HTCondor-users] HTCondor and Docker
  - From: Brian Candler
- Re: [HTCondor-users] HTCondor and Docker
  - From: R. Kent Wenger
- Re: [HTCondor-users] HTCondor and Docker
  - From: Brian Candler

Prev by Date: Re: [HTCondor-users] HTCondor and Docker
Next by Date: Re: [HTCondor-users] Help needed understanding cpu core usage with cgroups
Previous by thread: Re: [HTCondor-users] HTCondor and Docker
Next by thread: Re: [HTCondor-users] HTCondor and Docker
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] HTCondor and Docker