[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Getting DAG node to fail on file transfer error
- Date: Mon, 3 Nov 2014 09:57:57 -0600
- From: Zachary Miller <zmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Getting DAG node to fail on file transfer error
On Mon, Nov 03, 2014 at 03:51:43PM +0000, Brian Candler wrote:
> (related to my previous post)
> If I submit a DAG which uses a http:// URL for an input file, and the
> file transfer fails, the job goes into a "hold" state. Is it possible to
> configure this so that it fails the node entirely?
> If the DAG node failed then the whole DAG would fail, and this gets
> noticed by the user. However if a job ends up in 'held' state then it's
> just as if the job is taking forever to run, and needs additional
> monitoring to check.
I would look into setting "periodic_remove" in your job submit file. You can
condition it to look for the proper HoldReasonCode (that shows file transfer
has failed, and not some other reason). I'll defer to Kent Wenger on this, but
I believe if a job gets removed it causes the DAG to fail.