[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Is it possible to remove a job started by a DAG not wanted anymore?
- Date: Wed, 20 Apr 2011 11:17:33 -0500 (CDT)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Is it possible to remove a job started by a DAG not wanted anymore?
On Wed, 20 Apr 2011, Carsten Aulbert wrote:
we are facing a small little problem, but don't know how to tackle it.
A large number of jobs has been started with dagman, most of the jobs run
fine, but some have very severe memory needs. We would like to "kill" those
without dagman restarting those.
Is it possible to tell the master dagman process that it should ignore a
specific job while the dag is running? Or any other way to navigate around
Since you talk about DAGMan restarting the jobs, I assume you have retries
turned on for the relevant DAG nodes. (If you don't have retries turned
on, you should be able to condor_rm the offending jobs; DAGMan would just
consider those nodes failed, and continue to make as much progress as
possible given the failures.)
Assuming that you have retries set for the nodes, you could condor_hold
the jobs you want to get rid of. Once the DAG stops making progress,
condor_rm the DAGMan job, and that should remove the held node jobs.
Unfortunately, there's no way at this point to remove any dependencies
from a running DAG. I think you'll have to edit the rescue DAG file, and
then re-run the DAG.