[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Is it possible to remove a job started by a DAG not wanted anymore?



On Wed, 20 Apr 2011, Carsten Aulbert wrote:

we are facing a small little problem, but don't know how to tackle it.

A large number of jobs has been started with dagman, most of the jobs run
fine, but some have very severe memory needs. We would like to "kill" those
without dagman restarting those.

Is it possible to tell the master dagman process that it should ignore a
specific job while the dag is running? Or any other way to navigate around
this problem?

Since you talk about DAGMan restarting the jobs, I assume you have retries turned on for the relevant DAG nodes. (If you don't have retries turned on, you should be able to condor_rm the offending jobs; DAGMan would just consider those nodes failed, and continue to make as much progress as possible given the failures.)

Assuming that you have retries set for the nodes, you could condor_hold the jobs you want to get rid of. Once the DAG stops making progress, condor_rm the DAGMan job, and that should remove the held node jobs.

Unfortunately, there's no way at this point to remove any dependencies from a running DAG. I think you'll have to edit the rescue DAG file, and then re-run the DAG.

Kent Wenger
Condor Team