[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_dagman crashed when suspend it in 7.8.2 version



On Thu, 5 Sep 2013, 钱晓明 wrote:

I found that to condor_suspend a dagman job can make it crashed and get into
RECOVERY mode. This is the output for dagman when issue suspend command:

It doesn't surprise me that that happens. I did some testing with 8.0, and I'm not seeing the exact same behavior. But I'm not sure what condor_suspend is supposed to do to a scheduler universe job (which DAGMan is, unless you've changed the normal .condor.sub file generated by condor_submit_dag).

You're probably better off doing condor_hold/condor_release on the DAGMan job instead of condor_suspend/condor_continue.

Note that if you do condor_hold/condor_release on a DAGMan, it *will* go into recovery mode, but that's the correct behavior.

Kent Wenger
CHTC Team