[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Stuck dagman jobs after restart



On Mon, 15 Dec 2014, Brian Bockelman wrote:

Hi Brian,

It might be worth it to look at the UserLog of these jobs - it's possible they are switching quickly between R and I?

Hmm, you could look, but I'd be really surprised if that were happening.
Could you send us your SchedLog? I think that's the most likely log to give us some useful information.

We actually have a test for DAGs getting correctly restarted across a Condor restart, so I'm a little surprised this is happening.

Something else I just thought of -- you might want to try doing condor_hold and then condor_release on one of the DAGs, to see if that gets it to run (just a wild guess).

Kent Wenger
CHTC Team