[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 'ERROR while bootstrapping' in subdag



On Tue, 4 May 2010, Alexander Dietz wrote:

I have a problem with a DAG within my Uberdag, and I appreciate any help.
I needed to stop the uberdag process, because it seemed to be hang up.
When I started the (rescue) Uberdag, all but one of the sub DAGs run
fine, but one did not. The dagman.out file shows the following error I
never have seen before:

....
05/03 17:16:51     ------------------------------
05/03 17:16:51        Condor Recovery Complete
05/03 17:16:51     ------------------------------
05/03 17:16:51 Disabling log line cache.
05/03 17:16:51 ERROR while bootstrapping
05/03 17:16:51 **** condor_scheduniv_exec.9943595.0 (condor_DAGMAN)
pid 2502 EXITING WITH STATUS 1
05/03 17:16:51 Warning: ReadMultipleUserLogs destructor called, but
still monitoring 1 log(s)!

Is this from the uberdag or the low-level dag? It's hard to tell what's going on from this snippet of the file -- the best thing would be if you can send the dagman.out file for both the uberdag and the subdag that's failing.

This also happend if I try to restart this DAG by its own. It seems
that no rescue DAG has been created, and the DAG tried to recover from
the information in the dagman.out file?
Anyway, what can I do to recover this DAG? Or do I need to rerun this
DAG from scratch. I also tried to find some help via google, but I
found noting helpful. Any help is appreciated!

Hmm -- it sounds like DAGMan was killed (not condor_rm'ed) or held while the lower-level DAG was running. (Just as a note, recovery mode means that DAGMan is trying to recover the DAG state from the node job log files, not the dagman.out file.)

Kent Wenger
Condor Team