[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] no rescue file created



Hi,

Are there circumstances when a rescue file fails to get created? And if so, is there a way to force its recreation?

This is what happened: We were running a reasonalbly large DAG over 10 days or so. One of the main machines (the submitting machine actually) rebooted. (Not sure if this reboot is relevant.) Eventually, the dag seemed to finish (in that there was nothing actually running on any machine), but the "dag" job showed that there was 1 job on hold plus there was the actual dag job itself. So, I did a condor_rm on the job that was on hold. That operation both removed the "holded" job as well as the "dag" job itself. However, no rescue file was created. Is this normal? (Also, I've noticed that if I do a condor_rm on the dag job itself, it will not produce a rescue file either -- is that normal too?)

-Gautam