[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM



Rob de Graaf wrote:
If we can't "catch" jobs that are being killed outside condor, I suppose the only way is to re-queue them after reviewing the logs with non-zero return values?


Course the worry there is what if your job actually exits with non-zero?

Another idea is to ask Condor to rerun the job if it is killed with a sigterm or a sigquit signal. Seems unlikely that a job would exit on its own accord with either of those signals.

Off the top of my head, I think you could do the above by placing the following in your condor submit file:

   on_exit_remove = (ExitBySignal == False) ||
                    ((ExitSignal != 3) && (ExitSignal != 15))


hope this is helpful,
Todd