Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM

Date: Wed, 19 Dec 2007 11:58:06 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM

Rob de Graaf wrote:

If we can't "catch" jobs that are being killed outside condor, I supposethe only way is to re-queue them after reviewing the logs with non-zeroreturn values?


Course the worry there is what if your job actually exits with non-zero?

Another idea is to ask Condor to rerun the job if it is killed with asigterm or a sigquit signal. Seems unlikely that a job would exit onits own accord with either of those signals.

Off the top of my head, I think you could do the above by placing thefollowing in your condor submit file:


   on_exit_remove = (ExitBySignal == False) ||
                    ((ExitSignal != 3) && (ExitSignal != 15))


hope this is helpful,
Todd

Follow-Ups:
- Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
  - From: Daniel Forrest

References:
- [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
  - From: rob
- Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
  - From: Daniel Forrest
- Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
  - From: Rob de Graaf

Prev by Date: Re: [Condor-users] core file from job
Next by Date: [Condor-users] STARTD died due to exception ACCESS_VIOLATION
Previous by thread: Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
Next by thread: Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM