[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] USER_JOB_WRAPPER and Unix signals





Bruce Beckles wrote:

1. When it says "all user jobs" does it REALLY mean all user jobs
  regardless of the job's universe (so including Standard, Java, MPI, PVM
  and Scheduler universe jobs)?


It doesn't (currently) apply to scheduler universe jobs, but it applies to all of the others.


2. Are the reasons for exec()-ing the user job rather than fork()-ing the
  following?:

	- So that Condor 'knows' which process (PID) to send the Unix
	  control signals to cause the job to suspend, checkpoint or
	  vacate as necessary?


Yes.


...and so I need to know what signals Condor will send to the user job -
trawling the manual seems to reveal the following:

- SIGUSR2:
   cause a job in the Standard universe to checkpoint and then continue
   executing.

- SIGTSTP (or the value of the KillSig ClassAd attribute):
   cause a job in the Standard universe to try and gracefully shutdown
   (i.e. checkpoint).

- SIGTERM (or the value of the KillSig ClassAd attribute):
   cause a job in the Vanilla universe to try and gracefully shutdown,
   i.e. normal Unix termination (noting that the program may catch
   SIGTERM and try to clean up).  Is this also true for jobs in the other
   non-Standard (Java, MPI, PVM and Scheduler) universes?

- SIGKILL:
   kill (i.e. send the hard-kill signal to) the job, if the job takes too
   long to gracefully shutdown or doesn't respond to the appropriate
   signal.

...but what about when it suspends a user job?  Does it send it a SIGSTOP?
Does it do anything else (as wel/instead of)?
...and similarly when it unsuspends a user job does it send a SIGCONT?
Does it do anything else (as well/instead of)?


I'll let the "condor_starter" experts comment. Glancing at the code, I think there will be trouble with SIGSTOP and standard universe, because this is only sent to the parent process and you won't be able to trap it. In other universes, this signal is sent to all of the children, so you should be fine.


Dan Bradley