[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] USER_JOB_WRAPPER and Unix signals
- Date: Wed, 11 Aug 2004 16:44:47 +0100 (BST)
- From: Bruce Beckles <mbb10@xxxxxxxxx>
- Subject: [Condor-users] USER_JOB_WRAPPER and Unix signals
Section 3.3.12 of the Condor 6.6 manual, in the section that
documents the USER_JOB_WRAPPER setting, says:
"This macro allows the administrator to specify a ''wrapper'' script to
handle the execution of all user jobs. ... This wrapper program must
ultimately replace its image with the user job; in other words, it must
exec() the user job, not fork() it."
I have two questions about this:
1. When it says "all user jobs" does it REALLY mean all user jobs
regardless of the job's universe (so including Standard, Java, MPI, PVM
and Scheduler universe jobs)?
2. Are the reasons for exec()-ing the user job rather than fork()-ing the
- To ensure that the user job inherits the environment Condor has
prepapred for it, including environment variables and
redirection of standard error and standard output? Is there
anything else that needs to be preserved?
- So that Condor 'knows' which process (PID) to send the Unix
control signals to cause the job to suspend, checkpoint or
vacate as necessary?
... or are the reasons something else entirely? Or are there other
reasons in addition to the ones I've suggested above?
This leads me on to what I really want to know which is, if my "wrapper"
(a) ensures it passes the environment variables to, and preserves
Condor's redirection of standard error and standard input for, its
(b) traps signals from the Condor starter and passes them on to its
...can I fork() the user job instead of exec()-ing it, or will it all go
...and so I need to know what signals Condor will send to the user job -
trawling the manual seems to reveal the following:
cause a job in the Standard universe to checkpoint and then continue
- SIGTSTP (or the value of the KillSig ClassAd attribute):
cause a job in the Standard universe to try and gracefully shutdown
- SIGTERM (or the value of the KillSig ClassAd attribute):
cause a job in the Vanilla universe to try and gracefully shutdown,
i.e. normal Unix termination (noting that the program may catch
SIGTERM and try to clean up). Is this also true for jobs in the other
non-Standard (Java, MPI, PVM and Scheduler) universes?
kill (i.e. send the hard-kill signal to) the job, if the job takes too
long to gracefully shutdown or doesn't respond to the appropriate
...but what about when it suspends a user job? Does it send it a SIGSTOP?
Does it do anything else (as wel/instead of)?
...and similarly when it unsuspends a user job does it send a SIGCONT?
Does it do anything else (as well/instead of)?
Any help much appreciated!
University of Cambridge Computing Service.