[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_rm not killing subprocesses

I thought I might include a quick example of the bash behavior I
mention.  Run test3.sh then send various signals to it.  You'll see they
are not received by test2.sh.  test2.sh will continue until receiving a
SIGKILL itself.


$ cat test2.sh

trap trap_int INT
trap trap_hup HUP
trap trap_term TERM
trap_int() { echo int; }
trap_hup() { echo hup; }
trap_term() { echo term; }

while (( 1 )); do true; done

$ cat test3.sh

for x in "0"; do


Jacob Joseph wrote:
> Thanks for the reply.  I'm not sure it solves my troubles though.  Does
> condor send a SIGTERM only to the parent bash process it spawned?  If
> so, I can reproduce the behavior outside of condor by simply killing
> (SIGTERM) the bash script.  Bash does not forward this signal to
> processes started from within a loop.  I believe the correct terminology
> is that it is no longer the controlling shell.  The end result is that
> Condor never ends up getting a signal to the subprocess, which continues
> running.
> What does work is to send a kill to all processes in the same process
> group ID. (kill does this with a negative <pgid> argument).  Is there a
> way to have condor do this as well?  Can condor be modified?  Can condor
> spawn my own script to accomplish this?
> -Jacob
> Mark Silberstein wrote:
>>It seems that your condor setup doesn't give a time to a program to
>>finish nicely when condor is evicting it - look at KILL expression.
>>Usually Condor first tries to kill with SIGTERM, and then when KILL
>>expression is true - it will kill with -9. It seems that bash doesn't
>>have a chance to clean up all its processes, which it does when you kill
>>with Ctl-C.
>>You may also want to specify kill_sig=SIGQUIT, which will cause Condor
>>to kill it with SIGQUIT first.
>>On Fri, 2005-06-03 at 01:18 -0400, Jacob Joseph wrote:
>>>Hi.  I have a number of users who have taken to wrapping their jobs
>>>within shell scripts.  Often, they'll use a for or while loop to execute
>>>a single command with various permutations.  When such a job is removed
>>>with condor_rm, the main script is killed, but subprocesses spawned from
>>>inside a loop will not be killed and will continue to run on the compute
>>>machine.  This naturally interferes with jobs which are later assigned
>>>to that machine.
>>>Does anyone know of a way to force bash subprocesses to be killed along
>>>with the parent upon removal with condor_rm?  (This behavior is not
>>>unique to condor_rm.  A kill to the parent also leaves the subprocess
>>>Condor-users mailing list
>>Condor-users mailing list
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users