[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_rm not killing subprocesses



Thanks for the reply.  I'm not sure it solves my troubles though.  Does
condor send a SIGTERM only to the parent bash process it spawned?  If
so, I can reproduce the behavior outside of condor by simply killing
(SIGTERM) the bash script.  Bash does not forward this signal to
processes started from within a loop.  I believe the correct terminology
is that it is no longer the controlling shell.  The end result is that
Condor never ends up getting a signal to the subprocess, which continues
running.

What does work is to send a kill to all processes in the same process
group ID. (kill does this with a negative <pgid> argument).  Is there a
way to have condor do this as well?  Can condor be modified?  Can condor
spawn my own script to accomplish this?

-Jacob

Mark Silberstein wrote:
> It seems that your condor setup doesn't give a time to a program to
> finish nicely when condor is evicting it - look at KILL expression.
> Usually Condor first tries to kill with SIGTERM, and then when KILL
> expression is true - it will kill with -9. It seems that bash doesn't
> have a chance to clean up all its processes, which it does when you kill
> with Ctl-C.
> You may also want to specify kill_sig=SIGQUIT, which will cause Condor
> to kill it with SIGQUIT first.
> 
> 
> 
> On Fri, 2005-06-03 at 01:18 -0400, Jacob Joseph wrote:
> 
>>Hi.  I have a number of users who have taken to wrapping their jobs
>>within shell scripts.  Often, they'll use a for or while loop to execute
>>a single command with various permutations.  When such a job is removed
>>with condor_rm, the main script is killed, but subprocesses spawned from
>>inside a loop will not be killed and will continue to run on the compute
>>machine.  This naturally interferes with jobs which are later assigned
>>to that machine.
>>
>>Does anyone know of a way to force bash subprocesses to be killed along
>>with the parent upon removal with condor_rm?  (This behavior is not
>>unique to condor_rm.  A kill to the parent also leaves the subprocess
>>running.)
>>
>>-Jacob
>>_______________________________________________
>>Condor-users mailing list
>>Condor-users@xxxxxxxxxxx
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users