[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_rm & the docker universe

Hi Greg,

Ok, I didn't realized it worked like this - I had assumed HTCondor would do something like "docker stop", rather than send a signal to the actual executable running inside the container. Isn't this rather unsafe? It makes it very easy for people to run jobs which escape HTCondor's control - according to HTCondor the job has been killed but the Docker container continues running for as long as it wants.

Just running the job under a shell doesn't seem to work either. I've also been trying scripts which will catch SIGTERM but I haven't managed to get this to have any affect either. Still looking at it...


From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Greg Thain [gthain@xxxxxxxxxxx]
Sent: Wednesday, July 29, 2015 5:30 PM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] condor_rm & the docker universe


I think what's going on here is that docker uses linux pid namespaces,
and your job runs with pid 1 inside the namespace.  The Linux kernel has
a (mis)feature wherein it does not deliver signals to pid 1 if there is
no signal handler for that signal installed (for handle-able signals).

Condor, by default, sends SIGTERM on remove (and preempt and evictions),
in order for the job to be able to clean up gracefully. To be sure, if
the job hasn't exited after a longer timeout, condor will send SIGKILL,
which can't be caught, and which the kernel will deign to correctly
deliver.  During this interval, the job will be in the X state.

I believe the job will exit promptly if it catches SIGTERM/SIGQUIT.
Perhaps the easiest way to do this is to run it under a shell.

HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: