[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_rm & the docker universe

On 7/30/2015 1:53 PM, Brian Bockelman wrote:

On Jul 30, 2015, at 10:01 AM, andrew.lahiff@xxxxxxxxxx wrote:

Hi Greg,

Ok, I didn't realized it worked like this - I had assumed HTCondor
would do something like "docker stop", rather than send a signal to
the actual executable running inside the container. Isn't this
rather unsafe? It makes it very easy for people to run jobs which
escape HTCondor's control - according to HTCondor the job has been
killed but the Docker container continues running for as long as it

PID 1 is a very special creature.

In addition to having the somewhat-bizarre signal handling mentioned
below (PS Greg - why donât we use the same trick here as in vanilla
universe to avoid the problem there?)

Hi Brian,

Do you mean why don't we invoke the job via the $(LIBEXEC)/condor_pid_ns_init wrapper like we do when running a vanilla job in a private pid namespace? I asked Greg the same thing. The issue is condor_pid_ns_init is dynamically linked with a bunch of libraries we cannot ensure will exist in the docker image. I suppose one way around this would be to statically link condor_pid_ns_init; not sure how involved that change is to the build/cmake process. Looking at the source for condor_pid_ns_init (see https://goo.gl/P8ejqY - Andrew, you may find this interesting...) , it doesn't look like it really uses much beyond libc (i.e. doesn't use/need HTCondor libraries per se), so maybe this isn't as hard as it may seem. Of course an ldd of condor_pid_ns_init shows it is dynamically linked with all kinds of things that aren't really needed, but I think that is just an artifact of our cmake setup.


Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685