[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_rm & the docker universe


I think what's going on here is that docker uses linux pid namespaces, and your job runs with pid 1 inside the namespace. The Linux kernel has a (mis)feature wherein it does not deliver signals to pid 1 if there is no signal handler for that signal installed (for handle-able signals).

Condor, by default, sends SIGTERM on remove (and preempt and evictions), in order for the job to be able to clean up gracefully. To be sure, if the job hasn't exited after a longer timeout, condor will send SIGKILL, which can't be caught, and which the kernel will deign to correctly deliver. During this interval, the job will be in the X state.

I believe the job will exit promptly if it catches SIGTERM/SIGQUIT. Perhaps the easiest way to do this is to run it under a shell.