Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_rm & the docker universe

Date: Thu, 30 Jul 2015 15:01:37 +0000
From: andrew.lahiff@xxxxxxxxxx
Subject: Re: [HTCondor-users] condor_rm & the docker universe

Hi Greg,

Ok, I didn't realized it worked like this - I had assumed HTCondor would do something like "docker stop", rather than send a signal to the actual executable running inside the container. Isn't this rather unsafe? It makes it very easy for people to run jobs which escape HTCondor's control - according to HTCondor the job has been killed but the Docker container continues running for as long as it wants.

Just running the job under a shell doesn't seem to work either. I've also been trying scripts which will catch SIGTERM but I haven't managed to get this to have any affect either. Still looking at it...

Thanks,
Andrew.

________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Greg Thain [gthain@xxxxxxxxxxx]
Sent: Wednesday, July 29, 2015 5:30 PM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] condor_rm & the docker universe

Andrew:

I think what's going on here is that docker uses linux pid namespaces,
and your job runs with pid 1 inside the namespace.  The Linux kernel has
a (mis)feature wherein it does not deliver signals to pid 1 if there is
no signal handler for that signal installed (for handle-able signals).

Condor, by default, sends SIGTERM on remove (and preempt and evictions),
in order for the job to be able to clean up gracefully. To be sure, if
the job hasn't exited after a longer timeout, condor will send SIGKILL,
which can't be caught, and which the kernel will deign to correctly
deliver.  During this interval, the job will be in the X state.

I believe the job will exit promptly if it catches SIGTERM/SIGQUIT.
Perhaps the easiest way to do this is to run it under a shell.

-Greg
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] condor_rm & the docker universe
  - From: Brian Bockelman
- Re: [HTCondor-users] condor_rm & the docker universe
  - From: Dimitri Maziuk

References:
- [HTCondor-users] condor_rm & the docker universe
  - From: andrew . lahiff
- Re: [HTCondor-users] condor_rm & the docker universe
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] condor_userprio shows a group is using 199 slots but all users in this group only use 7 slots in total
Next by Date: Re: [HTCondor-users] condor_rm & the docker universe
Previous by thread: Re: [HTCondor-users] condor_rm & the docker universe
Next by thread: Re: [HTCondor-users] condor_rm & the docker universe
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] condor_rm & the docker universe