[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_rm & the docker universe



Hi,

Has anyone been able to successfully kill a docker universe job using "condor_rm"? When I try this (with 8.3.6 and also 8.3.7) the job just stays in the X state:

[root@vm168 condor]# condor_q


-- Schedd: vm168.nubes.stfc.ac.uk : <130.246.221.109:38993>
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 115.0   alahiff         7/28 11:42   0+00:00:00 X  0   0.0  sleep 10000

1 jobs; 0 completed, 1 removed, 0 idle, 0 running, 0 held, 0 suspended

while the container keeps on running:

[root@vm168 condor]# docker ps
CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS               NAMES
261d23866b71        busybox             "/bin/sleep 10000"   3 minutes ago       Up 3 minutes                            HTCJob115_0_slot1_1_PID8325

In ShadowLog I see:

07/28/15 11:42:34 (fd:6) (pid:8324) (D_ALWAYS) (115.0) (8324): Requesting graceful removal of job.

but nothing else.

Note that eventually the job disappears from condor_q about 10 minutes later (i.e. condor thinks that the job has finishing being removed) but the container itself continues running (!)

I'm using Docker 1.7.1 on SL7.

Many Thanks,
Andrew.