[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Docker cannot inspect exited container



Hi,

When running jobs with the docker universe, I always get the following error message:

12/06/21 15:26:24 (pid:3819363) Create_Process succeeded, pid=3819365
12/06/21 15:26:24 (pid:3819363) Suspending all jobs.
12/06/21 15:26:24 (pid:3819363) DockerProc::Suspend() container 'HTCJob50126_0_slot1_1_PID3819363'
12/06/21 15:26:24 (pid:3819363) Docker invocation '/usr/bin/docker pause HTCJob50126_0_slot1_1_PID3819363' failed, printing first few lines of output.
12/06/21 15:26:24 (pid:3819363) Failed to suspend container 'HTCJob50126_0_slot1_1_PID3819363'.
12/06/21 15:26:24 (pid:3819363) Process exited, pid=3819365, status=0
12/06/21 15:26:24 (pid:3819363) Output file: streaming to remote file /work/scratch/schock/condor_logs/loc_50126.0_out.log
12/06/21 15:26:24 (pid:3819363) Error file: streaming to remote file /work/scratch/schock/condor_logs/loc_50126.0_err.log
12/06/21 15:26:24 (pid:3819363) Runnning: /usr/bin/docker start -a HTCJob50126_0_slot1_1_PID3819363
12/06/21 15:26:25 (pid:3819363) unhandled job exit: pid=3819365, status=0
12/06/21 15:26:25 (pid:3819363) Process exited, pid=3819385, status=0
12/06/21 15:26:25 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:26 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:27 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:28 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:29 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:30 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:31 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:32 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:34 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:35 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:36 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:37 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:38 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:39 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:40 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:41 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:42 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:43 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:44 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:45 (pid:3819363) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/06/21 15:26:46 (pid:3819363) Failed to inspect (for removal) container 'HTCJob50126_0_slot1_1_PID3819363'.
12/06/21 15:26:46 (pid:3819363) ERROR "Cannot inspect exited container" at line 452 in file /var/lib/condor/execute/slot1/dir_75677/userdir/.tmpN40LkE/condor-9.0.8/src/condor_starter.V6.1/docker_proc.cpp
12/06/21 15:26:46 (pid:3819363) ShutdownFast all jobs.
12/06/21 15:26:46 (pid:3819363) DockerProc::ShutdownFast() container 'HTCJob50126_0_slot1_1_PID3819363'
12/06/21 15:26:46 (pid:3819363) Docker invocation '/usr/bin/docker kill --signal 9 HTCJob50126_0_slot1_1_PID3819363' failed, printing first few lines of output

However, the jobs (in that case a simple nvidia-smi) still seems to run. Does anybody have an idea on how to fix this?

Best,
Justus