[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Endless running docker jobs



Hi all,

we running successfully a the docker universe on a lot of our resources. Sometimes it happens, that the job in the docker container is finished but HTCondor doesn't recognize this. Sometimes, HTCondor loses the information about the PID and changes the executable (program)Â from docker:./condor_exec.exe to the job ID. This results in an endless running docker job.

condor_who job running:
OWNERÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CLIENTÂÂÂÂÂÂÂÂÂÂ SLOTÂÂÂÂ JOB RUNTIMEÂÂÂÂ ÂÂÂ PIDÂÂÂÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ PROGRAM userÂÂ ÂÂÂ Â ÂÂÂ ÂÂÂ ÂÂÂÂ submitnodeÂÂÂ 1_13 ÂÂÂ 2925686.0 0+02:30:11ÂÂÂÂ 11868ÂÂ ÂÂÂ ÂÂÂ Â Â ÂÂÂ docker:./condor_exec.exe

condor_who job finished:
OWNERÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CLIENTÂÂÂÂÂÂÂÂÂÂ SLOT ÂÂÂ JOB RUNTIMEÂÂÂÂ ÂÂÂ PIDÂÂÂÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ PROGRAM userÂÂ ÂÂÂ Â ÂÂÂ ÂÂÂ ÂÂÂÂ submitnodeÂÂÂ 1_13 ÂÂÂ 2925686.0 0+02:30:11ÂÂÂÂÂÂÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ 2925686.0

As expected, docker ps -a shows no docker container when the job is finished.

We run HTCondor 8.6.5 and docker 17.05.0-ce.

Is this a known issue and is there any solution?

Cheers,

Matthias