[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] A couple of questions regarding running jobs with docker containers (Q1)

On 3/26/21 5:36 AM, jcaballero.hep@xxxxxxxxx wrote:

I have a couple of questions regarding running jobs in Docker
containers. Here is the first one.

I am testing condor 8.8.12, with docker 18.03.0

Printing some custom logs from the wrapper set in config variable
DOCKER, I have just noticed that not always everything works.
Sometimes, HTCondor decides to kill the container after a few seconds.
As can be seen here [*]
For example, the container started at "Thu Mar 25 22:55:57 2021" was
terminated at "Thu Mar 25 23:04:07 2021".

Note that I am running one job at a time on that host.


HTCondor will kill the container, just like it will kill a running job when requested to by a condor_rm or a preemption or similar reason. My first guess is that's what's happening. The StartdLog should have more details.

In 8.9, we introduced Tickets of Execution in the job ad which have more details about why the job left the machine.