[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] docker detection broken in centos 7
- Date: Thu, 31 Oct 2019 21:21:43 +0000
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] docker detection broken in centos 7
On 10/31/2019 1:23 PM, Dimitri Maziuk via HTCondor-users wrote:
> Todd, Brian,
> I found out that centos-provided docker rpms don't work with condor:
> docker socket is not writable by a dedicated group. So I replaced them
> with the upstream docker-ce.
Yep, the version of docker (1.13) in the centos repos is ancient and
buggy. Not only will it not work well out of the box with HTCondor, it
won't work well with many other things, and Docker.com recommends
against it. Why it is still in the centos repo is beyond me.
FWIW, in the HTCondor Manual on Setting Up Docker ( see
https://tinyurl.com/yxt8tzej ) it says "Acquire and install the
docker-engine community edition by following the installations
instructions from docker.com". We can do better... e.g. HTCondor should
give a nice error message if it detects a version of Docker older than
we support. We will try to add that in.
> On some other hosts "hasdocker" is false and StarterLog contains
>> 10/31/19 13:08:39 (pid:1798) '/usr/bin/docker info' did not exit successfully (code 256); the first line of output was 'WARNING: Error loading config file: /root/.docker/config.json: stat /root/.docker/config.json: permission denied'.
Strange! Esp since you tried 'docker info' as user 'condor' and all was
well (this is what I was going to suggest first, but you beat me to it!).
Perhaps the interesting part of the error from docker may be after the
first line.... I am guessing the part about config.json is a red herring.
On an execute machine where HasDocker is False, please try setting
STARTER_DEBUG = D_FULLDEBUG
in your condor_config and restart HTCondor. I believe this will have
the starter echo the full output (all the lines) from "docker info" into
the starter log instead of just the first line. This may give us a clue!
Also, is HTCondor failing to detect Docker on a few machines? Many
machines? Ie just wondering what percent of your nodes this is
> Any idea what's going wrong?
Hoping the above additional info D_FULLDEBUG will give us a clue.
Some other random guesses:
Is happening on machine startup... if so, maybe the HTCondor service is
probing Docker before Docker is fully up and running?
Did you fully remove the ancient Centos Docker 1.x before installing
Docker 19.x from docker.com? Some googling reveals many tales of woe
with folks left in some half-old half-new Docker setup.
Greg likely has better ideas than me...