[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] docker detection broken in centos 7



On 10/31/2019 1:23 PM, Dimitri Maziuk via HTCondor-users wrote:
> Todd, Brian,
> 
> I found out that centos-provided docker rpms don't work with condor:
> docker socket is not writable by a dedicated group. So I replaced them
> with the upstream docker-ce.
> 

Hi Dimitri,

Yep, the version of docker (1.13) in the centos repos is ancient and 
buggy.  Not only will it not work well out of the box with HTCondor, it 
won't work well with many other things, and Docker.com recommends 
against it.  Why it is still in the centos repo is beyond me.

FWIW, in the HTCondor Manual on Setting Up Docker ( see 
https://tinyurl.com/yxt8tzej ) it says "Acquire and install the 
docker-engine community edition by following the installations 
instructions from docker.com".  We can do better... e.g. HTCondor should 
give a nice error message if it detects a version of Docker older than 
we support. We will try to add that in.

> On some other hosts "hasdocker" is false and StarterLog contains
> 
>> 10/31/19 13:08:39 (pid:1798) '/usr/bin/docker info' did not exit successfully (code 256); the first line of output was 'WARNING: Error loading config file: /root/.docker/config.json: stat /root/.docker/config.json: permission denied'.

Strange!  Esp since you tried 'docker info' as user 'condor' and all was 
well (this is what I was going to suggest first, but you beat me to it!).

Perhaps the interesting part of the error from docker may be after the 
first line....  I am guessing the part about config.json is a red herring.

On an execute machine where HasDocker is False, please try setting

   STARTER_DEBUG = D_FULLDEBUG

in your condor_config and restart HTCondor.  I believe this will have 
the starter echo the full output (all the lines) from "docker info" into 
the starter log instead of just the first line.  This may give us a clue!

Also, is HTCondor failing to detect Docker on a few machines?  Many 
machines?  Ie just wondering what percent of your nodes this is 
happening on...

> Any idea what's going wrong?
>

Hoping the above additional info D_FULLDEBUG will give us a clue.

Some other random guesses:

Is happening on machine startup... if so, maybe the HTCondor service is 
probing Docker before Docker is fully up and running?

Did you fully remove the ancient Centos Docker 1.x before installing 
Docker 19.x from docker.com?  Some googling reveals many tales of woe 
with folks left in some half-old half-new Docker setup.

Greg likely has better ideas than me...

regards,
Todd