[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Universe Docker: Cannot start container



Iâm seeing the same thing and wondering if you every solved this. Iâm wondering if it has do to with the account that condor_starter is running under versus using root to run docker with this in the local config file:

 

DOCKER = sudo /usr/bin/docker

 

 

-Sean

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Matthias Schnepf
Sent: Friday, December 11, 2015 2:48 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Universe Docker: Cannot start container

 

Hello everybody

I want to start a job in a docker universe

###### submit file ######
  universe                = docker
  docker_image            = debian
  executable              = /bin/cat
  arguments               = /etc/hosts
  should_transfer_files   = YES
  when_to_transfer_output = ON_EXIT
  output                  = out.$(Process)
  error                   = err.$(Process)
  log                     = log.$(Process)
  request_memory          = 100M

requirements = (Machine == "dockermachine")
queue 1
################

The log.0 file says:

000 (033.000.000) 12/11 11:13:18 Job submitted from host: <192.168.0.1:9618?addrs=192.168.0.1-9618&noUDP&sock=83868_5d5e_5>
...
001 (033.000.000) 12/11 11:13:27 Job executing on host: <192.168.0.2:9615?CCBID=192.168.0.1:9618%3faddrs%3d192.168.0.1-9618%26noUDP%26sock%3dcollector#1&addrs=192.168.0.2-9615&noUDP&sock=1933991_db28_5>
...
007 (033.000.000) 12/11 11:13:49 Shadow exception!
        Error from slot1@dockermachine: Cannot start container: invalid image name: debian
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...
012 (033.000.000) 12/11 11:13:49 Job was held.
        Error from slot1@dockermachine: Cannot start container: invalid image name: debian
        Code 35 Subcode 0



And the logfile from the host says:

##### /var/log/condor/StarterLog.slot1 #####
....
2/11/15 11:13:30 (pid:2173165) Starting a VANILLA universe job with ID: 33.0
12/11/15 11:13:30 (pid:2173165) Output file: /var/lib/condor/execute/dir_2173165/_condor_stdout
12/11/15 11:13:30 (pid:2173165) Error file: /var/lib/condor/execute/dir_2173165/_condor_stderr
12/11/15 11:13:30 (pid:2173165) lock_file returning ERROR, errno=9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) FileLock::obtain(1) failed - errno 9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) Found 2 entries in docker image cache.
12/11/15 11:13:30 (pid:2173165) lock_file returning ERROR, errno=9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) FileLock::obtain(2) failed - errno 9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) Process exited, pid=2173169, status=1
12/11/15 11:13:30 (pid:2173165) DockerProc::JobReaper()
12/11/15 11:13:30 (pid:2173165) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
12/11/15 11:13:30 (pid:2173165) Error: No such image or container: HTCJob33_0_slot1_PID2173165
12/11/15 11:13:31 (pid:2173165) Failed to create classad from Docker output (0).  Printing up to the first 9 (nonblank) lines.
......

#####

The Docker image debian is pulled on the host system. And the folder /var/lib/condor is empty and has the owner condor.

Has someone an idea to fix this problem?

Best regards
Matthias