[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] vanilla jobs not starting under docker: condor 8.7.10




I am getting a strange error while starting simple jobs using workers in dockerÂ
containers.. for reference universe=docker does work.ÂÂ

Note the line below "Failed to unshare the mount namespace errno"

Config is below also.

I've spent a day looking at this.. losing hope.
Thanks,Â
Kris



root@condor-worker-43:/var/log/condor# cat StarterLog.slot1_1
05/08/19 19:09:38 (pid:166) ******************************************************
05/08/19 19:09:38 (pid:166) ** condor_starter (CONDOR_STARTER) STARTING UP
05/08/19 19:09:38 (pid:166) ** /usr/sbin/condor_starter
05/08/19 19:09:38 (pid:166) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
05/08/19 19:09:38 (pid:166) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
05/08/19 19:09:38 (pid:166) ** $CondorVersion: 8.7.10 Oct 31 2018 BuildID: Debian-8.7.10-1 Debian-8.7.10-1 $
05/08/19 19:09:38 (pid:166) ** $CondorPlatform: X86_64-Debian_9 $
05/08/19 19:09:38 (pid:166) ** PID = 166
05/08/19 19:09:38 (pid:166) ** Log last touched time unavailable (No such file or directory)
05/08/19 19:09:38 (pid:166) ******************************************************
05/08/19 19:09:38 (pid:166) Using config source: /etc/condor/condor_config
05/08/19 19:09:38 (pid:166) Using local config sources:
05/08/19 19:09:38 (pid:166)Â Â /etc/condor/condor_config.local
05/08/19 19:09:38 (pid:166) config Macros = 79, Sorted = 78, StringBytes = 2184, TablesBytes = 2892
05/08/19 19:09:38 (pid:166) CLASSAD_CACHING is OFF
05/08/19 19:09:38 (pid:166) Daemon Log is logging: D_ALWAYS D_ERROR
05/08/19 19:09:38 (pid:166) SharedPortEndpoint: waiting for connections to named socket 113_5e5e_3
05/08/19 19:09:38 (pid:166) DaemonCore: command socket at <10.42.79.108:9886?addrs=10.42.79.108-9886&noUDP&sock=113_5e5e_3>
05/08/19 19:09:38 (pid:166) DaemonCore: private command socket at <10.42.79.108:9886?addrs=10.42.79.108-9886&noUDP&sock=113_5e5e_3>
05/08/19 19:09:38 (pid:166) Communicating with shadow <10.42.129.175:9886?addrs=10.42.129.175-9886&noUDP&sock=107_241d_1>
05/08/19 19:09:38 (pid:166) Submitting machine is "ip-10-42-129-175.us-west-2.compute.internal"
05/08/19 19:09:38 (pid:166) setting the orig job name in starter
05/08/19 19:09:38 (pid:166) setting the orig job iwd in starter
05/08/19 19:09:38 (pid:166) Chirp config summary: IO false, Updates false, Delayed updates true.
05/08/19 19:09:38 (pid:166) Initialized IO Proxy.
05/08/19 19:09:38 (pid:166) Done setting resource limits
05/08/19 19:09:39 (pid:166) File transfer completed successfully.
05/08/19 19:09:40 (pid:166) Job 1.0 set to execute immediately
05/08/19 19:09:40 (pid:166) Starting a VANILLA universe job with ID: 1.0
05/08/19 19:09:40 (pid:166) IWD: /var/lib/condor/execute/dir_166
05/08/19 19:09:40 (pid:166) Output file: /var/lib/condor/execute/dir_166/_condor_stdout
05/08/19 19:09:40 (pid:166) Error file: /var/lib/condor/execute/dir_166/_condor_stderr
05/08/19 19:09:40 (pid:166) Renice expr "0" evaluated to 0
05/08/19 19:09:40 (pid:166) About to exec /var/lib/condor/execute/dir_166/condor_exec.exe
05/08/19 19:09:40 (pid:166) Running job as user nobody
05/08/19 19:09:40 (pid:170) Failed to unshare the mount namespace errno
05/08/19 19:09:40 (pid:166) Warning: Create_Process: failed to read child process failure code
05/08/19 19:09:40 (pid:166) Create_Process(/var/lib/condor/execute/dir_166/condor_exec.exe): child failed with errno1 (Operation not permitted) before exec()
05/08/19 19:09:40 (pid:166) Create_Process(/var/lib/condor/execute/dir_166/condor_exec.exe,, ...) failed: (errno=1: 'Operation not permitted')
05/08/19 19:09:40 (pid:166) Failed to start job, exiting
05/08/19 19:09:40 (pid:166) ShutdownFast all jobs.
05/08/19 19:09:40 (pid:166) Failed to open '.update.ad' to read update ad: No such file or directory (2).
05/08/19 19:09:40 (pid:166) condor_read() failed: recv(fd=8) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <10.42.129.175:33495>.
05/08/19 19:09:40 (pid:166) IO: Failed to read packet header
05/08/19 19:09:40 (pid:166) Lost connection to shadow, waiting 2400 secs for reconnect
05/08/19 19:09:40 (pid:166) All jobs have exited... starter exiting
05/08/19 19:09:40 (pid:166) **** condor_starter (condor_STARTER) pid 166 EXITING WITH STATUS 0
root@condor-worker-43:/var/log/condor# apt-cache search libcgroup
libcgroup-dev - control and monitor control groups (development)
libcgroup1 - control and monitor control groups (library)
root@condor-worker-43:/var/log/condor# apt-cache policy libcgroup1




CONDOR_HOST = master
#CONDOR_HOST = master
COLLECTOR_NAME = GRID
COLLECTOR_HOST = $(CONDOR_HOST):9886?sock=collector
DAEMON_LIST = MASTER,STARTD,SHARED_PORT
# DAEMON_LIST = MASTER, SCHEDD, STARTD
# DAEMON_LIST = MASTER, SCHEDD
##Â When something goes wrong with condor at your site, who should get
##Â the email?

CONDOR_ADMINÂ Â Â Â Â = admins@xxxxxxxx
#UID_DOMAINÂ Â Â Â Â Â = viqi.org
#TRUST_UID_DOMAINÂ Â Â = True
#SOFT_UID_DOMAINÂ Â Â Â= TRUE
#FILESYSTEM_DOMAINÂ Â Â= viqi.org
##Â Do you want to use NFS for file access instead of remote system calls
ALLOW_READÂ = $(ALLOW_READ), 172.*, 10.*,
ALLOW_WRITE = $(ALLOW_WRITE), 172.*, 10.*,
ALLOW_NEGOTIATORÂ Â Â = 172.*, 10.*,
#ALLOW_ADMINISTRATORÂ Â= 172.*, 10.*,
#ALLOW_CONFIGÂ Â Â Â Â = 172.*,10.*,
#ALLOW_DAEMONÂ Â Â Â Â = 172.*,10.*,

# Use CCB with shared port so outside units can talk to
USE_SHARED_PORT = True
SHARED_PORT_ARGS = -p 9886
UPDATE_COLLECTOR_WITH_TCP = True
CCB_ADDRESS = $(COLLECTOR_HOST)
PRIVATE_NETWORK_NAME = VIQI
BIND_ALL_INTERFACES = True

SEC_DEFAULT_NEGOTIATION = NEVER
SEC_DEFAULT_AUTHENTICATION = NEVER
DISCARD_SESSION_KEYRING_ON_STARTUP = false
BASE_CGROUP =

#PER_JOB_NAMESPACES=False
#USE_PID_NAMESPACES=False
#USE_PROCD = false

# Slots for multi-cpu machines
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = 100%
SLOT_TYPE_1_PARTITIONABLE = true

START = True
PREEMPT = False
SUSPEND = False
KILL = False
WANT_SUSPEND = False
WANT_VACATE= False
CONTINUE= True


--
Kris Kvilekval, Ph.D.
ViQi Inc
(805)-699-6081