[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] jobs fail to start after update from 8.6.13 to 8.8.1

the default value for the configuration knob MOUNT_UNDER_SCRATCH changed from 8.6 to 8.8
The new default value is

MOUNT_UNDER_SCRATCH = /tmp,/var/tmp

Since your execute directory is under /tmp,  the attempt to mount /tmp into the job sandbox is recursive,
causing problems.  I'm surprised it doesn't fail earlier.

You can fix this either by adding this to your configuration


or this


Or you can fix it by moving your execute directory so that it is no longer under /tmp


-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Laurent Wandrebeck
Sent: Thursday, March 21, 2019 4:35 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] jobs fail to start after update from 8.6.13 to 8.8.1

Hi there,

Weâre happily running HTCondor for quite a while on CentOS 7.
After update to 8.8.1, jobs now fail to start. Simple setup, one
master, and some execute nodes.

Everything seems to be related to EXECUTE, which is
/tmp/condor/execute, defined as x /tmp/condor/execute in

on an execute node:
03/21/19 10:14:08 (pid:22633) Job 4443.944 set to execute immediately
03/21/19 10:14:08 (pid:22633) Starting a VANILLA universe job with ID: 4443.944
03/21/19 10:14:08 (pid:22633) Current mount, /tmp, is shared.
03/21/19 10:14:08 (pid:22633) Current mount, /, is shared.
03/21/19 10:14:08 (pid:22633) IWD: /tmp/condor/execute/dir_22633
03/21/19 10:14:08 (pid:22633) Renice expr "0" evaluated to 0
03/21/19 10:14:08 (pid:22633) About to exec /tmp/condor/execute/dir_22633/condor_exec.exe 
03/21/19 10:14:08 (pid:22633) Running job as user low
03/21/19 10:14:08 (pid:22633) Warning: Create_Process: failed to read child process failure code
03/21/19 10:14:08 (pid:22633) Create_Process(/tmp/condor/execute/dir_22633/condor_exec.exe,, ...) failed: (errno=2: 'No such file or directory')
03/21/19 10:14:08 (pid:22633) Failed to start job, exiting
03/21/19 10:14:08 (pid:22633) ShutdownFast all jobs.
03/21/19 10:14:08 (pid:22633) condor_read() failed: recv(fd=13) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <>.
03/21/19 10:14:08 (pid:22633) IO: Failed to read packet header
03/21/19 10:14:08 (pid:22633) Lost connection to shadow, waiting 2400 secs for reconnect
03/21/19 10:14:08 (pid:22633) All jobs have exited... starter exiting
03/21/19 10:14:08 (pid:22633) **** condor_starter (condor_STARTER) pid 22633 EXITING WITH STATUS 0

Any idea ? (selinux is not the culprit)
Laurent Wandrebeck
HYGEOS, Earth Observation Department / Observation de la Terre
165 Avenue de Bretagne
59000 Lille, France
tel: +33 3 20 08 24 98

HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: