Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] jobs fail to start after update from 8.6.13 to 8.8.1
- Date: Thu, 21 Mar 2019 16:13:59 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] jobs fail to start after update from 8.6.13 to 8.8.1
the default value for the configuration knob MOUNT_UNDER_SCRATCH changed from 8.6 to 8.8
The new default value is
MOUNT_UNDER_SCRATCH = /tmp,/var/tmp
Since your execute directory is under /tmp, the attempt to mount /tmp into the job sandbox is recursive,
causing problems. I'm surprised it doesn't fail earlier.
You can fix this either by adding this to your configuration
MOUNT_UNDER_SCRATCH = /var/tmp
or this
MOUNT_UNDER_SCRATCH =
Or you can fix it by moving your execute directory so that it is no longer under /tmp
-tj
-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Laurent Wandrebeck
Sent: Thursday, March 21, 2019 4:35 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] jobs fail to start after update from 8.6.13 to 8.8.1
Hi there,
Weâre happily running HTCondor for quite a while on CentOS 7.
After update to 8.8.1, jobs now fail to start. Simple setup, one
master, and some execute nodes.
Everything seems to be related to EXECUTE, which is
/tmp/condor/execute, defined as x /tmp/condor/execute in
etc/tmpfiles.d/condor.conf.
on an execute node:
03/21/19 10:14:08 (pid:22633) Job 4443.944 set to execute immediately
03/21/19 10:14:08 (pid:22633) Starting a VANILLA universe job with ID: 4443.944
03/21/19 10:14:08 (pid:22633) Current mount, /tmp, is shared.
03/21/19 10:14:08 (pid:22633) Current mount, /, is shared.
03/21/19 10:14:08 (pid:22633) IWD: /tmp/condor/execute/dir_22633
03/21/19 10:14:08 (pid:22633) Renice expr "0" evaluated to 0
03/21/19 10:14:08 (pid:22633) About to exec /tmp/condor/execute/dir_22633/condor_exec.exe
03/21/19 10:14:08 (pid:22633) Running job as user low
03/21/19 10:14:08 (pid:22633) Warning: Create_Process: failed to read child process failure code
03/21/19 10:14:08 (pid:22633) Create_Process(/tmp/condor/execute/dir_22633/condor_exec.exe,, ...) failed: (errno=2: 'No such file or directory')
03/21/19 10:14:08 (pid:22633) Failed to start job, exiting
03/21/19 10:14:08 (pid:22633) ShutdownFast all jobs.
03/21/19 10:14:08 (pid:22633) condor_read() failed: recv(fd=13) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <10.1.71.91:18752>.
03/21/19 10:14:08 (pid:22633) IO: Failed to read packet header
03/21/19 10:14:08 (pid:22633) Lost connection to shadow, waiting 2400 secs for reconnect
03/21/19 10:14:08 (pid:22633) All jobs have exited... starter exiting
03/21/19 10:14:08 (pid:22633) **** condor_starter (condor_STARTER) pid 22633 EXITING WITH STATUS 0
Any idea ? (selinux is not the culprit)
Thanks,
--
Laurent Wandrebeck
HYGEOS, Earth Observation Department / Observation de la Terre
Euratechnologies
165 Avenue de Bretagne
59000 Lille, France
tel: +33 3 20 08 24 98
https://www.hygeos.com
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/