Hi Joan,
I think I have figured out why you're hitting this and not us locally. The cgroup API generates a notification whenever the OOM occurs *or* the cgroup is removed. Locally, we pre-create all the possible cgroups (older RHEL kernels would crash if the cgroups were not pre-created; since fixed) which causes the cgroup to not be deleted by condor. Hence, we never got the notification when the cgroup was removed and this issue was missed in testing.
Your fix is acceptable, I think. A "more proper" approach would be to parse the memory.oom_control file to see if under_oom is set to 1. (On further thought, I'd actually prefer your approach - parsing files in the case where memory is actually tight is more likely to lead to deadlock.)
I'm just coming back under an avalanche of emails - did a bug report get filed from this?
Brian
Hi,
Just in case it can be useful for somebody, we have been able to
solve (or workaround) the problem with a little patch to the condor
source:
diff -ur
condor-8.0.0.orig/src/condor_starter.V6.1/vanilla_proc.cpp
condor-8.0.0/src/condor_starter.V6.1/vanilla_proc.cpp
--- condor-8.0.0.orig/src/condor_starter.V6.1/vanilla_proc.cpp
2013-05-29 18:58:09.000000000 +0200
+++ condor-8.0.0/src/condor_starter.V6.1/vanilla_proc.cpp
2013-07-25 12:13:09.000000000 +0200
@@ -798,6 +798,18 @@
int
VanillaProc::outOfMemoryEvent(int /* fd */)
{
+
+ /* If we have no jobs left, return and do nothing */
+ if (num_pids == 0) {
+ dprintf(D_FULLDEBUG, "Closing event FD pipe %d.\n",
m_oom_efd);
+ daemonCore->Close_Pipe(m_oom_efd);
+ close(m_oom_fd);
+ m_oom_efd = -1;
+ m_oom_fd = -1;
+
+ return 0;
+ }
+
std::stringstream ss;
if (m_memory_limit >= 0) {
ss << "Job has gone over memory limit of " <<
m_memory_limit << " megabytes.";
I don't know if it is the best way to work around this problem, but
at least it seems to work for us. We have forced a (true) OOM
condition, and it responded as it should, whereas the jobs weren't
put on hold at exit.
I don't think it's too clean, either, but as I've said it's more of
a quick-and-dirty hack to get this feature (which is really
interesting for us) running.
Regards,
Joan
El 24/07/13 17:24, Paolo Perfetti
escribió:
Hi,
On 24/07/2013 13:07, Joan J. Piles wrote:
Hi all:
We are having some problems using cgroups for memory limiting.
When jobs
exit, the OOM-Killer routines get called, placing the job on
hold
instead of letting it end normally. With a full starter log (and
a
really short job) debug we have:
Right now I'm getting crazy on the same problem since a week.
My system is an updated Debian Wheezy with condor version
8.0.1-148801 (from research.cs.wisc.edu repository)
odino:~$ uname -a
Linux odino 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
cgroups seems working properly:
odino:~$ condor_config_val BASE_CGROUP
htcondor
odino:~$ condor_config_val CGROUP_MEMORY_LIMIT_POLICY
soft
odino:~$ grep cgroup /etc/default/grub
GRUB_CMDLINE_LINUX="cgroup_enable=memory"
odino:~$ cat /etc/cgconfig.conf
mount {
cpu = /cgroup/cpu;
cpuset = /cgroup/cpuset;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
freezer = /cgroup/freezer;
blkio = /cgroup/blkio;
}
group htcondor {
cpu {}
cpuset {}
cpuacct {}
memory {
# Tested both memory.limit_in_bytes and memory.soft_limit_in_bytes
#memory.limit_in_bytes = 16370672K;
memory.soft_limit_in_bytes = 16370672K;
}
freezer {}
blkio {}
}
odino:~$ mount | grep cgrou
cgroup on /cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /cgroup/freezer type cgroup (rw,relatime,freezer)
cgroup on /cgroup/blkio type cgroup (rw,relatime,blkio)
Submit file is trivial:
universe = parallel
executable = /bin/sleep
arguments = 15
machine_count = 4
#request_cpu = 1
request_memory = 128
log = log
output = output
error = error
notification = never
should_transfer_files = always
when_to_transfer_output = on_exit
queue
Below is my StarterLog.
Any suggestion would be appreciated.
tnx, Paolo
07/24/13 16:56:09 Enumerating interfaces: lo 127.0.0.1 up
07/24/13 16:56:09 Enumerating interfaces: eth0 192.168.100.161 up
07/24/13 16:56:09 Enumerating interfaces: eth1 10.5.0.2 up
07/24/13 16:56:09 Initializing Directory: curr_dir =
/etc/condor/config.d
07/24/13 16:56:09
******************************************************
07/24/13 16:56:09 ** condor_starter (CONDOR_STARTER) STARTING UP
07/24/13 16:56:09 ** /usr/sbin/condor_starter
07/24/13 16:56:09 ** SubsystemInfo: name=STARTER type=STARTER(8)
class=DAEMON(1)
07/24/13 16:56:09 ** Configuration: subsystem:STARTER
local:<NONE> class:DAEMON
07/24/13 16:56:09 ** $CondorVersion: 8.0.1 Jul 15 2013 BuildID:
148801 $
07/24/13 16:56:09 ** $CondorPlatform: x86_64_Debian7 $
07/24/13 16:56:09 ** PID = 31181
07/24/13 16:56:09 ** Log last touched 7/24 16:37:26
07/24/13 16:56:09
******************************************************
07/24/13 16:56:09 Using config source: /etc/condor/condor_config
07/24/13 16:56:09 Using local config sources:
07/24/13 16:56:09 /etc/condor/config.d/00-asgard-common
07/24/13 16:56:09 /etc/condor/config.d/10-asgard-execute
07/24/13 16:56:09 /etc/condor/condor_config.local
07/24/13 16:56:09 Running as root. Enabling specialized core dump
routines
07/24/13 16:56:09 Not using shared port because
USE_SHARED_PORT=false
07/24/13 16:56:09 DaemonCore: command socket at
<192.168.100.161:35626>
07/24/13 16:56:09 DaemonCore: private command socket at
<192.168.100.161:35626>
07/24/13 16:56:09 Setting maximum accepts per cycle 8.
07/24/13 16:56:09 Will use UDP to update collector
odino.bo.ingv.it <192.168.100.160:9618>
07/24/13 16:56:09 Not using shared port because
USE_SHARED_PORT=false
07/24/13 16:56:09 Entering JICShadow::receiveMachineAd
07/24/13 16:56:09 Communicating with shadow
<192.168.100.160:36378?noUDP>
07/24/13 16:56:09 Shadow version: $CondorVersion: 8.0.1 Jul 15
2013 BuildID: 148801 $
07/24/13 16:56:09 Submitting machine is "odino.bo.ingv.it"
07/24/13 16:56:09 Instantiating a StarterHookMgr
07/24/13 16:56:09 Job does not define HookKeyword, not invoking
any job hooks.
07/24/13 16:56:09 setting the orig job name in starter
07/24/13 16:56:09 setting the orig job iwd in starter
07/24/13 16:56:09 ShouldTransferFiles is "YES", transfering files
07/24/13 16:56:09 Submit UidDomain: "bo.ingv.it"
07/24/13 16:56:09 Local UidDomain: "bo.ingv.it"
07/24/13 16:56:09 Initialized user_priv as "username"
07/24/13 16:56:09 Done moving to directory
"/var/lib/condor/execute/dir_31181"
07/24/13 16:56:09 Job has WantIOProxy=true
07/24/13 16:56:09 Initialized IO Proxy.
07/24/13 16:56:09 LocalUserLog::initFromJobAd: path_attr =
StarterUserLog
07/24/13 16:56:09 LocalUserLog::initFromJobAd: xml_attr =
StarterUserLogUseXML
07/24/13 16:56:09 No StarterUserLog found in job ClassAd
07/24/13 16:56:09 Starter will not write a local UserLog
07/24/13 16:56:09 Done setting resource limits
07/24/13 16:56:09 Changing the executable name
07/24/13 16:56:09 entering FileTransfer::Init
07/24/13 16:56:09 entering FileTransfer::SimpleInit
07/24/13 16:56:09 FILETRANSFER: protocol "http" handled by
"/usr/lib/condor/libexec/curl_plugin"
07/24/13 16:56:09 FILETRANSFER: protocol "ftp" handled by
"/usr/lib/condor/libexec/curl_plugin"
07/24/13 16:56:09 FILETRANSFER: protocol "file" handled by
"/usr/lib/condor/libexec/curl_plugin"
07/24/13 16:56:09 FILETRANSFER: protocol "data" handled by
"/usr/lib/condor/libexec/data_plugin"
07/24/13 16:56:09 Initializing Directory: curr_dir =
/var/lib/condor/execute/dir_31181
07/24/13 16:56:09 TransferIntermediate="(none)"
07/24/13 16:56:09 entering FileTransfer::DownloadFiles
07/24/13 16:56:09 entering FileTransfer::Download
07/24/13 16:56:09 FileTransfer: created download transfer process
with id 31184
07/24/13 16:56:09 entering FileTransfer::DownloadThread
07/24/13 16:56:09 entering FileTransfer::DoDownload sync=1
07/24/13 16:56:09 DaemonCore: No more children processes to reap.
07/24/13 16:56:09 DaemonCore: in SendAliveToParent()
07/24/13 16:56:09 REMAP: begin with rules:
07/24/13 16:56:09 REMAP: 0: condor_exec.exe
07/24/13 16:56:09 REMAP: res is 0 -> !
07/24/13 16:56:09 Sending GoAhead for 192.168.100.160 to send
/var/lib/condor/execute/dir_31181/condor_exec.exe and all further
files.
07/24/13 16:56:09 Completed DC_CHILDALIVE to daemon at
<192.168.100.161:53285>
07/24/13 16:56:09 DaemonCore: Leaving SendAliveToParent() -
success
07/24/13 16:56:09 Received GoAhead from peer to receive
/var/lib/condor/execute/dir_31181/condor_exec.exe.
07/24/13 16:56:09 get_file(): going to write to filename
/var/lib/condor/execute/dir_31181/condor_exec.exe
07/24/13 16:56:09 get_file: Receiving 31136 bytes
07/24/13 16:56:09 get_file: wrote 31136 bytes to file
07/24/13 16:56:09 ReliSock::get_file_with_permissions(): going to
set permissions 755
07/24/13 16:56:09 DaemonCore: No more children processes to reap.
07/24/13 16:56:09 File transfer completed successfully.
07/24/13 16:56:09 Initializing Directory: curr_dir =
/var/lib/condor/execute/dir_31181
07/24/13 16:56:10 Calling client FileTransfer handler function.
07/24/13 16:56:10 HOOK_PREPARE_JOB not configured.
07/24/13 16:56:10 Job 90.0 set to execute immediately
07/24/13 16:56:10 Starting a PARALLEL universe job with ID: 90.0
07/24/13 16:56:10 In OsProc::OsProc()
07/24/13 16:56:10 Main job KillSignal: 15 (SIGTERM)
07/24/13 16:56:10 Main job RmKillSignal: 15 (SIGTERM)
07/24/13 16:56:10 Main job HoldKillSignal: 15 (SIGTERM)
07/24/13 16:56:10 Constructor of ParallelProc::ParallelProc
07/24/13 16:56:10 in ParallelProc::StartJob()
07/24/13 16:56:10 Found Node = 0 in job ad
07/24/13 16:56:10 ParallelProc::addEnvVars()
07/24/13 16:56:10 No Path in ad, $PATH in env
07/24/13 16:56:10 before: /bin:/sbin:/usr/bin:/usr/sbin
07/24/13 16:56:10 New env:
PATH=/usr/bin:/bin:/sbin:/usr/bin:/usr/sbin _CONDOR_PROCNO=0
CONDOR_CONFIG=/etc/condor/condor_config _CONDOR_NPROCS=4
_CONDOR_REMOTE_SPOOL_DIR=/var/lib/condor/spool/90/0/cluster90.proc0.subproc0
07/24/13 16:56:10 in VanillaProc::StartJob()
07/24/13 16:56:10 Requesting cgroup
htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxx for
job.
07/24/13 16:56:10 Value of RequestedChroot is unset.
07/24/13 16:56:10 PID namespace option: false
07/24/13 16:56:10 in OsProc::StartJob()
07/24/13 16:56:10 IWD: /var/lib/condor/execute/dir_31181
07/24/13 16:56:10 Input file: /dev/null
07/24/13 16:56:10 Output file:
/var/lib/condor/execute/dir_31181/_condor_stdout
07/24/13 16:56:10 Error file:
/var/lib/condor/execute/dir_31181/_condor_stderr
07/24/13 16:56:10 About to exec
/var/lib/condor/execute/dir_31181/condor_exec.exe 15
07/24/13 16:56:10 Env = TEMP=/var/lib/condor/execute/dir_31181
_CONDOR_SCRATCH_DIR=/var/lib/condor/execute/dir_31181
_CONDOR_SLOT=slot1_1 TMPDIR=/var/lib/condor/execute/dir_31181
_CONDOR_PROCNO=0 _CONDOR_JOB_PIDS=
TMP=/var/lib/condor/execute/dir_31181
_CONDOR_REMOTE_SPOOL_DIR=/var/lib/condor/spool/90/0/cluster90.proc0.subproc0
_CONDOR_JOB_AD=/var/lib/condor/execute/dir_31181/.job.ad
_CONDOR_JOB_IWD=/var/lib/condor/execute/dir_31181
CONDOR_CONFIG=/etc/condor/condor_config
PATH=/usr/bin:/bin:/sbin:/usr/bin:/usr/sbin
_CONDOR_MACHINE_AD=/var/lib/condor/execute/dir_31181/.machine.ad
_CONDOR_NPROCS=4
07/24/13 16:56:10 Setting job's virtual memory rlimit to
17179869184 megabytes
07/24/13 16:56:10 ENFORCE_CPU_AFFINITY not true, not setting
affinity
07/24/13 16:56:10 Running job as user username
07/24/13 16:56:10 track_family_via_cgroup: Tracking PID 31185 via
cgroup
htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxx.
07/24/13 16:56:10 About to tell ProcD to track family with root
31185 via cgroup
htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxx
07/24/13 16:56:10 Create_Process succeeded, pid=31185
07/24/13 16:56:10 Initializing cgroup library.
07/24/13 16:56:18 Initializing Directory: curr_dir =
/var/lib/condor/execute/dir_31181
07/24/13 16:56:18 In ParallelProc::PublishUpdateAd()
07/24/13 16:56:18 In VanillaProc::PublishUpdateAd()
07/24/13 16:56:18 Inside OsProc::PublishUpdateAd()
07/24/13 16:56:18 Inside UserProc::PublishUpdateAd()
07/24/13 16:56:18 Entering JICShadow::updateShadow()
07/24/13 16:56:18 Initializing Directory: curr_dir =
/var/lib/condor/execute/dir_31181
07/24/13 16:56:18 In ParallelProc::PublishUpdateAd()
07/24/13 16:56:18 In VanillaProc::PublishUpdateAd()
07/24/13 16:56:18 Inside OsProc::PublishUpdateAd()
07/24/13 16:56:18 Inside UserProc::PublishUpdateAd()
07/24/13 16:56:18 Sent job ClassAd update to startd.
07/24/13 16:56:18 Leaving JICShadow::updateShadow(): success
07/24/13 16:56:25 DaemonCore: No more children processes to reap.
07/24/13 16:56:25 Process exited, pid=31185, status=0
07/24/13 16:56:25 Inside VanillaProc::JobReaper()
07/24/13 16:56:25 Inside OsProc::JobReaper()
07/24/13 16:56:25 Inside UserProc::JobReaper()
07/24/13 16:56:25 Reaper: all=1 handled=1 ShuttingDown=0
07/24/13 16:56:25 In ParallelProc::PublishUpdateAd()
07/24/13 16:56:25 In VanillaProc::PublishUpdateAd()
07/24/13 16:56:25 Inside OsProc::PublishUpdateAd()
07/24/13 16:56:25 Inside UserProc::PublishUpdateAd()
07/24/13 16:56:25 HOOK_JOB_EXIT not configured.
07/24/13 16:56:25 Initializing Directory: curr_dir =
/var/lib/condor/execute/dir_31181
07/24/13 16:56:25 In ParallelProc::PublishUpdateAd()
07/24/13 16:56:25 In VanillaProc::PublishUpdateAd()
07/24/13 16:56:25 Inside OsProc::PublishUpdateAd()
07/24/13 16:56:25 Inside UserProc::PublishUpdateAd()
07/24/13 16:56:25 Entering JICShadow::updateShadow()
07/24/13 16:56:25 Sent job ClassAd update to startd.
07/24/13 16:56:25 Leaving JICShadow::updateShadow(): success
07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
07/24/13 16:56:25 JICShadow::transferOutput(void): Transferring...
07/24/13 16:56:25 Begin transfer of sandbox to shadow.
07/24/13 16:56:25 entering FileTransfer::UploadFiles
(final_transfer=1)
07/24/13 16:56:25 Initializing Directory: curr_dir =
/var/lib/condor/execute/dir_31181
07/24/13 16:56:25 Sending new file _condor_stdout,
time==1374677770, size==0
07/24/13 16:56:25 Skipping file in exception list: .job.ad
07/24/13 16:56:25 Sending new file _condor_stderr,
time==1374677770, size==0
07/24/13 16:56:25 Skipping file in exception list: .machine.ad
07/24/13 16:56:25 Skipping file chirp.config, t:
1374677769==1374677769, s: 54==54
07/24/13 16:56:25 Skipping file condor_exec.exe, t:
1374677769==1374677769, s: 31136==31136
07/24/13 16:56:25 FileTransfer::UploadFiles: sent
TransKey=1#51efeb09437ffa2dcc159bc
07/24/13 16:56:25 entering FileTransfer::Upload
07/24/13 16:56:25 entering FileTransfer::DoUpload
07/24/13 16:56:25 DoUpload: sending file _condor_stdout
07/24/13 16:56:25 FILETRANSFER: outgoing file_command is 1 for
_condor_stdout
07/24/13 16:56:25 Received GoAhead from peer to send
/var/lib/condor/execute/dir_31181/_condor_stdout.
07/24/13 16:56:25 Sending GoAhead for 192.168.100.160 to receive
/var/lib/condor/execute/dir_31181/_condor_stdout and all further
files.
07/24/13 16:56:25 ReliSock::put_file_with_permissions(): going to
send permissions 100644
07/24/13 16:56:25 put_file: going to send from filename
/var/lib/condor/execute/dir_31181/_condor_stdout
07/24/13 16:56:25 put_file: Found file size 0
07/24/13 16:56:25 put_file: sending 0 bytes
07/24/13 16:56:25 ReliSock: put_file: sent 0 bytes
07/24/13 16:56:25 DoUpload: sending file _condor_stderr
07/24/13 16:56:25 FILETRANSFER: outgoing file_command is 1 for
_condor_stderr
07/24/13 16:56:25 Received GoAhead from peer to send
/var/lib/condor/execute/dir_31181/_condor_stderr.
07/24/13 16:56:25 ReliSock::put_file_with_permissions(): going to
send permissions 100644
07/24/13 16:56:25 put_file: going to send from filename
/var/lib/condor/execute/dir_31181/_condor_stderr
07/24/13 16:56:25 put_file: Found file size 0
07/24/13 16:56:25 put_file: sending 0 bytes
07/24/13 16:56:25 ReliSock: put_file: sent 0 bytes
07/24/13 16:56:25 DoUpload: exiting at 3294
07/24/13 16:56:25 End transfer of sandbox to shadow.
07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
07/24/13 16:56:25 Inside OsProc::JobExit()
07/24/13 16:56:25 Initializing Directory: curr_dir =
/var/lib/condor/execute/dir_31181
07/24/13 16:56:25 Notifying exit status=0 reason=100
07/24/13 16:56:25 Sent job ClassAd update to startd.
07/24/13 16:56:25 Hold all jobs
07/24/13 16:56:25 All jobs were removed due to OOM event.
07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
07/24/13 16:56:25 Closing event FD pipe 65536.
07/24/13 16:56:25 ShutdownFast all jobs.
07/24/13 16:56:25 Got ShutdownFast when no jobs running.
07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
07/24/13 16:56:25 Got SIGQUIT. Performing fast shutdown.
07/24/13 16:56:25 ShutdownFast all jobs.
07/24/13 16:56:25 Got ShutdownFast when no jobs running.
07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
07/24/13 16:56:25 dirscat: dirpath = /
07/24/13 16:56:25 dirscat: subdir = /var/lib/condor/execute
07/24/13 16:56:25 Initializing Directory: curr_dir =
/var/lib/condor/execute/
07/24/13 16:56:25 Removing /var/lib/condor/execute/dir_31181
07/24/13 16:56:25 Attempting to remove
/var/lib/condor/execute/dir_31181 as SuperUser (root)
07/24/13 16:56:25 **** condor_starter (condor_STARTER) pid 31181
EXITING WITH STATUS 0
07/24/13
12:47:39 Initializing cgroup library.
07/24/13 12:47:44 DaemonCore: No more children processes to
reap.
07/24/13 12:47:44 Process exited, pid=32686, status=0
07/24/13 12:47:44 Inside VanillaProc::JobReaper()
07/24/13 12:47:44 Inside OsProc::JobReaper()
07/24/13 12:47:44 Inside UserProc::JobReaper()
07/24/13 12:47:44 Reaper: all=1 handled=1 ShuttingDown=0
07/24/13 12:47:44 In VanillaProc::PublishUpdateAd()
07/24/13 12:47:44 Inside OsProc::PublishUpdateAd()
07/24/13 12:47:44 Inside UserProc::PublishUpdateAd()
07/24/13 12:47:44 HOOK_JOB_EXIT not configured.
07/24/13 12:47:44 In VanillaProc::PublishUpdateAd()
07/24/13 12:47:44 Inside OsProc::PublishUpdateAd()
07/24/13 12:47:44 Inside UserProc::PublishUpdateAd()
07/24/13 12:47:44 Entering JICShadow::updateShadow()
07/24/13 12:47:44 Sent job ClassAd update to startd.
07/24/13 12:47:44 Leaving JICShadow::updateShadow(): success
07/24/13 12:47:44 Inside JICShadow::transferOutput(void)
07/24/13 12:47:44 JICShadow::transferOutput(void):
Transferring...
07/24/13 12:47:44 Inside JICShadow::transferOutputMopUp(void)
07/24/13 12:47:44 Inside OsProc::JobExit()
07/24/13 12:47:44 Notifying exit status=0 reason=100
07/24/13 12:47:44 Sent job ClassAd update to startd.
07/24/13 12:47:44 Hold all jobs
07/24/13 12:47:44 All jobs were removed due to OOM event.
07/24/13 12:47:44 Inside JICShadow::transferOutput(void)
07/24/13 12:47:44 Inside JICShadow::transferOutputMopUp(void)
07/24/13 12:47:44 Closing event FD pipe 0.
07/24/13 12:47:44 Close_Pipe on invalid pipe end: 0
07/24/13 12:47:44 ERROR "Close_Pipe error" at line 2104 in
file
/slots/01/dir_5373/userdir/src/condor_daemon_core.V6/daemon_core.cpp
07/24/13 12:47:44 ShutdownFast all jobs.
07/24/13 12:47:44 Got ShutdownFast when no jobs running.
07/24/13 12:47:44 Inside JICShadow::transferOutput(void)
07/24/13 12:47:44 Inside JICShadow::transferOutputMopUp(void)
It seems an event is fired for some reason to the OOM eventfd
(the
cgroup itself being destroyed, perhaps?). Has anybody else seen
the same
issue? Could it be a change in the kernel cgroups' interface?
Thanks,
Joan
--
--------------------------------------------------------------------------
Joan Josep Piles Contreras - Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 876 55 51 47 (ext. 845147)
http://i3a.unizar.es --jpiles@xxxxxxxxx
--------------------------------------------------------------------------
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
--------------------------------------------------------------------------
Joan Josep Piles Contreras - Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 876 55 51 47 (ext. 845147)
http://i3a.unizar.es -- jpiles@xxxxxxxxx
--------------------------------------------------------------------------
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ |