[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CGROUPS + OOM / HOLD on exit



Folks - 

You can specify how you wish to handle OOM events, https://access.redhat.com/site/documentation//en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html .  

You may also want to check any errata for your distribution because cgroups has a planned delta which we've been watching.  

Cheers,
Tim


----- Original Message -----
> From: "Paolo Perfetti" <paolo.perfetti@xxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Sent: Wednesday, July 24, 2013 10:24:32 AM
> Subject: Re: [HTCondor-users] CGROUPS + OOM / HOLD on exit
> 
> Hi,
> 
> On 24/07/2013 13:07, Joan J. Piles wrote:
> > Hi all:
> >
> > We are having some problems using cgroups for memory limiting. When jobs
> > exit, the OOM-Killer routines get called, placing the job on hold
> > instead of letting it end normally. With a full starter log (and a
> > really short job) debug we have:
> 
> Right now I'm getting crazy on the same problem since a week.
> My system is an updated Debian Wheezy  with condor version 8.0.1-148801
> (from research.cs.wisc.edu repository)
> odino:~$ uname  -a
> Linux odino 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux
> 
> 
> cgroups seems working properly:
> odino:~$ condor_config_val BASE_CGROUP
> htcondor
> odino:~$ condor_config_val CGROUP_MEMORY_LIMIT_POLICY
> soft
> odino:~$ grep cgroup /etc/default/grub
> GRUB_CMDLINE_LINUX="cgroup_enable=memory"
> odino:~$ cat /etc/cgconfig.conf
> mount {
>          cpu     = /cgroup/cpu;
>          cpuset  = /cgroup/cpuset;
>          cpuacct = /cgroup/cpuacct;
>          memory  = /cgroup/memory;
>          freezer = /cgroup/freezer;
>          blkio   = /cgroup/blkio;
> }
> 
> group htcondor {
>          cpu {}
>          cpuset {}
>          cpuacct {}
>          memory {
> # Tested both memory.limit_in_bytes and memory.soft_limit_in_bytes
> #memory.limit_in_bytes = 16370672K;
>            memory.soft_limit_in_bytes = 16370672K;
>          }
>          freezer {}
>          blkio {}
> }
> odino:~$ mount | grep cgrou
> cgroup on /cgroup/cpu type cgroup (rw,relatime,cpu)
> cgroup on /cgroup/cpuset type cgroup (rw,relatime,cpuset)
> cgroup on /cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
> cgroup on /cgroup/memory type cgroup (rw,relatime,memory)
> cgroup on /cgroup/freezer type cgroup (rw,relatime,freezer)
> cgroup on /cgroup/blkio type cgroup (rw,relatime,blkio)
> 
> Submit file is trivial:
> universe = parallel
> executable = /bin/sleep
> arguments = 15
> machine_count = 4
> #request_cpu = 1
> request_memory = 128
> log = log
> output = output
> error  = error
> notification = never
> should_transfer_files = always
> when_to_transfer_output = on_exit
> queue
> 
> Below is my StarterLog.
> 
> Any suggestion would be appreciated.
> tnx, Paolo
> 
> 
> 07/24/13 16:56:09 Enumerating interfaces: lo 127.0.0.1 up
> 07/24/13 16:56:09 Enumerating interfaces: eth0 192.168.100.161 up
> 07/24/13 16:56:09 Enumerating interfaces: eth1 10.5.0.2 up
> 07/24/13 16:56:09 Initializing Directory: curr_dir = /etc/condor/config.d
> 07/24/13 16:56:09 ******************************************************
> 07/24/13 16:56:09 ** condor_starter (CONDOR_STARTER) STARTING UP
> 07/24/13 16:56:09 ** /usr/sbin/condor_starter
> 07/24/13 16:56:09 ** SubsystemInfo: name=STARTER type=STARTER(8)
> class=DAEMON(1)
> 07/24/13 16:56:09 ** Configuration: subsystem:STARTER local:<NONE>
> class:DAEMON
> 07/24/13 16:56:09 ** $CondorVersion: 8.0.1 Jul 15 2013 BuildID: 148801 $
> 07/24/13 16:56:09 ** $CondorPlatform: x86_64_Debian7 $
> 07/24/13 16:56:09 ** PID = 31181
> 07/24/13 16:56:09 ** Log last touched 7/24 16:37:26
> 07/24/13 16:56:09 ******************************************************
> 07/24/13 16:56:09 Using config source: /etc/condor/condor_config
> 07/24/13 16:56:09 Using local config sources:
> 07/24/13 16:56:09    /etc/condor/config.d/00-asgard-common
> 07/24/13 16:56:09    /etc/condor/config.d/10-asgard-execute
> 07/24/13 16:56:09    /etc/condor/condor_config.local
> 07/24/13 16:56:09 Running as root.  Enabling specialized core dump routines
> 07/24/13 16:56:09 Not using shared port because USE_SHARED_PORT=false
> 07/24/13 16:56:09 DaemonCore: command socket at <192.168.100.161:35626>
> 07/24/13 16:56:09 DaemonCore: private command socket at
> <192.168.100.161:35626>
> 07/24/13 16:56:09 Setting maximum accepts per cycle 8.
> 07/24/13 16:56:09 Will use UDP to update collector odino.bo.ingv.it
> <192.168.100.160:9618>
> 07/24/13 16:56:09 Not using shared port because USE_SHARED_PORT=false
> 07/24/13 16:56:09 Entering JICShadow::receiveMachineAd
> 07/24/13 16:56:09 Communicating with shadow <192.168.100.160:36378?noUDP>
> 07/24/13 16:56:09 Shadow version: $CondorVersion: 8.0.1 Jul 15 2013
> BuildID: 148801 $
> 07/24/13 16:56:09 Submitting machine is "odino.bo.ingv.it"
> 07/24/13 16:56:09 Instantiating a StarterHookMgr
> 07/24/13 16:56:09 Job does not define HookKeyword, not invoking any job
> hooks.
> 07/24/13 16:56:09 setting the orig job name in starter
> 07/24/13 16:56:09 setting the orig job iwd in starter
> 07/24/13 16:56:09 ShouldTransferFiles is "YES", transfering files
> 07/24/13 16:56:09 Submit UidDomain: "bo.ingv.it"
> 07/24/13 16:56:09  Local UidDomain: "bo.ingv.it"
> 07/24/13 16:56:09 Initialized user_priv as "username"
> 07/24/13 16:56:09 Done moving to directory
> "/var/lib/condor/execute/dir_31181"
> 07/24/13 16:56:09 Job has WantIOProxy=true
> 07/24/13 16:56:09 Initialized IO Proxy.
> 07/24/13 16:56:09 LocalUserLog::initFromJobAd: path_attr = StarterUserLog
> 07/24/13 16:56:09 LocalUserLog::initFromJobAd: xml_attr =
> StarterUserLogUseXML
> 07/24/13 16:56:09 No StarterUserLog found in job ClassAd
> 07/24/13 16:56:09 Starter will not write a local UserLog
> 07/24/13 16:56:09 Done setting resource limits
> 07/24/13 16:56:09 Changing the executable name
> 07/24/13 16:56:09 entering FileTransfer::Init
> 07/24/13 16:56:09 entering FileTransfer::SimpleInit
> 07/24/13 16:56:09 FILETRANSFER: protocol "http" handled by
> "/usr/lib/condor/libexec/curl_plugin"
> 07/24/13 16:56:09 FILETRANSFER: protocol "ftp" handled by
> "/usr/lib/condor/libexec/curl_plugin"
> 07/24/13 16:56:09 FILETRANSFER: protocol "file" handled by
> "/usr/lib/condor/libexec/curl_plugin"
> 07/24/13 16:56:09 FILETRANSFER: protocol "data" handled by
> "/usr/lib/condor/libexec/data_plugin"
> 07/24/13 16:56:09 Initializing Directory: curr_dir =
> /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:09 TransferIntermediate="(none)"
> 07/24/13 16:56:09 entering FileTransfer::DownloadFiles
> 07/24/13 16:56:09 entering FileTransfer::Download
> 07/24/13 16:56:09 FileTransfer: created download transfer process with
> id 31184
> 07/24/13 16:56:09 entering FileTransfer::DownloadThread
> 07/24/13 16:56:09 entering FileTransfer::DoDownload sync=1
> 07/24/13 16:56:09 DaemonCore: No more children processes to reap.
> 07/24/13 16:56:09 DaemonCore: in SendAliveToParent()
> 07/24/13 16:56:09 REMAP: begin with rules:
> 07/24/13 16:56:09 REMAP: 0: condor_exec.exe
> 07/24/13 16:56:09 REMAP: res is 0 ->  !
> 07/24/13 16:56:09 Sending GoAhead for 192.168.100.160 to send
> /var/lib/condor/execute/dir_31181/condor_exec.exe and all further files.
> 07/24/13 16:56:09 Completed DC_CHILDALIVE to daemon at
> <192.168.100.161:53285>
> 07/24/13 16:56:09 DaemonCore: Leaving SendAliveToParent() - success
> 07/24/13 16:56:09 Received GoAhead from peer to receive
> /var/lib/condor/execute/dir_31181/condor_exec.exe.
> 07/24/13 16:56:09 get_file(): going to write to filename
> /var/lib/condor/execute/dir_31181/condor_exec.exe
> 07/24/13 16:56:09 get_file: Receiving 31136 bytes
> 07/24/13 16:56:09 get_file: wrote 31136 bytes to file
> 07/24/13 16:56:09 ReliSock::get_file_with_permissions(): going to set
> permissions 755
> 07/24/13 16:56:09 DaemonCore: No more children processes to reap.
> 07/24/13 16:56:09 File transfer completed successfully.
> 07/24/13 16:56:09 Initializing Directory: curr_dir =
> /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:10 Calling client FileTransfer handler function.
> 07/24/13 16:56:10 HOOK_PREPARE_JOB not configured.
> 07/24/13 16:56:10 Job 90.0 set to execute immediately
> 07/24/13 16:56:10 Starting a PARALLEL universe job with ID: 90.0
> 07/24/13 16:56:10 In OsProc::OsProc()
> 07/24/13 16:56:10 Main job KillSignal: 15 (SIGTERM)
> 07/24/13 16:56:10 Main job RmKillSignal: 15 (SIGTERM)
> 07/24/13 16:56:10 Main job HoldKillSignal: 15 (SIGTERM)
> 07/24/13 16:56:10 Constructor of ParallelProc::ParallelProc
> 07/24/13 16:56:10 in ParallelProc::StartJob()
> 07/24/13 16:56:10 Found Node = 0 in job ad
> 07/24/13 16:56:10 ParallelProc::addEnvVars()
> 07/24/13 16:56:10 No Path in ad, $PATH in env
> 07/24/13 16:56:10 before: /bin:/sbin:/usr/bin:/usr/sbin
> 07/24/13 16:56:10 New env: PATH=/usr/bin:/bin:/sbin:/usr/bin:/usr/sbin
> _CONDOR_PROCNO=0 CONDOR_CONFIG=/etc/condor/condor_config
> _CONDOR_NPROCS=4
> _CONDOR_REMOTE_SPOOL_DIR=/var/lib/condor/spool/90/0/cluster90.proc0.subproc0
> 07/24/13 16:56:10 in VanillaProc::StartJob()
> 07/24/13 16:56:10 Requesting cgroup
> htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxx for job.
> 07/24/13 16:56:10 Value of RequestedChroot is unset.
> 07/24/13 16:56:10 PID namespace option: false
> 07/24/13 16:56:10 in OsProc::StartJob()
> 07/24/13 16:56:10 IWD: /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:10 Input file: /dev/null
> 07/24/13 16:56:10 Output file:
> /var/lib/condor/execute/dir_31181/_condor_stdout
> 07/24/13 16:56:10 Error file:
> /var/lib/condor/execute/dir_31181/_condor_stderr
> 07/24/13 16:56:10 About to exec
> /var/lib/condor/execute/dir_31181/condor_exec.exe 15
> 07/24/13 16:56:10 Env = TEMP=/var/lib/condor/execute/dir_31181
> _CONDOR_SCRATCH_DIR=/var/lib/condor/execute/dir_31181
> _CONDOR_SLOT=slot1_1 TMPDIR=/var/lib/condor/execute/dir_31181
> _CONDOR_PROCNO=0 _CONDOR_JOB_PIDS= TMP=/var/lib/condor/execute/dir_31181
> _CONDOR_REMOTE_SPOOL_DIR=/var/lib/condor/spool/90/0/cluster90.proc0.subproc0
> _CONDOR_JOB_AD=/var/lib/condor/execute/dir_31181/.job.ad
> _CONDOR_JOB_IWD=/var/lib/condor/execute/dir_31181
> CONDOR_CONFIG=/etc/condor/condor_config
> PATH=/usr/bin:/bin:/sbin:/usr/bin:/usr/sbin
> _CONDOR_MACHINE_AD=/var/lib/condor/execute/dir_31181/.machine.ad
> _CONDOR_NPROCS=4
> 07/24/13 16:56:10 Setting job's virtual memory rlimit to 17179869184
> megabytes
> 07/24/13 16:56:10 ENFORCE_CPU_AFFINITY not true, not setting affinity
> 07/24/13 16:56:10 Running job as user username
> 07/24/13 16:56:10 track_family_via_cgroup: Tracking PID 31185 via cgroup
> htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxx
> 07/24/13 16:56:10 About to tell ProcD to track family with root 31185
> via cgroup htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxx
> 07/24/13 16:56:10 Create_Process succeeded, pid=31185
> 07/24/13 16:56:10 Initializing cgroup library.
> 07/24/13 16:56:18 Initializing Directory: curr_dir =
> /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:18 In ParallelProc::PublishUpdateAd()
> 07/24/13 16:56:18 In VanillaProc::PublishUpdateAd()
> 07/24/13 16:56:18 Inside OsProc::PublishUpdateAd()
> 07/24/13 16:56:18 Inside UserProc::PublishUpdateAd()
> 07/24/13 16:56:18 Entering JICShadow::updateShadow()
> 07/24/13 16:56:18 Initializing Directory: curr_dir =
> /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:18 In ParallelProc::PublishUpdateAd()
> 07/24/13 16:56:18 In VanillaProc::PublishUpdateAd()
> 07/24/13 16:56:18 Inside OsProc::PublishUpdateAd()
> 07/24/13 16:56:18 Inside UserProc::PublishUpdateAd()
> 07/24/13 16:56:18 Sent job ClassAd update to startd.
> 07/24/13 16:56:18 Leaving JICShadow::updateShadow(): success
> 07/24/13 16:56:25 DaemonCore: No more children processes to reap.
> 07/24/13 16:56:25 Process exited, pid=31185, status=0
> 07/24/13 16:56:25 Inside VanillaProc::JobReaper()
> 07/24/13 16:56:25 Inside OsProc::JobReaper()
> 07/24/13 16:56:25 Inside UserProc::JobReaper()
> 07/24/13 16:56:25 Reaper: all=1 handled=1 ShuttingDown=0
> 07/24/13 16:56:25 In ParallelProc::PublishUpdateAd()
> 07/24/13 16:56:25 In VanillaProc::PublishUpdateAd()
> 07/24/13 16:56:25 Inside OsProc::PublishUpdateAd()
> 07/24/13 16:56:25 Inside UserProc::PublishUpdateAd()
> 07/24/13 16:56:25 HOOK_JOB_EXIT not configured.
> 07/24/13 16:56:25 Initializing Directory: curr_dir =
> /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:25 In ParallelProc::PublishUpdateAd()
> 07/24/13 16:56:25 In VanillaProc::PublishUpdateAd()
> 07/24/13 16:56:25 Inside OsProc::PublishUpdateAd()
> 07/24/13 16:56:25 Inside UserProc::PublishUpdateAd()
> 07/24/13 16:56:25 Entering JICShadow::updateShadow()
> 07/24/13 16:56:25 Sent job ClassAd update to startd.
> 07/24/13 16:56:25 Leaving JICShadow::updateShadow(): success
> 07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
> 07/24/13 16:56:25 JICShadow::transferOutput(void): Transferring...
> 07/24/13 16:56:25 Begin transfer of sandbox to shadow.
> 07/24/13 16:56:25 entering FileTransfer::UploadFiles (final_transfer=1)
> 07/24/13 16:56:25 Initializing Directory: curr_dir =
> /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:25 Sending new file _condor_stdout, time==1374677770, size==0
> 07/24/13 16:56:25 Skipping file in exception list: .job.ad
> 07/24/13 16:56:25 Sending new file _condor_stderr, time==1374677770, size==0
> 07/24/13 16:56:25 Skipping file in exception list: .machine.ad
> 07/24/13 16:56:25 Skipping file chirp.config, t: 1374677769==1374677769,
> s: 54==54
> 07/24/13 16:56:25 Skipping file condor_exec.exe, t:
> 1374677769==1374677769, s: 31136==31136
> 07/24/13 16:56:25 FileTransfer::UploadFiles: sent
> TransKey=1#51efeb09437ffa2dcc159bc
> 07/24/13 16:56:25 entering FileTransfer::Upload
> 07/24/13 16:56:25 entering FileTransfer::DoUpload
> 07/24/13 16:56:25 DoUpload: sending file _condor_stdout
> 07/24/13 16:56:25 FILETRANSFER: outgoing file_command is 1 for
> _condor_stdout
> 07/24/13 16:56:25 Received GoAhead from peer to send
> /var/lib/condor/execute/dir_31181/_condor_stdout.
> 07/24/13 16:56:25 Sending GoAhead for 192.168.100.160 to receive
> /var/lib/condor/execute/dir_31181/_condor_stdout and all further files.
> 07/24/13 16:56:25 ReliSock::put_file_with_permissions(): going to send
> permissions 100644
> 07/24/13 16:56:25 put_file: going to send from filename
> /var/lib/condor/execute/dir_31181/_condor_stdout
> 07/24/13 16:56:25 put_file: Found file size 0
> 07/24/13 16:56:25 put_file: sending 0 bytes
> 07/24/13 16:56:25 ReliSock: put_file: sent 0 bytes
> 07/24/13 16:56:25 DoUpload: sending file _condor_stderr
> 07/24/13 16:56:25 FILETRANSFER: outgoing file_command is 1 for
> _condor_stderr
> 07/24/13 16:56:25 Received GoAhead from peer to send
> /var/lib/condor/execute/dir_31181/_condor_stderr.
> 07/24/13 16:56:25 ReliSock::put_file_with_permissions(): going to send
> permissions 100644
> 07/24/13 16:56:25 put_file: going to send from filename
> /var/lib/condor/execute/dir_31181/_condor_stderr
> 07/24/13 16:56:25 put_file: Found file size 0
> 07/24/13 16:56:25 put_file: sending 0 bytes
> 07/24/13 16:56:25 ReliSock: put_file: sent 0 bytes
> 07/24/13 16:56:25 DoUpload: exiting at 3294
> 07/24/13 16:56:25 End transfer of sandbox to shadow.
> 07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
> 07/24/13 16:56:25 Inside OsProc::JobExit()
> 07/24/13 16:56:25 Initializing Directory: curr_dir =
> /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:25 Notifying exit status=0 reason=100
> 07/24/13 16:56:25 Sent job ClassAd update to startd.
> 07/24/13 16:56:25 Hold all jobs
> 07/24/13 16:56:25 All jobs were removed due to OOM event.
> 07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
> 07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
> 07/24/13 16:56:25 Closing event FD pipe 65536.
> 07/24/13 16:56:25 ShutdownFast all jobs.
> 07/24/13 16:56:25 Got ShutdownFast when no jobs running.
> 07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
> 07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
> 07/24/13 16:56:25 Got SIGQUIT.  Performing fast shutdown.
> 07/24/13 16:56:25 ShutdownFast all jobs.
> 07/24/13 16:56:25 Got ShutdownFast when no jobs running.
> 07/24/13 16:56:25 Inside JICShadow::transferOutput(void)
> 07/24/13 16:56:25 Inside JICShadow::transferOutputMopUp(void)
> 07/24/13 16:56:25 dirscat: dirpath = /
> 07/24/13 16:56:25 dirscat: subdir = /var/lib/condor/execute
> 07/24/13 16:56:25 Initializing Directory: curr_dir =
> /var/lib/condor/execute/
> 07/24/13 16:56:25 Removing /var/lib/condor/execute/dir_31181
> 07/24/13 16:56:25 Attempting to remove /var/lib/condor/execute/dir_31181
> as SuperUser (root)
> 07/24/13 16:56:25 **** condor_starter (condor_STARTER) pid 31181 EXITING
> WITH STATUS 0
> 
> >> 07/24/13 12:47:39 Initializing cgroup library.
> >> 07/24/13 12:47:44 DaemonCore: No more children processes to reap.
> >> 07/24/13 12:47:44 Process exited, pid=32686, status=0
> >> 07/24/13 12:47:44 Inside VanillaProc::JobReaper()
> >> 07/24/13 12:47:44 Inside OsProc::JobReaper()
> >> 07/24/13 12:47:44 Inside UserProc::JobReaper()
> >> 07/24/13 12:47:44 Reaper: all=1 handled=1 ShuttingDown=0
> >> 07/24/13 12:47:44 In VanillaProc::PublishUpdateAd()
> >> 07/24/13 12:47:44 Inside OsProc::PublishUpdateAd()
> >> 07/24/13 12:47:44 Inside UserProc::PublishUpdateAd()
> >> 07/24/13 12:47:44 HOOK_JOB_EXIT not configured.
> >> 07/24/13 12:47:44 In VanillaProc::PublishUpdateAd()
> >> 07/24/13 12:47:44 Inside OsProc::PublishUpdateAd()
> >> 07/24/13 12:47:44 Inside UserProc::PublishUpdateAd()
> >> 07/24/13 12:47:44 Entering JICShadow::updateShadow()
> >> 07/24/13 12:47:44 Sent job ClassAd update to startd.
> >> 07/24/13 12:47:44 Leaving JICShadow::updateShadow(): success
> >> 07/24/13 12:47:44 Inside JICShadow::transferOutput(void)
> >> 07/24/13 12:47:44 JICShadow::transferOutput(void): Transferring...
> >> 07/24/13 12:47:44 Inside JICShadow::transferOutputMopUp(void)
> >> 07/24/13 12:47:44 Inside OsProc::JobExit()
> >> 07/24/13 12:47:44 Notifying exit status=0 reason=100
> >> 07/24/13 12:47:44 Sent job ClassAd update to startd.
> >> 07/24/13 12:47:44 Hold all jobs
> >> 07/24/13 12:47:44 All jobs were removed due to OOM event.
> >> 07/24/13 12:47:44 Inside JICShadow::transferOutput(void)
> >> 07/24/13 12:47:44 Inside JICShadow::transferOutputMopUp(void)
> >> 07/24/13 12:47:44 Closing event FD pipe 0.
> >> 07/24/13 12:47:44 Close_Pipe on invalid pipe end: 0
> >> 07/24/13 12:47:44 ERROR "Close_Pipe error" at line 2104 in file
> >> /slots/01/dir_5373/userdir/src/condor_daemon_core.V6/daemon_core.cpp
> >> 07/24/13 12:47:44 ShutdownFast all jobs.
> >> 07/24/13 12:47:44 Got ShutdownFast when no jobs running.
> >> 07/24/13 12:47:44 Inside JICShadow::transferOutput(void)
> >> 07/24/13 12:47:44 Inside JICShadow::transferOutputMopUp(void)
> >
> > It seems an event is fired for some reason to the OOM eventfd (the
> > cgroup itself being destroyed, perhaps?). Has anybody else seen the same
> > issue? Could it be a change in the kernel cgroups' interface?
> >
> > Thanks,
> >
> > Joan
> >
> > --
> > --------------------------------------------------------------------------
> > Joan Josep Piles Contreras -  Analista de sistemas
> > I3A - Instituto de InvestigaciÃn en IngenierÃa de AragÃn
> > Tel: 876 55 51 47 (ext. 845147)
> > http://i3a.unizar.es  --jpiles@xxxxxxxxx
> > --------------------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> >
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>