Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] cgroup error
- Date: Tue, 25 Aug 2015 16:17:17 +0200
- From: Iain Steers <iain.steers@xxxxxxx>
- Subject: Re: [HTCondor-users] cgroup error
Hi Christoph,
We've been running SLC6 cgroups in our HTCondor pool for a couple of months now without issue.
Give me a shout and mail me your config if you'd like.
Cheers, Iain
On Tue, Aug 25, 2015 at 08:53:08AM -0500, Lincoln Bryant wrote:
> Hi,
>
> Shot in the dark, but.. do you have the cgroups service running? /etc/init.d/cgconfig status?
>
> Cheers,
> Lincoln
>
> > On Aug 25, 2015, at 8:48 AM, Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
> >
> >
> > Hi,
> >
> > I am using SL6 (2.6.32-504.8.1.el6.x86_64) and HTC 8.3.7 Jul 23 2015 BuildID: 331383
> >
> > I enabled cgroups as described in the memory and send 'stress' jobs using 10 gb memory while announcing 1gb of memory usage via submit file.
> >
> > The result is somehow not as I would expect:
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 20378 chbeyer 20 0 9.8g 1.0g 144 D 2.0 6.4 0:01.03 stress
> > 20400 chbeyer 20 0 9.8g 1.0g 144 D 2.0 6.4 0:00.77 stress
> > 20381 chbeyer 20 0 9.8g 1.0g 144 D 1.3 6.4 0:00.97 stress
> > 20384 chbeyer 20 0 9.8g 1.0g 144 D 1.3 6.3 0:00.97 stress
> > 20386 chbeyer 20 0 9.8g 1.0g 144 D 1.3 6.4 0:01.03 stress
> > 20388 chbeyer 20 0 9.8g 1.0g 144 D 1.3 6.3 0:00.86 stress
> > 20392 chbeyer 20 0 9.8g 1.0g 144 D 1.3 6.3 0:00.80 stress
> > 20398 chbeyer 20 0 9.8g 1.0g 144 D 1.3 6.3 0:00.71 stress
> >
> >
> > The procd log file shows some errors:
> >
> >
> > 08/25/15 15:40:26 : PROC_FAMILY_GET_USAGE
> > 08/25/15 15:40:26 : gathering usage data for family with root pid 20361
> > 08/25/15 15:40:26 : Unable to read cgroup htcondor/condor_var_lib_condor_execute_slot1_2@xxxxxxxxxxxxxxxx cpuacct stats (ProcFamily 20373): Cgroup invalid operation.
> > 08/25/15 15:40:26 : Internal cgroup error when retrieving CPU statistics: Cgroup invalid operation
> > 08/25/15 15:40:26 : Unable to read cgroup htcondor/condor_var_lib_condor_execute_slot1_2@xxxxxxxxxxxxxxxx memory stats (ProcFamily 20373): 50016 No such file or directory.
> > 08/25/15 15:40:26 : PROC_FAMILY_GET_USAGE
> > 08/25/15 15:40:26 : gathering usage data for family with root pid 20360
> > 08/25/15 15:40:26 : Unable to read cgroup htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxxxxxx cpuacct stats (ProcFamily 20372): Cgroup invalid operation.
> > [ snip ]
> > 08/25/15 13:46:41 : PROC_FAMILY_TRACK_FAMILY_VIA_CGROUP
> > 08/25/15 13:46:41 : Setting cgroup to htcondor/condor_var_lib_condor_execute_slot1_3@xxxxxxxxxxxxxxxx for ProcFamily 15896.
> > 08/25/15 13:46:41 : Warning - cgroup controller cpuacct not mounted (but not required).
> > 08/25/15 13:46:41 : Warning - cgroup controller memory not mounted (but not required).
> > 08/25/15 13:46:41 : Warning - cgroup controller freezer not mounted (but not required).
> > 08/25/15 13:46:41 : Warning - cgroup controller blkio not mounted (but not required).
> > 08/25/15 13:46:41 : Warning - cgroup controller cpu not mounted (but not required).
> > 08/25/15 13:46:41 : Cannot attach pid 15896 to cgroup htcondor/condor_var_lib_condor_execute_slot1_3@xxxxxxxxxxxxxxxx for ProcFamily 15896: 50014 Cgroup not initialized
> >
> > I thought the jobs that by far exceed the memory limit would be killed and go on hold but that seems only to happen from time to time (?)
> >
> > best regards
> > ~christoph
> >
> >
> > --
> > /* Christoph Beyer | Office: Building 2b / 23 *\
> > * DESY | Phone: 040-8998-2317 *
> > * - IT - | Fax: 040-8994-2317 *
> > \* 22603 Hamburg | http://www.desy.de */
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> >
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/