Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)

Date: Wed, 5 Dec 2012 10:50:34 -0600
From: Jaime Frey <jfrey@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)

When the schedd forks a child process to answer a condor_q query, it does not double the memory in use. All of the memory pages will be shared between the two processes until either writes to them. Since the child should be short-lived, the amount of additional memory should be minimal.

 -- Jaime

On Dec 3, 2012, at 10:47 AM, Ian Cottam <Ian.Cottam@xxxxxxxxxxxxxxxx> wrote:

> We are running 7.8.4.
> 
> The below is from a colleague, but basically when we are very busy on our
> main submit node
> (2000-3000 jobs) we see a problem when a condor_q occurs causing
> condor_schedd to fork, which, as it is fairly massive by then can cause us
> to run out of memory.
> 
> We are buying more memory (cheap), but has anything in this area changed
> in 7.8.?
> 
> Any thoughts?
> Many thanks
> -Ian
> 
> 
> 
> 
> "++++++++
> The additional condor_schedulers are nothing to do with one scheduler
> being overloaded. They are automatically/instantly created whenever a
> condor_q command is run - they appear to be copies of the running
> scheduler (ie they immediately claim/use the same amount of memory).
> 
> Running
> 
> ps axu | awk '{mem+=$6} END {print mem}'
> 
> on submitter to get an idea of how much memory is required by the [2200]
> running processes, the figure returned is around 12Gb - you recall
> our submitter only has 8Gb of memory.
> 
> Hence simply to support additional processes and adding more Condor nodes,
> submitter needs at least 16Gb. Although I would suggest that if the rack
> supports it, 24Gb minimum is probably better.
> +++++++++"
> 
> 
> 
> 
> 
> -- 
> Ian Cottam
> IT Services -- supporting research
> Faculty of Engineering and Physical Sciences
> The University of Manchester
> "The only strategy that is guaranteed to fail is not taking risks." Mark
> Zuckerberg

References:
- [HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)
  - From: Ian Cottam

Prev by Date: Re: [HTCondor-users] RANDOM_INTEGER problems on Windows
Next by Date: [HTCondor-users] manager StartLog errno = 10061
Previous by thread: Re: [HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)
Next by thread: [HTCondor-users] Condor central manager can't find another machines
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)