[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)



When the schedd forks a child process to answer a condor_q query, it does not double the memory in use. All of the memory pages will be shared between the two processes until either writes to them. Since the child should be short-lived, the amount of additional memory should be minimal.

 -- Jaime

On Dec 3, 2012, at 10:47 AM, Ian Cottam <Ian.Cottam@xxxxxxxxxxxxxxxx> wrote:

> We are running 7.8.4.
> 
> The below is from a colleague, but basically when we are very busy on our
> main submit node
> (2000-3000 jobs) we see a problem when a condor_q occurs causing
> condor_schedd to fork, which, as it is fairly massive by then can cause us
> to run out of memory.
> 
> We are buying more memory (cheap), but has anything in this area changed
> in 7.8.?
> 
> Any thoughts?
> Many thanks
> -Ian
> 
> 
> 
> 
> "++++++++
> The additional condor_schedulers are nothing to do with one scheduler
> being overloaded. They are automatically/instantly created whenever a
> condor_q command is run - they appear to be copies of the running
> scheduler (ie they immediately claim/use the same amount of memory).
> 
> Running
> 
> ps axu | awk '{mem+=$6} END {print mem}'
> 
> on submitter to get an idea of how much memory is required by the [2200]
> running processes, the figure returned is around 12Gb - you recall
> our submitter only has 8Gb of memory.
> 
> Hence simply to support additional processes and adding more Condor nodes,
> submitter needs at least 16Gb. Although I would suggest that if the rack
> supports it, 24Gb minimum is probably better.
> +++++++++"
> 
> 
> 
> 
> 
> -- 
> Ian Cottam
> IT Services -- supporting research
> Faculty of Engineering and Physical Sciences
> The University of Manchester
> "The only strategy that is guaranteed to fail is not taking risks." Mark
> Zuckerberg