[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)

We are running 7.8.4.

The below is from a colleague, but basically when we are very busy on our
main submit node
(2000-3000 jobs) we see a problem when a condor_q occurs causing
condor_schedd to fork, which, as it is fairly massive by then can cause us
to run out of memory.

We are buying more memory (cheap), but has anything in this area changed
in 7.8.?

Any thoughts?
Many thanks

The additional condor_schedulers are nothing to do with one scheduler
being overloaded. They are automatically/instantly created whenever a
condor_q command is run - they appear to be copies of the running
scheduler (ie they immediately claim/use the same amount of memory).


ps axu | awk '{mem+=$6} END {print mem}'

on submitter to get an idea of how much memory is required by the [2200]
running processes, the figure returned is around 12Gb - you recall
our submitter only has 8Gb of memory.

Hence simply to support additional processes and adding more Condor nodes,
submitter needs at least 16Gb. Although I would suggest that if the rack
supports it, 24Gb minimum is probably better.

Ian Cottam
IT Services -- supporting research
Faculty of Engineering and Physical Sciences
The University of Manchester
"The only strategy that is guaranteed to fail is not taking risks." Mark