[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor: our main submit machine is running out of memory (our status page runs condor_q)

On 12/03/2012 10:47 AM, Ian Cottam wrote:
> We are running 7.8.4.
> The below is from a colleague, but basically when we are very busy on our
> main submit node
> (2000-3000 jobs) we see a problem when a condor_q occurs causing
> condor_schedd to fork, which, as it is fairly massive by then can cause us
> to run out of memory.

Not entirely dissimilar but probably unrelated: sometimes when our
submit node sends jobs out to OSG something causes condor_shadow to fork
in the fork-bomb fashion -- the machine even stops answering pings long
enough for nagios to notice. Adding more memory did not make it go away,
it just made happen very rarely. I'm unable to reproduce it of course.

Dimitri Maziuk
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature