On 12/03/2012 10:47 AM, Ian Cottam wrote: > We are running 7.8.4. > > The below is from a colleague, but basically when we are very busy on our > main submit node > (2000-3000 jobs) we see a problem when a condor_q occurs causing > condor_schedd to fork, which, as it is fairly massive by then can cause us > to run out of memory. Not entirely dissimilar but probably unrelated: sometimes when our submit node sends jobs out to OSG something causes condor_shadow to fork in the fork-bomb fashion -- the machine even stops answering pings long enough for nagios to notice. Adding more memory did not make it go away, it just made happen very rarely. I'm unable to reproduce it of course. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature