[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] HELP: condor_schedd memory leak???




If you stop and restart condor, the existing contents of your job queue are preserved. For vanilla universe jobs with sufficient lease times (20 minutes by default), the schedd should be able to reconnect to running jobs after it restarts without the jobs being interrupted by the restart.

Is your schedd logging anything strange?  Is it responding to condor_q?

--Dan

Robert E. Parrott wrote:

Hi Folks,

I have somewhat of an emergency situation.

After a DOS attempt on ssh on our login node, and subsequent system unresponsiveness an then reboot, the condor_schedd process now grows to exhaust all of physical memory on the host (6 GB + at present). This causes swapping issues etc. ... it ain't pretty.

Is there some appraoch I can take to try to resolve this issue, without losing the previous queue of submitted jobs?

thanks in advance for the quick response,
rob


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
       Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/