[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] HELP: condor_schedd memory leak???



I have stopped and restarted the condor_schedd multiple times, and each time I get the same behavior. Essentially, it goes through its cycle, then the memory use sky- rockets and once it hits swap brings the host to its knees. There is no error reported in the logs.

It looks like a beahvior that would result from a corrupt data file, or similar, since it happens in a consistent fashion.

BTW, the version is 7.0.1 on this node, but the compute nodes are 6.8.5 (I think).

rob


On Apr 7, 2008, at 12:41 PM, Dan Bradley wrote:


If you stop and restart condor, the existing contents of your job queue
are preserved.  For vanilla universe jobs with sufficient lease times
(20 minutes by default), the schedd should be able to reconnect to
running jobs after it restarts without the jobs being interrupted by the
restart.

Is your schedd logging anything strange? Is it responding to condor_q?

--Dan

Robert E. Parrott wrote:

Hi Folks,

I have somewhat of an emergency situation.

After a DOS attempt on ssh on our login node, and subsequent system
unresponsiveness an then reboot, the condor_schedd process now grows
to exhaust all of physical memory on the host (6 GB + at present).
This causes swapping issues etc.  ... it ain't pretty.

Is there some appraoch I can take to try to resolve this issue,
without losing the previous queue of submitted jobs?

thanks in advance for the quick response,
rob


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
      Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
       Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045