[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor Daemons Fail to run on node
- Date: Thu, 02 Dec 2010 12:27:43 -0600
- From: Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor Daemons Fail to run on node
Ian Chesal wrote:
On Thu, Dec 2, 2010 at 1:13 PM, Xenia Fave <xfave2008@xxxxxxxxxx
Do you mean just rebooting the one node or the entire cluster?
Just the one node where Condor won't start.
See the other email from James Burnash about fsck'ing the file system --
in order to do this you'll have to unmount it from *all* your machines.
If it's mounted on other machines: looks like everyone has a local /scratch.
As I recall (haven't seen it in a while) this can error happen when the
disk develops too many bad sectors too fast. Then the filesystem gets
ro'ed at a lower level than mtab, so mount still shows it as "rw". If
that is the case, smartctl and/or dmesg (or /var/log/messages) should
have something to say about it. Also, if this is the cause of the
problem, don't bother with fsck, replace the disk.
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu