[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor 6.9.2 hung schedd
- Date: Thu, 14 Jun 2007 09:24:18 +0200
- From: Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
- Subject: Re: [Condor-users] Condor 6.9.2 hung schedd
On Wed, Jun 13, 2007 at 11:36:44AM -0500, Dan Bradley wrote:
> Steffen Grunewald wrote:
> >Question to Condor developers: where's the status of submitted jobs kept
> >over a restart of condor_schedd? It might be easier to make changes there...
> It is fairly easy to understand the format and to make manual changes,
> but be careful!
Hmmm, when would be the best time to make changes? Mine is about 9 MB
in size, and I'm worried that I'd miss some bits.
Certainly, it would be nicer if condor_schedd could handle this situation more
gracefully. I'm thinking of a timeout - if condor_schedd doesn't get the lock
within a configurable time (one minute?) it'd simply write a notice to its
own log, and ignore the user log output... would this be feasible?
> >And why doesn't 'condor_restart -sub schedd' work in this case?
> Hmm. It worked for me when I tried it, but I'm running a pre-release of
> 6.9.3. The usual problem people have is that their security
> configuration doesn't allow condor_restart to operate from the machine
> where they are running it, but the command-line tool does not know
> whether the operation was rejected or not, so there is no visible
> complaint to the user. If you look in the schedd log, you will see a
> message indicating that it rejected the command.
I didn't see any message in the log because the schedd was completely (!)
unresponsive. Yet the restart command returned without complaints.