[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Quill++ assistance

Right, the exponential backoff restart process can include some random jitter which will build up over time.

If you've got a chance, could you turn up QUILL_DEBUG to D_FULLDEBUG, and send me the QuillLog from a crashing machine, along with your sql.log file? 

Also, how much free disk space do you have? The dprintf_failure files should have some data inside of them explaining why the dprint failed, and being out of disk is one reason the dprintf can fail.


On Wed, Aug 25, 2010 at 9:04 PM, <Greg.Hitchen@xxxxxxxx> wrote:

That's correct, no other daemons are restarting, just condor_quill.

Interestingly, now that I have installed this version onto another
few PCs, the 1hr 25min is not EXACT. Two PCs that I "synched" yesterday
by restarting condor at the same time are now 2-3 minutes apart on
their condor_quill restarts. Maybe the condor_master restarting
condor_quill after 10secs isn't exact and the time diff gradually builds
up? I'll keep an eye on it.



-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Thursday, 26 August 2010 4:16 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Quill++ assistance

And just to confirm, it's only Quill - none of the other daemons show
the same restart every hour and twenty-five minutes?


On Wed, Aug 25, 2010 at 1:12 AM,  <Greg.Hitchen@xxxxxxxx> wrote:
> Hi Erik
> The 1hr 25 mins is definitely not related (as far as I can tell) to virus
> scans/server activity/etc.
> I've checked all the scheduled type of activities that our PCs get installed
> with and nothng "fits".
> In addition I have installed 7.4.3 onto several PCs now and they all exhibit
> the 1hr 25 restart
> of condor_quill and it always starts exactly 1 hr 25 mins after condor is
> started, i.e. anytime
> I do a condor net stop, condor net start on them then the first of the 1hr
> 25mins restarts
> begins 1 hr 25mins after this.
> There is a dprintf_failure.QUILL file created but it is empty and 0 bytes in
> size.
> No core file is created and condor_quill quite happily gets restarted by
> condor_master after
> 10 secs until the MasterLog again says it exits with error 44 after the next
> 1hr 25 mins.
> Nothing gets logged in the QuillLog.
> Cheers
> Greg
> ________________________________