[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Quill++ assistance



Right, the exponential backoff restart process can include some random jitter which will build up over time.

If you've got a chance, could you turn up QUILL_DEBUG to D_FULLDEBUG, and send me the QuillLog from a crashing machine, along with your sql.log file? 

Also, how much free disk space do you have? The dprintf_failure files should have some data inside of them explaining why the dprint failed, and being out of disk is one reason the dprintf can fail.

-Erik

On Wed, Aug 25, 2010 at 9:04 PM, <Greg.Hitchen@xxxxxxxx> wrote:

That's correct, no other daemons are restarting, just condor_quill.

Interestingly, now that I have installed this version onto another
few PCs, the 1hr 25min is not EXACT. Two PCs that I "synched" yesterday
by restarting condor at the same time are now 2-3 minutes apart on
their condor_quill restarts. Maybe the condor_master restarting
condor_quill after 10secs isn't exact and the time diff gradually builds
up? I'll keep an eye on it.

Cheers

Greg


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Thursday, 26 August 2010 4:16 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Quill++ assistance

And just to confirm, it's only Quill - none of the other daemons show
the same restart every hour and twenty-five minutes?

-Erik


On Wed, Aug 25, 2010 at 1:12 AM,  <Greg.Hitchen@xxxxxxxx> wrote:
> Hi Erik
>
> The 1hr 25 mins is definitely not related (as far as I can tell) to virus
> scans/server activity/etc.
> I've checked all the scheduled type of activities that our PCs get installed
> with and nothng "fits".
>
> In addition I have installed 7.4.3 onto several PCs now and they all exhibit
> the 1hr 25 restart
> of condor_quill and it always starts exactly 1 hr 25 mins after condor is
> started, i.e. anytime
> I do a condor net stop, condor net start on them then the first of the 1hr
> 25mins restarts
> begins 1 hr 25mins after this.
>
> There is a dprintf_failure.QUILL file created but it is empty and 0 bytes in
> size.
> No core file is created and condor_quill quite happily gets restarted by
> condor_master after
> 10 secs until the MasterLog again says it exits with error 44 after the next
> 1hr 25 mins.
> Nothing gets logged in the QuillLog.
>
> Cheers
>
> Greg
> ________________________________