[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Quill++ assistance
- Date: Thu, 26 Aug 2010 10:04:28 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: Re: [Condor-users] Quill++ assistance
That's correct, no other daemons are restarting, just condor_quill.
Interestingly, now that I have installed this version onto another
few PCs, the 1hr 25min is not EXACT. Two PCs that I "synched" yesterday
by restarting condor at the same time are now 2-3 minutes apart on
their condor_quill restarts. Maybe the condor_master restarting
condor_quill after 10secs isn't exact and the time diff gradually builds
up? I'll keep an eye on it.
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Thursday, 26 August 2010 4:16 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Quill++ assistance
And just to confirm, it's only Quill - none of the other daemons show
the same restart every hour and twenty-five minutes?
On Wed, Aug 25, 2010 at 1:12 AM, <Greg.Hitchen@xxxxxxxx> wrote:
> Hi Erik
> The 1hr 25 mins is definitely not related (as far as I can tell) to virus
> scans/server activity/etc.
> I've checked all the scheduled type of activities that our PCs get installed
> with and nothng "fits".
> In addition I have installed 7.4.3 onto several PCs now and they all exhibit
> the 1hr 25 restart
> of condor_quill and it always starts exactly 1 hr 25 mins after condor is
> started, i.e. anytime
> I do a condor net stop, condor net start on them then the first of the 1hr
> 25mins restarts
> begins 1 hr 25mins after this.
> There is a dprintf_failure.QUILL file created but it is empty and 0 bytes in
> No core file is created and condor_quill quite happily gets restarted by
> condor_master after
> 10 secs until the MasterLog again says it exits with error 44 after the next
> 1hr 25 mins.
> Nothing gets logged in the QuillLog.
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
> Sent: Tuesday, 24 August 2010 3:46 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Quill++ assistance
> Greg: The "exit 44" issue is odd - status 44 means that Condor couldn't log
> some piece of information (which is why you don't see anything in the logs
> :). While I wouldn't rule anything in Condor out, 1:25:00 is not a number
> that strikes me as special in any of the Condor code, so I'm not sure what
> would happen on the Condor side with that periodicity. Are there any file
> server/virus scans/etc sort of activity that might interfere with writes to
> files that happen at your site?
> Greg/Michael: the ACCESS_VIOLATION is happening in a strange spot. To answer
> your question, the Quill daemon should run continuously - however, if it is
> consistently crashing, the master will exponentially back off trying to run
> it until it only tries once an hour - so it may be likely that you'll see a
> core file with no Quill daemon running.
> If that's the case and it is consistently crashing, I would love to see your
> full QuillLog, along with your sql.log file. We should be able to play it
> back and see exactly why it's crashing.
> On Wed, Aug 11, 2010 at 8:48 PM, <Greg.Hitchen@xxxxxxxx> wrote:
>> Perhaps not much help Michael but we've had similar problems with 7.2.4 on
>> (see first attached email). It behaved somewhat better for 7.4.1 (see
>> second attached email)
>> and at least ran, even though restarting condor_quill every 1hr 25mins,
>> but a number of other
>> problems/issues with the 7.4 series has not allowed us to upgrade to that
>> version yet.
>> From: condor-users-bounces@xxxxxxxxxxx
>> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael O'Donnell
>> Sent: Thursday, 12 August 2010 3:56 AM
>> To: Condor-Users Mail List
>> Subject: Re: [Condor-users] Quill++ assistance
>> I have these specified already and I do not see any issues. The quilllog
>> file show SQL statements and success at populating the tables.
>> However, I am finding a file on all machine other than the central manager
>> that has an access violation error. I am not sure if the condor_quill.exe
>> daemon is supposed to run continuously, but I do not see it running on any
>> machines other than the central manager.
>> The file that is showing up in the log directory on each machine is called
>> core.QUILL.WIN32. Its contents are (Does this mean anything to anyone else):
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at:
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: