I have not examined the time intervals
of the Quill daemons dying for our pool, but I get hundreds of emails stating
the quill daemon died and has restarted on each machine. I have been trying
to get Quill to work with Windows as well, and I have been posting on this
topic to this list. I mentioned earlier that I have postgres database on
the same server as our CM. I was going to try installing postgress on a
different server, but I have not gotten around to this yet. I am pretty
sure this is not the problem, but it is something for me to try. I also
have noticed that the Quill daemon on our CM does not seem to die, but
the Quill daemons on all working nodes die on a regular basis. I have not
determined why this is the case, and the only difference is my OS. Our
server is using server 2008 and our working nodes are 32/64bit windows
xp and windows 7.
08/25/2010 08:07 PM
Re: [Condor-users] Quill++ assistance
That's correct, no other daemons are restarting, just condor_quill.
Interestingly, now that I have installed this version onto another
few PCs, the 1hr 25min is not EXACT. Two PCs that I "synched"
by restarting condor at the same time are now 2-3 minutes apart on
their condor_quill restarts. Maybe the condor_master restarting
condor_quill after 10secs isn't exact and the time diff gradually builds
up? I'll keep an eye on it.
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx]
On Behalf Of Erik Paulson
Sent: Thursday, 26 August 2010 4:16 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Quill++ assistance
And just to confirm, it's only Quill - none of the other daemons show
the same restart every hour and twenty-five minutes?
On Wed, Aug 25, 2010 at 1:12 AM, <Greg.Hitchen@xxxxxxxx> wrote:
> Hi Erik
> The 1hr 25 mins is definitely not related (as far as I can tell) to
> scans/server activity/etc.
> I've checked all the scheduled type of activities that our PCs get
> with and nothng "fits".
> In addition I have installed 7.4.3 onto several PCs now and they all
> the 1hr 25 restart
> of condor_quill and it always starts exactly 1 hr 25 mins after condor
> started, i.e. anytime
> I do a condor net stop, condor net start on them then the first of
> 25mins restarts
> begins 1 hr 25mins after this.
> There is a dprintf_failure.QUILL file created but it is empty and
0 bytes in
> No core file is created and condor_quill quite happily gets restarted
> condor_master after
> 10 secs until the MasterLog again says it exits with error 44
after the next
> 1hr 25 mins.
> Nothing gets logged in the QuillLog.
> From: condor-users-bounces@xxxxxxxxxxx
On Behalf Of Erik Paulson
> Sent: Tuesday, 24 August 2010 3:46 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Quill++ assistance
> Greg: The "exit 44" issue is odd - status 44 means that
Condor couldn't log
> some piece of information (which is why you don't see anything in
> :). While I wouldn't rule anything in Condor out, 1:25:00 is not a
> that strikes me as special in any of the Condor code, so I'm not sure
> would happen on the Condor side with that periodicity. Are there
> server/virus scans/etc sort of activity that might interfere with
> files that happen at your site?
> Greg/Michael: the ACCESS_VIOLATION is happening in a strange spot.
> your question, the Quill daemon should run continuously - however,
if it is
> consistently crashing, the master will exponentially back off trying
> it until it only tries once an hour - so it may be likely that you'll
> core file with no Quill daemon running.
> If that's the case and it is consistently crashing, I would love to
> full QuillLog, along with your sql.log file. We should be able to
> back and see exactly why it's crashing.
> On Wed, Aug 11, 2010 at 8:48 PM, <Greg.Hitchen@xxxxxxxx> wrote:
>> Perhaps not much help Michael but we've had similar problems with
>> (see first attached email). It behaved somewhat better for 7.4.1
>> second attached email)
>> and at least ran, even though restarting condor_quill every 1hr
>> but a number of other
>> problems/issues with the 7.4 series has not allowed us to upgrade
>> version yet.
>> From: condor-users-bounces@xxxxxxxxxxx
On Behalf Of Michael O'Donnell
>> Sent: Thursday, 12 August 2010 3:56 AM
>> To: Condor-Users Mail List
>> Subject: Re: [Condor-users] Quill++ assistance
>> I have these specified already and I do not see any issues. The
>> file show SQL statements and success at populating the tables.
>> However, I am finding a file on all machine other than the central
>> that has an access violation error. I am not sure if the condor_quill.exe
>> daemon is supposed to run continuously, but I do not see it running
>> machines other than the central manager.
>> The file that is showing up in the log directory on each machine
>> core.QUILL.WIN32. Its contents are (Does this mean anything to
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at:
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users