[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_quill keep dying



Hi Greg,
Restarting the DB and condor daemons doesn't fix the problem, now the condor_quill dies every two minuts and the fsize is increasing rapidly (fsize: 3040644)

Do you think of anything which will prevent condor_quill dying. 

Thanks,
Senthil

"/usr/local/condor-7.0.5/sbin/condor_quill" on "my.condor.host" died due to signal 25 (File size limit exceeded).
Condor will automatically restart this process in 17 seconds.

*** Last 20 line(s) of file /u/condor/log/QuillLog:
2/24 10:33:21 configuring tt options from config file
2/24 10:33:21 Using Polling Period = 10
2/24 10:33:21 Using logs 2/24 10:33:21 /u/condor/log/schedd_sql.log 2/24 10:33:21 /u/condor/log/sql.log 2/24 10:33:21
2/24 10:33:21 Using Job Queue File /u/condor/spool/job_queue.log
2/24 10:33:21 Using Database Type = Postgres
2/24 10:33:21 Using Database IpAddress = my.condor.host:5432
2/24 10:33:21 Using Database Name = DBNAME
2/24 10:33:21 Using Database User = DBUSER
2/24 10:33:21 ******** Start of Polling Job Queue Log ********
2/24 10:33:21 === Current Probing Information ===
2/24 10:33:21 fsize: 3040644		mtime: 1267025566
2/24 10:33:21 first log entry: 1036 CreationTimestamp 1174575013
2/24 10:33:21 JOB QUEUE POLLING RESULT: COMPRESSED
2/24 10:33:44 ********* End of Polling Job Queue Log *********
2/24 10:33:44 ******** Start of Polling Event Log ********
2/24 10:33:44 >>>>>>>> Fail: Polling Event Log <<<<<<<<
2/24 10:33:44 ******** Start of Polling XML Log ********
2/24 10:33:44 ********* End of Polling XML Log *********
2/24 10:33:44 ++++++++ Sending Quill ad to collector ++++++++
2/24 10:33:44 ++++++++ Sent Quill ad to collector ++++++++
*** End of file QuillLog




-----Original Message-----
From: Natarajan, Senthil 
Sent: Wednesday, February 24, 2010 10:14 AM
To: 'Greg Thain'
Subject: RE: [Condor-users] condor_quill keep dying

Hi Greg,
Thanks for the response. 

-rw-r--r--  1 condor users 2147483647 Feb 23 21:57 sql.log.copy

This make sense, file sql.log.copy is 2.1 GB and since yesterday 21:57, condor_quill keep dying and I am keep getting email.

But the current file is sql.log, why condor_quill try to use sql.log.copy ?

I will try to restart Database and all the condor daemon, see whether that fixes the problem.

Now condor_quill dying rate become every 5 mins.

How sql.log (sql.log.copy) suddenly grown to more than 2 GB, I never had this problem before.

Thanks,
Senthil

-----Original Message-----
From: Greg Thain [mailto:gthain@xxxxxxxxxxx] 
Sent: Wednesday, February 24, 2010 9:59 AM
To: Natarajan, Senthil
Subject: Re: [Condor-users] condor_quill keep dying

Natarajan, Senthil wrote:
> Hi,
>
> condor_quill keep dying   with this message approximately for every 19 mins.
>
>
>
> "/usr/local/condor-7.0.5/sbin/condor_quill" on "my.condor.host" died due to signal 25 (File size limit exceeded).
>
> Condor will automatically restart this process in 10 seconds.
>
>   
This probably means that this 32 bit executable can't append to a file 
that's more than 2Gb in size.  See if there are any very large log files 
on the local disk.

-Greg