[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 7.5.3 Collector crashing on HTTP request.



On 08/02/2010 04:06 PM, Patrick Armstrong wrote:
Hi Everyone,

My collector seems to be crashing regularly on serving a SOAP request for its machine ads.

The crash always looks like this:

08/02/10 12:10:02 Housekeeper:  Done cleaning
08/02/10 12:10:23 Received HTTP POST connection from<127.0.0.1:48910>
08/02/10 12:10:23 About to serve HTTP request...
Stack dump for process 18796 at timestamp 1280776223 (3 frames)
condor_collector(dprintf_dump_stack+0xd0)[0x81858c4]
condor_collector(_Z18linux_sig_coredumpi+0x22)[0x817a0c4]
[0xf40420]
08/02/10 12:10:33 Setting maximum file descriptors to 30000.
08/02/10 12:10:33 ******************************************************
08/02/10 12:10:33 ** condor_collector (CONDOR_COLLECTOR) STARTING UP
08/02/10 12:10:33 ** /usr/sbin/condor_collector
08/02/10 12:10:33 ** SubsystemInfo: name=COLLECTOR type=COLLECTOR(3) class=DAEMON(1)
08/02/10 12:10:33 ** Configuration: subsystem:COLLECTOR local:<NONE>  class:DAEMON
08/02/10 12:10:33 ** $CondorVersion: 7.5.3 Jun 25 2010 BuildID: 250654 $
08/02/10 12:10:33 ** $CondorPlatform: I386-LINUX_RHEL5 $
08/02/10 12:10:33 ** PID = 14693
08/02/10 12:10:33 ** Log last touched 8/2 12:10:23
08/02/10 12:10:33 ******************************************************
08/02/10 12:10:33 Using config source: /etc/condor/condor_config
08/02/10 12:10:33 Using local config sources:
08/02/10 12:10:33    /etc/condor/condor_config.local
08/02/10 12:10:33 DaemonCore: command socket at<142.104.63.28:9618>


And, as you can see below, this happens quite often:

07/27/10 16:33:23 Sending obituary for "/usr/sbin/condor_collector"
07/27/10 18:04:43 Sending obituary for "/usr/sbin/condor_collector"
07/27/10 18:22:56 Sending obituary for "/usr/sbin/condor_collector"
07/30/10 15:13:28 Sending obituary for "/usr/sbin/condor_collector"
07/31/10 20:18:59 Sending obituary for "/usr/sbin/condor_collector"
07/31/10 21:04:39 Sending obituary for "/usr/sbin/condor_collector"
07/31/10 22:02:05 Sending obituary for "/usr/sbin/condor_collector"
07/31/10 22:33:26 Sending obituary for "/usr/sbin/condor_collector"
07/31/10 23:03:57 Sending obituary for "/usr/sbin/condor_collector"
07/31/10 23:19:05 Sending obituary for "/usr/sbin/condor_collector"
08/01/10 00:15:54 Sending obituary for "/usr/sbin/condor_collector"
08/01/10 01:58:55 Sending obituary for "/usr/sbin/condor_collector"
08/01/10 03:58:34 Sending obituary for "/usr/sbin/condor_collector"
08/01/10 04:03:54 Sending obituary for "/usr/sbin/condor_collector"
08/01/10 05:00:35 Sending obituary for "/usr/sbin/condor_collector"
08/01/10 06:24:52 Sending obituary for "/usr/sbin/condor_collector"
08/02/10 12:10:23 Sending obituary for "/usr/sbin/condor_collector"
08/02/10 12:25:38 Sending obituary for "/usr/sbin/condor_collector"
08/02/10 12:56:19 Sending obituary for "/usr/sbin/condor_collector"


Any idea why this might be happening? I have a system with about 200 nodes.

--patrick

Do you know which call is causing this, or have a log with a deeper stack trace?

Best,


matt