[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Quill Problem - database not reachable



I should mention that, in the MasterLog on the vserv03, there a number of these:

   02/22 10:10:13 Started DaemonCore process "/opt/condor/sbin/condor_dbmsd", pid and pgroup = 13940
   02/22 10:10:13 The DBMSD (pid 13940) died due to signal 11 (Segmentation fault)
   02/22 10:10:13 restarting /opt/condor/sbin/condor_dbmsd in 3600 seconds

and the DbmsdLog reports these:

   02/22 10:10:13 main_init() called
   02/22 10:10:13 Using Database Type = Postgres
   02/22 10:10:13 Using Database IpAddress = vserv03:5432
   02/22 10:10:13 Using Database Name = quill_vserv03
   02/22 10:10:13 Using Database User = quillwriter
   02/22 10:10:13 Connection to database 'quill_vserv03' failed.
   02/22 10:10:13 FATAL:  connection limit exceeded for non-superusers
   02/22 10:10:13 Deallocating connection resources to database 'quill_vserv03'
   02/22 10:10:13 config: unable to connect to DB--- ERROR02/22 10:10:13 ERROR "config: unable to connect to DB
   " at line 133 in file ManagedDatabase.cpp
   Stack dump for process 13940 at timestamp 1298369413 (14 frames)
   condor_dbmsd(dprintf_dump_stack+0xb7)[0x5183f0]
   condor_dbmsd(_Z18linux_sig_coredumpi+0x2c)[0x50afc8]
   /lib64/libpthread.so.0[0x2b1dcd4abb10]
   condor_dbmsd(_ZN11DBMSManagerD1Ev+0xbd)[0x4dce1d]
   condor_dbmsd[0x4dbf16]
   /lib64/libc.so.6(exit+0xe5)[0x2b1dce4cf3a5]
   condor_dbmsd(__wrap_exit+0x28)[0x4f3330]
   condor_dbmsd[0x516911]
   condor_dbmsd(_ZN15ManagedDatabaseC1Ev+0x421)[0x4ddf35]
   condor_dbmsd(_ZN11DBMSManager4initEv+0x63)[0x4dc925]
   condor_dbmsd(_Z9main_initiPPc+0x2d)[0x4dbfe7]
   condor_dbmsd(main+0x18df)[0x50d26b]
   /lib64/libc.so.6(__libc_start_main+0xf4)[0x2b1dce4b9994]
   condor_dbmsd(__gxx_personality_v0+0x411)[0x4dbde9]

Cheers,
Santanu

Santanu Das wrote:
Dear all,

Every time I try to use condor_history,  I get this:

-- Quill: quill@xxxxxxxxxxxxxxxxxxxxxxxx : <vserv03:5432> : quill_vserv03
   -- Database at <vserv03:5432> not reachable
--Failing over to the history file at /home/condorr/spool/history instead --


Or condor_q, returns this:

-- Failed to fetch ads from db [quill_vserv03] at database server <vserv03:5432>
   -- Database not reachable or down.
           - Failing over to the quill daemon --

On the box, where QUILL database is running (vserv03), I see these in the log:

02/22 09:48:36 *** Warning: Bad Log file; skipping malformed Attr List
   02/22 09:48:36 >>>>>>>> Fail: Polling Event Log <<<<<<<<
   02/22 09:48:36 ******** Start of Polling XML Log ********
   02/22 09:48:36 ********* End of Polling XML Log *********
   02/22 09:48:36 ++++++++ Sending Quill ad to collector ++++++++
   02/22 09:48:36 ++++++++ Sent Quill ad to collector ++++++++
   02/22 09:48:36 ******** Start of Polling Job Queue Log ********
   02/22 09:48:36 JOB QUEUE POLLING RESULT: NO CHANGE
   02/22 09:48:36 ********* End of Polling Job Queue Log *********
   02/22 09:48:36 ******** Start of Polling Event Log ********
02/22 09:48:55 failed to create classad; bad expr = username = "group_camont.camoNEW Rejects

Any idea about what's going wrong or where I start digging in?

Cheers,
Santanu



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/