[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Assertion error in Quill



On 10/5/07, Steven Timm <timm@xxxxxxxx> wrote:
> On Fri, 5 Oct 2007, Matt Baker wrote:
>
> > I have recently upgraded to 6.9.4 on i386 RHEL3. Running condor_status
> > and condor_q seems to work (I have installed and configured postgres and
> > quill), but when I submit jobs and use condor_q, I see no jobs:
> >
> > -- Quill: quill@xxxxxxxxxxxx : <XXX.XXX.X.XXX:5432> : condor
> > ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> >
> >
> > The MasterLog seems to be restarting the Quill daemon over and over
> > again, while the Quill log shows an ERROR:
> >
> > ...
> > 10/5 12:19:07 Using config source: /opt/condor/etc/condor_config
> > 10/5 12:19:07 Using local config sources:
> > 10/5 12:19:07    /opt/condor/etc/condor_config.local
> > 10/5 12:19:07 DaemonCore: Command Socket at <xxx.xxx.xxx.xxx:38522>
> > 10/5 12:19:07 main_init() called
> > 10/5 12:19:07 configuring tt options from config file
> > 10/5 12:19:07 Using Polling Period = 10
> > 10/5 12:19:07 Using logs 10/5 12:19:07 /export/home/condor/log/sql.log
> > 10/5 12:19:07
> > 10/5 12:19:07 ERROR "Assertion ERROR on (jobQueueDBUser)" at line 137 in
> > file dbms_utils.C
> >
> > I've followed the Postgres/quill instructions as far as I can tell.
> >
> > Is there anyplace I should be looking first to solve this problem?
> >
> > Thank you,
> > Matt Baker
> >
>
> There are a huge number of gaps and some stuff that is just
> plain wrong in section 3.11 (the quill section)
> of the condor 6.9.4 manual. Condor-support was able to walk
> me through it but it took about 7 e-mail interchanges and
> a week and a half.  To date the online documentation has not
> yet been fixed.  I saw the error that you are mentioning above
> during that time.  As I remember, I finally got it beat
> when I configured the postgres "quillcm" database
> to be owned correctly by the quillwriter user and got the correct
> password for the quillwriter user into the .pgpass file.
> Note that one must have the IP number there, not the host name.
>

Sort of - what you really have to have in the .pgpass file is a match
for what you put in QUILL_DB_IP_ADDR. Two confusing points:

1. The parameter says "QUILL_DB_IP_ADDR", but really, it will take a
hostname just fine. This parameter will almost certainly be renamed in
6.9.5 or 6.

2. The code that looks through .pgpass is doing a string match, and
not canonicalizing the hostnames it finds there. If you used an IP
address in QUILL_DB_IP_ADDR, you must use an IP address in .pgpass. If
you used a hostname, you must use the same hostname in .pgpass.

This has tripped up everyone outside of Wisconsin who has tried it.
Obviously, when you're batting .000, it's time for a change.

-Erik