[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Assertion error in Quill



On Fri, 5 Oct 2007, Matt Baker wrote:

I have recently upgraded to 6.9.4 on i386 RHEL3. Running condor_status
and condor_q seems to work (I have installed and configured postgres and
quill), but when I submit jobs and use condor_q, I see no jobs:

-- Quill: quill@xxxxxxxxxxxx : <XXX.XXX.X.XXX:5432> : condor
ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD


The MasterLog seems to be restarting the Quill daemon over and over
again, while the Quill log shows an ERROR:

...
10/5 12:19:07 Using config source: /opt/condor/etc/condor_config
10/5 12:19:07 Using local config sources:
10/5 12:19:07    /opt/condor/etc/condor_config.local
10/5 12:19:07 DaemonCore: Command Socket at <xxx.xxx.xxx.xxx:38522>
10/5 12:19:07 main_init() called
10/5 12:19:07 configuring tt options from config file
10/5 12:19:07 Using Polling Period = 10
10/5 12:19:07 Using logs 10/5 12:19:07 /export/home/condor/log/sql.log
10/5 12:19:07
10/5 12:19:07 ERROR "Assertion ERROR on (jobQueueDBUser)" at line 137 in
file dbms_utils.C

I've followed the Postgres/quill instructions as far as I can tell.

Is there anyplace I should be looking first to solve this problem?

Thank you,
Matt Baker


There are a huge number of gaps and some stuff that is just
plain wrong in section 3.11 (the quill section)
of the condor 6.9.4 manual. Condor-support was able to walk
me through it but it took about 7 e-mail interchanges and
a week and a half.  To date the online documentation has not
yet been fixed.  I saw the error that you are mentioning above
during that time.  As I remember, I finally got it beat
when I configured the postgres "quillcm" database
to be owned correctly by the quillwriter user and got the correct
password for the quillwriter user into the .pgpass file.
Note that one must have the IP number there, not the host name.

Below is what I did to get it working, with passwords blanked out.
Note that postgresql 8.2.5 is now available which it was not at the time.

The following steps were performed to install postgres and enable quill
under condor 6.9.4 on fgitb-cm.fnal.gov which is the condor collector/negotiator
for the fgitb cluster:
http://www.postgresql.org/docs/8.2/interactive/installation.html
1) Download postgresql-8.2.4.tar.gz from www.postgresql.org

2) untar it using tar xvfz postgresql-8.2.4.tar.gz
cd postgresql-8.2.4

3) ./configure --prefix=/local/stage1/pgsql
(configure failed first two times through because it could not find readline libraries even though readline libraries were present on the system. Finally
figured out I needed to install readline-devel and zlib-devel rpms on the
system and we were good to go.

4) make

5) make the /local/stage1/pgsql directory, also a /local/pgsql symlink to it

6) make install

7) cd /local/stage1
chown postgresTongueostgres pgsql
cd pgsql
mkdir data
chown postgresTongueostgres data


7) su - postgres
export PATH="/local/stage1/pgsql/bin:$PATH"

8) initdb -D /local/stage1/pgsql/data

9) cd /local/stage1/pgsql/data
10)vi pg_hba.conf

Add these 2 lines:

host all quillreader 131.225.0.0/16 md5
host all quillwriter 131.225.0.0/16 md5

11) vi postgresql.conf
Make changes as detailed in
http://www.cs.wisc.edu/condor/manual/v6.9.4/3_11Quill.html#SECTION004111000000000000000

12) as root make the postgresql startup script in /etc/rc.d/init.d
(I copy this from fnpcsrv2) /sbin/chkconfig --on postgres
13) as postgres make the directory /local/stage1/pgsql/data/pg_log
14)as root /etc/rc.d/init.d/postgres start

15) now do database initialization:

as user postgres:

16) createdb test
psql test

This works.

Now create the condor-specific tables and users.
-bash-3.00$ createuser quillreader --no-createdb --no-adduser --pwprompt
Enter password for new role:
Enter it again:
Shall the new role be allowed to create more new roles? (y/n) n
CREATE ROLE
-bash-3.00$ createuser quillwriter --createdb --no-adduser --pwprompt
Enter password for new role:
Enter it again:
Shall the new role be allowed to create more new roles? (y/n) n
CREATE ROLE
-bash-3.00$

------------------

When creating condor-specific tables you must be sure to do it as user "quillwriter" and always have the -W option on there to prompt as the password file. Otherwise postgres will create the tables anyway but quillwriter and quillreader users will
not be able to access them.-bash-3.00$ createdb --username=quillwriter quillcm -W
Password:
CREATE DATABASE
-bash-3.00$

createlang plpgsql quillcm

(this must be done for each database stored in the postgres. i.e. on GP Grid
there are 2 databases, on ITB cluster there will be at least three).
In condor 6.9.4 the sql files are not in the rpm but they were added later to
the .tar.gz files. Copy from fcdfosgt1 where this has already been done:
**note the sql files are now in the .tar.gz release on the condor
downloads page***

cd /opt/condor
mkdir sql
cd sql
scp root@fcdfosgt1:/opt/condor/sql/* .

-bash-3.00$ psql --username=quillwriter --dbname=quillcm -W < /opt/condor/sql/common_createddl.sql
Password for user quillwriter:
CREATE TABLE
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "files_pkey" fortable "files"
CREATE TABLE
CREATE TABLE
CREATE SEQUENCE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "machines_horizontal_pkey" for table "machines_horizontal"
CREATE TABLE
CREATE TABLE
CREATE SEQUENCE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "runs_pkey" for table "runs"
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "l_jobstatus_pkey" for table "l_jobstatus"
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
CREATE TABLE
CREATE INDEX
CREATE TABLE
CREATE INDEX
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "daemons_horizontal_pkey" for table "daemons_horizontal"
CREATE TABLE
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "submitters_horizontal_pkey" for table "submitters_horizontal"
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
-bash-3.00$
-bash-3.00$ psql --username=quillwriter --dbname=quillcm -W < /opt/condor/sql/pgsql_createddl.sql
Password for user quillwriter:
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "maintenance_events_pkey" for table "maintenance_events"
CREATE TABLE
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "machines_vertical_pkey" for table "machines_vertical"
CREATE TABLE
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "clusterads_horizontal_pkey" for table "clusterads_horizontal"
CREATE TABLE
CREATE INDEX
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "procads_horizontal_pkey" for table "procads_horizontal"
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "jobs_horizontal_history_pkey" for table "jobs_horizontal_history"
CREATE TABLE
CREATE INDEX
CREATE INDEX
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "clusterads_vertical_pkey" for table "clusterads_vertical"
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "procads_vertical_pkey" for table "procads_vertical"
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "jobs_vertical_history_pkey" for table "jobs_vertical_history"
CREATE TABLE
CREATE TABLE
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "daemons_vertical_pkey" for table "daemons_vertical"
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE INDEX
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE TABLE
DELETE 0
INSERT 0 1
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "history_jobs_to_purge_pkey" for table "history_jobs_to_purge"
CREATE TABLE
INSERT 0 1
INSERT 0 1
CREATE FUNCTION
CREATE FUNCTION
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
GRANT
CREATE TABLE
GRANT
DELETE 0
INSERT 0 1
-bash-3.00$

--------------------------------------------

Now we should be good to start quill and DBMSD

Condor settings needed in condor_config and condor_config.local:

DAEMON_LIST = MASTER, SCHEDD, STARTD, COLLECTOR, NEGOTIATOR, QUILL, DBMSD

##
##--------------------------------------------------------------------
## Quill Job Queue Mirroring Server
##--------------------------------------------------------------------
## Where is the Quill binary installed and what arguments should be passed?
QUILL = $(SBIN)/condor_quill
QUILL_ARGS = -f

# Where is the log file for the quill daemon?
QUILL_LOG = $(LOG)/QuillLog
MAX_QUILL_LOG = 50000000

# The identification and location of the quill daemon for local clients.

# If this is set to true, then the rest of the QUILL arguments must be defined # for quill to function. If it is Fase or left undefined, then quill will not
# be consulted by either the scheduler or the tools, but in the case of a
# remote quill query where the local client has quill turned off, but the
# remote client has quill turned on, things will still function normally.
#S. Timm, disable quill while we are debugging.
QUILL_ENABLED = TRUE
# This will be the name of a quill daemon using this config file. This name
# should not conflict with any other quill name--or schedd name.
QUILL_NAME = quillcm@xxxxxxxxxxxxxxxxx

# The Postgreql server requires usernames that can manipulate tables. This will # be the username associated with this instance of the quill daemon mirroring
# a schedd's job queue. Each quill daemon must have a unique username
# associated with it otherwise multiple quill daemons will corrupt the data
# held under an indentical user name.
QUILL_DB_NAME = quillcm
QUILL_DB_USER = quillwriter
# The required password for the DB user which quill will use to read
# information from the database about the queue.
QUILL_DB_QUERY_PASSWORD = xxxxxxxx

# The machine and port of the postgres server.
QUILL_DB_IP_ADDR = 131.225.166.22:5432

# Polling period, in seconds, for when quill reads transactions out of the
# schedd's job queue log file and puts them into the database.
QUILL_POLLING_PERIOD = 10

# Number of days that historical information about previous jobs will be kept.
# It defaults to 180 days
QUILL_HISTORY_DURATION = 120

# Number of hours between scans of QUILL_HISTORY_DURATION.
QUILL_HISTORY_CLEANING_INTERVAL = 24


# Allows or disallows a remote query to the quill daemon and database
# which is reading this log file. Defaults to true.
QUILL_IS_REMOTELY_QUERYABLE = TRUE
# Add debugging flags to here if you need to debug quill for some reason.
QUILL_DEBUG = D_FULLDEBUG
QUILL_SHOULD_REINDEX = True


--------------

Try to start condor, quill and dbmsd fail because we forgot the .pgpass file

.pgpass file looks like this:
vi .quillwritepassword
[root@fgitb-cm spool]# chown condor:condor .quillwritepassword
[root@fgitb-cm spool]# chmod 600 ./quillwritepassword
more .pgpass
131.225.166.22:5432:quillcm:quillwriter:xxxxxxxx
[root@fgitbchown condor:condor .pgpass
-cm spool]# chmod 600 .pgpass


Quill and dbmsd start up OK.
But condor_history shows no entries.
Next step,

add variables

QUILL_USE_SQL_LOG = false
SCHEDD.QUILL_USE_SQL_LOG = true

Now we see history building up in the database with
condor_history command.
But there are two bugs:

condor_history -constraint
doesn't query the database, it only queries the history file. Also true for 6.8.x series.

condor_history -name quillgk@xxxxxxxxxxxxxxxxx

gives the error message
[root@fgitb-cm ~]# condor_history -name quillgk@xxxxxxxxxxxxxxxxx
Error: The quill daemon "quillgk@xxxxxxxxxxxxxxxxx" is not set up for database queries




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.