[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to shrink quill database?



More info...

Condor_q doesn't seem to show jobs without quill running so I've had to
reinstate quill & dbmsd

>condor_config_val daemon_list -verbose
daemon_list: MASTER, STARTD, SCHEDD, DBMSD, COLLECTOR, NEGOTIATOR, QUILL
  Defined in '/opt/condor/etc/condor_config', line 991.

The master fires up & everything runs for about a minute then the amount
of free space on /var drops to 0%, the Masterlog shows several
repetitions of 

8/18 16:38:09 The QUILL (pid 25157) exited with status 44
8/18 16:38:09 Sending obituary for "/opt/condor/sbin/condor_quill"
8/18 16:38:09 restarting /opt/condor/sbin/condor_quill in 10 seconds
8/18 16:38:19 Started DaemonCore process
"/opt/condor/sbin/condor_quill",pid and pgroup = 25240

... and all condor daemons are gone from 'ps aux | grep condor'

Checking the database (using psql) I can see that it has gone from
3018125180 bytes to 3033608060 bytes in this minute. Df -h reports &
postgres table sizes pre- & post- quill start are attached.

I've got no idea why the database is growing so rapidly and would
*really* appreciate suggestions as condor is not running while this
problem exists.

Steve
(Had to roll back to v7.0.5 as MPI jobs wouldn't run in an otherwise
successful v7.6.2 upgrade)


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Steven Platt
Sent: 18 August 2011 16:20
To: Condor-Users Mail List
Subject: [Condor-users] How to shrink quill database?

Hello,

Our quill database has ballooned recently to the point where it takes
over 75% of its partition. I've had to disable the quill & dbmsd daemons
to allow condor to continue to run.

Is there a way, either through Condor or Postgres, that we can reduce
this size? Vacuuming/emptying specific tables leaps to mind (a whole DB
vacuum exceeds our current settings), but I'm not sure if this is safe.
Also, I can't find a sql folder in the release directory to try 'psql -h
hostname -d databasename < pgsql_dropddl.sql' mentioned on the wiki.

I don't mind losing it all & starting from scratch, but I'm a postgres
novice & don't want to cripple it completely.

Thanks

Steve
-----------------------------------------
************************************************************************
**
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
************************************************************************
**
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


-----------------------------------------
**************************************************************************
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
**************************************************************************
Thu Aug 18 16:02:22 BST 2011

[root@queen log]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              16G  6.9G  7.5G  48% /
/dev/sda5             9.2G  2.9G  6.0G  33% /state/partition1
/dev/sda2             3.8G  3.6G   24M 100% /var
tmpfs                1006M     0 1006M   0% /dev/shm
tmpfs                 245M  8.6M  237M   4% /var/lib/ganglia/rrds
158.119.147.20:/home/export
                      800G  527G  233G  70% /share/export

quill=#  SELECT pg_database_size('quill');
 pg_database_size 
------------------
       3018125180
(1 row)

quill=# SELECT relname, relpages FROM pg_class ORDER BY relpages DESC limit 10;
           relname            | relpages 
------------------------------+----------
 pg_toast_16857               |   166270
 jobs_horizontal_history      |    70703
 jobs_vertical_history_pkey   |    55845
 jobs_vertical_history        |    51525
 machines_vertical_history    |    20161
 pg_toast_16857_index         |     1245
 jobs_horizontal_history_pkey |      666
 transfers                    |      351
 jobs_hor_his_ix1             |      298
 jobs_hor_his_ix2             |      254

quill=# SELECT pg_size_pretty(pg_total_relation_size('error_sqllogs'));
 pg_size_pretty 
----------------
 1310 MB


Thu Aug 18 16:37:58 BST 2011 ( with quill & dbmsd started about a minute ago)

[root@queen log]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              16G  6.9G  7.5G  48% /
/dev/sda5             9.2G  2.9G  6.0G  33% /state/partition1
/dev/sda2             3.8G  3.6G     0 100% /var
tmpfs                1006M     0 1006M   0% /dev/shm
tmpfs                 245M  8.6M  237M   4% /var/lib/ganglia/rrds
158.119.147.20:/home/export
                      800G  527G  233G  70% /share/export
158.119.147.20:/home/f0
                      800G  527G  233G  70% /home/f0


quill=#  SELECT pg_database_size('quill');
 pg_database_size 
------------------
       3023130492

>>>>>>>> 1 minute later

quill=#  SELECT pg_database_size('quill');
 pg_database_size 
------------------
       3033608060
 
quill=# SELECT relname, relpages FROM pg_class ORDER BY relpages DESC limit 10;
           relname            | relpages 
------------------------------+----------
 pg_toast_16857               |   166861
 jobs_horizontal_history      |    70703
 jobs_vertical_history_pkey   |    55845
 jobs_vertical_history        |    51525
 machines_vertical_history    |    20161
 jobs_horizontal_history_pkey |      666
 transfers                    |      351
 jobs_hor_his_ix1             |      298
 jobs_hor_his_ix2             |      254
 runs                         |       91

quill=# SELECT pg_size_pretty(pg_total_relation_size('error_sqllogs'));
 pg_size_pretty 
----------------
 1324 MB