grayhemp / pgtoolkit Goto Github PK

View Code? Open in Web Editor NEW

184.0 184.0 16.0 1.2 MB

Tools for PostgreSQL maintenance

License: Other

Perl 95.77% Shell 0.20% PLpgSQL 4.03%

pgtoolkit's People

Contributors

Stargazers

Watchers

Forkers

denisby albertocabello benoahriz eshikhov gitter-badger ssd-admin cit-consulting chenchaodev nemoaa chaos-dremel quixoten sibyakin andrew-kopylov technobasant ivan-kitty zechel

pgtoolkit's Issues

pgcompact locks database while pg_dump is running

A few days ago I had an issue where pgcompact was running a little longer than usual so it continued to run when pg_dump started to dump the database. Because of this my application locked for more than 10 minutes. I'm not sure if pgcompact should be safe against pg_dump, but it would be great if it is. Here is part of the postgres log which caused the lock:

2015-03-09 01:32:44 CET [6226]: [1037-1] ERROR:  canceling statement due to statement timeout
2015-03-09 01:32:44 CET [6226]: [1038-1] STATEMENT:  ALTER TABLE public.stats DROP CONSTRAINT stats_pkey;
2015-03-09 01:32:45 CET [6226]: [1039-1] ERROR:  canceling statement due to statement timeout
2015-03-09 01:32:45 CET [6226]: [1040-1] STATEMENT:  ALTER TABLE public.stats DROP CONSTRAINT stats_pkey;
2015-03-09 01:32:46 CET [6226]: [1041-1] ERROR:  canceling statement due to statement timeout
2015-03-09 01:32:46 CET [6226]: [1042-1] STATEMENT:  ALTER TABLE public.stats DROP CONSTRAINT stats_pkey;
2015-03-09 01:32:47 CET [6226]: [1043-1] ERROR:  canceling statement due to statement timeout
2015-03-09 01:32:47 CET [6226]: [1044-1] STATEMENT:  ALTER TABLE public.stats DROP CONSTRAINT stats_pkey;
2015-03-09 01:32:48 CET [6226]: [1045-1] LOG:  process 6226 still waiting for AccessExclusiveLock on relation 350629 of database 349582 after 1000.808 ms
2015-03-09 01:32:48 CET [6226]: [1046-1] STATEMENT:  DROP INDEX public.pgcompact_index_6192;
2015-03-09 01:46:45 CET [6226]: [1047-1] LOG:  process 6226 acquired AccessExclusiveLock on relation 350629 of database 349582 after 837082.864 ms
2015-03-09 01:46:45 CET [6226]: [1048-1] STATEMENT:  DROP INDEX public.pgcompact_index_6192;

This is on Postgres 9.1 and I'm sorry for not having more information. Also I haven't checked into detail, but this issue could be related to issue #5.

сluster index & Update by field

После reindex concurrency исчезает атрибут 'CLUSTER ON' индекса, которым приходится пользоваться в силу того, что pgcompacor зачем-то выбирает произвольное поле для Update By.

Как правильно использовать утилиту ?

"ERROR: canceling statement due to statement timeout" during VACUUM

I'm starting pgcompact with command:

$ pgcompact --all --reindex -v info -U postgres

and it works with some tables, but crashes with error. After restarting it crashes again with the same error, but ON ANOTHER TABLE. And I don't see that table in the output between tables processed by the next call. I start to google the error and all recomendations is about setting variable statement_timeout, which i have by default:

#statement_timeout = 0          # in milliseconds, 0 is disabled

Output log I see (dates are different, because i copied from different execution times):

$ pgcompact --all --reindex -v info -U postgres
Sat Feb  8 20:35:47 2014 INFO Database connection method: psql.
Sat Feb  8 20:35:47 2014 postgres INFO Created environment.
Sat Feb  8 20:35:47 2014 postgres INFO Statictics calculation method: pgstattuple.
Sat Feb  8 20:35:47 2014 template1 INFO Created environment.
Sat Feb  8 20:35:47 2014 template1 NOTICE Statictics calculation method: approximation.
Sat Feb  8 20:35:47 2014 manufacture INFO Created environment.
Sat Feb  8 20:35:47 2014 manufacture INFO Statictics calculation method: pgstattuple.
Sat Feb  8 20:35:48 2014 manufacture, public.ads_ad INFO Vacuum initial: 0 pages left, duration 0.032 seconds.
......
Sat Feb  8 16:46:02 2014 manufacture, public.facebook_posts_post_like_users INFO Skipping reindex: public.facebook_posts_post_like_users_user_id, 11.40% space to compact from 20% minimum required.
Sat Feb  8 16:46:02 2014 manufacture, public.facebook_posts_post_like_users INFO Skipping reindex: public.facebook_posts_post_like_users_post_id, 19.20% space to compact from 20% minimum required.
Sat Feb  8 16:46:05 2014 manufacture, public.facebook_posts_post_like_users INFO Reindex: public.facebook_posts_post_like_users_pkey, initial size 7394 pages (58 MB), has been reduced by 30% (18 MB), duration 2.593 seconds, attempts 1.
Sat Feb  8 16:46:05 2014 manufacture, public.facebook_posts_post_like_users NOTICE Processing complete.
Sat Feb  8 16:46:05 2014 manufacture, public.facebook_posts_post_like_users NOTICE Processing results: 12485 pages left (35350 pages including toasts and indexes), size reduced by 0 bytes (18 MB including toasts and indexes) in total.
Sat Feb  8 16:46:06 2014 manufacture, public.facebook_page_activity_groupuser ERROR A database error occurred, exiting:
DatabaseError Can not executie command:
 VACUUM public.facebook_page_activity_groupuser
 ERROR:  canceling statement due to statement timeout

One time I receive another error, so it varies from time to time:

Sat Feb  8 20:57:36 2014 manufacture, public.posts_users_activity_postuseractivitystat ERROR A database error occurred, exiting:
DatabaseError Can not executie command: 
 CREATE INDEX CONCURRENTLY pgcompact_index_11386 ON posts_users_activity_postuseractivitystat USING btree (actions_count); -- manufacture
 ERROR:  canceling statement due to statement timeout

The non-default settings are:

SELECT name, current_setting(name) FROM pg_settings WHERE source = 'configuration file'

"archive_command";"cp %p /var/lib/postgresql/9.1/main/pg_xlog_archive/%f"
"archive_mode";"on"
"autovacuum";"on"
"checkpoint_completion_target";"0.9"
"checkpoint_segments";"128"
"constraint_exclusion";"on"
"default_statistics_target";"50"
"default_text_search_config";"pg_catalog.english"
"effective_cache_size";"44GB"
"external_pid_file";"/var/run/postgresql/9.1-main.pid"
"lc_messages";"en_US.UTF-8"
"lc_monetary";"en_US.UTF-8"
"lc_numeric";"en_US.UTF-8"
"lc_time";"en_US.UTF-8"
"listen_addresses";"*"
"log_autovacuum_min_duration";"0"
"log_checkpoints";"on"
"log_connections";"on"
"log_directory";"pg_log"
"log_disconnections";"on"
"log_line_prefix";"%t [%p]: [%l-1] "
"log_lock_waits";"on"
"log_min_duration_statement";"2s"
"log_temp_files";"0"
"logging_collector";"on"
"maintenance_work_mem";"6GB"
"max_connections";"80"
"max_stack_depth";"7MB"
"max_wal_senders";"5"
"port";"5432"
"shared_buffers";"15GB"
"ssl";"on"
"track_counts";"on"
"unix_socket_directory";"/var/run/postgresql"
"vacuum_cost_delay";"10ms"
"wal_buffers";"8MB"
"wal_keep_segments";"2000"
"wal_level";"hot_standby"
"work_mem";"1GB"

Any ideas, how to solve?

Max execution time

Since pgcompact is used as a database maintenance tool, you usually have a limited time when you can do maintenance (e.g. when there is less db activity), so it would be really nice if you could define a max execution time which would limit how long pgcompact can run. For example --max-execution-time=3600 which would limit execution to 1 hour.

Compaction does not work on PostgreSQL 9.6

In process of migration to PG 9.6 we notice strange behavior of pgcompact - it cant clean any page on any bloated table. We are seeing logs like this:

Sat Nov 26 13:37:12 2016 oltpdb_dev, public.question_hist NOTICE Processing results: 301865 pages left (308484 pages including toasts and indexes), size reduced by 0 bytes (-288 kB including toasts and indexes) in total, approximately 37.33% (112682 pages) that is 880 MB more were expected to be compacted after this attempt.
Sat Nov 26 13:37:32 2016 oltpdb_dev, public.task_hist NOTICE Statistics: 339868 pages (422017 pages including toasts and indexes), approximately 21.21% (72098 pages) can be compacted reducing the size by 563 MB.
Sat Nov 26 13:37:41 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 13500 pages, 339868 pages left, duration 0.291 seconds.
Sat Nov 26 13:37:44 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 20100 pages, 339868 pages left, duration 0.297 seconds.
Sat Nov 26 13:37:46 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 26500 pages, 339868 pages left, duration 0.283 seconds.
Sat Nov 26 13:37:49 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 32800 pages, 339868 pages left, duration 0.279 seconds.
Sat Nov 26 13:37:52 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 39000 pages, 339868 pages left, duration 0.285 seconds.
Sat Nov 26 13:37:54 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 45100 pages, 339868 pages left, duration 0.300 seconds.
Sat Nov 26 13:37:57 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 51000 pages, 339868 pages left, duration 0.291 seconds.
Sat Nov 26 13:38:00 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 56800 pages, 339868 pages left, duration 0.280 seconds.
Sat Nov 26 13:38:02 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 62500 pages, 339868 pages left, duration 0.282 seconds.
Sat Nov 26 13:38:04 2016 oltpdb_dev, public.task_hist NOTICE Vacuum routine: can not clean 68100 pages, 339868 pages left, duration 0.289 seconds.
Sat Nov 26 13:38:07 2016 oltpdb_dev, public.task_hist NOTICE Vacuum final: can not clean 71900 pages, 339868 pages left, duration 0.802 seconds.
Sat Nov 26 13:38:32 2016 oltpdb_dev, public.task_hist WARNING Processing incomplete.
Sat Nov 26 13:38:32 2016 oltpdb_dev, public.task_hist NOTICE Processing results: 339868 pages left (422017 pages including toasts and indexes), size reduced by 0 bytes (-320 kB including toasts and indexes) in total, approximately 21.21% (72098 pages) that is 563 MB more were expected to be compacted after this attempt.

I've search extensively PG release notes from 9.4 to 9.6 but found nothing relevant. Does pgcompact conceptually broken from 9.6 or there is some options?

P.S.: I've tryed many variations of configuration on 4 clusters with very different DB schema without success...

Use of uninitialized value $to_page

Hi,

I have following error for a few tables:

Mon Sep  8 09:31:12 2014 zabbix, partitions.history_p2014_06_16 INFO Vacuum initial: 20512 pages left, duration 1.794 seconds.
Mon Sep  8 09:31:12 2014 zabbix, partitions.history_p2014_06_16 NOTICE Statistics: 20512 pages (32901 pages including toasts and indexes), approximately 7.94% (1629 pages) can be compacted reducing the size by 13 MB.
Mon Sep  8 09:31:12 2014 zabbix, partitions.history_p2014_06_16 INFO Update by column: ns.
Mon Sep  8 09:31:12 2014 zabbix, partitions.history_p2014_06_16 INFO Set pages/round: 10.
Mon Sep  8 09:31:12 2014 zabbix, partitions.history_p2014_06_16 INFO Set pages/vacuum: 1000.
Mon Sep  8 09:31:12 2014 zabbix, partitions.history_p2014_06_16 WARNING Incorrect result of cleaning: column_ident ns, to_page 20511, pages_per_round 10, max_tupples_per_page 316.
Use of uninitialized value $to_page in subtraction (-) at /loader/0x25deef0/PgToolkit/Compactor/Table.pm line 456, <GEN8> line 8.
Mon Sep  8 09:31:13 2014 zabbix, partitions.history_p2014_06_16 INFO Cleaning in average: 196.0 pages/second (0.051 seconds per 10 pages).
Use of uninitialized value $arg_hash{"to_page"} in addition (+) at /loader/0x25deef0/PgToolkit/Compactor/Table.pm line 855, <GEN8> line 8.
Use of uninitialized value $arg_hash{"to_page"} in addition (+) at /loader/0x25deef0/PgToolkit/Compactor/Table.pm line 857, <GEN8> line 8.
Use of uninitialized value $arg_hash{"to_page"} in subtraction (-) at /loader/0x25deef0/PgToolkit/Compactor/Table.pm line 865, <GEN8> line 8.
Mon Sep  8 09:31:13 2014 zabbix, partitions.history_p2014_06_16 NOTICE Vacuum routine: can not clean 20511 pages, 20512 pages left, duration 0.139 seconds.
Use of uninitialized value $to_page in numeric ge (>=) at /loader/0x25deef0/PgToolkit/Compactor/Table.pm line 492, <GEN8> line 8.
Use of uninitialized value $to_page in numeric le (<=) at /loader/0x25deef0/PgToolkit/Compactor/Table.pm line 496, <GEN8> line 8.
Mon Sep  8 09:31:15 2014 zabbix, partitions.history_p2014_06_16 INFO Analyze final: duration 2.074 second.
Mon Sep  8 09:31:15 2014 zabbix, partitions.history_p2014_06_16 WARNING Processing incomplete.
Mon Sep  8 09:31:15 2014 zabbix, partitions.history_p2014_06_16 NOTICE Processing results: 20512 pages left (32901 pages including toasts and indexes), size reduced by 0 bytes (0 bytes including toasts and indexes) in total, approximately 7.90% (1620 pages) that is 13 MB more were expected to be compacted after this attempt.

Support AWS RDS

We are looking to use this tool for an AWS DB, but the highest privilege RDS user anyone has cannot perform certain commands / access certain tables (especially pgtoast):

foo_db, public.settings, pg_toast.pg_toast_17512 ERROR A database error occurred, exiting:
DatabaseError Can not executie command: 
 SELECT pg_try_advisory_lock(
    'pg_catalog.pg_class'::regclass::integer,
    'pg_toast.pg_toast_17512'::regclass::integer)::integer;

 ERROR:  permission denied for schema pg_toast
LINE 3:  'pg_toast.pg_toast_17512'::regclass::integer)::integer;

We initially explored removing the "stats" checks - we know which one table is an issue, and we don't need pgcompact to give us running updates of progress. However it seems pretty bound up in the process.

Can we run the core functionality of this tool on a table given an rds root user?

EDIT:
pg_statistic for example cannot be queried.

Reindex unique constraint with INITIALLY DEFERRED options reset options

Hello

There is a table, on the table to create a unique constraint with the options "initially deferred". When processing the table with the option reindex, the constraint reset option "initially deferred".

Constraint was originally created on the basis of the existing unique index using "add constraint using index".

pgcompactor (PgToolkit) v1.0rc1
Pg 9.2

Reindex queries seem to drop index in use

command run:
./pgcompact --dbname dbname --table tablename -u --reindex

output (non relevant parts were skipped):

--  Fri May 22 10:17:11 2015 dbname, tablename NOTICE Reindex queries: tablename_pkey, initial size 1921318 pages (15 GB).
CREATE UNIQUE INDEX CONCURRENTLY pgcompact_index_26822 ON schema_name.tablename USING btree (id); -- dbname
BEGIN; -- dbname
SET LOCAL statement_timeout TO 1000; -- dbname
ALTER TABLE schema_name.tablename DROP CONSTRAINT tablename_pkey; -- dbname
ALTER TABLE schema_name.tablename ADD CONSTRAINT tablename_pkey PRIMARY KEY USING INDEX pgcompact_index_26822;  -- dbname
END; -- dbname

--  Fri May 22 10:17:11 2015 dbname, tablename NOTICE Reindex queries: schema_nameindex_tablename_date_id_customer_subscription_id, initial size 2182219 pages (17 GB).
CREATE UNIQUE INDEX CONCURRENTLY pgcompact_index_26822 ON schema_name.tablename USING btree (date_id, customer_subscription_id); -- dbname
BEGIN; -- dbname
SET LOCAL statement_timeout TO 1000; -- dbname
ALTER INDEX schema_name.pgcompact_index_26822 RENAME TO pgcompact_swap_index_26822;
ALTER INDEX schema_name.index_tablename_date_id_customer_subscription_id RENAME TO pgcompact_index_26822;
ALTER INDEX schema_name.pgcompact_swap_index_26822 RENAME TO index_tablename_date_id_customer_subscription_id; -- dbname
END; -- dbname
DROP INDEX CONCURRENTLY schema_name.pgcompact_index_26822; -- dbname

What bothers me here is that we have
ALTER TABLE schema_name.tablename ADD CONSTRAINT tablename_pkey PRIMARY KEY USING INDEX pgcompact_index_26822; -- dbname
and on the next line
CREATE UNIQUE INDEX CONCURRENTLY pgcompact_index_26822 ON schema_name.tablename USING btree (date_id, customer_subscription_id); -- dbname

So, we used an index as a PK and then try to create the index with the same name, but with another set of columns. It won't be the issue with other cases as that temp index name is used to swap indices and dropped later, but in the case with PK seems like an error in the script.

Canceling statement due to user request

Today on one of our Postgres 9.1 database pgcompact (1.0.2) exited with this error:

Thu Mar 12 00:24:02 2015 db1, public.message ERROR A database error occurred, exiting:
DatabaseError Can not executie command: 
 DROP INDEX public.message_is_ready_idx;
 ERROR:  canceling statement due to user request

Here is what was logged in the postgres log:

2015-03-12 00:23:59 CET [22772]: [202-1] LOG:  duration: 3405.028 ms  statement: CREATE INDEX CONCURRENTLY pgcompact_index_22738 ON message USING btree (is_ready) TABLESPACE index WHERE (is_ready = false);
2015-03-12 00:24:00 CET [22772]: [203-1] ERROR:  canceling statement due to statement timeout
2015-03-12 00:24:00 CET [22772]: [204-1] STATEMENT:  DROP INDEX public.message_is_ready_idx;
2015-03-12 00:24:01 CET [22772]: [205-1] ERROR:  canceling statement due to statement timeout
2015-03-12 00:24:01 CET [22772]: [206-1] STATEMENT:  DROP INDEX public.message_is_ready_idx;
2015-03-12 00:24:02 CET [22772]: [207-1] LOG:  duration: 1074.371 ms  statement: DROP INDEX public.message_is_ready_idx;
2015-03-12 00:24:02 CET [22772]: [208-1] ERROR:  canceling statement due to user request
2015-03-12 00:24:02 CET [22772]: [209-1] STATEMENT:  DROP INDEX public.message_is_ready_idx;

As you can see a temp index was created, but pgcompact due to a time out couldn't drop the old index (and after the second retry it cancelled the DROP query). The problem is that pgcompact exited after this and it didn't clean up the temp index.

Is this normal? Shouldn't pgcompact continue working even if one index fails? I guess pgcompact should at least drop the temp index?

rare deadlock on index renaming, probably due to OID overflow

it seems that if postgresql needs to acquire several locks, it does that in OID order. which causes deadlock if pgcompactor's index gets OID less than existing one's.

this is what happened in real life:

… [20583]: […] from=…(…),user=…,db=… STATEMENT:
SELECT * FROM … FOR UPDATE
… [20583]: […] from=…(…),user=…,db=… ERROR:  deadlock detected
… [20583]: […] from=…(…),user=…,db=… DETAIL:
Process 20583 waits for AccessShareLock on relation 3879503128 of database 24314; blocked by process 12662.
Process 12662 waits for AccessExclusiveLock on relation 43045091 of database 24314; blocked by process 20583.
Process 12662: ALTER INDEX public.pgcompact_index_12644 RENAME TO …;

the only way I've managed to reproduce that is using pg_resetwal to manually set cluster's OID counter.

create table t (id serial primary key, date timestamp(0) without time zone);

stop postgresql. pg_resetwal -o 4294967294 [data_directory]. start postgresql

create index t_idx on t (date); -- should get OID 4294967295 (max uint32)
create index pgcompact_index on t (date);

-- [1]
begin;
alter index t_idx rename to pgcompact_temp_index;

[1] acquires AccessExclusiveLock on t_idx

-- [2]
begin;
select * from t where id = 1 for update;

[2] acquires AccessShareLock on pgcompact_index and waits for AccessShareLock on t_idx

-- [1]
alter index pgcompact_index rename to t_idx;

waits for AccessExclusiveLock on pgcompact_index. deadlock

I think it would be safer to lock entire table when swapping indexes

grayhemp / pgtoolkit Goto Github PK

pgtoolkit's People

Contributors

Stargazers

Watchers

Forkers

pgtoolkit's Issues

pgcompact locks database while pg_dump is running

сluster index & Update by field

"ERROR: canceling statement due to statement timeout" during VACUUM

Max execution time

Compaction does not work on PostgreSQL 9.6

Use of uninitialized value $to_page

Support AWS RDS

Reindex unique constraint with INITIALLY DEFERRED options reset options

Reindex queries seem to drop index in use

Canceling statement due to user request

rare deadlock on index renaming, probably due to OID overflow

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent