public inbox for [email protected]help / color / mirror / Atom feed
Re: how to know if the sql will run a seq scan 4+ messages / 2 participants [nested] [flat]
* Re: how to know if the sql will run a seq scan @ 2024-10-15 21:29 Adrian Klaver <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Adrian Klaver @ 2024-10-15 21:29 UTC (permalink / raw) To: Vijaykumar Jain <[email protected]>; +Cc: pgsql-general On 10/15/24 13:50, Vijaykumar Jain wrote: > Sorry top posting, coz Gmail app on phone. > > Yeah, my point was for example we have a large table and we are > attaching a table as a partition. Now it will scan the whole table to > validate the constraint and that will create all sorts of problems. Now you have changed the problem description. To get a proper answer you will need to provide a more detailed description of what you are doing with the following information: 1) Postgres version. 2) Definition of 'large'. 3) The command/process being used to create the partition. 4) The actual constraint definition. 5) The table definition. > I understand the benefit of not valid constraint and then validating > constraint to reduce blocking. > But yeah monitoring locks for the statement should give me good enough > hint of what will happen. > > Thanks for your reply. It helps. > > > > On Wed, Oct 16, 2024, 1:54 AM Adrian Klaver <[email protected] > <mailto:[email protected]>> wrote: > > On 10/15/24 12:50, Vijaykumar Jain wrote: > > > > Hi, > > > > tl;dr > > I am trying to learn what sql can result in a full seq scan. > > > > Basically there is a lot of info on the internet of what ddl > change may > > take an access exclusive lock while running a seq scan and hold > for long. > > And for some cases we can make use of > > "not valid" constraint and then run a validate constraint as work > > arounds to avoid long exclusive locks etc. > > but how do we check the same. i mean for dmls there is a explain/ > > auto_explain. > > > > but for DDLs, how do we check the same. > > i tried to isolate my setup and use pg_stat_user_tables and > monitor the > > same, which helped, but it is not useful as it does not link me > to what > > process/command invoked the seq scan. > > > > am i clear in my question ? > > > > if yes, > > how do i log an alter table that may or may not do a seq scan, > that may > > or may not rewrite the table file on disk etc. > > its a useless question, i am just playing with it for building > > knowledge, no requirement as such. > > Look at the docs: > > https://www.postgresql.org/docs/current/sql-altertable.html > <https://www.postgresql.org/docs/current/sql-altertable.html; > > "Scanning a large table to verify a new foreign key or check constraint > can take a long time, and other updates to the table are locked out > until the ALTER TABLE ADD CONSTRAINT command is committed. The main > purpose of the NOT VALID constraint option is to reduce the impact of > adding a constraint on concurrent updates. With NOT VALID, the ADD > CONSTRAINT command does not scan the table and can be committed > immediately. After that, a VALIDATE CONSTRAINT command can be issued to > verify that existing rows satisfy the constraint. The validation step > does not need to lock out concurrent updates, since it knows that other > transactions will be enforcing the constraint for rows that they insert > or update; only pre-existing rows need to be checked. Hence, validation > acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered. > (If the constraint is a foreign key then a ROW SHARE lock is also > required on the table referenced by the constraint.) In addition to > improving concurrency, it can be useful to use NOT VALID and VALIDATE > CONSTRAINT in cases where the table is known to contain pre-existing > violations. Once the constraint is in place, no new violations can be > inserted, and the existing problems can be corrected at leisure until > VALIDATE CONSTRAINT finally succeeds." > > > > -- > > Thanks, > > Vijay > > > > Open to work > > Resume - Vijaykumar Jain <https://github.com/cabecada > <https://github.com/cabecada>; > > -- > Adrian Klaver > [email protected] <mailto:[email protected]> > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: how to know if the sql will run a seq scan @ 2024-10-16 07:02 Vijaykumar Jain <[email protected]> parent: Adrian Klaver <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Vijaykumar Jain @ 2024-10-16 07:02 UTC (permalink / raw) To: Adrian Klaver <[email protected]>; +Cc: pgsql-general On Wed, 16 Oct 2024 at 02:59, Adrian Klaver <[email protected]> wrote: > On 10/15/24 13:50, Vijaykumar Jain wrote: > > Sorry top posting, coz Gmail app on phone. > > > > Yeah, my point was for example we have a large table and we are > > attaching a table as a partition. Now it will scan the whole table to > > validate the constraint and that will create all sorts of problems. > > Now you have changed the problem description. > > To get a proper answer you will need to provide a more detailed > description of what you are doing with the following information: > > 1) Postgres version. > > 2) Definition of 'large'. > > 3) The command/process being used to create the partition. > > 4) The actual constraint definition. > > 5) The table definition. > > /* postgres=# create table t(col1 int) partition by list(col1); CREATE TABLE postgres=# create table t1(col1 int) postgres-# ; CREATE TABLE postgres=# insert into t1 select 0 from generate_series(1, 100000) x; INSERT 0 100000 postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+---------------+-----+-------------- t1 | 0 | | | 0 (1 row) postgres=# alter table t1 add constraint col10 check (col1 = 0); ALTER TABLE postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+-------------------------------+------------------+-------------- t1 | 1 | 2024-10-16 06:46:28.641281+00 | -00:00:03.258432 | 100000 (1 row) postgres=# -- this results in a seq scan , which is ok, but then when i attach the partition it does a seq scan again postgres=# alter table t attach partition t1 for values in (0); ALTER TABLE postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+-------------------------------+------------------+-------------- t1 | 2 | 2024-10-16 06:46:59.512201+00 | -00:00:02.498771 | 200000 (1 row) postgres=# -- why , when there is a constraint that helps with the partition boundary/value postgres=# alter table t detach partition t1; ALTER TABLE postgres=# alter table t attach partition t1 for values in (0); ALTER TABLE postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+-------------------------------+------------------+-------------- t1 | 3 | 2024-10-16 06:54:28.780145+00 | -00:00:03.358524 | 300000 (1 row) -- despite there being a constraint, it does a full table scan to attach the partition. why ? note the tup read is full table of t1. */ above is one of the cases i found. my core question still was, how do i know which statement will cause a full table rewrite full table scan how do i get to know that. i know implictly i can use the above stat tables and pg_rel_filepath function etc to figure out the change in oid , update in seq count etc. but i want to pin point which statement made what change among 100 other statements in production. I mean is there a way that a certain alter table will do a table rewrite on disk and other alter table will not. access exclusive lock on tables does not help answer that question. if i am not clear, maybe ignore my question. i have some issues explaining things clearly, so i try to use demos. Thanks, Vijay Open to work Resume - Vijaykumar Jain <https://github.com/cabecada; ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: how to know if the sql will run a seq scan @ 2024-10-16 16:40 Adrian Klaver <[email protected]> parent: Vijaykumar Jain <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Adrian Klaver @ 2024-10-16 16:40 UTC (permalink / raw) To: Vijaykumar Jain <[email protected]>; +Cc: pgsql-general On 10/16/24 00:02, Vijaykumar Jain wrote: > > > postgres=# create table t(col1 int) partition by list(col1); > CREATE TABLE > postgres=# create table t1(col1 int) > postgres-# ; > CREATE TABLE > postgres=# insert into t1 select 0 from generate_series(1, 100000) x; > INSERT 0 100000 > postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, > current_timestamp), seq_tup_read from pg_stat_user_tables where relname > = 't1'; > relname | seq_scan | last_seq_scan | age | seq_tup_read > ---------+----------+---------------+-----+-------------- > t1 | 0 | | | 0 > (1 row) > > postgres=# alter table t1 add constraint col10 check (col1 = 0); > ALTER TABLE > postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, > current_timestamp), seq_tup_read from pg_stat_user_tables where relname > = 't1'; > relname | seq_scan | last_seq_scan | age > | seq_tup_read > ---------+----------+-------------------------------+------------------+-------------- > t1 | 1 | 2024-10-16 06:46:28.641281+00 | -00:00:03.258432 > | 100000 > (1 row) > > postgres=# -- this results in a seq scan , which is ok, but then when i > attach the partition it does a seq scan again > postgres=# alter table t attach partition t1 for values in (0); > ALTER TABLE > postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, > current_timestamp), seq_tup_read from pg_stat_user_tables where relname > = 't1'; > relname | seq_scan | last_seq_scan | age > | seq_tup_read > ---------+----------+-------------------------------+------------------+-------------- > t1 | 2 | 2024-10-16 06:46:59.512201+00 | -00:00:02.498771 > | 200000 > (1 row) > > postgres=# -- why , when there is a constraint that helps with the > partition boundary/value > > postgres=# alter table t detach partition t1; > ALTER TABLE > > postgres=# alter table t attach partition t1 for values in (0); > ALTER TABLE > postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, > current_timestamp), seq_tup_read from pg_stat_user_tables where relname > = 't1'; > relname | seq_scan | last_seq_scan | age > | seq_tup_read > ---------+----------+-------------------------------+------------------+-------------- > t1 | 3 | 2024-10-16 06:54:28.780145+00 | -00:00:03.358524 > | 300000 > (1 row) > > -- despite there being a constraint, it does a full table scan to attach > the partition. why ? note the tup read is full table of t1. > > */ > > above is one of the cases i found. > my core question still was, how do i know which statement will cause a > full table rewrite > full table scan I don't have time now to create an example, but I can point you at: https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE 5.12.2.2. Partition Maintenance "As an alternative to creating a new partition, it is sometimes more convenient to create a new table separate from the partition structure and attach it as a partition later. ... " Read the section starting above. > > how do i get to know that. i know implictly i can use the above stat > tables and pg_rel_filepath function etc to figure out the change in oid > , update in seq count etc. > but i want to pin point which statement made what change among 100 other > statements in production. > > I mean is there a way that a certain alter table will do a table rewrite > on disk and other alter table will not. > access exclusive lock on tables does not help answer that question. > > if i am not clear, maybe ignore my question. i have some issues > explaining things clearly, so i try to use demos. > > > > > > > > Thanks, > Vijay > > Open to work > Resume - Vijaykumar Jain <https://github.com/cabecada; -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: how to know if the sql will run a seq scan @ 2024-10-17 05:39 Vijaykumar Jain <[email protected]> parent: Adrian Klaver <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Vijaykumar Jain @ 2024-10-17 05:39 UTC (permalink / raw) To: Adrian Klaver <[email protected]>; +Cc: pgsql-general > > > > I don't have time now to create an example, but I can point you at: > > > https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE > > 5.12.2.2. Partition Maintenance > > "As an alternative to creating a new partition, it is sometimes more > convenient to create a new table separate from the partition structure > and attach it as a partition later. ... " > > Read the section starting above. > Thanks Adrian, I tried similar things as in the doc, but I found my mistake. I casually did not put a constraint of not null on the partition column. as a result i think to invalidate any of that data, it does a scan despite the boundary constraint. which when i used not null for the partition column, the scans stopped. although my question was asking something else, but maybe i will not be able to put it clearly. at least this part got addressed well. thanks for your help and time. /* postgres=# drop table t; DROP TABLE postgres=# drop table t1; ERROR: table "t1" does not exist postgres=# create table t(col1 int) partition by list(col1); CREATE TABLE postgres=# create table t1 (LIKE t INCLUDING DEFAULTS INCLUDING CONSTRAINTS); CREATE TABLE postgres=# insert into t1 select 0 from generate_series(1, 100000) x; INSERT 0 100000 postgres=# alter table t1 add constraint col10 check (col1 = 0); ALTER TABLE postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+-------------------------------+------------------+-------------- t1 | 1 | 2024-10-17 05:33:40.080319+00 | -00:00:02.488752 | 100000 (1 row) postgres=# alter table t attach partition t1 for values in (0); -- note the scan count increased as col1 was not set to not null ALTER TABLE postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+-------------------------------+------------------+-------------- t1 | 2 | 2024-10-17 05:33:48.248001+00 | -00:00:01.510694 | 200000 (1 row) postgres=# drop table t; DROP TABLE postgres=# drop table t1; ERROR: table "t1" does not exist postgres=# create table t(col1 int not null) partition by list(col1); CREATE TABLE postgres=# create table t1 (LIKE t INCLUDING DEFAULTS INCLUDING CONSTRAINTS); CREATE TABLE postgres=# insert into t1 select 0 from generate_series(1, 100000) x; INSERT 0 100000 postgres=# alter table t1 add constraint col10 check (col1 = 0); ALTER TABLE postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+-------------------------------+-----------------+-------------- t1 | 1 | 2024-10-17 05:34:41.363401+00 | -00:00:03.19836 | 100000 (1 row) postgres=# alter table t attach partition t1 for values in (0); -- note no scan count bump as not null on column to avoid scanning to filter those values - ALTER TABLE postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, current_timestamp), seq_tup_read from pg_stat_user_tables where relname = 't1'; relname | seq_scan | last_seq_scan | age | seq_tup_read ---------+----------+-------------------------------+------------------+-------------- t1 | 1 | 2024-10-17 05:34:41.363401+00 | -00:00:08.241812 | 100000 (1 row) */ ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2024-10-17 05:39 UTC | newest] Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-10-15 21:29 Re: how to know if the sql will run a seq scan Adrian Klaver <[email protected]> 2024-10-16 07:02 ` Vijaykumar Jain <[email protected]> 2024-10-16 16:40 ` Adrian Klaver <[email protected]> 2024-10-17 05:39 ` Vijaykumar Jain <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox