Re: how to know if the sql will run a seq scan

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: how to know if the sql will run a seq scan
4+ messages / 2 participants
[nested] [flat]

* Re: how to know if the sql will run a seq scan
@ 2024-10-15 21:29  Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Adrian Klaver @ 2024-10-15 21:29 UTC (permalink / raw)
  To: Vijaykumar Jain <[email protected]>; +Cc: pgsql-general

On 10/15/24 13:50, Vijaykumar Jain wrote:
> Sorry top posting, coz Gmail app on phone.
> 
> Yeah, my point was for example we have a large table and we are 
> attaching a table as a partition. Now it will scan the whole table to 
> validate the constraint and that will create all sorts of problems.

Now you have changed the problem description.

To get a proper answer you will need to provide a more detailed 
description of what you are doing with the following information:

1) Postgres version.

2) Definition of 'large'.

3) The command/process being used to create the partition.

4) The actual constraint definition.

5) The table definition.

> I understand the benefit of not valid constraint and then validating 
> constraint to reduce blocking.
> But yeah monitoring locks for the statement should give me good enough 
> hint of what will happen.
> 
> Thanks for your reply. It helps.
> 
> 
> 
> On Wed, Oct 16, 2024, 1:54 AM Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 10/15/24 12:50, Vijaykumar Jain wrote:
>      >
>      > Hi,
>      >
>      > tl;dr
>      > I am trying to learn what sql can result in a full seq scan.
>      >
>      > Basically there is a lot of info on the internet of what ddl
>     change may
>      > take an access exclusive lock while running a seq scan and hold
>     for long.
>      >   And for some cases we can make use of
>      > "not valid" constraint and then run a validate constraint as work
>      > arounds to avoid long exclusive locks etc.
>      > but how do we check the same. i mean for dmls there is a explain/
>      > auto_explain.
>      >
>      > but for DDLs, how do we check the same.
>      > i tried to isolate my setup and use pg_stat_user_tables and
>     monitor the
>      > same, which helped, but it is not useful as it does not link me
>     to what
>      > process/command invoked the seq scan.
>      >
>      > am i clear in my question ?
>      >
>      > if yes,
>      > how do i log an alter table that may or may not do a seq scan,
>     that may
>      > or may not rewrite the table file on disk etc.
>      > its a useless question, i am just playing with it for building
>      > knowledge, no requirement as such.
> 
>     Look at the docs:
> 
>     https://www.postgresql.org/docs/current/sql-altertable.html
>     <https://www.postgresql.org/docs/current/sql-altertable.html;
> 
>     "Scanning a large table to verify a new foreign key or check constraint
>     can take a long time, and other updates to the table are locked out
>     until the ALTER TABLE ADD CONSTRAINT command is committed. The main
>     purpose of the NOT VALID constraint option is to reduce the impact of
>     adding a constraint on concurrent updates. With NOT VALID, the ADD
>     CONSTRAINT command does not scan the table and can be committed
>     immediately. After that, a VALIDATE CONSTRAINT command can be issued to
>     verify that existing rows satisfy the constraint. The validation step
>     does not need to lock out concurrent updates, since it knows that other
>     transactions will be enforcing the constraint for rows that they insert
>     or update; only pre-existing rows need to be checked. Hence, validation
>     acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered.
>     (If the constraint is a foreign key then a ROW SHARE lock is also
>     required on the table referenced by the constraint.) In addition to
>     improving concurrency, it can be useful to use NOT VALID and VALIDATE
>     CONSTRAINT in cases where the table is known to contain pre-existing
>     violations. Once the constraint is in place, no new violations can be
>     inserted, and the existing problems can be corrected at leisure until
>     VALIDATE CONSTRAINT finally succeeds."
> 
> 
>      > --
>      > Thanks,
>      > Vijay
>      >
>      > Open to work
>      > Resume - Vijaykumar Jain <https://github.com/cabecada
>     <https://github.com/cabecada>;
> 
>     -- 
>     Adrian Klaver
>     [email protected] <mailto:[email protected]>
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: how to know if the sql will run a seq scan
@ 2024-10-16 07:02  Vijaykumar Jain <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Vijaykumar Jain @ 2024-10-16 07:02 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general

On Wed, 16 Oct 2024 at 02:59, Adrian Klaver <[email protected]>
wrote:

> On 10/15/24 13:50, Vijaykumar Jain wrote:
> > Sorry top posting, coz Gmail app on phone.
> >
> > Yeah, my point was for example we have a large table and we are
> > attaching a table as a partition. Now it will scan the whole table to
> > validate the constraint and that will create all sorts of problems.
>
> Now you have changed the problem description.
>
> To get a proper answer you will need to provide a more detailed
> description of what you are doing with the following information:
>
> 1) Postgres version.
>
> 2) Definition of 'large'.
>
> 3) The command/process being used to create the partition.
>
> 4) The actual constraint definition.
>
> 5) The table definition.
>
>
/*
postgres=# create table t(col1 int) partition by list(col1);
CREATE TABLE
postgres=# create table t1(col1 int)
postgres-# ;
CREATE TABLE
postgres=# insert into t1 select 0 from generate_series(1, 100000) x;
INSERT 0 100000
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan | last_seq_scan | age | seq_tup_read
---------+----------+---------------+-----+--------------
 t1      |        0 |               |     |            0
(1 row)

postgres=# alter table t1 add constraint col10 check (col1 = 0);
ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan |         last_seq_scan         |       age        |
seq_tup_read
---------+----------+-------------------------------+------------------+--------------
 t1      |        1 | 2024-10-16 06:46:28.641281+00 | -00:00:03.258432 |
   100000
(1 row)

postgres=# -- this results in a seq scan , which is ok, but then when i
attach the partition it does a seq scan again
postgres=# alter table t attach partition t1 for values in (0);
                                                 ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan |         last_seq_scan         |       age        |
seq_tup_read
---------+----------+-------------------------------+------------------+--------------
 t1      |        2 | 2024-10-16 06:46:59.512201+00 | -00:00:02.498771 |
   200000
(1 row)

postgres=# -- why , when there is a constraint that helps with the
partition boundary/value

postgres=# alter table t detach partition t1;
ALTER TABLE

postgres=# alter table t attach partition t1 for values in (0);
ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan |         last_seq_scan         |       age        |
seq_tup_read
---------+----------+-------------------------------+------------------+--------------
 t1      |        3 | 2024-10-16 06:54:28.780145+00 | -00:00:03.358524 |
   300000
(1 row)

-- despite there being a constraint, it does a full table scan to attach
the partition. why ? note the tup read is full table of t1.

*/

above is one of the cases i found.
my core question still was, how do i know which statement will cause a
full table rewrite
full table scan

how do i get to know that. i know implictly i can use the above stat tables
and pg_rel_filepath function etc to figure out the change in oid , update
in seq count etc.
but i want to pin point which statement made what change among 100 other
statements in production.

I mean is there a way that a certain alter table will do a table rewrite on
disk and other alter table will not.
access exclusive lock on tables does not help answer that question.

if i am not clear, maybe ignore my question. i have some issues explaining
things clearly, so i try to use demos.







Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada;


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: how to know if the sql will run a seq scan
@ 2024-10-16 16:40  Adrian Klaver <[email protected]>
  parent: Vijaykumar Jain <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Adrian Klaver @ 2024-10-16 16:40 UTC (permalink / raw)
  To: Vijaykumar Jain <[email protected]>; +Cc: pgsql-general

On 10/16/24 00:02, Vijaykumar Jain wrote:
> 
> 

> postgres=# create table t(col1 int) partition by list(col1);
> CREATE TABLE
> postgres=# create table t1(col1 int)
> postgres-# ;
> CREATE TABLE
> postgres=# insert into t1 select 0 from generate_series(1, 100000) x;
> INSERT 0 100000
> postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
> current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
> = 't1';
>   relname | seq_scan | last_seq_scan | age | seq_tup_read
> ---------+----------+---------------+-----+--------------
>   t1      |        0 |               |     |            0
> (1 row)
> 
> postgres=# alter table t1 add constraint col10 check (col1 = 0);
> ALTER TABLE
> postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
> current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
> = 't1';
>   relname | seq_scan |         last_seq_scan         |       age        
> | seq_tup_read
> ---------+----------+-------------------------------+------------------+--------------
>   t1      |        1 | 2024-10-16 06:46:28.641281+00 | -00:00:03.258432 
> |       100000
> (1 row)
> 
> postgres=# -- this results in a seq scan , which is ok, but then when i 
> attach the partition it does a seq scan again
> postgres=# alter table t attach partition t1 for values in (0);          
>                                                     ALTER TABLE
> postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
> current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
> = 't1';
>   relname | seq_scan |         last_seq_scan         |       age        
> | seq_tup_read
> ---------+----------+-------------------------------+------------------+--------------
>   t1      |        2 | 2024-10-16 06:46:59.512201+00 | -00:00:02.498771 
> |       200000
> (1 row)
> 
> postgres=# -- why , when there is a constraint that helps with the 
> partition boundary/value
> 
> postgres=# alter table t detach partition t1;
> ALTER TABLE
> 
> postgres=# alter table t attach partition t1 for values in (0);
> ALTER TABLE
> postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan, 
> current_timestamp), seq_tup_read from pg_stat_user_tables where relname 
> = 't1';
>   relname | seq_scan |         last_seq_scan         |       age        
> | seq_tup_read
> ---------+----------+-------------------------------+------------------+--------------
>   t1      |        3 | 2024-10-16 06:54:28.780145+00 | -00:00:03.358524 
> |       300000
> (1 row)
> 
> -- despite there being a constraint, it does a full table scan to attach 
> the partition. why ? note the tup read is full table of t1.
> 
> */
> 
> above is one of the cases i found.
> my core question still was, how do i know which statement will cause a
> full table rewrite
> full table scan

I don't have time now to create an example, but I can point you at:

https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

5.12.2.2. Partition Maintenance

"As an alternative to creating a new partition, it is sometimes more 
convenient to create a new table separate from the partition structure 
and attach it as a partition later. ... "

Read the section starting above.

> 
> how do i get to know that. i know implictly i can use the above stat 
> tables and pg_rel_filepath function etc to figure out the change in oid 
> , update in seq count etc.
> but i want to pin point which statement made what change among 100 other 
> statements in production.
> 
> I mean is there a way that a certain alter table will do a table rewrite 
> on disk and other alter table will not.
> access exclusive lock on tables does not help answer that question.
> 
> if i am not clear, maybe ignore my question. i have some issues 
> explaining things clearly, so i try to use demos.
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> Vijay
> 
> Open to work
> Resume - Vijaykumar Jain <https://github.com/cabecada;

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: how to know if the sql will run a seq scan
@ 2024-10-17 05:39  Vijaykumar Jain <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Vijaykumar Jain @ 2024-10-17 05:39 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general

>
>
>
> I don't have time now to create an example, but I can point you at:
>
>
> https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
>
> 5.12.2.2. Partition Maintenance
>
> "As an alternative to creating a new partition, it is sometimes more
> convenient to create a new table separate from the partition structure
> and attach it as a partition later. ... "
>
> Read the section starting above.
>

Thanks Adrian,
I tried similar things as in the doc, but I found my mistake.
I casually did not put a constraint of not null on the partition column. as
a result i think to invalidate any of that data, it does a scan despite the
boundary constraint.

which when i used not null for the partition column, the scans stopped.
although my question was asking something else, but maybe i will not be
able to put it clearly.
at least this part got addressed well. thanks for your help and time.



/*
postgres=# drop table t;
DROP TABLE
postgres=# drop table t1;
ERROR:  table "t1" does not exist
postgres=# create table t(col1 int) partition by list(col1);
                                                CREATE TABLE
postgres=# create table t1 (LIKE t INCLUDING DEFAULTS INCLUDING
CONSTRAINTS);                                               CREATE TABLE
postgres=# insert into t1 select 0 from generate_series(1, 100000) x;
INSERT 0 100000
postgres=# alter table t1 add constraint col10 check (col1 = 0);
                                                ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan |         last_seq_scan         |       age        |
seq_tup_read
---------+----------+-------------------------------+------------------+--------------
 t1      |        1 | 2024-10-17 05:33:40.080319+00 | -00:00:02.488752 |
   100000
(1 row)

postgres=# alter table t attach partition t1 for values in (0);          --
note the scan count increased as col1 was not set to not null
                                                   ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan |         last_seq_scan         |       age        |
seq_tup_read
---------+----------+-------------------------------+------------------+--------------
 t1      |        2 | 2024-10-17 05:33:48.248001+00 | -00:00:01.510694 |
   200000
(1 row)

postgres=# drop table t;
DROP TABLE
postgres=# drop table t1;
ERROR:  table "t1" does not exist
postgres=# create table t(col1 int not null) partition by list(col1);
                                                 CREATE TABLE
postgres=# create table t1 (LIKE t INCLUDING DEFAULTS INCLUDING
CONSTRAINTS);                                               CREATE TABLE
postgres=# insert into t1 select 0 from generate_series(1, 100000) x;
                                                 INSERT 0 100000
postgres=# alter table t1 add constraint col10 check (col1 = 0);

ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan |         last_seq_scan         |       age       |
seq_tup_read
---------+----------+-------------------------------+-----------------+--------------
 t1      |        1 | 2024-10-17 05:34:41.363401+00 | -00:00:03.19836 |
   100000
(1 row)

postgres=# alter table t attach partition t1 for values in (0);  -- note no
scan count bump as not null on column to avoid scanning to filter those
values                                            -
       ALTER TABLE
postgres=# select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't1';
 relname | seq_scan |         last_seq_scan         |       age        |
seq_tup_read
---------+----------+-------------------------------+------------------+--------------
 t1      |        1 | 2024-10-17 05:34:41.363401+00 | -00:00:08.241812 |
   100000
(1 row)

*/


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2024-10-17 05:39 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-10-15 21:29 Re: how to know if the sql will run a seq scan Adrian Klaver <[email protected]>
2024-10-16 07:02 ` Vijaykumar Jain <[email protected]>
2024-10-16 16:40   ` Adrian Klaver <[email protected]>
2024-10-17 05:39     ` Vijaykumar Jain <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox