public inbox for [email protected]  
help / color / mirror / Atom feed
how to know if the sql will run a seq scan
3+ messages / 2 participants
[nested] [flat]

* how to know if the sql will run a seq scan
@ 2024-10-15 19:50  Vijaykumar Jain <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Vijaykumar Jain @ 2024-10-15 19:50 UTC (permalink / raw)
  To: pgsql-general

Hi,

tl;dr
I am trying to learn what sql can result in a full seq scan.

Basically there is a lot of info on the internet of what ddl change may
take an access exclusive lock while running a seq scan and hold for long.
 And for some cases we can make use of
"not valid" constraint and then run a validate constraint as work arounds
to avoid long exclusive locks etc.
but how do we check the same. i mean for dmls there is a explain/
auto_explain.

but for DDLs, how do we check the same.
i tried to isolate my setup and use pg_stat_user_tables and monitor the
same, which helped, but it is not useful as it does not link me to what
process/command invoked the seq scan.

am i clear in my question ?

if yes,
how do i log an alter table that may or may not do a seq scan, that may or
may not rewrite the table file on disk etc.

its a useless question, i am just playing with it for building knowledge,
no requirement as such.


/*
postgres=# \d t
                 Table "public.t"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 col1   | integer |           |          |

postgres=# insert into t select 0 from generate_series(1, 1000000) x;
INSERT 0 1000000

-- this does a full seq scan as new constraint
postgres=# alter table t add constraint col1c check ( col1 < 2 );
ALTER TABLE
-- this will not since the table has valid constraint to make it think only
worry about changed data ?
postgres=# insert into t values (3);
ERROR:  new row for relation "t" violates check constraint "col1c"
DETAIL:  Failing row contains (3).

-- the below setup making use of not valid and validate constraint still
runs a seq scan but does not block writes
postgres=# alter table t add constraint col1c_not_neg check ( col1 > -1 )
not valid;                                        ALTER TABLE
postgres=#  select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't';
-[ RECORD 1 ]-+------------------------------
relname       | t
seq_scan      | 7
last_seq_scan | 2024-10-15 19:34:46.837628+00
age           | -00:06:46.030264
seq_tup_read  | 4000000

postgres=# alter table t validate constraint col1c_not_neg;
ALTER TABLE
postgres=#  select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't';
-[ RECORD 1 ]-+------------------------------
relname       | t
seq_scan      | 8
last_seq_scan | 2024-10-15 19:41:50.931282+00
age           | -00:00:01.85388
seq_tup_read  | 5000000

postgres=# -- now i dont want this seq scan, so i update the pg_constraint
(ok we dont do this but i want to trace seq scans)

postgres=# alter table t drop constraint col1c_not_neg;
ALTER TABLE
postgres=#  select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't';
-[ RECORD 1 ]-+------------------------------
relname       | t
seq_scan      | 8
last_seq_scan | 2024-10-15 19:41:50.931282+00
age           | -00:00:21.980611
seq_tup_read  | 5000000

postgres=# alter table t add constraint col1c_not_neg check ( col1 > -1 )
not valid;                                        ALTER TABLE
postgres=# select oid from pg_constraint where conrelid =
't'::regclass::oid and convalidated = 'f';                        -[ RECORD
1 ]
oid | 16410

-- i save a seq scan in validate constraint because i know my data. (like
in attaching partitions etc) by updating the catalog directly

postgres=# update pg_constraint set convalidated = 't' where conrelid =
't'::regclass::oid and convalidated = 'f' and oid = 16410;
UPDATE 1

postgres=#  select relname,seq_scan,last_seq_scan, age(last_seq_scan,
current_timestamp), seq_tup_read from pg_stat_user_tables where relname =
't';
-[ RECORD 1 ]-+------------------------------
relname       | t
seq_scan      | 8
last_seq_scan | 2024-10-15 19:41:50.931282+00
age           | -00:05:14.066944
seq_tup_read  | 5000000

but how do i log this seq scan here for this sql.
*/

if this does not make sense,  pls ignore. not critical.
-- 
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada;


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: how to know if the sql will run a seq scan
@ 2024-10-15 20:24  Adrian Klaver <[email protected]>
  parent: Vijaykumar Jain <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Adrian Klaver @ 2024-10-15 20:24 UTC (permalink / raw)
  To: Vijaykumar Jain <[email protected]>; pgsql-general

On 10/15/24 12:50, Vijaykumar Jain wrote:
> 
> Hi,
> 
> tl;dr
> I am trying to learn what sql can result in a full seq scan.
> 
> Basically there is a lot of info on the internet of what ddl change may 
> take an access exclusive lock while running a seq scan and hold for long.
>   And for some cases we can make use of
> "not valid" constraint and then run a validate constraint as work 
> arounds to avoid long exclusive locks etc.
> but how do we check the same. i mean for dmls there is a explain/ 
> auto_explain.
> 
> but for DDLs, how do we check the same.
> i tried to isolate my setup and use pg_stat_user_tables and monitor the 
> same, which helped, but it is not useful as it does not link me to what 
> process/command invoked the seq scan.
> 
> am i clear in my question ?
> 
> if yes,
> how do i log an alter table that may or may not do a seq scan, that may 
> or may not rewrite the table file on disk etc.
> its a useless question, i am just playing with it for building 
> knowledge, no requirement as such.

Look at the docs:

https://www.postgresql.org/docs/current/sql-altertable.html

"Scanning a large table to verify a new foreign key or check constraint 
can take a long time, and other updates to the table are locked out 
until the ALTER TABLE ADD CONSTRAINT command is committed. The main 
purpose of the NOT VALID constraint option is to reduce the impact of 
adding a constraint on concurrent updates. With NOT VALID, the ADD 
CONSTRAINT command does not scan the table and can be committed 
immediately. After that, a VALIDATE CONSTRAINT command can be issued to 
verify that existing rows satisfy the constraint. The validation step 
does not need to lock out concurrent updates, since it knows that other 
transactions will be enforcing the constraint for rows that they insert 
or update; only pre-existing rows need to be checked. Hence, validation 
acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered. 
(If the constraint is a foreign key then a ROW SHARE lock is also 
required on the table referenced by the constraint.) In addition to 
improving concurrency, it can be useful to use NOT VALID and VALIDATE 
CONSTRAINT in cases where the table is known to contain pre-existing 
violations. Once the constraint is in place, no new violations can be 
inserted, and the existing problems can be corrected at leisure until 
VALIDATE CONSTRAINT finally succeeds."


> -- 
> Thanks,
> Vijay
> 
> Open to work
> Resume - Vijaykumar Jain <https://github.com/cabecada;

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: how to know if the sql will run a seq scan
@ 2024-10-15 20:50  Vijaykumar Jain <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: Vijaykumar Jain @ 2024-10-15 20:50 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general

Sorry top posting, coz Gmail app on phone.

Yeah, my point was for example we have a large table and we are attaching a
table as a partition. Now it will scan the whole table to validate the
constraint and that will create all sorts of problems.
I understand the benefit of not valid constraint and then validating
constraint to reduce blocking.
But yeah monitoring locks for the statement should give me good enough hint
of what will happen.

Thanks for your reply. It helps.



On Wed, Oct 16, 2024, 1:54 AM Adrian Klaver <[email protected]>
wrote:

> On 10/15/24 12:50, Vijaykumar Jain wrote:
> >
> > Hi,
> >
> > tl;dr
> > I am trying to learn what sql can result in a full seq scan.
> >
> > Basically there is a lot of info on the internet of what ddl change may
> > take an access exclusive lock while running a seq scan and hold for long.
> >   And for some cases we can make use of
> > "not valid" constraint and then run a validate constraint as work
> > arounds to avoid long exclusive locks etc.
> > but how do we check the same. i mean for dmls there is a explain/
> > auto_explain.
> >
> > but for DDLs, how do we check the same.
> > i tried to isolate my setup and use pg_stat_user_tables and monitor the
> > same, which helped, but it is not useful as it does not link me to what
> > process/command invoked the seq scan.
> >
> > am i clear in my question ?
> >
> > if yes,
> > how do i log an alter table that may or may not do a seq scan, that may
> > or may not rewrite the table file on disk etc.
> > its a useless question, i am just playing with it for building
> > knowledge, no requirement as such.
>
> Look at the docs:
>
> https://www.postgresql.org/docs/current/sql-altertable.html
>
> "Scanning a large table to verify a new foreign key or check constraint
> can take a long time, and other updates to the table are locked out
> until the ALTER TABLE ADD CONSTRAINT command is committed. The main
> purpose of the NOT VALID constraint option is to reduce the impact of
> adding a constraint on concurrent updates. With NOT VALID, the ADD
> CONSTRAINT command does not scan the table and can be committed
> immediately. After that, a VALIDATE CONSTRAINT command can be issued to
> verify that existing rows satisfy the constraint. The validation step
> does not need to lock out concurrent updates, since it knows that other
> transactions will be enforcing the constraint for rows that they insert
> or update; only pre-existing rows need to be checked. Hence, validation
> acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered.
> (If the constraint is a foreign key then a ROW SHARE lock is also
> required on the table referenced by the constraint.) In addition to
> improving concurrency, it can be useful to use NOT VALID and VALIDATE
> CONSTRAINT in cases where the table is known to contain pre-existing
> violations. Once the constraint is in place, no new violations can be
> inserted, and the existing problems can be corrected at leisure until
> VALIDATE CONSTRAINT finally succeeds."
>
>
> > --
> > Thanks,
> > Vijay
> >
> > Open to work
> > Resume - Vijaykumar Jain <https://github.com/cabecada;
>
> --
> Adrian Klaver
> [email protected]
>
>


^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2024-10-15 20:50 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-10-15 19:50 how to know if the sql will run a seq scan Vijaykumar Jain <[email protected]>
2024-10-15 20:24 ` Adrian Klaver <[email protected]>
2024-10-15 20:50   ` Vijaykumar Jain <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox