Re: analyze-in-stages post upgrade questions

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: analyze-in-stages post upgrade questions
3+ messages / 3 participants
[nested] [flat]

* Re: analyze-in-stages post upgrade questions
@ 2025-06-27 15:31 Adrian Klaver <[email protected]>
  2025-06-27 21:39 ` Re: analyze-in-stages post upgrade questions Laurenz Albe <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Adrian Klaver @ 2025-06-27 15:31 UTC (permalink / raw)
  To: Zechman, Derek S <[email protected]>; [email protected] <[email protected]>

On 6/27/25 06:35, Zechman, Derek S wrote:
> We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and 
> performed the analyze-in-stages post upgrade.  It has been noticed that 
> some plans changed to use hash joins instead of nested loops.  Further 
> investigation found it was because the parent table of partitioned 
> tables did not have stats.  After running an ANALYZE on the parent 
> tables we got similar plan an execution times as before.
> 
> I have two questions
> 
> 1 - Why does analyze-in-stages not analyze the parent tables?
> 
> 2 – What happens if we do not run analyze-in-stages post upgrade and 
> just run an analyze?

It is spelled out in the docs:

https://www.postgresql.org/docs/current/pgupgrade.html

Emphasis added

"Using vacuumdb --all --analyze-only can efficiently generate such 
statistics, and the use of --jobs can speed it up. Option 
--analyze-in-stages can be used to generate **minimal statistics** 
quickly. If vacuum_cost_delay is set to a non-zero value, this can be 
overridden to speed up statistics generation using PGOPTIONS, e.g., 
PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."

and from here:

https://www.postgresql.org/docs/current/app-vacuumdb.html

"--analyze-in-stages

     Only calculate statistics for use by the optimizer (no vacuum), 
like --analyze-only. Run three stages of analyze; the first stage uses 
the lowest possible statistics target (see default_statistics_target) to 
produce usable statistics faster, and subsequent stages build the full 
statistics.

     This option is only useful to analyze a database that currently has 
no statistics or has wholly incorrect ones, such as if it is newly 
populated from a restored dump or by pg_upgrade. Be aware that running 
with this option in a database with existing statistics may cause the 
query optimizer choices to become transiently worse due to the low 
statistics targets of the early stages.
"

> 
> Thanks,
> 
> Sean
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: analyze-in-stages post upgrade questions
  2025-06-27 15:31 Re: analyze-in-stages post upgrade questions Adrian Klaver <[email protected]>
@ 2025-06-27 21:39 ` Laurenz Albe <[email protected]>
  2025-06-28 01:23   ` RE: analyze-in-stages post upgrade questions Zechman, Derek S <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Laurenz Albe @ 2025-06-27 21:39 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; Zechman, Derek S <[email protected]>; [email protected] <[email protected]>

On Fri, 2025-06-27 at 08:31 -0700, Adrian Klaver wrote:
> On 6/27/25 06:35, Zechman, Derek S wrote:
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and 
> > performed the analyze-in-stages post upgrade.  It has been noticed that 
> > some plans changed to use hash joins instead of nested loops.  Further 
> > investigation found it was because the parent table of partitioned 
> > tables did not have stats.  After running an ANALYZE on the parent 
> > tables we got similar plan an execution times as before.
> > 
> > I have two questions
> > 
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> > 
> > 2 – What happens if we do not run analyze-in-stages post upgrade and 
> > just run an analyze?
> 
> It is spelled out in the docs:
> 
> https://www.postgresql.org/docs/current/pgupgrade.html
> 
> Emphasis added
> 
> "Using vacuumdb --all --analyze-only can efficiently generate such 
> statistics, and the use of --jobs can speed it up. Option 
> --analyze-in-stages can be used to generate **minimal statistics** 
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be 
> overridden to speed up statistics generation using PGOPTIONS, e.g., 
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
> 
> and from here:
> 
> https://www.postgresql.org/docs/current/app-vacuumdb.html
> 
> "--analyze-in-stages
> 
>      Only calculate statistics for use by the optimizer (no vacuum), 
> like --analyze-only. Run three stages of analyze; the first stage uses 
> the lowest possible statistics target (see default_statistics_target) to 
> produce usable statistics faster, and subsequent stages build the full 
> statistics.
> 
>      This option is only useful to analyze a database that currently has 
> no statistics or has wholly incorrect ones, such as if it is newly 
> populated from a restored dump or by pg_upgrade. Be aware that running 
> with this option in a database with existing statistics may cause the 
> query optimizer choices to become transiently worse due to the low 
> statistics targets of the early stages.

Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.

Derek, can cou share the pg_stats entries for the partitioned table?

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* RE: analyze-in-stages post upgrade questions
  2025-06-27 15:31 Re: analyze-in-stages post upgrade questions Adrian Klaver <[email protected]>
  2025-06-27 21:39 ` Re: analyze-in-stages post upgrade questions Laurenz Albe <[email protected]>
@ 2025-06-28 01:23   ` Zechman, Derek S <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: Zechman, Derek S @ 2025-06-28 01:23 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; Adrian Klaver <[email protected]>; [email protected] <[email protected]>

> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and

> > performed the analyze-in-stages post upgrade.  It has been noticed that

> > some plans changed to use hash joins instead of nested loops.  Further

> > investigation found it was because the parent table of partitioned

> > tables did not have stats.  After running an ANALYZE on the parent

> > tables we got similar plan an execution times as before.

> >

> > I have two questions

> >

> > 1 - Why does analyze-in-stages not analyze the parent tables?

> >

> > 2 – What happens if we do not run analyze-in-stages post upgrade and

> > just run an analyze?

>

> It is spelled out in the docs:

>

> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/pgupgrade.html__;!!Lf_9VycLqA!jU...;

>

> Emphasis added

>

> "Using vacuumdb --all --analyze-only can efficiently generate such

> statistics, and the use of --jobs can speed it up. Option

> --analyze-in-stages can be used to generate **minimal statistics**

> quickly. If vacuum_cost_delay is set to a non-zero value, this can be

> overridden to speed up statistics generation using PGOPTIONS, e.g.,

> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."

>

> and from here:

>

> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/app-vacuumdb.html__;!!Lf_9VycLqA...;

>

> "--analyze-in-stages

>

>      Only calculate statistics for use by the optimizer (no vacuum),

> like --analyze-only. Run three stages of analyze; the first stage uses

> the lowest possible statistics target (see default_statistics_target) to

> produce usable statistics faster, and subsequent stages build the full

> statistics.

>

>      This option is only useful to analyze a database that currently has

> no statistics or has wholly incorrect ones, such as if it is newly

> populated from a restored dump or by pg_upgrade. Be aware that running

> with this option in a database with existing statistics may cause the

> query optimizer choices to become transiently worse due to the low

> statistics targets of the early stages.



Well, that wouldn't explain why it doesn't work on partitioned tables.

I am under the impression that it should.



Derek, can cou share the pg_stats entries for the partitioned table?



Yours,

Laurenz Albe





There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below



=> select relname, reltuples, relkind from pg_class where relname ~ '^chapter_[0-9]+$' or relname='chapter' order by 1;

   relname   | reltuples | relkind

-------------+-----------+---------

 chapter     |        -1 | p

 chapter_1   |         4 | r

 chapter_10  |         4 | r

 chapter_100 |        30 | r

 chapter_101 |        15 | r

 chapter_102 |        15 | r

…

=> select count(*) from pg_stats where tablename='chapter';

 count

-------

     0

(1 row)



=> analyze chapter;

ANALYZE

=> select relname, reltuples, relkind from pg_class where relkind ='p' and relname='chapter';

 relname | reltuples | relkind

---------+-----------+---------

 chapter |      7589 | p

(1 row)



=> select count(*) from pg_stats where tablename='chapter';

 count

-------

    49

(1 row)



toy_epc_stg_1_db=>


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

end of thread, other threads:[~2025-06-28 01:23 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-06-27 15:31 Re: analyze-in-stages post upgrade questions Adrian Klaver <[email protected]>
2025-06-27 21:39 ` Laurenz Albe <[email protected]>
2025-06-28 01:23   ` Zechman, Derek S <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox