public inbox for [email protected]
help / color / mirror / Atom feedRe: analyze-in-stages post upgrade questions
3+ messages / 3 participants
[nested] [flat]
* Re: analyze-in-stages post upgrade questions
@ 2025-06-27 15:31 Adrian Klaver <[email protected]>
2025-06-27 21:39 ` Re: analyze-in-stages post upgrade questions Laurenz Albe <[email protected]>
0 siblings, 1 reply; 3+ messages in thread
From: Adrian Klaver @ 2025-06-27 15:31 UTC (permalink / raw)
To: Zechman, Derek S <[email protected]>; [email protected] <[email protected]>
On 6/27/25 06:35, Zechman, Derek S wrote:
> We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
> performed the analyze-in-stages post upgrade. It has been noticed that
> some plans changed to use hash joins instead of nested loops. Further
> investigation found it was because the parent table of partitioned
> tables did not have stats. After running an ANALYZE on the parent
> tables we got similar plan an execution times as before.
>
> I have two questions
>
> 1 - Why does analyze-in-stages not analyze the parent tables?
>
> 2 – What happens if we do not run analyze-in-stages post upgrade and
> just run an analyze?
It is spelled out in the docs:
https://www.postgresql.org/docs/current/pgupgrade.html
Emphasis added
"Using vacuumdb --all --analyze-only can efficiently generate such
statistics, and the use of --jobs can speed it up. Option
--analyze-in-stages can be used to generate **minimal statistics**
quickly. If vacuum_cost_delay is set to a non-zero value, this can be
overridden to speed up statistics generation using PGOPTIONS, e.g.,
PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
and from here:
https://www.postgresql.org/docs/current/app-vacuumdb.html
"--analyze-in-stages
Only calculate statistics for use by the optimizer (no vacuum),
like --analyze-only. Run three stages of analyze; the first stage uses
the lowest possible statistics target (see default_statistics_target) to
produce usable statistics faster, and subsequent stages build the full
statistics.
This option is only useful to analyze a database that currently has
no statistics or has wholly incorrect ones, such as if it is newly
populated from a restored dump or by pg_upgrade. Be aware that running
with this option in a database with existing statistics may cause the
query optimizer choices to become transiently worse due to the low
statistics targets of the early stages.
"
>
> Thanks,
>
> Sean
>
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: analyze-in-stages post upgrade questions
2025-06-27 15:31 Re: analyze-in-stages post upgrade questions Adrian Klaver <[email protected]>
@ 2025-06-27 21:39 ` Laurenz Albe <[email protected]>
2025-06-28 01:23 ` RE: analyze-in-stages post upgrade questions Zechman, Derek S <[email protected]>
0 siblings, 1 reply; 3+ messages in thread
From: Laurenz Albe @ 2025-06-27 21:39 UTC (permalink / raw)
To: Adrian Klaver <[email protected]>; Zechman, Derek S <[email protected]>; [email protected] <[email protected]>
On Fri, 2025-06-27 at 08:31 -0700, Adrian Klaver wrote:
> On 6/27/25 06:35, Zechman, Derek S wrote:
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
> > performed the analyze-in-stages post upgrade. It has been noticed that
> > some plans changed to use hash joins instead of nested loops. Further
> > investigation found it was because the parent table of partitioned
> > tables did not have stats. After running an ANALYZE on the parent
> > tables we got similar plan an execution times as before.
> >
> > I have two questions
> >
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> >
> > 2 – What happens if we do not run analyze-in-stages post upgrade and
> > just run an analyze?
>
> It is spelled out in the docs:
>
> https://www.postgresql.org/docs/current/pgupgrade.html
>
> Emphasis added
>
> "Using vacuumdb --all --analyze-only can efficiently generate such
> statistics, and the use of --jobs can speed it up. Option
> --analyze-in-stages can be used to generate **minimal statistics**
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be
> overridden to speed up statistics generation using PGOPTIONS, e.g.,
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
>
> and from here:
>
> https://www.postgresql.org/docs/current/app-vacuumdb.html
>
> "--analyze-in-stages
>
> Only calculate statistics for use by the optimizer (no vacuum),
> like --analyze-only. Run three stages of analyze; the first stage uses
> the lowest possible statistics target (see default_statistics_target) to
> produce usable statistics faster, and subsequent stages build the full
> statistics.
>
> This option is only useful to analyze a database that currently has
> no statistics or has wholly incorrect ones, such as if it is newly
> populated from a restored dump or by pg_upgrade. Be aware that running
> with this option in a database with existing statistics may cause the
> query optimizer choices to become transiently worse due to the low
> statistics targets of the early stages.
Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.
Derek, can cou share the pg_stats entries for the partitioned table?
Yours,
Laurenz Albe
^ permalink raw reply [nested|flat] 3+ messages in thread
* RE: analyze-in-stages post upgrade questions
2025-06-27 15:31 Re: analyze-in-stages post upgrade questions Adrian Klaver <[email protected]>
2025-06-27 21:39 ` Re: analyze-in-stages post upgrade questions Laurenz Albe <[email protected]>
@ 2025-06-28 01:23 ` Zechman, Derek S <[email protected]>
0 siblings, 0 replies; 3+ messages in thread
From: Zechman, Derek S @ 2025-06-28 01:23 UTC (permalink / raw)
To: Laurenz Albe <[email protected]>; Adrian Klaver <[email protected]>; [email protected] <[email protected]>
> > We recently performed an upgrade from pg14 (14.18) to pg16 (16.9) and
> > performed the analyze-in-stages post upgrade. It has been noticed that
> > some plans changed to use hash joins instead of nested loops. Further
> > investigation found it was because the parent table of partitioned
> > tables did not have stats. After running an ANALYZE on the parent
> > tables we got similar plan an execution times as before.
> >
> > I have two questions
> >
> > 1 - Why does analyze-in-stages not analyze the parent tables?
> >
> > 2 – What happens if we do not run analyze-in-stages post upgrade and
> > just run an analyze?
>
> It is spelled out in the docs:
>
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/pgupgrade.html__;!!Lf_9VycLqA!jU...;
>
> Emphasis added
>
> "Using vacuumdb --all --analyze-only can efficiently generate such
> statistics, and the use of --jobs can speed it up. Option
> --analyze-in-stages can be used to generate **minimal statistics**
> quickly. If vacuum_cost_delay is set to a non-zero value, this can be
> overridden to speed up statistics generation using PGOPTIONS, e.g.,
> PGOPTIONS='-c vacuum_cost_delay=0' vacuumdb ...."
>
> and from here:
>
> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/app-vacuumdb.html__;!!Lf_9VycLqA...;
>
> "--analyze-in-stages
>
> Only calculate statistics for use by the optimizer (no vacuum),
> like --analyze-only. Run three stages of analyze; the first stage uses
> the lowest possible statistics target (see default_statistics_target) to
> produce usable statistics faster, and subsequent stages build the full
> statistics.
>
> This option is only useful to analyze a database that currently has
> no statistics or has wholly incorrect ones, such as if it is newly
> populated from a restored dump or by pg_upgrade. Be aware that running
> with this option in a database with existing statistics may cause the
> query optimizer choices to become transiently worse due to the low
> statistics targets of the early stages.
Well, that wouldn't explain why it doesn't work on partitioned tables.
I am under the impression that it should.
Derek, can cou share the pg_stats entries for the partitioned table?
Yours,
Laurenz Albe
There are no entries in pg_stats for the parent table until after I manually run an analyze on it – Example below
=> select relname, reltuples, relkind from pg_class where relname ~ '^chapter_[0-9]+$' or relname='chapter' order by 1;
relname | reltuples | relkind
-------------+-----------+---------
chapter | -1 | p
chapter_1 | 4 | r
chapter_10 | 4 | r
chapter_100 | 30 | r
chapter_101 | 15 | r
chapter_102 | 15 | r
…
=> select count(*) from pg_stats where tablename='chapter';
count
-------
0
(1 row)
=> analyze chapter;
ANALYZE
=> select relname, reltuples, relkind from pg_class where relkind ='p' and relname='chapter';
relname | reltuples | relkind
---------+-----------+---------
chapter | 7589 | p
(1 row)
=> select count(*) from pg_stats where tablename='chapter';
count
-------
49
(1 row)
toy_epc_stg_1_db=>
^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2025-06-28 01:23 UTC | newest]
Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-06-27 15:31 Re: analyze-in-stages post upgrade questions Adrian Klaver <[email protected]>
2025-06-27 21:39 ` Laurenz Albe <[email protected]>
2025-06-28 01:23 ` Zechman, Derek S <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox