Re: ANALYZE on partitioned tables vs on individual partitions

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Michael Harris <[email protected]>
To: David Rowley <[email protected]>
Cc: [email protected]
Cc: Christophe Pettus <[email protected]>
Subject: Re: ANALYZE on partitioned tables vs on individual partitions
Date: Wed, 7 Aug 2024 17:20:19 +1000
Message-ID: <CADofcAUMS2waLhJDdiz3Zv2RRPT8N_+cq0ZOtf9LjV51F0ht2Q@mail.gmail.com> (raw)
In-Reply-To: <CAApHDvrjTL3qo70C1snX7yLQwPpWoANL6sigcD6OTWqy7Y3+Nw@mail.gmail.com>
References: <CADofcAXVbD0yGp_EaC9chmzsOoSai3jcfBCnyva3j0RRdRvMVA@mail.gmail.com>
	<[email protected]>
	<CAApHDvrjTL3qo70C1snX7yLQwPpWoANL6sigcD6OTWqy7Y3+Nw@mail.gmail.com>

Many thanks David for the comprehensive response.

> I think the complaint was about no autovacuum on the partitioned
> table, not the partitions.

Yes, exactly.

One other piece of information: these tables contain a lot of columns, of which
only 4 are normally used for WHERE clauses or joins. The table I was
experimenting
with has 150 columns, 156026832 rows and occupies 166GB.

I found that running an ANALYZE specifying only those 4 columns only took
5 minutes, compared to the 30 minutes for the whole table.

That was a bit of a surprise as I imagined actually reading the table would take
most of the time and would be the same regardless of the number of columns
being analyzed, but I guess that is wrong.

Regards, Mike


On Wed, 7 Aug 2024 at 15:23, David Rowley <[email protected]> wrote:
>
> On Wed, 7 Aug 2024 at 16:44, Christophe Pettus <[email protected]> wrote:
> > Child partitions should be autovacuumed and autoanalyzed just like any other table; they are not prohibited from autovacuum in any way by default.  It's probably a good idea to investigate why they are not being picked up by autovacuum.  If they are created by a bulk load process, it's not a bad idea to do a VACUUM ANALYZE on them once the bulk load is complete.
>
> I think the complaint was about no autovacuum on the partitioned
> table, not the partitions.  This is expected as we don't track the
> counters (in particular n_mod_since_analyze) shown in
> pg_stat_all_tables at the partitioned table level, so the trigger
> points that normally cause autovacuum to analyze or vacuum a table
> just won't be triggered for a partitioned table.  For VACUUM, that's
> fine as, as you mentioned, no rows are stored. But for analyze, that
> does present a problem.
>
> To name the aspects of planning that rely on statistics of the
> partitioned table, basically anything above the Append or MergeAppend
> which joins the partitioned results together. So that doesn't include
> the scans of each partition and any quals that are pushed down to the
> scan level as those are able to use the partition level statistics.
> However, it does include things like joins, group by, distinct as
> those require n_distinct estimates for the partitioned table. It's not
> all bad though as the row estimates for each individual partition will
> be totalled up through the Append / MergeAppend simply by adding up
> the row estimates for each Append / MergeAppend child plan. So, it's
> really only an estimation problem for any node that comes after a join
> node or a group by node as the output rows for those nodes will depend
> on a good n_distinct estimate for the partitioned table.
>
> Partition-wise joins and aggregates do change things a bit as those
> features do permit moving those operations below the Append / Merge
> Append, in which case the statistics for the individual partition can
> be used.
>
> You could consider manually setting the n_distinct_inherited estimates
> for the columns that you join on or group by in the partitioned table.
> You might find that you're able to choose a suitable value for that if
> you review the documentation for that setting. In particular, please
> review what is mentioned about using negative numbers for that
> setting. You may be able to choose a value that scales correctly with
> the row estimate that doesn't get outdated as you add more rows to the
> partitions. You'll need to determine that based on the data you're
> storing.
>
> David

view thread (3+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: ANALYZE on partitioned tables vs on individual partitions
  In-Reply-To: <CADofcAUMS2waLhJDdiz3Zv2RRPT8N_+cq0ZOtf9LjV51F0ht2Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox