public inbox for [email protected]  
help / color / mirror / Atom feed
From: David Rowley <[email protected]>
To: Michael Harris <[email protected]>
Cc: [email protected]
Cc: Christophe Pettus <[email protected]>
Subject: Re: ANALYZE on partitioned tables vs on individual partitions
Date: Wed, 7 Aug 2024 20:09:39 +1200
Message-ID: <CAApHDvosdmXz-=nyUF+ceUgTrLy=FcCct47vhMtTHDsRpBtaJA@mail.gmail.com> (raw)
In-Reply-To: <CADofcAUMS2waLhJDdiz3Zv2RRPT8N_+cq0ZOtf9LjV51F0ht2Q@mail.gmail.com>
References: <CADofcAXVbD0yGp_EaC9chmzsOoSai3jcfBCnyva3j0RRdRvMVA@mail.gmail.com>
	<[email protected]>
	<CAApHDvrjTL3qo70C1snX7yLQwPpWoANL6sigcD6OTWqy7Y3+Nw@mail.gmail.com>
	<CADofcAUMS2waLhJDdiz3Zv2RRPT8N_+cq0ZOtf9LjV51F0ht2Q@mail.gmail.com>

On Wed, 7 Aug 2024 at 19:20, Michael Harris <[email protected]> wrote:
> I found that running an ANALYZE specifying only those 4 columns only took
> 5 minutes, compared to the 30 minutes for the whole table.
>
> That was a bit of a surprise as I imagined actually reading the table would take
> most of the time and would be the same regardless of the number of columns
> being analyzed, but I guess that is wrong.

ANALYZE does do sampling of the data in the table.  It would only read
all of the rows for fairly small tables.  The docs in [1] mention
this:

"For large tables, ANALYZE takes a random sample of the table
contents, rather than examining every row. This allows even very large
tables to be analyzed in a small amount of time."

I think the reason it's taking so long is not because of it performing
ANALYZE on the partitioned table which results in gathering statistics
for the partitioned table which means proportionately (based on the
size of the partition) sampling rows from each partition, it's more
likely due to the fact that each partition is also analysed and the
statistics for each of those is updated. There is no "ANALYZE ONLY"
command similar to "FROM ONLY" in SELECT queries.

You could probably do some sampling of the pg_stat_progress_analyze
view to figure out what's taking the most time. If you find that the
majority of the time is spent analysing the partitions and not the
partitioned table then maybe we should expand ANALYZE to add the ONLY
option...

David

[1] https://www.postgresql.org/docs/current/sql-analyze.html
[2] https://www.postgresql.org/docs/current/progress-reporting.html






view thread (3+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: ANALYZE on partitioned tables vs on individual partitions
  In-Reply-To: <CAApHDvosdmXz-=nyUF+ceUgTrLy=FcCct47vhMtTHDsRpBtaJA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox