public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tom Lane <[email protected]>
To: [email protected]
Subject: pgsql: Correctly calculate "MCV frequency" for a unique column.
Date: Sun, 01 Mar 2026 17:57:01 +0000
Message-ID: <[email protected]> (raw)

Correctly calculate "MCV frequency" for a unique column.

In commit bd3e3e9e5, I over-hastily used 1 / rel->rows as the assumed
frequency of entries in a column that ANALYZE has found to be unique.
However, rel->rows is the number of table rows that are estimated to
pass the query's restriction conditions, so that we got a too-large
result if the query has selective restrictions.  What I should have
used is 1 / rel->tuples, since that is the estimated total number of
table rows.  The pre-existing code path that digs a frequency out of
the histogram produces a frequency relative to the whole table, so
surely this new alternative code path must do so as well.  Any
correction needed on the basis of selectivity must be done by the
user of the mcv_freq value.

Fixing this causes all the regression test plans changed by bd3e3e9e5
to revert to what they had been, except for the first change in
join.out.  As I correctly argued in bd3e3e9e5, in that test case we
have no stats and should not risk a hash join.  Evidently I was less
correct to argue that the other changes were improvements.

Reported-by: Joel Jacobson <[email protected]>
Diagnosed-by: Tender Wang <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/d80b0225010fd407c784bbecde116a28198b6eab

Modified Files
--------------
src/backend/utils/adt/selfuncs.c             |   9 +-
src/test/regress/expected/join.out           |  16 +-
src/test/regress/expected/partition_join.out | 432 ++++++++++++++-------------
3 files changed, 231 insertions(+), 226 deletions(-)



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: pgsql: Correctly calculate "MCV frequency" for a unique column.
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox