public inbox for [email protected]
help / color / mirror / Atom feedFrom: Alexander Korotkov <[email protected]>
To: [email protected]
Subject: pgsql: Use extended stats for precise estimation of bucket size in hash
Date: Mon, 10 Mar 2025 11:47:56 +0000
Message-ID: <[email protected]> (raw)
Use extended stats for precise estimation of bucket size in hash join
Recognizing the real-life complexity where columns in the table often have
functional dependencies, PostgreSQL's estimation of the number of distinct
values over a set of columns can be underestimated (or much rarely,
overestimated) when dealing with multi-clause JOIN. In the case of hash
join, it can end up with a small number of predicted hash buckets and, as
a result, picking non-optimal merge join.
To improve the situation, we introduce one additional stage of bucket size
estimation - having two or more join clauses estimator lookup for extended
statistics and use it for multicolumn estimation. Clauses are grouped into
lists, each containing expressions referencing the same relation. The result
of the multicolumn estimation made over such a list is combined with others
according to the caller's logic. Clauses that are not estimated are returned
to the caller for further estimation.
Discussion: https://postgr.es/m/52257607-57f6-850d-399a-ec33a654457b%40postgrespro.ru
Author: Andrei Lepikhov <[email protected]>
Reviewed-by: Andy Fan <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Alena Rybakina <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/6bb6a62f3cc45624c601d5270673a17447734629
Modified Files
--------------
src/backend/optimizer/path/costsize.c | 12 ++-
src/backend/utils/adt/selfuncs.c | 175 ++++++++++++++++++++++++++++++++
src/include/utils/selfuncs.h | 4 +
src/test/regress/expected/stats_ext.out | 45 ++++++++
src/test/regress/sql/stats_ext.sql | 29 ++++++
5 files changed, 264 insertions(+), 1 deletion(-)
view thread (3+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: pgsql: Use extended stats for precise estimation of bucket size in hash
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox