public inbox for [email protected]
help / color / mirror / Atom feedFrom: Corey Huinker <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Alexandra Wang <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: jian he <[email protected]>
Cc: [email protected]
Cc: Andrei Lepikhov <[email protected]>
Cc: [email protected]
Cc: Jeff Davis <[email protected]>
Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
Date: Wed, 27 May 2026 18:02:20 -0400
Message-ID: <CADkLM=dabcDYS1hqbZbeVsqeTH_AyD0tPvsvc77HOk=mMsVhOg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CADkLM=cUwMftPLFq0iD6-qKRyNiRM2HZGYVp6=0noxA8GfuEtA@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<CAK98qZ2ySno00SApwj3X_MgN4iuBzKaXKXOY+U3jaFqTxPS8Tg@mail.gmail.com>
<CAK98qZ0Vr4D3usesj_qpNFAH1=7NKX=bR9bNRNM7tdnk5KWHDw@mail.gmail.com>
<CAK98qZ2TveWY34VBw8LmQUEq2GUrPLhsaYYORgDAA5VZybyS5w@mail.gmail.com>
<CAK98qZ3TGXKcudcnyEhGuMUiru36m=Vm=Zm5Gyug0h+VyxmD0w@mail.gmail.com>
<CACJufxFOe4rx=J+0+=_g+K=bnhxs1OUdMJCgQ+tV0KdkQ_2aeQ@mail.gmail.com>
<CAK98qZ2mMJ3Nvx9U1S9N9Put1bb=iOgy8vFdp=rfjfta4wJ2AQ@mail.gmail.com>
<CAK98qZ3ASXEBuftvd=SxPwLOXgo=2-88SyZyLxXf3k4HfDKKsw@mail.gmail.com>
<CAK98qZ397f5xEkNgqEoeZTbXBnr7h665wZMvGJOnok+ch+XWSA@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<CAK98qZ0_4Kodyemk0Tdmew=YG8jeHQ=wzOydQws4s99A1X2h5g@mail.gmail.com>
<[email protected]>
>
> Also, maybe this is one more argument against the "optimizer view" idea
> Corey mentioned in Vancouver last week? Because surely we'll want to
> include the join statistics in export/import, and for the view approach
> we'd need to invent a fair amount of code to do that. Maybe?
>
We definitely want join statistics in the export/import.
To my mind, having stats on certain views would be extremely simple for
export/import, we'd simply have one more relkind that makes it into the
system views pg_stats and pg_stats_ext, and the statistics for that new
relation would plug into pg_statistic with no catalog change to
pg_statistic whatsoever. Additionally, allowing certain kinds of views (or
a relation relkind that's functionally equivalent to a view) to have
statistics makes it easy to define extended statistics on those views, with
no catalog change to pg_statistic_ext.
Having something in pg_class to anchor existing per-attribute and
multi-attribute stats off of seems like a big win to me. We'd get all
pre-existing statistics types for free, statistics kinds provided by
extensions for free, extended stats for free, import and export for free.
Well, not free, we just have to remove the exclusion that views (or
whatever relkind we create) can't have stats.
Having said all that, the statistics import code reads from
pg_statistic_ext but obviously never modifies it, it's all about
constructing the pg_statistic_ext_data row, and it need to only concern
itself with importing to the current version, so internal catalog changes
to pg_statistic_ext wouldn't be that big of a deal.
If, however, we want to stick with the notion that all non-relation,
not-attribute stats are extended stats, then we have to remodel a bit:
- as Tom and Tomas mentioned, we'd probably want to dispense with the
negative attnums and stxexprs, as those are already a bit of a pain to deal
with. That might be worth of a patch even without join statistics.
- we'd want a way to express stats for all the individual
columns/expressions defined in the join stats object, i.e. "what is the MCV
of B.name for rows joined by A on A.b_id"?
- we'd want some way of excluding the non-interesting combinations of
columns. If we had a stat on A.qty, A.price, B.cust_name, C.sale_date,
D.item_name, then we'd have 5-factorial MCV combinations in addition to the
5 single-column stats, and the (A.qty, A.price) combinations can already be
covered by a non-join extended stat.
It's those things that make optimizer views so attractive to me - we
already have a way to store individual column stats (pg_statistic) and shut
them off when not interesting (pg_attribute.attstattarget), and extended
statistics are a great way to describe which combinations of columns are
interesting to us.
Typing all this has me thinking that there may be a third way:
- statistics objects become pg_class objects, stxkeys and stxexprs become
pg_attribute rows
- pg_statisic_ext keeps (stxrelid, stxstattarget, stxkind), adds stxid
(pointing to new pg_class object) sort of like pg_index
- pg_statistic_ext_data remains as-is
and we'd set per-column stats collection like ALTER STATISTICS foo ALTER
COLUMN.
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
In-Reply-To: <CADkLM=dabcDYS1hqbZbeVsqeTH_AyD0tPvsvc77HOk=mMsVhOg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox