public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tomas Vondra <[email protected]>
To: Andrei Lepikhov <[email protected]>
To: Alexandra Wang <[email protected]>
To: Corey Huinker <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Jeff Davis <[email protected]>
Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
Date: Sun, 1 Feb 2026 17:39:38 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CADkLM=cUwMftPLFq0iD6-qKRyNiRM2HZGYVp6=0noxA8GfuEtA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CADkLM=fEi_GeeS3zyg6B5WgswyPe0wNXHfKQOxjy8A5fXHD7=A@mail.gmail.com>
	<CAK98qZ2mW=geT9NKe5vC68-sB9EJe_887uV=MCFt6y9AhyTp7A@mail.gmail.com>
	<CAK98qZ0LwJbUoiZjjFXitojHy4UskkjYDiSd_JZfGE9LbfZm9w@mail.gmail.com>
	<[email protected]>

On 1/31/26 12:18, Andrei Lepikhov wrote:
> On 29/1/26 06:04, Alexandra Wang wrote:
>> Hi hackers,
>>
>> As promised in my previous email, I'm sharing a proof-of-concept patch
>> exploring join statistics for correlated columns across relations.
>> This is a POC at this point, but I hope the performance numbers below
>> give a better idea of both the potential usefulness of join statistics
>> and the complexity of implementing them.
> I wonder why you chose the JOIN operator only?
> 
> It seems to me that any relational operator produces relational output
> that can be treated as a table. The extended statistics code may be
> adopted to such relations.
> I think it may be a VIEW that you can declare (manually or
> automatically) and allow Postgres to build statistics on this 'virtual'
> table. So, the main focus may shift to the question: how to provably
> match a query subtree to a specific statistic.
> 

Because for each "supported" operator we need to know two things:

(1) how to sample it efficiently

(2) how to apply it in selectivity estimation

We can't add support for everything at once, and for some cases we may
not even know answers to (1) and/or (2).

We can't simply store an opaque VIEW, and build the stats by simply
executing it (and sampling the results). The whole premise of extended
stats is that people define them to fix incorrect estimates. And with
incorrect estimates the plan may be terrible, and the VIEW may not even
complete.


regards

-- 
Tomas Vondra







reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox