Re: Implement missing join selectivity estimation for range types

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tom Lane <[email protected]>
To: Schoemans Maxime <[email protected]>
Cc: Damir Belyalov <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: SAKR Mahmoud <[email protected]>
Cc: Diogo Repas <[email protected]>
Cc: LUO Zhicheng <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Andrey Lepikhov <[email protected]>
Subject: Re: Implement missing join selectivity estimation for range types
Date: Tue, 14 Nov 2023 14:46:21 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CAB4o4asMq3k6HN9WfDsssQ5DDVfAziB4TpiFJ8RBJgZTVuwC7g@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAB4o4aud47V_iRyWtA8+ZAmdXDjCF165R73AeCjx2RL0nzQzHA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAB4o4asvPN=NT7KvS9zVQjZbdsiRW5t8aQctEkW7mxc4hbBxVQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CALH1Lgurg1y1DTeFOXOkpP=+X7saVGCh8gSjDLoCBOcFFWhz-A@mail.gmail.com>
	<[email protected]>

Schoemans Maxime <[email protected]> writes:
> You can find attached a new version of the patch that can be applied on 
> the current master branch of postgres.

I took a brief look through this very interesting work.  I concur
with Tomas that it feels a little odd that range join selectivity
would become smarter than scalar inequality join selectivity, and
that we really ought to prioritize applying these methods to that
case.  Still, that's a poor reason to not take the patch.

I also agree with the upthread criticism that having two identical
functions in different source files will be a maintenance nightmare.
Don't do it.  When and if there's a reason for the behavior to
diverge between the range and multirange cases, it'd likely be
better to handle that by passing in a flag to say what to do.

But my real unhappiness with the patch as-submitted is the test cases,
which require rowcount estimates to be reproduced exactly.  We know
very well that ANALYZE estimates are not perfectly stable and tend to
vary across platforms.  As a quick check I tried the patch within a
32-bit VM, and it passed, which surprised me a bit ... but it would
surprise me a lot if we got these same numbers on every machine in
the buildfarm.  We need a more forgiving test method.  Usually the
approach is to set up a test case where the improved accuracy of
the estimate changes the planner's choice of plan compared to what
you got before, since that will normally not be too prone to change
from variations of a percent or two in the estimates.  Another idea
could be something like

	SELECT (estimate/actual BETWEEN 0.9 AND 1.1) AS ok FROM ...

which just gives a true/false output instead of an exact number.

			regards, tom lane

view thread (5+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Implement missing join selectivity estimation for range types
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox