public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tom Lane <[email protected]>
To: Justin Pryzby <[email protected]>
Cc: [email protected]
Subject: Re: Re: join estimate of subqueries with range conditions and constraint exclusion
Date: Mon, 05 Jun 2017 17:02:32 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

Justin Pryzby <[email protected]> writes:
> I dug into this some more;  I can mitigate the issue with this change:

> diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
> index 6a4f7b1..962a5b4 100644
> --- a/src/backend/utils/adt/selfuncs.c
> +++ b/src/backend/utils/adt/selfuncs.c
> @@ -2279,6 +2279,22 @@ eqjoinsel_inner(Oid operator,
 
>         nd1 = get_variable_numdistinct(vardata1, &isdefault1);
>         nd2 = get_variable_numdistinct(vardata2, &isdefault2);
> +       elog(DEBUG4, "nd %lf %lf", nd1 ,nd2);
> +       if (nd1>vardata1->rel->rows) nd1=vardata1->rel->rows;
> +       if (nd2>vardata1->rel->rows) nd2=vardata2->rel->rows;
> +
> +       elog(DEBUG4, "nd %lf %lf", nd1 ,nd2);
> +       elog(DEBUG4, "rows %lf %lf", vardata1->rel->rows ,vardata2->rel->rows);
> +       elog(DEBUG4, "tuples %lf %lf", vardata1->rel->tuples ,vardata2->rel->tuples);

I don't like this change too much.  I agree that intuitively you would
not expect the number of distinct values to exceed the possibly-restricted
number of rows from the input relation, but I think this falls foul of
the problem mentioned in eqjoinsel_semi's comments, namely that it's
effectively double-counting the restriction selectivity.  It happens to
improve matters in the test case you show, but it's not exactly producing
a good estimate even so; and the fact that the change is in the right
direction seems like mostly an artifact of particular ndistinct and
rowcount values.  I note for instance that this patch would do nothing
at all for the toy example you posted upthread, because nd1/nd2 are
already equal to the rows estimates in that case.

The core reason why you get good results for

	select * from a join b using (x) where x = constant

is that there's a great deal of intelligence in the planner about
transitive equality deductions and what to do with partially-redundant
equality clauses.  The reason you don't get similarly good results for

	select * from a join b using (x) where x < constant

is that there is no comparable machinery for inequalities.  Maybe there
should be, but it'd be a fair bit of work to create, and we'd have to
keep one eye firmly fixed on whether it slows planning down even in cases
where no benefit ensues.  In the meantime, I'm not sure that there are
any quick-hack ways of materially improving the situation :-(

			regards, tom lane


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Re: join estimate of subqueries with range conditions and constraint exclusion
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox