MIME-Version: 1.0
In-Reply-To: <32209.1507841455@sss.pgh.pa.us>
References: 
 <CA+CSw_vitKcNDw+6zgSttoHRu+ZrKT3a3C5YzT+JyPuTRyLC_w@mail.gmail.com>
 <32209.1507841455@sss.pgh.pa.us>
From: Ants Aasma <ants.aasma@eesti.ee>
Date: Fri, 13 Oct 2017 05:32:03 +0300
Message-ID: 
 <CA+CSw_vW-qzVJM3z3XE_LY0V=fHAH6s0BNwZw+wvmgL78+NzCg@mail.gmail.com>
Subject: Re: Rowcount estimation changes based on from clause order
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

On Thu, Oct 12, 2017 at 11:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ants Aasma <ants.aasma@eesti.ee> writes:
>> I stumbled upon a severe row count underestimation that confusingly
>> went away when two inner joins in the from clause were reordered.
>
> Hm, looks more like an overestimate in this example, but anyway ...
>
>> Does anybody have any idea what is going on here?
>
> set_joinrel_size_estimates says
>
>  * Since there is more than one way to make a joinrel for more than two
>  * base relations, the results we get here could depend on which component
>  * rel pair is provided.  In theory we should get the same answers no matter
>  * which pair is provided; in practice, since the selectivity estimation
>  * routines don't handle all cases equally well, we might not.  But there's
>  * not much to be done about it.
>
> In this example I think the core of the issue is actually not so much
> bad selectivity estimates as rowcount roundoff error.
>
> If we first consider joining "small" with "big", we get an estimate of
> 2000 rows (which is dead on for what would happen if we just joined
> those).  Then we estimate the final result size as the join of that to
> "lookup".  The selectivity number for that step is somewhat hogwash but
> happens to yield a result that's not awful (8 rows).
>
> In the other case we first estimate the size of the join of "small" with
> the "lookup" subquery, and we get a rounded-off estimate of one row,
> whereas without the roundoff it would have been probably about 0.01.
> When that's joined to "big", we are computing one row times 1 million rows
> times a selectivity estimate that's about right for the "small.id =
> big.small_id" clause; but because the roundoff already inflated the first
> join's size so much, you end up with an inflated final result.
>
> This suggests that there might be some value in considering the
> sub-relations from largest to smallest, so that roundoff error
> in the earlier estimates is less likely to contaminate the final
> answer.  Not sure how expensive it would be to do that or what
> sort of instability it might introduce into plan choices.
>
> Whether that's got anything directly to do with your original problem is
> hard to say.  Joins to subqueries, which we normally lack any stats for,
> tend to produce pretty bogus selectivity numbers in themselves; so the
> original problem might've been more of that nature.

Thanks for pointing me in the correct direction. The original issue
was that values from lookup joined to ref_id and the subset filter in
the small table were almost perfectly correlated, which caused the
underestimate. In the second case this was hidden by the intermediate
clamping to 1, accidentally resulting in a more correct estimate.

I actually think that it might be better to consider relations from
smallest to largest. The reasoning being - a join cannot produce a
fraction of a row, it will either produce 0 or 1, and we should
probably plan for the case when it does return something.

Going even further, and I haven't looked at how feasible this is, but
I have run into several cases lately where cardinality underestimates
clamping to 1 result in catastrophically bad plans. Like a stack of
nested loops with unparameterized GroupAggregates and HashAggregates
as inner sides bad. It seems to me that row estimates should clamp to
something slightly larger than 1 unless it's provably going to be 1.

Regards,
Ants Aasma


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance