public inbox for [email protected]  
help / color / mirror / Atom feed
From: Richard Guo <[email protected]>
To: wenhui qiu <[email protected]>
Cc: Pg Hackers <[email protected]>
Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
Date: Tue, 14 Apr 2026 15:11:06 +0900
Message-ID: <CAMbWs4-Nib006vz1y9i5xE-0of9c-27oGyVS88uZgmOmPqST5w@mail.gmail.com> (raw)
In-Reply-To: <CAGjGUAK7cmV3u6YE3OBQCuhDh6hrhwY=meaEch0VDT3Kn32WDw@mail.gmail.com>
References: <CAMbWs49rWYrecgreDhKsfx3VSDW=qo35s+iAmgGu=wpARrM8_g@mail.gmail.com>
	<CAGjGUAK7cmV3u6YE3OBQCuhDh6hrhwY=meaEch0VDT3Kn32WDw@mail.gmail.com>

On Mon, Apr 13, 2026 at 12:27 PM wenhui qiu <[email protected]> wrote:

> Thanks so much for working on this! While looking at the negative stadistinct conversion, I was wondering if we might run into a potential edge case with multi-level nested subqueries. What do you think?
>
> /* Convert negative stadistinct to absolute count */
>
>     if (stats->stadistinct < 0)
>     {
> -       RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> +       RelOptInfo *baserel = vardata->rel;
>
> -       if (baserel->tuples > 0)
> +       if (baserel && baserel->tuples > 0)
>         {
>             stats->stadistinct = (float4)
>                 clamp_row_est(-stats->stadistinct * baserel->tuples);
>         }
>     }

I don't think your proposed change would work.  vardata->rel is the
CTE/subquery scan rel in the outer query, and its tuples count is the
CTE's output row count, not the base table's.  Using it would be
equivalent to not converting at all, since get_variable_numdistinct()
already computes -stadistinct * vardata->rel->tuples.  What we need
here is the base table's rel in the subroot, which gives us the
correct rowcount for interpreting the negative fraction.

- Richard





view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
  In-Reply-To: <CAMbWs4-Nib006vz1y9i5xE-0of9c-27oGyVS88uZgmOmPqST5w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox