public inbox for [email protected]  
help / color / mirror / Atom feed
From: wenhui qiu <[email protected]>
To: Richard Guo <[email protected]>
Cc: Pg Hackers <[email protected]>
Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
Date: Tue, 14 Apr 2026 15:58:51 +0800
Message-ID: <CAGjGUALqgfapfAy+i7mYSq9+dOhg_4sf6C4bUhbM1+-uQqyoPw@mail.gmail.com> (raw)
In-Reply-To: <CAMbWs4-Nib006vz1y9i5xE-0of9c-27oGyVS88uZgmOmPqST5w@mail.gmail.com>
References: <CAMbWs49rWYrecgreDhKsfx3VSDW=qo35s+iAmgGu=wpARrM8_g@mail.gmail.com>
	<CAGjGUAK7cmV3u6YE3OBQCuhDh6hrhwY=meaEch0VDT3Kn32WDw@mail.gmail.com>
	<CAMbWs4-Nib006vz1y9i5xE-0of9c-27oGyVS88uZgmOmPqST5w@mail.gmail.com>

Hi Richard
> I don't think your proposed change would work.  vardata->rel is the
> CTE/subquery scan rel in the outer query, and its tuples count is the
> CTE's output row count, not the base table's.  Using it would be
> equivalent to not converting at all, since get_variable_numdistinct()
> already computes -stadistinct * vardata->rel->tuples.  What we need
> here is the base table's rel in the subroot, which gives us the
> correct rowcount for interpreting the negative fraction.
Thank you for your explanation. The path LGTM


Thanks

On Tue, Apr 14, 2026 at 2:11 PM Richard Guo <[email protected]> wrote:

> On Mon, Apr 13, 2026 at 12:27 PM wenhui qiu <[email protected]> wrote:
>
> > Thanks so much for working on this! While looking at the negative
> stadistinct conversion, I was wondering if we might run into a potential
> edge case with multi-level nested subqueries. What do you think?
> >
> > /* Convert negative stadistinct to absolute count */
> >
> >     if (stats->stadistinct < 0)
> >     {
> > -       RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> > +       RelOptInfo *baserel = vardata->rel;
> >
> > -       if (baserel->tuples > 0)
> > +       if (baserel && baserel->tuples > 0)
> >         {
> >             stats->stadistinct = (float4)
> >                 clamp_row_est(-stats->stadistinct * baserel->tuples);
> >         }
> >     }
>
> I don't think your proposed change would work.  vardata->rel is the
> CTE/subquery scan rel in the outer query, and its tuples count is the
> CTE's output row count, not the base table's.  Using it would be
> equivalent to not converting at all, since get_variable_numdistinct()
> already computes -stadistinct * vardata->rel->tuples.  What we need
> here is the base table's rel in the subroot, which gives us the
> correct rowcount for interpreting the negative fraction.
>
> - Richard
>


view thread (4+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
  In-Reply-To: <CAGjGUALqgfapfAy+i7mYSq9+dOhg_4sf6C4bUhbM1+-uQqyoPw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox