Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs

public inbox for [email protected]  
help / color / mirror / Atom feed

Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
4+ messages / 2 participants
[nested] [flat]

* Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
@ 2026-04-13 01:18 Richard Guo <[email protected]>
  2026-04-13 03:27 ` Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs wenhui qiu <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Richard Guo @ 2026-04-13 01:18 UTC (permalink / raw)
  To: Pg Hackers <[email protected]>

I complained in [1] that some TPC-DS queries suffer from very poor
cardinality estimates on CTE scan filters, to the point that simply
disabling nestloop makes some queries run hundreds of times faster.
Here's a simple reproduction:

create table t (a int, b int, c int);
insert into t select i%2, i, i from generate_series(1,1000) i;
analyze t;

explain analyze
with cte as (select a, b, avg(c) as avg from t group by a, b)
select * from cte t1, cte t2
where t1.a = 1 and t2.a = 1 and t1.avg = t2.avg;

Column 'a' has only 2 distinct values, so the filter a=1 on the
1000-row CTE output should estimate ~500 rows (assuming these values
are equally common).  Instead, the CTE scan estimates 5 rows (1000 *
1/200) because examine_simple_variable returns early when the subquery
has GROUP BY, and selectivity estimation falls back on
1/DEFAULT_NUM_DISTINCT.

   ->  CTE Scan on cte t1  (cost=0.00..22.50 rows=5 width=40)
                   (actual time=4.874..5.053 rows=500.00 loops=1)

As a result, this query ends up with a Nested Loop plan, and the
Execution Time is 192.907 ms.

For DISTINCT or GROUP BY key columns that are simple Vars, I think we
can propagate stadistinct from the base table, because the set of
distinct values is preserved after grouping.  MCV frequencies,
histograms, and correlation data are not valid since GROUP BY and
DISTINCT change the frequency distribution, but with stadistinct
alone, callers like var_eq_const() can use a 1/ndistinct estimate
rather than 1/DEFAULT_NUM_DISTINCT.

Attached is a patch to do this.  With the patch, the example above
estimates 500 rows ...

   ->  CTE Scan on cte t1  (cost=0.00..22.50 rows=500 width=40)
                   (actual time=3.785..4.143 rows=500.00 loops=1)

... and chooses a Hash Join, with an Execution Time of 8.238 ms (~20x
faster).

I tested this patch on TPC-DS query 31:

-- on master:
 Planning Time: 5.207 ms
 Execution Time: 1536140.258 ms

-- on patched:
 Planning Time: 5.140 ms
 Execution Time: 1149.482 ms

Over 1300x faster.

Does this approach make sense?  Any thoughts?

[1] https://postgr.es/m/CAMbWs4-QU_nkFqFZLdzWRsEsVE8aLWx4qBBVq7g4rXw+cvYDMg@mail.gmail.com

- Richard


Attachments:

  [application/octet-stream] v1-0001-Propagate-stadistinct-through-GROUP-BY-DISTINCT-i.patch (13.1K, 2-v1-0001-Propagate-stadistinct-through-GROUP-BY-DISTINCT-i.patch)
  download | inline diff:
From 6aa78da684a048e99a0366b918cf9de2572d9f5c Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Sun, 12 Apr 2026 15:12:20 +0900
Subject: [PATCH v1] Propagate stadistinct through GROUP BY/DISTINCT in
 subqueries and CTEs

Previously, examine_simple_variable() would return early when a
subquery or CTE used GROUP BY or DISTINCT.  It could detect uniqueness
for single-column cases, but for multi-column GROUP BY or DISTINCT,
selectivity estimation fell back on 1/DEFAULT_NUM_DISTINCT (1/200).
This produced wildly inaccurate estimates for filters and joins on
such columns, often leading the planner to choose nested loop joins
where hash joins would be far better.  This was a significant factor
in poor TPC-DS benchmark performance.

For DISTINCT or GROUP BY key columns that are simple Vars, we now
recurse into the subquery to obtain the base table's stadistinct,
which remains valid after grouping (the set of distinct values is
preserved).  However, MCV frequencies, histograms, and correlation
data are not valid because GROUP BY and DISTINCT change the frequency
distribution of key columns.  So we strip all stats slots from the
copied stats tuple, causing callers like var_eq_const() to use the
1/ndistinct estimate instead.  If stadistinct is stored as a negative
value (a fraction of the base table's row count), we convert it to an
absolute count so it is not misinterpreted relative to the subquery's
output row count.

Non-key columns (e.g., aggregate outputs) continue to get no stats,
same as before.
---
 src/backend/utils/adt/selfuncs.c   | 106 +++++++++++++++++++++++++----
 src/test/regress/expected/with.out |  94 +++++++++++++++++++++++++
 src/test/regress/sql/with.sql      |  35 ++++++++++
 3 files changed, 221 insertions(+), 14 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f2b58ebfe1e..adf638a380c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -247,6 +247,8 @@ static bool contain_placeholder_walker(Node *node, void *context);
 static Node *strip_all_phvs_mutator(Node *node, void *context);
 static void examine_simple_variable(PlannerInfo *root, Var *var,
 									VariableStatData *vardata);
+static void adjust_statstuple_for_grouping(PlannerInfo *subroot, Var *var,
+										   VariableStatData *vardata);
 static void examine_indexcol_variable(PlannerInfo *root, IndexOptInfo *index,
 									  int indexcol, VariableStatData *vardata);
 static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
@@ -6101,6 +6103,7 @@ examine_simple_variable(PlannerInfo *root, Var *var,
 		Query	   *subquery;
 		List	   *subtlist;
 		TargetEntry *ste;
+		bool		have_grouping = false;
 
 		/*
 		 * Punt if it's a whole-row var rather than a plain column reference.
@@ -6191,9 +6194,12 @@ examine_simple_variable(PlannerInfo *root, Var *var,
 		 * Punt if subquery uses set operations or grouping sets, as these
 		 * will mash underlying columns' stats beyond recognition.  (Set ops
 		 * are particularly nasty; if we forged ahead, we would return stats
-		 * relevant to only the leftmost subselect...)	DISTINCT is also
-		 * problematic, but we check that later because there is a possibility
-		 * of learning something even with it.
+		 * relevant to only the leftmost subselect...)	DISTINCT and GROUP BY
+		 * are also problematic, but we check those later because there is a
+		 * possibility of learning something even with them: we can detect
+		 * uniqueness for single-column cases, and for key columns that are
+		 * simple Vars, we can obtain a useful stadistinct from the underlying
+		 * base table.
 		 */
 		if (subquery->setOperations ||
 			subquery->groupingSets)
@@ -6211,28 +6217,42 @@ examine_simple_variable(PlannerInfo *root, Var *var,
 		var = (Var *) ste->expr;
 
 		/*
-		 * If subquery uses DISTINCT, we can't make use of any stats for the
+		 * If subquery uses DISTINCT, we can't make full use of stats for the
 		 * variable ... but, if it's the only DISTINCT column, we are entitled
 		 * to consider it unique.  We do the test this way so that it works
 		 * for cases involving DISTINCT ON.
+		 *
+		 * If the target is a DISTINCT key that is a simple Var, we can still
+		 * obtain a useful stadistinct from the base table, though MCV and
+		 * histogram data must be stripped since DISTINCT changes the
+		 * frequency distribution.  We set have_grouping and fall through to
+		 * the simple-Var recursion below.  Non-key columns cannot go further.
 		 */
 		if (subquery->distinctClause)
 		{
-			if (list_length(subquery->distinctClause) == 1 &&
-				targetIsInSortList(ste, InvalidOid, subquery->distinctClause))
-				vardata->isunique = true;
-			/* cannot go further */
-			return;
+			if (targetIsInSortList(ste, InvalidOid, subquery->distinctClause))
+			{
+				have_grouping = true;
+
+				if (list_length(subquery->distinctClause) == 1)
+					vardata->isunique = true;
+			}
+			else
+				return;
 		}
 
 		/* The same idea as with DISTINCT clause works for a GROUP-BY too */
 		if (subquery->groupClause)
 		{
-			if (list_length(subquery->groupClause) == 1 &&
-				targetIsInSortList(ste, InvalidOid, subquery->groupClause))
-				vardata->isunique = true;
-			/* cannot go further */
-			return;
+			if (targetIsInSortList(ste, InvalidOid, subquery->groupClause))
+			{
+				have_grouping = true;
+
+				if (list_length(subquery->groupClause) == 1)
+					vardata->isunique = true;
+			}
+			else if (!have_grouping)
+				return;
 		}
 
 		/*
@@ -6263,6 +6283,14 @@ examine_simple_variable(PlannerInfo *root, Var *var,
 			 * joined to other tables in a way that creates duplicates.
 			 */
 			examine_simple_variable(subroot, var, vardata);
+
+			/*
+			 * If the subquery uses DISTINCT or GROUP BY and we got here
+			 * because the target is a key column, strip MCV/histogram slots
+			 * from the stats tuple, keeping only stadistinct.
+			 */
+			if (have_grouping)
+				adjust_statstuple_for_grouping(subroot, var, vardata);
 		}
 	}
 	else
@@ -6276,6 +6304,56 @@ examine_simple_variable(PlannerInfo *root, Var *var,
 	}
 }
 
+/*
+ * adjust_statstuple_for_grouping
+ *		Adjust a stats tuple for use in a grouped or distinct context.
+ *
+ * This is used when the stats tuple was obtained by recursing into a subquery,
+ * but the subquery's output invalidates frequency-related statistics (e.g. due
+ * to GROUP BY or DISTINCT).  The set of distinct values is preserved by such
+ * operations, so stadistinct remains valid, but MCV frequencies, histograms,
+ * and correlation data are not.  Zeroing all stats slots causes callers (e.g.
+ * var_eq_const) to fall through to the 1/ndistinct estimate instead.
+ *
+ * If stadistinct is negative (a fraction of the base table's row count), we
+ * convert it to an absolute count, since it would otherwise be misinterpreted
+ * relative to the subquery output's row count.
+ */
+static void
+adjust_statstuple_for_grouping(PlannerInfo *subroot, Var *var,
+							   VariableStatData *vardata)
+{
+	HeapTuple	copy;
+	Form_pg_statistic stats;
+
+	if (!HeapTupleIsValid(vardata->statsTuple))
+		return;
+
+	copy = heap_copytuple(vardata->statsTuple);
+	stats = (Form_pg_statistic) GETSTRUCT(copy);
+
+	/* Convert negative stadistinct to absolute count */
+	if (stats->stadistinct < 0)
+	{
+		RelOptInfo *baserel = find_base_rel(subroot, var->varno);
+
+		if (baserel->tuples > 0)
+		{
+			stats->stadistinct = (float4)
+				clamp_row_est(-stats->stadistinct * baserel->tuples);
+		}
+	}
+
+	/* Zero out all stats slots */
+	for (int k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		(&stats->stakind1)[k] = 0;
+
+	/* Replace original with our modified copy */
+	vardata->freefunc(vardata->statsTuple);
+	vardata->statsTuple = copy;
+	vardata->freefunc = heap_freetuple;
+}
+
 /*
  * all_rows_selectable
  *		Test whether the user has permission to select all rows from a given
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 77ded01b046..addb24896be 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -3760,3 +3760,97 @@ select * from with_test;
 (1 row)
 
 drop table with_test;
+--
+-- Test selectivity estimates for GROUP BY and DISTINCT subqueries/CTEs.
+--
+-- When a subquery or CTE has GROUP BY or DISTINCT, the planner should use
+-- stadistinct from the base table for key columns, rather than falling back
+-- to DEFAULT_NUM_DISTINCT.  With correct estimates, the planner picks Hash
+-- Join; with the old default estimates it would pick Nested Loop.
+--
+-- Subquery with GROUP BY (OFFSET 0 prevents qual pushdown)
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+  (SELECT two, thousand, avg(ten) AS avg FROM tenk1 GROUP BY two, thousand OFFSET 0) t1,
+  (SELECT two, thousand, avg(ten) AS avg FROM tenk1 GROUP BY two, thousand OFFSET 0) t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.avg = t2.avg;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Hash Join
+   Hash Cond: (t1.avg = t2.avg)
+   ->  Subquery Scan on t1
+         Filter: (t1.two = 0)
+         ->  HashAggregate
+               Group Key: tenk1.two, tenk1.thousand
+               ->  Seq Scan on tenk1
+   ->  Hash
+         ->  Subquery Scan on t2
+               Filter: (t2.two = 0)
+               ->  HashAggregate
+                     Group Key: tenk1_1.two, tenk1_1.thousand
+                     ->  Seq Scan on tenk1 tenk1_1
+(13 rows)
+
+-- Subquery with DISTINCT (OFFSET 0 prevents qual pushdown)
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+  (SELECT DISTINCT two, thousand FROM tenk1 OFFSET 0) t1,
+  (SELECT DISTINCT two, thousand FROM tenk1 OFFSET 0) t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.thousand = t2.thousand;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Hash Join
+   Hash Cond: (t1.thousand = t2.thousand)
+   ->  Subquery Scan on t1
+         Filter: (t1.two = 0)
+         ->  HashAggregate
+               Group Key: tenk1.two, tenk1.thousand
+               ->  Seq Scan on tenk1
+   ->  Hash
+         ->  Subquery Scan on t2
+               Filter: (t2.two = 0)
+               ->  HashAggregate
+                     Group Key: tenk1_1.two, tenk1_1.thousand
+                     ->  Seq Scan on tenk1 tenk1_1
+(13 rows)
+
+-- CTE with GROUP BY
+EXPLAIN (COSTS OFF)
+WITH cte AS (SELECT two, thousand, avg(ten) AS avg FROM tenk1 GROUP BY two, thousand)
+SELECT * FROM cte t1, cte t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.avg = t2.avg;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: (t1.avg = t2.avg)
+   CTE cte
+     ->  HashAggregate
+           Group Key: tenk1.two, tenk1.thousand
+           ->  Seq Scan on tenk1
+   ->  CTE Scan on cte t1
+         Filter: (two = 0)
+   ->  Hash
+         ->  CTE Scan on cte t2
+               Filter: (two = 0)
+(11 rows)
+
+-- CTE with DISTINCT
+EXPLAIN (COSTS OFF)
+WITH cte AS (SELECT DISTINCT two, thousand FROM tenk1)
+SELECT * FROM cte t1, cte t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.thousand = t2.thousand;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: (t1.thousand = t2.thousand)
+   CTE cte
+     ->  HashAggregate
+           Group Key: tenk1.two, tenk1.thousand
+           ->  Seq Scan on tenk1
+   ->  CTE Scan on cte t1
+         Filter: (two = 0)
+   ->  Hash
+         ->  CTE Scan on cte t2
+               Filter: (two = 0)
+(11 rows)
+
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 2255befda89..0cb26312e21 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -1765,3 +1765,38 @@ create temp table with_test (i int);
 with with_test as (select 42) insert into with_test select * from with_test;
 select * from with_test;
 drop table with_test;
+
+--
+-- Test selectivity estimates for GROUP BY and DISTINCT subqueries/CTEs.
+--
+-- When a subquery or CTE has GROUP BY or DISTINCT, the planner should use
+-- stadistinct from the base table for key columns, rather than falling back
+-- to DEFAULT_NUM_DISTINCT.  With correct estimates, the planner picks Hash
+-- Join; with the old default estimates it would pick Nested Loop.
+--
+
+-- Subquery with GROUP BY (OFFSET 0 prevents qual pushdown)
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+  (SELECT two, thousand, avg(ten) AS avg FROM tenk1 GROUP BY two, thousand OFFSET 0) t1,
+  (SELECT two, thousand, avg(ten) AS avg FROM tenk1 GROUP BY two, thousand OFFSET 0) t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.avg = t2.avg;
+
+-- Subquery with DISTINCT (OFFSET 0 prevents qual pushdown)
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+  (SELECT DISTINCT two, thousand FROM tenk1 OFFSET 0) t1,
+  (SELECT DISTINCT two, thousand FROM tenk1 OFFSET 0) t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.thousand = t2.thousand;
+
+-- CTE with GROUP BY
+EXPLAIN (COSTS OFF)
+WITH cte AS (SELECT two, thousand, avg(ten) AS avg FROM tenk1 GROUP BY two, thousand)
+SELECT * FROM cte t1, cte t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.avg = t2.avg;
+
+-- CTE with DISTINCT
+EXPLAIN (COSTS OFF)
+WITH cte AS (SELECT DISTINCT two, thousand FROM tenk1)
+SELECT * FROM cte t1, cte t2
+WHERE t1.two = 0 AND t2.two = 0 AND t1.thousand = t2.thousand;
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
  2026-04-13 01:18 Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs Richard Guo <[email protected]>
@ 2026-04-13 03:27 ` wenhui qiu <[email protected]>
  2026-04-14 06:11   ` Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs Richard Guo <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: wenhui qiu @ 2026-04-13 03:27 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]>

HI Richard

> + /* Convert negative stadistinct to absolute count */
> + if (stats->stadistinct < 0)
> + {
> + RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> +
> + if (baserel->tuples > 0)
> + {
> + stats->stadistinct = (float4)
> + clamp_row_est(-stats->stadistinct * baserel->tuples);
> + }
> + }

Thanks so much for working on this! While looking at the negative
stadistinct conversion, I was wondering if we might run into a potential
edge case with multi-level nested subqueries. What do you think?

/* Convert negative stadistinct to absolute count */

    if (stats->stadistinct < 0)
    {
-       RelOptInfo *baserel = find_base_rel(subroot, var->varno);
+       RelOptInfo *baserel = vardata->rel;

-       if (baserel->tuples > 0)
+       if (baserel && baserel->tuples > 0)
        {
            stats->stadistinct = (float4)
                clamp_row_est(-stats->stadistinct * baserel->tuples);
        }
    }

Thanks

On Mon, Apr 13, 2026 at 9:19 AM Richard Guo <[email protected]> wrote:

> I complained in [1] that some TPC-DS queries suffer from very poor
> cardinality estimates on CTE scan filters, to the point that simply
> disabling nestloop makes some queries run hundreds of times faster.
> Here's a simple reproduction:
>
> create table t (a int, b int, c int);
> insert into t select i%2, i, i from generate_series(1,1000) i;
> analyze t;
>
> explain analyze
> with cte as (select a, b, avg(c) as avg from t group by a, b)
> select * from cte t1, cte t2
> where t1.a = 1 and t2.a = 1 and t1.avg = t2.avg;
>
> Column 'a' has only 2 distinct values, so the filter a=1 on the
> 1000-row CTE output should estimate ~500 rows (assuming these values
> are equally common).  Instead, the CTE scan estimates 5 rows (1000 *
> 1/200) because examine_simple_variable returns early when the subquery
> has GROUP BY, and selectivity estimation falls back on
> 1/DEFAULT_NUM_DISTINCT.
>
>    ->  CTE Scan on cte t1  (cost=0.00..22.50 rows=5 width=40)
>                    (actual time=4.874..5.053 rows=500.00 loops=1)
>
> As a result, this query ends up with a Nested Loop plan, and the
> Execution Time is 192.907 ms.
>
> For DISTINCT or GROUP BY key columns that are simple Vars, I think we
> can propagate stadistinct from the base table, because the set of
> distinct values is preserved after grouping.  MCV frequencies,
> histograms, and correlation data are not valid since GROUP BY and
> DISTINCT change the frequency distribution, but with stadistinct
> alone, callers like var_eq_const() can use a 1/ndistinct estimate
> rather than 1/DEFAULT_NUM_DISTINCT.
>
> Attached is a patch to do this.  With the patch, the example above
> estimates 500 rows ...
>
>    ->  CTE Scan on cte t1  (cost=0.00..22.50 rows=500 width=40)
>                    (actual time=3.785..4.143 rows=500.00 loops=1)
>
> ... and chooses a Hash Join, with an Execution Time of 8.238 ms (~20x
> faster).
>
> I tested this patch on TPC-DS query 31:
>
> -- on master:
>  Planning Time: 5.207 ms
>  Execution Time: 1536140.258 ms
>
> -- on patched:
>  Planning Time: 5.140 ms
>  Execution Time: 1149.482 ms
>
> Over 1300x faster.
>
> Does this approach make sense?  Any thoughts?
>
> [1]
> https://postgr.es/m/CAMbWs4-QU_nkFqFZLdzWRsEsVE8aLWx4qBBVq7g4rXw+cvYDMg@mail.gmail.com
>
> - Richard
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
  2026-04-13 01:18 Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs Richard Guo <[email protected]>
  2026-04-13 03:27 ` Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs wenhui qiu <[email protected]>
@ 2026-04-14 06:11   ` Richard Guo <[email protected]>
  2026-04-14 07:58     ` Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs wenhui qiu <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Richard Guo @ 2026-04-14 06:11 UTC (permalink / raw)
  To: wenhui qiu <[email protected]>; +Cc: Pg Hackers <[email protected]>

On Mon, Apr 13, 2026 at 12:27 PM wenhui qiu <[email protected]> wrote:

> Thanks so much for working on this! While looking at the negative stadistinct conversion, I was wondering if we might run into a potential edge case with multi-level nested subqueries. What do you think?
>
> /* Convert negative stadistinct to absolute count */
>
>     if (stats->stadistinct < 0)
>     {
> -       RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> +       RelOptInfo *baserel = vardata->rel;
>
> -       if (baserel->tuples > 0)
> +       if (baserel && baserel->tuples > 0)
>         {
>             stats->stadistinct = (float4)
>                 clamp_row_est(-stats->stadistinct * baserel->tuples);
>         }
>     }

I don't think your proposed change would work.  vardata->rel is the
CTE/subquery scan rel in the outer query, and its tuples count is the
CTE's output row count, not the base table's.  Using it would be
equivalent to not converting at all, since get_variable_numdistinct()
already computes -stadistinct * vardata->rel->tuples.  What we need
here is the base table's rel in the subroot, which gives us the
correct rowcount for interpreting the negative fraction.

- Richard





^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
  2026-04-13 01:18 Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs Richard Guo <[email protected]>
  2026-04-13 03:27 ` Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs wenhui qiu <[email protected]>
  2026-04-14 06:11   ` Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs Richard Guo <[email protected]>
@ 2026-04-14 07:58     ` wenhui qiu <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: wenhui qiu @ 2026-04-14 07:58 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]>

Hi Richard
> I don't think your proposed change would work.  vardata->rel is the
> CTE/subquery scan rel in the outer query, and its tuples count is the
> CTE's output row count, not the base table's.  Using it would be
> equivalent to not converting at all, since get_variable_numdistinct()
> already computes -stadistinct * vardata->rel->tuples.  What we need
> here is the base table's rel in the subroot, which gives us the
> correct rowcount for interpreting the negative fraction.
Thank you for your explanation. The path LGTM


Thanks

On Tue, Apr 14, 2026 at 2:11 PM Richard Guo <[email protected]> wrote:

> On Mon, Apr 13, 2026 at 12:27 PM wenhui qiu <[email protected]> wrote:
>
> > Thanks so much for working on this! While looking at the negative
> stadistinct conversion, I was wondering if we might run into a potential
> edge case with multi-level nested subqueries. What do you think?
> >
> > /* Convert negative stadistinct to absolute count */
> >
> >     if (stats->stadistinct < 0)
> >     {
> > -       RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> > +       RelOptInfo *baserel = vardata->rel;
> >
> > -       if (baserel->tuples > 0)
> > +       if (baserel && baserel->tuples > 0)
> >         {
> >             stats->stadistinct = (float4)
> >                 clamp_row_est(-stats->stadistinct * baserel->tuples);
> >         }
> >     }
>
> I don't think your proposed change would work.  vardata->rel is the
> CTE/subquery scan rel in the outer query, and its tuples count is the
> CTE's output row count, not the base table's.  Using it would be
> equivalent to not converting at all, since get_variable_numdistinct()
> already computes -stadistinct * vardata->rel->tuples.  What we need
> here is the base table's rel in the subroot, which gives us the
> correct rowcount for interpreting the negative fraction.
>
> - Richard
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2026-04-14 07:58 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-13 01:18 Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs Richard Guo <[email protected]>
2026-04-13 03:27 ` wenhui qiu <[email protected]>
2026-04-14 06:11   ` Richard Guo <[email protected]>
2026-04-14 07:58     ` wenhui qiu <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox