public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Unfortunate pushing down of expressions below sort
4+ messages / 4 participants
[nested] [flat]

* Re: Unfortunate pushing down of expressions below sort
@ 2026-02-06 14:37 Chengpeng Yan <[email protected]>
  2026-02-07 12:46 ` Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Chengpeng Yan @ 2026-02-06 14:37 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; Andres Freund <[email protected]>; +Cc: pgsql-hackers; David Rowley <[email protected]>

Hi,

I took a closer look at make_sort_input_target() in
src/backend/optimizer/plan/planner.c.

The current heuristics only defer targetlist expressions past a Sort
when they are: 
* volatile, or 
* set-returning (due to SRF synchronization constraints), or 
* considered “expensive” (cost > 10 * cpu_operator_cost) and there is a
LIMIT.

Functions like repeat() and acldefault() have the default procost = 1,
so they don’t meet the “expensive” threshold and therefore remain below
the Sort, which is what leads to the tuple width inflation seen in these
examples.

Grouping behaves differently: make_group_input_target() unconditionally
flattens non-grouping expressions to Vars, which is why repeat() ends up
above the Aggregate node there.

Tom mentioned the md5(widecol) counterexample, where evaluating the
expression before the sort can actually reduce memory usage. The key
distinction seems to be whether the expression depends solely on sort
keys.

The approach I’m experimenting with is to defer an expression only when
all the Vars it depends on are sort keys. That gives the desired
behavior in both cases: 
* repeat(i, 1000) ORDER BY i: i is the sort key, so we defer and keep
the sort tuples narrow. 
* md5(widecol) ORDER BY id: widecol is not a sort key, so we keep the
expression below the sort and avoid carrying the wide column.

This seems to address the cases discussed in this thread and should be
low-risk for the common case.

I’m working on a patch along these lines; any thoughts?

--
Best regards,
Chengpeng Yan



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Unfortunate pushing down of expressions below sort
  2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]>
@ 2026-02-07 12:46 ` Chengpeng Yan <[email protected]>
  2026-04-07 20:16   ` Re: Unfortunate pushing down of expressions below sort Tom Lane <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Chengpeng Yan @ 2026-02-07 12:46 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; Andres Freund <[email protected]>; +Cc: pgsql-hackers; David Rowley <[email protected]>

Hi,

Following up on the discussion below, I now have a patch.

The patch extends make_sort_input_target() with a conservative rule:
defer additional non-sort targetlist expressions past Sort only when
doing so does not require carrying any additional Vars/PlaceHolderVars
through Sort. This way, Sort input width never increases.

This still allows cases like repeat(i::text, ...) ORDER BY i to be
projected above Sort, while avoiding the md5(widecol) counterexample
mentioned earlier, since such expressions are not deferred when they
would force a wide non-sort column through Sort.

One limitation remains: this can't help queries like
'SELECT repeat(i::text, ...) FROM t ORDER BY othercols;'
where the output expression depends on Vars that are not sort keys. In
that case we still have to carry i through the Sort to be able to
compute the final targetlist, so the patch cannot avoid inflating Sort's
input width (and may still evaluate repeat() before Sort, depending on
the existing projection placement rules).

The existing volatile/SRF/expensive behavior is unchanged (expensive
exprs are still postponed once a post-sort projection is needed).

I also added regression coverage (including an md5(widecol)-style case).
Some existing EXPLAIN outputs (e.g. join/groupingsets) now show a Result
above Sort, which is expected from postponing additional non-sort
outputs.

Patch attached. Comments welcome.

--
Best regards,
Chengpeng Yan


Attachments:

  [application/octet-stream] v1-0001-planner-postpone-some-non-sort-output-expressions.patch (19.8K, 3-v1-0001-planner-postpone-some-non-sort-output-expressions.patch)
  download | inline diff:
From ad0c5c23a6b07fe5376b0a3c042423dcf3e679d5 Mon Sep 17 00:00:00 2001
From: Chengpeng Yan <[email protected]>
Date: Sat, 7 Feb 2026 20:21:27 +0800
Subject: [PATCH v1] planner: postpone some non-sort output expressions past
 Sort

Allow make_sort_input_target() to postpone additional non-sort
targetlist expressions when doing so doesn't require carrying any
additional Vars or PlaceHolderVars through the Sort.  This keeps the
sort input no wider, and can avoid evaluating output expressions for
rows that are never returned under LIMIT.
---
 src/backend/optimizer/plan/planner.c       | 121 ++++++++++++++++++++-
 src/test/regress/expected/groupingsets.out |  99 +++++++++--------
 src/test/regress/expected/join.out         |   9 +-
 src/test/regress/expected/limit.out        |  51 +++++++++
 src/test/regress/expected/tuplesort.out    |  42 +++++++
 src/test/regress/sql/limit.sql             |  26 +++++
 src/test/regress/sql/tuplesort.sql         |  20 ++++
 7 files changed, 315 insertions(+), 53 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 757bdc7b1de..49d0568d01c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6444,6 +6444,11 @@ make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
  * any volatile or set-returning expressions (since once we've put in a
  * projection at all, it won't cost any more to postpone more stuff).
  *
+ * We can also postpone some additional non-sort output expressions when doing
+ * so would not require carrying any additional Vars/PlaceHolderVars through
+ * the Sort.  This keeps the sort input no wider, and can avoid evaluating
+ * output expressions for rows that are never returned under LIMIT.
+ *
  * Another issue that could potentially be considered here is that
  * evaluating tlist expressions could result in data that's either wider
  * or narrower than the input Vars, thus changing the volume of data that
@@ -6484,12 +6489,14 @@ make_sort_input_target(PlannerInfo *root,
 	PathTarget *input_target;
 	int			ncols;
 	bool	   *col_is_srf;
+	bool	   *col_is_expensive;
 	bool	   *postpone_col;
 	bool		have_srf;
 	bool		have_volatile;
 	bool		have_expensive;
 	bool		have_srf_sortcols;
 	bool		postpone_srfs;
+	bool		have_safe_postpone;
 	List	   *postponable_cols;
 	List	   *postponable_vars;
 	int			i;
@@ -6503,8 +6510,10 @@ make_sort_input_target(PlannerInfo *root,
 	/* Inspect tlist and collect per-column information */
 	ncols = list_length(final_target->exprs);
 	col_is_srf = (bool *) palloc0(ncols * sizeof(bool));
+	col_is_expensive = (bool *) palloc0(ncols * sizeof(bool));
 	postpone_col = (bool *) palloc0(ncols * sizeof(bool));
 	have_srf = have_volatile = have_expensive = have_srf_sortcols = false;
+	have_safe_postpone = false;
 
 	i = 0;
 	foreach(lc, final_target->exprs)
@@ -6556,7 +6565,7 @@ make_sort_input_target(PlannerInfo *root,
 				 */
 				if (cost.per_tuple > 10 * cpu_operator_cost)
 				{
-					postpone_col[i] = true;
+					col_is_expensive[i] = true;
 					have_expensive = true;
 				}
 			}
@@ -6578,12 +6587,120 @@ make_sort_input_target(PlannerInfo *root,
 	 */
 	postpone_srfs = (have_srf && !have_srf_sortcols);
 
+	/*
+	 * Keep the historical expensive-expression policy: once we're adding a
+	 * post-sort projection for any reason, postpone all expensive
+	 * expressions.
+	 */
+	if (postpone_srfs || have_volatile ||
+		(have_expensive && (parse->limitCount || root->tuple_fraction > 0)))
+	{
+		i = 0;
+		foreach(lc, final_target->exprs)
+		{
+			if (col_is_expensive[i])
+				postpone_col[i] = true;
+			i++;
+		}
+	}
+
+	/*
+	 * We can postpone some additional non-sort output expressions if doing so
+	 * doesn't require carrying any extra Vars/PlaceHolderVars through the
+	 * Sort.
+	 */
+	{
+		List	   *required_exprs = NIL;
+		List	   *base_postponable_cols = NIL;
+		List	   *base_postponable_vars;
+		List	   *required_before_sort;
+		int			j;
+
+		/*
+		 * Build the set of expressions that will already be carried through
+		 * the Sort: non-postponed columns, plus Vars/PHVs etc needed for
+		 * already-postponed columns.
+		 */
+		j = 0;
+		foreach(lc, final_target->exprs)
+		{
+			Expr	   *expr = (Expr *) lfirst(lc);
+
+			if (postpone_col[j] || (postpone_srfs && col_is_srf[j]))
+				base_postponable_cols = lappend(base_postponable_cols, expr);
+			else
+				required_exprs = lappend(required_exprs, expr);
+
+			j++;
+		}
+
+		base_postponable_vars = pull_var_clause((Node *) base_postponable_cols,
+												PVC_INCLUDE_AGGREGATES |
+												PVC_INCLUDE_WINDOWFUNCS |
+												PVC_INCLUDE_PLACEHOLDERS);
+		required_before_sort = list_union(required_exprs, base_postponable_vars);
+
+		/*
+		 * Mark any safe-to-postpone columns.  We ignore simple Vars/Aggrefs/
+		 * WindowFuncs/PHVs because postponing them would not avoid any work.
+		 */
+		j = 0;
+		foreach(lc, final_target->exprs)
+		{
+			Expr	   *expr = (Expr *) lfirst(lc);
+			List	   *expr_vars;
+			ListCell   *lc2;
+			bool		safe = true;
+
+			if (postpone_col[j] || (postpone_srfs && col_is_srf[j]) ||
+				col_is_srf[j] ||
+				get_pathtarget_sortgroupref(final_target, j) != 0 ||
+				IsA(expr, Var) ||
+				IsA(expr, Aggref) ||
+				IsA(expr, WindowFunc) ||
+				IsA(expr, PlaceHolderVar))
+			{
+				j++;
+				continue;
+			}
+
+			expr_vars = pull_var_clause((Node *) expr,
+										PVC_INCLUDE_AGGREGATES |
+										PVC_INCLUDE_WINDOWFUNCS |
+										PVC_INCLUDE_PLACEHOLDERS);
+			foreach(lc2, expr_vars)
+			{
+				if (!list_member(required_before_sort, lfirst(lc2)))
+				{
+					safe = false;
+					break;
+				}
+			}
+
+			if (safe)
+			{
+				postpone_col[j] = true;
+				have_safe_postpone = true;
+			}
+
+			list_free(expr_vars);
+			j++;
+		}
+
+		/* clean up cruft */
+		list_free(required_before_sort);
+		list_free(base_postponable_vars);
+		list_free(base_postponable_cols);
+		list_free(required_exprs);
+	}
+
 	/*
 	 * If we don't need a post-sort projection, just return final_target.
 	 */
 	if (!(postpone_srfs || have_volatile ||
 		  (have_expensive &&
-		   (parse->limitCount || root->tuple_fraction > 0))))
+		   (parse->limitCount || root->tuple_fraction > 0)) ||
+		  have_safe_postpone))
 		return final_target;
 
 	/*
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 921017489c0..5d4759521a7 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -972,19 +972,20 @@ select v.c, (select count(*) from gstest2 group by () having v.c)
 explain (costs off)
   select v.c, (select count(*) from gstest2 group by () having v.c)
     from (values (false),(true)) v(c) order by v.c;
-                        QUERY PLAN                         
------------------------------------------------------------
- Sort
-   Sort Key: "*VALUES*".column1
-   ->  Values Scan on "*VALUES*"
-         SubPlan expr_1
-           ->  Aggregate
-                 Group Key: ()
-                 Filter: "*VALUES*".column1
-                 ->  Result
-                       One-Time Filter: "*VALUES*".column1
-                       ->  Seq Scan on gstest2
-(10 rows)
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Result
+   ->  Sort
+         Sort Key: "*VALUES*".column1
+         ->  Values Scan on "*VALUES*"
+   SubPlan expr_1
+     ->  Aggregate
+           Group Key: ()
+           Filter: "*VALUES*".column1
+           ->  Result
+                 One-Time Filter: "*VALUES*".column1
+                 ->  Seq Scan on gstest2
+(11 rows)
 
 -- test pushdown of non-degenerate HAVING clause that does not reference any
 -- columns that are nullable by grouping sets
@@ -2398,24 +2399,26 @@ order by case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0
               then (select t1.v from gstest5 t2 where id = t1.id)
               else null end
          nulls first;
-                                                                           QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Sort
+                                                                              QUERY PLAN                                                                               
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Result
    Output: (GROUPING((SubPlan expr_1))), ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v
-   Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST
-   ->  HashAggregate
-         Output: GROUPING((SubPlan expr_1)), ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v
-         Hash Key: t1.v
-         Hash Key: (SubPlan expr_3)
-         ->  Seq Scan on pg_temp.gstest5 t1
-               Output: (SubPlan expr_3), t1.v, t1.id
-               SubPlan expr_3
-                 ->  Bitmap Heap Scan on pg_temp.gstest5 t2
-                       Output: t1.v
-                       Recheck Cond: (t2.id = t1.id)
-                       ->  Bitmap Index Scan on gstest5_pkey
-                             Index Cond: (t2.id = t1.id)
-(15 rows)
+   ->  Sort
+         Output: ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v, (GROUPING((SubPlan expr_1)))
+         Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST
+         ->  HashAggregate
+               Output: ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v, GROUPING((SubPlan expr_1))
+               Hash Key: t1.v
+               Hash Key: (SubPlan expr_3)
+               ->  Seq Scan on pg_temp.gstest5 t1
+                     Output: (SubPlan expr_3), t1.v, t1.id
+                     SubPlan expr_3
+                       ->  Bitmap Heap Scan on pg_temp.gstest5 t2
+                             Output: t1.v
+                             Recheck Cond: (t2.id = t1.id)
+                             ->  Bitmap Index Scan on gstest5_pkey
+                                   Index Cond: (t2.id = t1.id)
+(17 rows)
 
 select grouping((select t1.v from gstest5 t2 where id = t1.id)),
        (select t1.v from gstest5 t2 where id = t1.id) as s
@@ -2448,24 +2451,26 @@ select grouping((select t1.v from gstest5 t2 where id = t1.id)),
 from gstest5 t1
 group by grouping sets(v, s)
 order by o nulls first;
-                                                                           QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Sort
+                                                                              QUERY PLAN                                                                               
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Result
    Output: (GROUPING((SubPlan expr_1))), ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v
-   Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST
-   ->  HashAggregate
-         Output: GROUPING((SubPlan expr_1)), ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v
-         Hash Key: t1.v
-         Hash Key: (SubPlan expr_3)
-         ->  Seq Scan on pg_temp.gstest5 t1
-               Output: (SubPlan expr_3), t1.v, t1.id
-               SubPlan expr_3
-                 ->  Bitmap Heap Scan on pg_temp.gstest5 t2
-                       Output: t1.v
-                       Recheck Cond: (t2.id = t1.id)
-                       ->  Bitmap Index Scan on gstest5_pkey
-                             Index Cond: (t2.id = t1.id)
-(15 rows)
+   ->  Sort
+         Output: ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v, (GROUPING((SubPlan expr_1)))
+         Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST
+         ->  HashAggregate
+               Output: ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v, GROUPING((SubPlan expr_1))
+               Hash Key: t1.v
+               Hash Key: (SubPlan expr_3)
+               ->  Seq Scan on pg_temp.gstest5 t1
+                     Output: (SubPlan expr_3), t1.v, t1.id
+                     SubPlan expr_3
+                       ->  Bitmap Heap Scan on pg_temp.gstest5 t2
+                             Output: t1.v
+                             Recheck Cond: (t2.id = t1.id)
+                             ->  Bitmap Index Scan on gstest5_pkey
+                                   Index Cond: (t2.id = t1.id)
+(17 rows)
 
 select grouping((select t1.v from gstest5 t2 where id = t1.id)),
        (select t1.v from gstest5 t2 where id = t1.id) as s,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index d05a0ca0373..d1cc951aae3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2810,12 +2810,13 @@ select count(*) from
    ->  Merge Left Join
          Merge Cond: (x.thousand = y.unique2)
          Join Filter: ((x.twothousand = y.hundred) AND (x.fivethous = y.unique2))
-         ->  Sort
-               Sort Key: x.thousand, x.twothousand, x.fivethous
-               ->  Seq Scan on tenk1 x
+         ->  Result
+               ->  Sort
+                     Sort Key: x.thousand, x.twothousand, x.fivethous
+                     ->  Seq Scan on tenk1 x
          ->  Materialize
                ->  Index Scan using tenk1_unique2 on tenk1 y
-(9 rows)
+(10 rows)
 
 select count(*) from
   (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x
diff --git a/src/test/regress/expected/limit.out b/src/test/regress/expected/limit.out
index e3bcc680653..7e29f09c68f 100644
--- a/src/test/regress/expected/limit.out
+++ b/src/test/regress/expected/limit.out
@@ -555,6 +555,57 @@ select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2
  45020 | 45020
 (3 rows)
 
+--
+-- Postpone non-sort output expressions past Sort under LIMIT, when doing so
+-- doesn't require carrying additional columns through the Sort.
+--
+explain (verbose, costs off)
+select repeat(g.i::text, 100)
+  from generate_series(1, 100) g(i)
+  order by g.i
+  limit 10;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Limit
+   Output: (repeat((i)::text, 100)), i
+   ->  Result
+         Output: repeat((i)::text, 100), i
+         ->  Sort
+               Output: i
+               Sort Key: g.i
+               ->  Function Scan on pg_catalog.generate_series g
+                     Output: i
+                     Function Call: generate_series(1, 100)
+(10 rows)
+
+--
+-- Don't postpone if that would require carrying a wide column through
+-- the Sort.  Use MATERIALIZED to prevent inlining.
+--
+explain (verbose, costs off)
+with s(i, wide) as materialized (
+  select i, repeat(i::text, 100) as wide
+    from generate_series(1, 100) g(i)
+)
+select md5(wide)
+  from s
+  order by i
+  limit 10;
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Limit
+   Output: (md5(s.wide)), s.i
+   CTE s
+     ->  Function Scan on pg_catalog.generate_series g
+           Output: g.i, repeat((g.i)::text, 100)
+           Function Call: generate_series(1, 100)
+   ->  Sort
+         Output: (md5(s.wide)), s.i
+         Sort Key: s.i
+         ->  CTE Scan on s
+               Output: md5(s.wide), s.i
+(11 rows)
+
 --
 -- FETCH FIRST
 -- Check the WITH TIES clause
diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out
index 6dd97e7427a..1c15c6b21fa 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -356,6 +356,48 @@ ORDER BY v.a DESC;
  aaaaaaaaaa | 1
 (2 rows)
 
+----
+-- Test postponing non-sort output expressions past Sort.
+----
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT repeat(g.i::text, 100)
+  FROM generate_series(1, 100) g(i)
+  ORDER BY g.i;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ Result
+   Output: repeat((i)::text, 100), i
+   ->  Sort
+         Output: i
+         Sort Key: g.i
+         ->  Function Scan on pg_catalog.generate_series g
+               Output: i
+               Function Call: generate_series(1, 100)
+(8 rows)
+
+-- Don't postpone if that would require carrying a wide column through
+-- the sort.  Use MATERIALIZED to prevent inlining.
+EXPLAIN (VERBOSE, COSTS OFF)
+WITH s(i, wide) AS MATERIALIZED (
+  SELECT i, repeat(i::text, 100) AS wide
+    FROM generate_series(1, 100) g(i)
+)
+SELECT md5(wide)
+  FROM s
+  ORDER BY i;
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Sort
+   Output: (md5(s.wide)), s.i
+   Sort Key: s.i
+   CTE s
+     ->  Function Scan on pg_catalog.generate_series g
+           Output: g.i, repeat((g.i)::text, 100)
+           Function Call: generate_series(1, 100)
+   ->  CTE Scan on s
+         Output: md5(s.wide), s.i
+(9 rows)
+
 ----
 -- test forward and backward scans for in-memory and disk based tuplesort
 ----
diff --git a/src/test/regress/sql/limit.sql b/src/test/regress/sql/limit.sql
index 603910fe6d1..41323fbda1e 100644
--- a/src/test/regress/sql/limit.sql
+++ b/src/test/regress/sql/limit.sql
@@ -152,6 +152,32 @@ select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2
 select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2
   from tenk1 group by thousand order by thousand limit 3;
 
+--
+-- Postpone non-sort output expressions past Sort under LIMIT, when doing so
+-- doesn't require carrying additional columns through the Sort.
+--
+
+explain (verbose, costs off)
+select repeat(g.i::text, 100)
+  from generate_series(1, 100) g(i)
+  order by g.i
+  limit 10;
+
+--
+-- Don't postpone if that would require carrying a wide column through
+-- the Sort.  Use MATERIALIZED to prevent inlining.
+--
+
+explain (verbose, costs off)
+with s(i, wide) as materialized (
+  select i, repeat(i::text, 100) as wide
+    from generate_series(1, 100) g(i)
+)
+select md5(wide)
+  from s
+  order by i
+  limit 10;
+
 --
 -- FETCH FIRST
 -- Check the WITH TIES clause
diff --git a/src/test/regress/sql/tuplesort.sql b/src/test/regress/sql/tuplesort.sql
index 8476e594e6c..d63ec576c73 100644
--- a/src/test/regress/sql/tuplesort.sql
+++ b/src/test/regress/sql/tuplesort.sql
@@ -155,6 +155,26 @@ SELECT LEFT(a,10),b FROM
     (VALUES(REPEAT('a', 512 * 1024),1),(REPEAT('b', 512 * 1024),2)) v(a,b)
 ORDER BY v.a DESC;
 
+----
+-- Test postponing non-sort output expressions past Sort.
+----
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT repeat(g.i::text, 100)
+  FROM generate_series(1, 100) g(i)
+  ORDER BY g.i;
+
+-- Don't postpone if that would require carrying a wide column through
+-- the sort.  Use MATERIALIZED to prevent inlining.
+EXPLAIN (VERBOSE, COSTS OFF)
+WITH s(i, wide) AS MATERIALIZED (
+  SELECT i, repeat(i::text, 100) AS wide
+    FROM generate_series(1, 100) g(i)
+)
+SELECT md5(wide)
+  FROM s
+  ORDER BY i;
+
 ----
 -- test forward and backward scans for in-memory and disk based tuplesort
 ----
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Unfortunate pushing down of expressions below sort
  2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]>
  2026-02-07 12:46 ` Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]>
@ 2026-04-07 20:16   ` Tom Lane <[email protected]>
  2026-04-07 21:01     ` Re: Unfortunate pushing down of expressions below sort Andres Freund <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Tom Lane @ 2026-04-07 20:16 UTC (permalink / raw)
  To: Chengpeng Yan <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; David Rowley <[email protected]>

Chengpeng Yan <[email protected]> writes:
> Following up on the discussion below, I now have a patch.

> The patch extends make_sort_input_target() with a conservative rule:
> defer additional non-sort targetlist expressions past Sort only when
> doing so does not require carrying any additional Vars/PlaceHolderVars
> through Sort. This way, Sort input width never increases.

I spent some time thinking about this.

One thing I think we need to keep in mind is that if we don't postpone
an expression past Sort, and the user doesn't like that, she can
easily rewrite the query to force it; as indeed Andres demonstrated
at the start of this thread.  But overriding an unwanted planner
decision to postpone is harder.  I think you can do it with

SELECT * FROM (SELECT x,y,f(z) FROM ... OFFSET 0) ORDER BY whatever;

but if you forget the OFFSET-0 optimization fence you may find
f(z) getting evaluated after the sort anyway.  And the fence might
foreclose some other optimization you did want.

Also, make_sort_input_target() has gone basically unchanged since
2016, without that many complaints.  So I think we need to be pretty
conservative about adding postponement choices that aren't forced by
semantic requirements.

The rule stated above seems pretty conservative, but either it's not
conservative enough or you didn't implement it right, because the
regression test changes show the v2 patch is very willing to create
Result nodes where there were none before, even when there's no LIMIT
and thus no reason to think we can save any expression evaluations.
That extra plan node has nonzero cost that I don't think you're
accounting for.  It'll still be a win if enough data volume is removed
from the Sort step, but I don't see any consideration of how much
we're actually saving before deciding to add the projection step.

So I think we need some sort of gating rule, whereby we only postpone
these expressions if (a) there was already a reason to add a
projection or (b) we can make some cost-based or at least heuristic
estimate that says we'll cut the sort data volume significantly.
Maybe (b) needs to interact with the existing heuristic about
postponing expensive expressions, not sure.

Independently of that, I don't especially like the changes in
make_sort_input_target().  They seem rather inelegant and expensive
(and underdocumented), as well as duplicative of other work already
being done in the function.  It may be time to tackle the unfinished
work mentioned in the existing comments about avoiding redundant
cost/width calculations ...

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Unfortunate pushing down of expressions below sort
  2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]>
  2026-02-07 12:46 ` Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]>
  2026-04-07 20:16   ` Re: Unfortunate pushing down of expressions below sort Tom Lane <[email protected]>
@ 2026-04-07 21:01     ` Andres Freund <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Andres Freund @ 2026-04-07 21:01 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Chengpeng Yan <[email protected]>; pgsql-hackers; David Rowley <[email protected]>

Hi,

On 2026-04-07 16:16:54 -0400, Tom Lane wrote:
> That extra plan node has nonzero cost that I don't think you're
> accounting for.  It'll still be a win if enough data volume is removed
> from the Sort step, but I don't see any consideration of how much
> we're actually saving before deciding to add the projection step.

A different way (compared to the heuristics you were subsequently talking
about) to deal with that could be to add projection support to sort's output,
I guess. That would have a considerably smaller cost than a separate node.

It'd add a tiny bit of cost (an if checking for projection) for the rather
common of a sort without a projection.  Although that'd not be too hard to get
rid of by generating specialized ExecSort() variants for the different cases,
like now done for ExecSeqScan, if it turned out to matter.


However it'd probably would still not be guaranteed faster than evaluating
below the sort, due to
a) startup cost of the projection machinery
b) potentially needing to deform during the expression's inputs during the
   projection in the sort (or above)

I don't know if there are realistic cases where b) matters?  You'd have to
have nodes above the sort that would require the projection to happen at the
sort (or an intermediary level) but then filter out most rows to avoid needing
to deform anyway?  In all the realistic cases I can think of the expression
evaluation should then be pulled up above that filtering out of rows?

I don't know if a) is really ever significant enough to matter compared to the
cost of sorting. It sure shows up in a sequential scan, but that has an order
of magnitude or three lower per tuple cost.


Are there cases where something like Chengpeng's logic would still trigger a
result node being injected, if sort had projection capability?


> So I think we need some sort of gating rule, whereby we only postpone
> these expressions if (a) there was already a reason to add a
> projection or (b) we can make some cost-based or at least heuristic
> estimate that says we'll cut the sort data volume significantly.

This reminds me: The heuristics around the cost of expression evaluation seem
like they could be improved a fair bit by taking into account the cost of
having to deform the input columns.  There's a huge difference between

  func1(col1), func2(col1)
and
  func1(col1), func2(col2)
and
  func1(col30)

Unless func* are particularly expensive, the cost will be increasingly
dominated by tuple deforming.

I think this is one thing that sometimes contributes to us wrongly choosing
sequential scans with a qual over an index scans that needs to evaluate far
fewer rows, because we just take the operator costs into account, not the
tuple deforming it requires.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2026-04-07 21:01 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]>
2026-02-07 12:46 ` Chengpeng Yan <[email protected]>
2026-04-07 20:16   ` Tom Lane <[email protected]>
2026-04-07 21:01     ` Andres Freund <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox