public inbox for [email protected]help / color / mirror / Atom feed
Re: Unfortunate pushing down of expressions below sort 4+ messages / 4 participants [nested] [flat]
* Re: Unfortunate pushing down of expressions below sort @ 2026-02-06 14:37 Chengpeng Yan <[email protected]> 2026-02-07 12:46 ` Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Chengpeng Yan @ 2026-02-06 14:37 UTC (permalink / raw) To: Tom Lane <[email protected]>; Andres Freund <[email protected]>; +Cc: pgsql-hackers; David Rowley <[email protected]> Hi, I took a closer look at make_sort_input_target() in src/backend/optimizer/plan/planner.c. The current heuristics only defer targetlist expressions past a Sort when they are: * volatile, or * set-returning (due to SRF synchronization constraints), or * considered “expensive” (cost > 10 * cpu_operator_cost) and there is a LIMIT. Functions like repeat() and acldefault() have the default procost = 1, so they don’t meet the “expensive” threshold and therefore remain below the Sort, which is what leads to the tuple width inflation seen in these examples. Grouping behaves differently: make_group_input_target() unconditionally flattens non-grouping expressions to Vars, which is why repeat() ends up above the Aggregate node there. Tom mentioned the md5(widecol) counterexample, where evaluating the expression before the sort can actually reduce memory usage. The key distinction seems to be whether the expression depends solely on sort keys. The approach I’m experimenting with is to defer an expression only when all the Vars it depends on are sort keys. That gives the desired behavior in both cases: * repeat(i, 1000) ORDER BY i: i is the sort key, so we defer and keep the sort tuples narrow. * md5(widecol) ORDER BY id: widecol is not a sort key, so we keep the expression below the sort and avoid carrying the wide column. This seems to address the cases discussed in this thread and should be low-risk for the common case. I’m working on a patch along these lines; any thoughts? -- Best regards, Chengpeng Yan ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Unfortunate pushing down of expressions below sort 2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]> @ 2026-02-07 12:46 ` Chengpeng Yan <[email protected]> 2026-04-07 20:16 ` Re: Unfortunate pushing down of expressions below sort Tom Lane <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Chengpeng Yan @ 2026-02-07 12:46 UTC (permalink / raw) To: Tom Lane <[email protected]>; Andres Freund <[email protected]>; +Cc: pgsql-hackers; David Rowley <[email protected]> Hi, Following up on the discussion below, I now have a patch. The patch extends make_sort_input_target() with a conservative rule: defer additional non-sort targetlist expressions past Sort only when doing so does not require carrying any additional Vars/PlaceHolderVars through Sort. This way, Sort input width never increases. This still allows cases like repeat(i::text, ...) ORDER BY i to be projected above Sort, while avoiding the md5(widecol) counterexample mentioned earlier, since such expressions are not deferred when they would force a wide non-sort column through Sort. One limitation remains: this can't help queries like 'SELECT repeat(i::text, ...) FROM t ORDER BY othercols;' where the output expression depends on Vars that are not sort keys. In that case we still have to carry i through the Sort to be able to compute the final targetlist, so the patch cannot avoid inflating Sort's input width (and may still evaluate repeat() before Sort, depending on the existing projection placement rules). The existing volatile/SRF/expensive behavior is unchanged (expensive exprs are still postponed once a post-sort projection is needed). I also added regression coverage (including an md5(widecol)-style case). Some existing EXPLAIN outputs (e.g. join/groupingsets) now show a Result above Sort, which is expected from postponing additional non-sort outputs. Patch attached. Comments welcome. -- Best regards, Chengpeng Yan Attachments: [application/octet-stream] v1-0001-planner-postpone-some-non-sort-output-expressions.patch (19.8K, 3-v1-0001-planner-postpone-some-non-sort-output-expressions.patch) download | inline diff: From ad0c5c23a6b07fe5376b0a3c042423dcf3e679d5 Mon Sep 17 00:00:00 2001 From: Chengpeng Yan <[email protected]> Date: Sat, 7 Feb 2026 20:21:27 +0800 Subject: [PATCH v1] planner: postpone some non-sort output expressions past Sort Allow make_sort_input_target() to postpone additional non-sort targetlist expressions when doing so doesn't require carrying any additional Vars or PlaceHolderVars through the Sort. This keeps the sort input no wider, and can avoid evaluating output expressions for rows that are never returned under LIMIT. --- src/backend/optimizer/plan/planner.c | 121 ++++++++++++++++++++- src/test/regress/expected/groupingsets.out | 99 +++++++++-------- src/test/regress/expected/join.out | 9 +- src/test/regress/expected/limit.out | 51 +++++++++ src/test/regress/expected/tuplesort.out | 42 +++++++ src/test/regress/sql/limit.sql | 26 +++++ src/test/regress/sql/tuplesort.sql | 20 ++++ 7 files changed, 315 insertions(+), 53 deletions(-) diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 757bdc7b1de..49d0568d01c 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -6444,6 +6444,11 @@ make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc, * any volatile or set-returning expressions (since once we've put in a * projection at all, it won't cost any more to postpone more stuff). * + * We can also postpone some additional non-sort output expressions when doing + * so would not require carrying any additional Vars/PlaceHolderVars through + * the Sort. This keeps the sort input no wider, and can avoid evaluating + * output expressions for rows that are never returned under LIMIT. + * * Another issue that could potentially be considered here is that * evaluating tlist expressions could result in data that's either wider * or narrower than the input Vars, thus changing the volume of data that @@ -6484,12 +6489,14 @@ make_sort_input_target(PlannerInfo *root, PathTarget *input_target; int ncols; bool *col_is_srf; + bool *col_is_expensive; bool *postpone_col; bool have_srf; bool have_volatile; bool have_expensive; bool have_srf_sortcols; bool postpone_srfs; + bool have_safe_postpone; List *postponable_cols; List *postponable_vars; int i; @@ -6503,8 +6510,10 @@ make_sort_input_target(PlannerInfo *root, /* Inspect tlist and collect per-column information */ ncols = list_length(final_target->exprs); col_is_srf = (bool *) palloc0(ncols * sizeof(bool)); + col_is_expensive = (bool *) palloc0(ncols * sizeof(bool)); postpone_col = (bool *) palloc0(ncols * sizeof(bool)); have_srf = have_volatile = have_expensive = have_srf_sortcols = false; + have_safe_postpone = false; i = 0; foreach(lc, final_target->exprs) @@ -6556,7 +6565,7 @@ make_sort_input_target(PlannerInfo *root, */ if (cost.per_tuple > 10 * cpu_operator_cost) { - postpone_col[i] = true; + col_is_expensive[i] = true; have_expensive = true; } } @@ -6578,12 +6587,120 @@ make_sort_input_target(PlannerInfo *root, */ postpone_srfs = (have_srf && !have_srf_sortcols); + /* + * Keep the historical expensive-expression policy: once we're adding a + * post-sort projection for any reason, postpone all expensive + * expressions. + */ + if (postpone_srfs || have_volatile || + (have_expensive && (parse->limitCount || root->tuple_fraction > 0))) + { + i = 0; + foreach(lc, final_target->exprs) + { + if (col_is_expensive[i]) + postpone_col[i] = true; + i++; + } + } + + /* + * We can postpone some additional non-sort output expressions if doing so + * doesn't require carrying any extra Vars/PlaceHolderVars through the + * Sort. + */ + { + List *required_exprs = NIL; + List *base_postponable_cols = NIL; + List *base_postponable_vars; + List *required_before_sort; + int j; + + /* + * Build the set of expressions that will already be carried through + * the Sort: non-postponed columns, plus Vars/PHVs etc needed for + * already-postponed columns. + */ + j = 0; + foreach(lc, final_target->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + if (postpone_col[j] || (postpone_srfs && col_is_srf[j])) + base_postponable_cols = lappend(base_postponable_cols, expr); + else + required_exprs = lappend(required_exprs, expr); + + j++; + } + + base_postponable_vars = pull_var_clause((Node *) base_postponable_cols, + PVC_INCLUDE_AGGREGATES | + PVC_INCLUDE_WINDOWFUNCS | + PVC_INCLUDE_PLACEHOLDERS); + required_before_sort = list_union(required_exprs, base_postponable_vars); + + /* + * Mark any safe-to-postpone columns. We ignore simple Vars/Aggrefs/ + * WindowFuncs/PHVs because postponing them would not avoid any work. + */ + j = 0; + foreach(lc, final_target->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + List *expr_vars; + ListCell *lc2; + bool safe = true; + + if (postpone_col[j] || (postpone_srfs && col_is_srf[j]) || + col_is_srf[j] || + get_pathtarget_sortgroupref(final_target, j) != 0 || + IsA(expr, Var) || + IsA(expr, Aggref) || + IsA(expr, WindowFunc) || + IsA(expr, PlaceHolderVar)) + { + j++; + continue; + } + + expr_vars = pull_var_clause((Node *) expr, + PVC_INCLUDE_AGGREGATES | + PVC_INCLUDE_WINDOWFUNCS | + PVC_INCLUDE_PLACEHOLDERS); + foreach(lc2, expr_vars) + { + if (!list_member(required_before_sort, lfirst(lc2))) + { + safe = false; + break; + } + } + + if (safe) + { + postpone_col[j] = true; + have_safe_postpone = true; + } + + list_free(expr_vars); + j++; + } + + /* clean up cruft */ + list_free(required_before_sort); + list_free(base_postponable_vars); + list_free(base_postponable_cols); + list_free(required_exprs); + } + /* * If we don't need a post-sort projection, just return final_target. */ if (!(postpone_srfs || have_volatile || (have_expensive && - (parse->limitCount || root->tuple_fraction > 0)))) + (parse->limitCount || root->tuple_fraction > 0)) || + have_safe_postpone)) return final_target; /* diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out index 921017489c0..5d4759521a7 100644 --- a/src/test/regress/expected/groupingsets.out +++ b/src/test/regress/expected/groupingsets.out @@ -972,19 +972,20 @@ select v.c, (select count(*) from gstest2 group by () having v.c) explain (costs off) select v.c, (select count(*) from gstest2 group by () having v.c) from (values (false),(true)) v(c) order by v.c; - QUERY PLAN ------------------------------------------------------------ - Sort - Sort Key: "*VALUES*".column1 - -> Values Scan on "*VALUES*" - SubPlan expr_1 - -> Aggregate - Group Key: () - Filter: "*VALUES*".column1 - -> Result - One-Time Filter: "*VALUES*".column1 - -> Seq Scan on gstest2 -(10 rows) + QUERY PLAN +----------------------------------------------------- + Result + -> Sort + Sort Key: "*VALUES*".column1 + -> Values Scan on "*VALUES*" + SubPlan expr_1 + -> Aggregate + Group Key: () + Filter: "*VALUES*".column1 + -> Result + One-Time Filter: "*VALUES*".column1 + -> Seq Scan on gstest2 +(11 rows) -- test pushdown of non-degenerate HAVING clause that does not reference any -- columns that are nullable by grouping sets @@ -2398,24 +2399,26 @@ order by case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0 then (select t1.v from gstest5 t2 where id = t1.id) else null end nulls first; - QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ - Sort + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Result Output: (GROUPING((SubPlan expr_1))), ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v - Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST - -> HashAggregate - Output: GROUPING((SubPlan expr_1)), ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v - Hash Key: t1.v - Hash Key: (SubPlan expr_3) - -> Seq Scan on pg_temp.gstest5 t1 - Output: (SubPlan expr_3), t1.v, t1.id - SubPlan expr_3 - -> Bitmap Heap Scan on pg_temp.gstest5 t2 - Output: t1.v - Recheck Cond: (t2.id = t1.id) - -> Bitmap Index Scan on gstest5_pkey - Index Cond: (t2.id = t1.id) -(15 rows) + -> Sort + Output: ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v, (GROUPING((SubPlan expr_1))) + Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST + -> HashAggregate + Output: ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v, GROUPING((SubPlan expr_1)) + Hash Key: t1.v + Hash Key: (SubPlan expr_3) + -> Seq Scan on pg_temp.gstest5 t1 + Output: (SubPlan expr_3), t1.v, t1.id + SubPlan expr_3 + -> Bitmap Heap Scan on pg_temp.gstest5 t2 + Output: t1.v + Recheck Cond: (t2.id = t1.id) + -> Bitmap Index Scan on gstest5_pkey + Index Cond: (t2.id = t1.id) +(17 rows) select grouping((select t1.v from gstest5 t2 where id = t1.id)), (select t1.v from gstest5 t2 where id = t1.id) as s @@ -2448,24 +2451,26 @@ select grouping((select t1.v from gstest5 t2 where id = t1.id)), from gstest5 t1 group by grouping sets(v, s) order by o nulls first; - QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ - Sort + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Result Output: (GROUPING((SubPlan expr_1))), ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v - Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST - -> HashAggregate - Output: GROUPING((SubPlan expr_1)), ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v - Hash Key: t1.v - Hash Key: (SubPlan expr_3) - -> Seq Scan on pg_temp.gstest5 t1 - Output: (SubPlan expr_3), t1.v, t1.id - SubPlan expr_3 - -> Bitmap Heap Scan on pg_temp.gstest5 t2 - Output: t1.v - Recheck Cond: (t2.id = t1.id) - -> Bitmap Index Scan on gstest5_pkey - Index Cond: (t2.id = t1.id) -(15 rows) + -> Sort + Output: ((SubPlan expr_3)), (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END), t1.v, (GROUPING((SubPlan expr_1))) + Sort Key: (CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END) NULLS FIRST + -> HashAggregate + Output: ((SubPlan expr_3)), CASE WHEN (GROUPING((SubPlan expr_2)) = 0) THEN ((SubPlan expr_3)) ELSE NULL::integer END, t1.v, GROUPING((SubPlan expr_1)) + Hash Key: t1.v + Hash Key: (SubPlan expr_3) + -> Seq Scan on pg_temp.gstest5 t1 + Output: (SubPlan expr_3), t1.v, t1.id + SubPlan expr_3 + -> Bitmap Heap Scan on pg_temp.gstest5 t2 + Output: t1.v + Recheck Cond: (t2.id = t1.id) + -> Bitmap Index Scan on gstest5_pkey + Index Cond: (t2.id = t1.id) +(17 rows) select grouping((select t1.v from gstest5 t2 where id = t1.id)), (select t1.v from gstest5 t2 where id = t1.id) as s, diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index d05a0ca0373..d1cc951aae3 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -2810,12 +2810,13 @@ select count(*) from -> Merge Left Join Merge Cond: (x.thousand = y.unique2) Join Filter: ((x.twothousand = y.hundred) AND (x.fivethous = y.unique2)) - -> Sort - Sort Key: x.thousand, x.twothousand, x.fivethous - -> Seq Scan on tenk1 x + -> Result + -> Sort + Sort Key: x.thousand, x.twothousand, x.fivethous + -> Seq Scan on tenk1 x -> Materialize -> Index Scan using tenk1_unique2 on tenk1 y -(9 rows) +(10 rows) select count(*) from (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x diff --git a/src/test/regress/expected/limit.out b/src/test/regress/expected/limit.out index e3bcc680653..7e29f09c68f 100644 --- a/src/test/regress/expected/limit.out +++ b/src/test/regress/expected/limit.out @@ -555,6 +555,57 @@ select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2 45020 | 45020 (3 rows) +-- +-- Postpone non-sort output expressions past Sort under LIMIT, when doing so +-- doesn't require carrying additional columns through the Sort. +-- +explain (verbose, costs off) +select repeat(g.i::text, 100) + from generate_series(1, 100) g(i) + order by g.i + limit 10; + QUERY PLAN +----------------------------------------------------------------- + Limit + Output: (repeat((i)::text, 100)), i + -> Result + Output: repeat((i)::text, 100), i + -> Sort + Output: i + Sort Key: g.i + -> Function Scan on pg_catalog.generate_series g + Output: i + Function Call: generate_series(1, 100) +(10 rows) + +-- +-- Don't postpone if that would require carrying a wide column through +-- the Sort. Use MATERIALIZED to prevent inlining. +-- +explain (verbose, costs off) +with s(i, wide) as materialized ( + select i, repeat(i::text, 100) as wide + from generate_series(1, 100) g(i) +) +select md5(wide) + from s + order by i + limit 10; + QUERY PLAN +------------------------------------------------------- + Limit + Output: (md5(s.wide)), s.i + CTE s + -> Function Scan on pg_catalog.generate_series g + Output: g.i, repeat((g.i)::text, 100) + Function Call: generate_series(1, 100) + -> Sort + Output: (md5(s.wide)), s.i + Sort Key: s.i + -> CTE Scan on s + Output: md5(s.wide), s.i +(11 rows) + -- -- FETCH FIRST -- Check the WITH TIES clause diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out index 6dd97e7427a..1c15c6b21fa 100644 --- a/src/test/regress/expected/tuplesort.out +++ b/src/test/regress/expected/tuplesort.out @@ -356,6 +356,48 @@ ORDER BY v.a DESC; aaaaaaaaaa | 1 (2 rows) +---- +-- Test postponing non-sort output expressions past Sort. +---- +EXPLAIN (VERBOSE, COSTS OFF) +SELECT repeat(g.i::text, 100) + FROM generate_series(1, 100) g(i) + ORDER BY g.i; + QUERY PLAN +----------------------------------------------------------- + Result + Output: repeat((i)::text, 100), i + -> Sort + Output: i + Sort Key: g.i + -> Function Scan on pg_catalog.generate_series g + Output: i + Function Call: generate_series(1, 100) +(8 rows) + +-- Don't postpone if that would require carrying a wide column through +-- the sort. Use MATERIALIZED to prevent inlining. +EXPLAIN (VERBOSE, COSTS OFF) +WITH s(i, wide) AS MATERIALIZED ( + SELECT i, repeat(i::text, 100) AS wide + FROM generate_series(1, 100) g(i) +) +SELECT md5(wide) + FROM s + ORDER BY i; + QUERY PLAN +------------------------------------------------------- + Sort + Output: (md5(s.wide)), s.i + Sort Key: s.i + CTE s + -> Function Scan on pg_catalog.generate_series g + Output: g.i, repeat((g.i)::text, 100) + Function Call: generate_series(1, 100) + -> CTE Scan on s + Output: md5(s.wide), s.i +(9 rows) + ---- -- test forward and backward scans for in-memory and disk based tuplesort ---- diff --git a/src/test/regress/sql/limit.sql b/src/test/regress/sql/limit.sql index 603910fe6d1..41323fbda1e 100644 --- a/src/test/regress/sql/limit.sql +++ b/src/test/regress/sql/limit.sql @@ -152,6 +152,32 @@ select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2 select sum(tenthous) as s1, sum(tenthous) + random()*0 as s2 from tenk1 group by thousand order by thousand limit 3; +-- +-- Postpone non-sort output expressions past Sort under LIMIT, when doing so +-- doesn't require carrying additional columns through the Sort. +-- + +explain (verbose, costs off) +select repeat(g.i::text, 100) + from generate_series(1, 100) g(i) + order by g.i + limit 10; + +-- +-- Don't postpone if that would require carrying a wide column through +-- the Sort. Use MATERIALIZED to prevent inlining. +-- + +explain (verbose, costs off) +with s(i, wide) as materialized ( + select i, repeat(i::text, 100) as wide + from generate_series(1, 100) g(i) +) +select md5(wide) + from s + order by i + limit 10; + -- -- FETCH FIRST -- Check the WITH TIES clause diff --git a/src/test/regress/sql/tuplesort.sql b/src/test/regress/sql/tuplesort.sql index 8476e594e6c..d63ec576c73 100644 --- a/src/test/regress/sql/tuplesort.sql +++ b/src/test/regress/sql/tuplesort.sql @@ -155,6 +155,26 @@ SELECT LEFT(a,10),b FROM (VALUES(REPEAT('a', 512 * 1024),1),(REPEAT('b', 512 * 1024),2)) v(a,b) ORDER BY v.a DESC; +---- +-- Test postponing non-sort output expressions past Sort. +---- + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT repeat(g.i::text, 100) + FROM generate_series(1, 100) g(i) + ORDER BY g.i; + +-- Don't postpone if that would require carrying a wide column through +-- the sort. Use MATERIALIZED to prevent inlining. +EXPLAIN (VERBOSE, COSTS OFF) +WITH s(i, wide) AS MATERIALIZED ( + SELECT i, repeat(i::text, 100) AS wide + FROM generate_series(1, 100) g(i) +) +SELECT md5(wide) + FROM s + ORDER BY i; + ---- -- test forward and backward scans for in-memory and disk based tuplesort ---- -- 2.50.1 (Apple Git-155) ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Unfortunate pushing down of expressions below sort 2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]> 2026-02-07 12:46 ` Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]> @ 2026-04-07 20:16 ` Tom Lane <[email protected]> 2026-04-07 21:01 ` Re: Unfortunate pushing down of expressions below sort Andres Freund <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Tom Lane @ 2026-04-07 20:16 UTC (permalink / raw) To: Chengpeng Yan <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers; David Rowley <[email protected]> Chengpeng Yan <[email protected]> writes: > Following up on the discussion below, I now have a patch. > The patch extends make_sort_input_target() with a conservative rule: > defer additional non-sort targetlist expressions past Sort only when > doing so does not require carrying any additional Vars/PlaceHolderVars > through Sort. This way, Sort input width never increases. I spent some time thinking about this. One thing I think we need to keep in mind is that if we don't postpone an expression past Sort, and the user doesn't like that, she can easily rewrite the query to force it; as indeed Andres demonstrated at the start of this thread. But overriding an unwanted planner decision to postpone is harder. I think you can do it with SELECT * FROM (SELECT x,y,f(z) FROM ... OFFSET 0) ORDER BY whatever; but if you forget the OFFSET-0 optimization fence you may find f(z) getting evaluated after the sort anyway. And the fence might foreclose some other optimization you did want. Also, make_sort_input_target() has gone basically unchanged since 2016, without that many complaints. So I think we need to be pretty conservative about adding postponement choices that aren't forced by semantic requirements. The rule stated above seems pretty conservative, but either it's not conservative enough or you didn't implement it right, because the regression test changes show the v2 patch is very willing to create Result nodes where there were none before, even when there's no LIMIT and thus no reason to think we can save any expression evaluations. That extra plan node has nonzero cost that I don't think you're accounting for. It'll still be a win if enough data volume is removed from the Sort step, but I don't see any consideration of how much we're actually saving before deciding to add the projection step. So I think we need some sort of gating rule, whereby we only postpone these expressions if (a) there was already a reason to add a projection or (b) we can make some cost-based or at least heuristic estimate that says we'll cut the sort data volume significantly. Maybe (b) needs to interact with the existing heuristic about postponing expensive expressions, not sure. Independently of that, I don't especially like the changes in make_sort_input_target(). They seem rather inelegant and expensive (and underdocumented), as well as duplicative of other work already being done in the function. It may be time to tackle the unfinished work mentioned in the existing comments about avoiding redundant cost/width calculations ... regards, tom lane ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Unfortunate pushing down of expressions below sort 2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]> 2026-02-07 12:46 ` Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]> 2026-04-07 20:16 ` Re: Unfortunate pushing down of expressions below sort Tom Lane <[email protected]> @ 2026-04-07 21:01 ` Andres Freund <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Andres Freund @ 2026-04-07 21:01 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Chengpeng Yan <[email protected]>; pgsql-hackers; David Rowley <[email protected]> Hi, On 2026-04-07 16:16:54 -0400, Tom Lane wrote: > That extra plan node has nonzero cost that I don't think you're > accounting for. It'll still be a win if enough data volume is removed > from the Sort step, but I don't see any consideration of how much > we're actually saving before deciding to add the projection step. A different way (compared to the heuristics you were subsequently talking about) to deal with that could be to add projection support to sort's output, I guess. That would have a considerably smaller cost than a separate node. It'd add a tiny bit of cost (an if checking for projection) for the rather common of a sort without a projection. Although that'd not be too hard to get rid of by generating specialized ExecSort() variants for the different cases, like now done for ExecSeqScan, if it turned out to matter. However it'd probably would still not be guaranteed faster than evaluating below the sort, due to a) startup cost of the projection machinery b) potentially needing to deform during the expression's inputs during the projection in the sort (or above) I don't know if there are realistic cases where b) matters? You'd have to have nodes above the sort that would require the projection to happen at the sort (or an intermediary level) but then filter out most rows to avoid needing to deform anyway? In all the realistic cases I can think of the expression evaluation should then be pulled up above that filtering out of rows? I don't know if a) is really ever significant enough to matter compared to the cost of sorting. It sure shows up in a sequential scan, but that has an order of magnitude or three lower per tuple cost. Are there cases where something like Chengpeng's logic would still trigger a result node being injected, if sort had projection capability? > So I think we need some sort of gating rule, whereby we only postpone > these expressions if (a) there was already a reason to add a > projection or (b) we can make some cost-based or at least heuristic > estimate that says we'll cut the sort data volume significantly. This reminds me: The heuristics around the cost of expression evaluation seem like they could be improved a fair bit by taking into account the cost of having to deform the input columns. There's a huge difference between func1(col1), func2(col1) and func1(col1), func2(col2) and func1(col30) Unless func* are particularly expensive, the cost will be increasingly dominated by tuple deforming. I think this is one thing that sometimes contributes to us wrongly choosing sequential scans with a qual over an index scans that needs to evaluate far fewer rows, because we just take the operator costs into account, not the tuple deforming it requires. Greetings, Andres Freund ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2026-04-07 21:01 UTC | newest] Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-02-06 14:37 Re: Unfortunate pushing down of expressions below sort Chengpeng Yan <[email protected]> 2026-02-07 12:46 ` Chengpeng Yan <[email protected]> 2026-04-07 20:16 ` Tom Lane <[email protected]> 2026-04-07 21:01 ` Andres Freund <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox