public inbox for [email protected]  
help / color / mirror / Atom feed
Clean up remove_rel_from_query() after self-join elimination commit
7+ messages / 4 participants
[nested] [flat]

* Clean up remove_rel_from_query() after self-join elimination commit
@ 2026-04-06 08:11  Richard Guo <[email protected]>
  0 siblings, 2 replies; 7+ messages in thread

From: Richard Guo @ 2026-04-06 08:11 UTC (permalink / raw)
  To: Pg Hackers <[email protected]>

While working on reusing remove_rel_from_query() for inner-join
removal, I noticed that the function is in pretty rough shape.  The
self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only.  This resulted in several issues:

1. Comments throughout remove_rel_from_query() still assumed only
left-join removal, making the code misleading.  For example:

  * This is relevant in case the outer join we're deleting is nested inside
  * other outer joins:

2. This was called even for left-join removal, with subst=-1.  This is
pointless and confusing.

      ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
                             replace_relid_callback);

3. phinfo->ph_lateral was adjusted for left-join removal, which is
also confusing ...

       phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);

       /*
        * ph_lateral might contain rels mentioned in ph_eval_at after the
        * replacement, remove them.
        */
       phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
phinfo->ph_eval_at);

... since you can find this Assert just above:

       Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));

4. The comment about attr_needed reconstruction was in
remove_rel_from_query(), but the actual rebuild is performed by
the callers.

5. The EquivalenceClass processing in remove_rel_from_query():

    /*
     * Likewise remove references from EquivalenceClasses.
     */
    foreach(l, root->eq_classes)
    {
        EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);

        if (bms_is_member(relid, ec->ec_relids) ||
            (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
            remove_rel_from_eclass(ec, sjinfo, relid, subst);
    }

The condition always evaluates to true for self-join elimination
(i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
But this is redundant because the caller remove_self_join_rel()
already handles ECs via update_eclasses().

6. In remove_self_join_rel(), I notice this code:

    /* At last, replace varno in root targetlist and HAVING clause */
    ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
                           toKeep->relid, 0, replace_relid_callback);
    ChangeVarNodesExtended((Node *) root->processed_groupClause,
                           toRemove->relid, toKeep->relid, 0,
                           replace_relid_callback);

The comment mentions "HAVING clause", but neither processed_tlist nor
processed_groupClause has anything to do with the HAVING clause.
Furthermore, processed_groupClause contains SortGroupClause nodes,
which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
on it is pointless.

The attached patch is an attempt to fix all these issues.  It also
aims to leave the code better structured for adding new types of join
removal (such as inner-join removal) in the future.

Thoughts?

- Richard


Attachments:

  [application/octet-stream] v1-0001-Clean-up-remove_rel_from_query-after-self-join-el.patch (26.4K, 2-v1-0001-Clean-up-remove_rel_from_query-after-self-join-el.patch)
  download | inline diff:
From 1fb2952afd4bacc6b10647494caa513fe6e58f28 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 3 Apr 2026 18:51:01 +0900
Subject: [PATCH v1] Clean up remove_rel_from_query() after self-join
 elimination commit

The self-join elimination (SJE) commit grafted self-join removal onto
remove_rel_from_query(), which was originally written for left-join
removal only.  This resulted in several issues:

- Comments throughout remove_rel_from_query() still assumed only
left-join removal, making the code misleading.

- ChangeVarNodesExtended was called on phv->phexpr with subst=-1
during left-join removal, which is pointless and confusing since
any surviving PHV shouldn't reference the removed rel.

- phinfo->ph_lateral was adjusted for left-join removal, which is
unnecessary since the removed relid cannot appear in ph_lateral
for outer joins.

- The comment about attr_needed reconstruction was in
remove_rel_from_query(), but the actual rebuild is performed by
the callers.

- EquivalenceClass processing in remove_rel_from_query() is redundant
for self-join removal, since the caller (remove_self_join_rel)
already handles ECs via update_eclasses().

- In remove_self_join_rel(), ChangeVarNodesExtended was called on
root->processed_groupClause, which contains SortGroupClause nodes
that have no Var nodes to rewrite.  The accompanying comment
incorrectly mentioned "HAVING clause".

This patch fixes all these issues, clarifying the separation between
left-join removal and self-join elimination code paths within
remove_rel_from_query().  The resulting code is also better structured
for adding new types of join removal (such as inner-join removal) in
the future.
---
 src/backend/optimizer/plan/analyzejoins.c | 452 +++++++++++-----------
 1 file changed, 230 insertions(+), 222 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 12e9ed0d0c7..3a6689a1629 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -57,11 +57,13 @@ bool		enable_self_join_elimination;
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
 										  SpecialJoinInfo *sjinfo);
+static void remove_rel_from_query(PlannerInfo *root, int relid,
+								  int subst, SpecialJoinInfo *sjinfo,
+								  Relids joinrelids);
 static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
 										 int relid, int ojrelid);
 static void remove_rel_from_eclass(EquivalenceClass *ec,
-								   SpecialJoinInfo *sjinfo,
-								   int relid, int subst);
+								   int relid, int ojrelid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -312,24 +314,150 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	return false;
 }
 
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here.  We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+							  SpecialJoinInfo *sjinfo)
+{
+	RelOptInfo *rel = find_base_rel(root, relid);
+	int			ojrelid = sjinfo->ojrelid;
+	Relids		joinrelids;
+	Relids		join_plus_commute;
+	List	   *joininfos;
+	ListCell   *l;
+
+	/* Compute the relid set for the join we are considering */
+	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+	Assert(ojrelid != 0);
+	joinrelids = bms_add_member(joinrelids, ojrelid);
+
+	remove_rel_from_query(root, relid, -1, sjinfo, joinrelids);
+
+	/*
+	 * Remove any joinquals referencing the rel from the joininfo lists.
+	 *
+	 * In some cases, a joinqual has to be put back after deleting its
+	 * reference to the target rel.  This can occur for pseudoconstant and
+	 * outerjoin-delayed quals, which can get marked as requiring the rel in
+	 * order to force them to be evaluated at or above the join.  We can't
+	 * just discard them, though.  Only quals that logically belonged to the
+	 * outer join being discarded should be removed from the query.
+	 *
+	 * We might encounter a qual that is a clone of a deletable qual with some
+	 * outer-join relids added (see deconstruct_distribute_oj_quals).  To
+	 * ensure we get rid of such clones as well, add the relids of all OJs
+	 * commutable with this one to the set we test against for
+	 * pushed-down-ness.
+	 */
+	join_plus_commute = bms_union(joinrelids,
+								  sjinfo->commute_above_r);
+	join_plus_commute = bms_add_members(join_plus_commute,
+										sjinfo->commute_below_l);
+
+	/*
+	 * We must make a copy of the rel's old joininfo list before starting the
+	 * loop, because otherwise remove_join_clause_from_rels would destroy the
+	 * list while we're scanning it.
+	 */
+	joininfos = list_copy(rel->joininfo);
+	foreach(l, joininfos)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+		remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+
+		if (RINFO_IS_PUSHED_DOWN(rinfo, join_plus_commute))
+		{
+			/*
+			 * There might be references to relid or ojrelid in the
+			 * RestrictInfo's relid sets, as a consequence of PHVs having had
+			 * ph_eval_at sets that include those.  We already checked above
+			 * that any such PHV is safe (and updated its ph_eval_at), so we
+			 * can just drop those references.
+			 */
+			remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+
+			/*
+			 * Cross-check that the clause itself does not reference the
+			 * target rel or join.
+			 */
+#ifdef USE_ASSERT_CHECKING
+			{
+				Relids		clause_varnos = pull_varnos(root,
+														(Node *) rinfo->clause);
+
+				Assert(!bms_is_member(relid, clause_varnos));
+				Assert(!bms_is_member(ojrelid, clause_varnos));
+			}
+#endif
+			/* Now throw it back into the joininfo lists */
+			distribute_restrictinfo_to_rels(root, rinfo);
+		}
+	}
+
+	/*
+	 * There may be references to the rel in root->fkey_list, but if so,
+	 * match_foreign_keys_to_quals() will get rid of them.
+	 */
+
+	/*
+	 * Now remove the rel from the baserel array to prevent it from being
+	 * referenced again.  (We can't do this earlier because
+	 * remove_join_clause_from_rels will touch it.)
+	 */
+	root->simple_rel_array[relid] = NULL;
+	root->simple_rte_array[relid] = NULL;
+
+	/* And nuke the RelOptInfo, just in case there's another access path */
+	pfree(rel);
+
+	/*
+	 * Now repeat construction of attr_needed bits coming from all other
+	 * sources.
+	 */
+	rebuild_placeholder_attr_needed(root);
+	rebuild_joinclause_attr_needed(root);
+	rebuild_eclass_attr_needed(root);
+	rebuild_lateral_attr_needed(root);
+}
 
 /*
- * Remove the target rel->relid and references to the target join from the
+ * Remove the target relid and references to the target join from the
  * planner's data structures, having determined that there is no need
- * to include them in the query. Optionally replace them with subst if subst
- * is non-negative.
+ * to include them in the query.  Optionally replace references to the
+ * removed relid with subst if this is a self-join removal.
+ *
+ * This function serves as the common infrastructure for left-join removal
+ * and self-join elimination.  It is intentionally scoped to update only the
+ * shared planner data structures that are universally affected by relation
+ * removal.  Each specific caller remains responsible for updating any
+ * remaining data structures required by its unique removal logic.
  *
- * This function updates only parts needed for both left-join removal and
- * self-join removal.
+ * The specific type of removal being performed is dictated by the combination
+ * of the sjinfo and subst parameters.  A non-NULL sjinfo indicates left-join
+ * removal.  When sjinfo is NULL, a positive subst value indicates self-join
+ * elimination (where references are replaced with subst).
  */
 static void
-remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+remove_rel_from_query(PlannerInfo *root, int relid,
 					  int subst, SpecialJoinInfo *sjinfo,
 					  Relids joinrelids)
 {
-	int			relid = rel->relid;
+	int			ojrelid = sjinfo ? sjinfo->ojrelid : 0;
 	Index		rti;
 	ListCell   *l;
+	bool		is_outer_join = (sjinfo != NULL);
+	bool		is_self_join = (!is_outer_join && subst > 0);
+
+	Assert(is_outer_join || is_self_join);
+	Assert(!is_outer_join || ojrelid > 0);
 
 	/*
 	 * Update all_baserels and related relid sets.
@@ -337,21 +465,20 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 	root->all_baserels = adjust_relid_set(root->all_baserels, relid, subst);
 	root->all_query_rels = adjust_relid_set(root->all_query_rels, relid, subst);
 
-	if (sjinfo != NULL)
+	if (is_outer_join)
 	{
-		root->outer_join_rels = bms_del_member(root->outer_join_rels,
-											   sjinfo->ojrelid);
-		root->all_query_rels = bms_del_member(root->all_query_rels,
-											  sjinfo->ojrelid);
+		root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
+		root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
 	}
 
 	/*
 	 * Likewise remove references from SpecialJoinInfo data structures.
 	 *
-	 * This is relevant in case the outer join we're deleting is nested inside
-	 * other outer joins: the upper joins' relid sets have to be adjusted. The
-	 * RHS of the target outer join will be made empty here, but that's OK
-	 * since caller will delete that SpecialJoinInfo entirely.
+	 * This is relevant in case the relation we're deleting is part of the
+	 * relid sets of special joins: those sets have to be adjusted.  If we are
+	 * removing an outer join, the RHS of the target outer join will be made
+	 * empty here, but that's OK since the caller will delete that
+	 * SpecialJoinInfo entirely.
 	 */
 	foreach(l, root->join_info_list)
 	{
@@ -367,39 +494,32 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 		sjinf->min_righthand = bms_copy(sjinf->min_righthand);
 		sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
 		sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
-		/* Now remove relid from the sets: */
+
+		/* Now adjust relid bit in the sets: */
 		sjinf->min_lefthand = adjust_relid_set(sjinf->min_lefthand, relid, subst);
 		sjinf->min_righthand = adjust_relid_set(sjinf->min_righthand, relid, subst);
 		sjinf->syn_lefthand = adjust_relid_set(sjinf->syn_lefthand, relid, subst);
 		sjinf->syn_righthand = adjust_relid_set(sjinf->syn_righthand, relid, subst);
 
-		if (sjinfo != NULL)
+		if (is_outer_join)
 		{
-			Assert(subst <= 0);
-
-			/* Remove sjinfo->ojrelid bits from the sets: */
-			sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand,
-												 sjinfo->ojrelid);
-			sjinf->min_righthand = bms_del_member(sjinf->min_righthand,
-												  sjinfo->ojrelid);
-			sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand,
-												 sjinfo->ojrelid);
-			sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand,
-												  sjinfo->ojrelid);
+			/* Remove ojrelid bit from the sets: */
+			sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+			sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+			sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+			sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
 			/* relid cannot appear in these fields, but ojrelid can: */
-			sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l,
-													sjinfo->ojrelid);
-			sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r,
-													sjinfo->ojrelid);
-			sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l,
-													sjinfo->ojrelid);
-			sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r,
-													sjinfo->ojrelid);
+			sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+			sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+			sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+			sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
 		}
 		else
 		{
-			Assert(subst > 0);
-
+			/*
+			 * For self-join removal, replace relid references in
+			 * semi_rhs_exprs.
+			 */
 			ChangeVarNodesExtended((Node *) sjinf->semi_rhs_exprs, relid, subst,
 								   0, replace_relid_callback);
 		}
@@ -407,8 +527,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 
 	/*
 	 * Likewise remove references from PlaceHolderVar data structures,
-	 * removing any no-longer-needed placeholders entirely.  We remove PHV
-	 * only for left-join removal.  With self-join elimination, PHVs already
+	 * removing any no-longer-needed placeholders entirely.  We only remove
+	 * PHVs for left-join removal.  With self-join elimination, PHVs already
 	 * get moved to the remaining relation, where they might still be needed.
 	 * It might also happen that we skip the removal of some PHVs that could
 	 * be removed.  However, the overhead of extra PHVs is small compared to
@@ -428,17 +548,13 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 	{
 		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
 
-		Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
-		if (sjinfo != NULL &&
+		Assert(!is_outer_join || !bms_is_member(relid, phinfo->ph_lateral));
+
+		if (is_outer_join &&
 			bms_is_subset(phinfo->ph_needed, joinrelids) &&
 			bms_is_member(relid, phinfo->ph_eval_at) &&
-			!bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at))
+			!bms_is_member(ojrelid, phinfo->ph_eval_at))
 		{
-			/*
-			 * This code shouldn't be executed if one relation is substituted
-			 * with another: in this case, the placeholder may be employed in
-			 * a filter inside the scan node the SJE removes.
-			 */
 			root->placeholder_list = foreach_delete_current(root->placeholder_list,
 															l);
 			root->placeholder_array[phinfo->phid] = NULL;
@@ -448,33 +564,42 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 			PlaceHolderVar *phv = phinfo->ph_var;
 
 			phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, relid, subst);
-			if (sjinfo != NULL)
-				phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at,
-													  sjinfo->ojrelid, subst);
+			if (is_outer_join)
+				phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
 			Assert(!bms_is_empty(phinfo->ph_eval_at));	/* checked previously */
+
 			/* Reduce ph_needed to contain only "relation 0"; see below */
 			if (bms_is_member(0, phinfo->ph_needed))
 				phinfo->ph_needed = bms_make_singleton(0);
 			else
 				phinfo->ph_needed = NULL;
 
-			phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
+			phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
+			if (is_outer_join)
+				phv->phrels = bms_del_member(phv->phrels, ojrelid);
+			Assert(!bms_is_empty(phv->phrels));
 
 			/*
-			 * ph_lateral might contain rels mentioned in ph_eval_at after the
-			 * replacement, remove them.
+			 * For self-join removal, update Var nodes within the PHV's
+			 * expression to reference the replacement relid, and adjust
+			 * ph_lateral for the relid substitution.  (For left-join removal,
+			 * we're removing rather than replacing, and any surviving PHV
+			 * shouldn't reference the removed rel in its expression.  Also,
+			 * relid can't appear in ph_lateral for outer joins.)
 			 */
-			phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
-			/* ph_lateral might or might not be empty */
-
-			phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
-			if (sjinfo != NULL)
-				phv->phrels = adjust_relid_set(phv->phrels,
-											   sjinfo->ojrelid, subst);
-			Assert(!bms_is_empty(phv->phrels));
+			if (is_self_join)
+			{
+				ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
+									   replace_relid_callback);
+				phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
 
-			ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
-								   replace_relid_callback);
+				/*
+				 * ph_lateral might contain rels mentioned in ph_eval_at after
+				 * the replacement, remove them.
+				 */
+				phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+				/* ph_lateral might or might not be empty */
+			}
 
 			Assert(phv->phnullingrels == NULL); /* no need to adjust */
 		}
@@ -482,29 +607,40 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 
 	/*
 	 * Likewise remove references from EquivalenceClasses.
+	 *
+	 * For self-join removal, the caller has already updated the
+	 * EquivalenceClasses, so we can skip this step.
 	 */
-	foreach(l, root->eq_classes)
+	if (is_outer_join)
 	{
-		EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+		foreach(l, root->eq_classes)
+		{
+			EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
 
-		if (bms_is_member(relid, ec->ec_relids) ||
-			(sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
-			remove_rel_from_eclass(ec, sjinfo, relid, subst);
+			if (bms_is_member(relid, ec->ec_relids) ||
+				bms_is_member(ojrelid, ec->ec_relids))
+				remove_rel_from_eclass(ec, relid, ojrelid);
+		}
 	}
 
 	/*
-	 * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
-	 * ph_needed relid sets.  These have to be known accurately, else we may
-	 * fail to remove other now-removable outer joins.  And our removal of the
-	 * join clause(s) for this outer join may mean that Vars that were
-	 * formerly needed no longer are.  So we have to do this honestly by
-	 * repeating the construction of those relid sets.  We can cheat to one
-	 * small extent: we can avoid re-examining the targetlist and HAVING qual
-	 * by preserving "relation 0" bits from the existing relid sets.  This is
-	 * safe because we'd never remove such references.
+	 * Finally, we must prepare for the caller to recompute per-Var
+	 * attr_needed and per-PlaceHolderVar ph_needed relid sets.  These have to
+	 * be known accurately, else we may fail to remove other now-removable
+	 * joins.  Because the caller removes the join clause(s) associated with
+	 * the removed join, Vars that were formerly needed may no longer be.
 	 *
-	 * So, start by removing all other bits from attr_needed sets and
-	 * lateral_vars lists.  (We already did this above for ph_needed.)
+	 * The actual reconstruction of these relid sets is performed by the
+	 * specific caller.  Here, we simply clear out the existing attr_needed
+	 * sets (we already did this above for ph_needed) to ensure they are
+	 * rebuilt from scratch.  We can cheat to one small extent: we can avoid
+	 * re-examining the targetlist and HAVING qual by preserving "relation 0"
+	 * bits from the existing relid sets.  This is safe because we'd never
+	 * remove such references.
+	 *
+	 * Additionally, if we are performing self-join elimination, we must
+	 * replace references to the removed relid with subst within the
+	 * lateral_vars lists.
 	 */
 	for (rti = 1; rti < root->simple_rel_array_size; rti++)
 	{
@@ -527,126 +663,12 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
 				otherrel->attr_needed[attroff] = NULL;
 		}
 
-		if (subst > 0)
+		if (is_self_join)
 			ChangeVarNodesExtended((Node *) otherrel->lateral_vars, relid,
 								   subst, 0, replace_relid_callback);
 	}
 }
 
-/*
- * Remove the target relid and references to the target join from the
- * planner's data structures, having determined that there is no need
- * to include them in the query.
- *
- * We are not terribly thorough here.  We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
- */
-static void
-remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
-							  SpecialJoinInfo *sjinfo)
-{
-	RelOptInfo *rel = find_base_rel(root, relid);
-	int			ojrelid = sjinfo->ojrelid;
-	Relids		joinrelids;
-	Relids		join_plus_commute;
-	List	   *joininfos;
-	ListCell   *l;
-
-	/* Compute the relid set for the join we are considering */
-	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
-	Assert(ojrelid != 0);
-	joinrelids = bms_add_member(joinrelids, ojrelid);
-
-	remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
-
-	/*
-	 * Remove any joinquals referencing the rel from the joininfo lists.
-	 *
-	 * In some cases, a joinqual has to be put back after deleting its
-	 * reference to the target rel.  This can occur for pseudoconstant and
-	 * outerjoin-delayed quals, which can get marked as requiring the rel in
-	 * order to force them to be evaluated at or above the join.  We can't
-	 * just discard them, though.  Only quals that logically belonged to the
-	 * outer join being discarded should be removed from the query.
-	 *
-	 * We might encounter a qual that is a clone of a deletable qual with some
-	 * outer-join relids added (see deconstruct_distribute_oj_quals).  To
-	 * ensure we get rid of such clones as well, add the relids of all OJs
-	 * commutable with this one to the set we test against for
-	 * pushed-down-ness.
-	 */
-	join_plus_commute = bms_union(joinrelids,
-								  sjinfo->commute_above_r);
-	join_plus_commute = bms_add_members(join_plus_commute,
-										sjinfo->commute_below_l);
-
-	/*
-	 * We must make a copy of the rel's old joininfo list before starting the
-	 * loop, because otherwise remove_join_clause_from_rels would destroy the
-	 * list while we're scanning it.
-	 */
-	joininfos = list_copy(rel->joininfo);
-	foreach(l, joininfos)
-	{
-		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
-
-		remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
-
-		if (RINFO_IS_PUSHED_DOWN(rinfo, join_plus_commute))
-		{
-			/*
-			 * There might be references to relid or ojrelid in the
-			 * RestrictInfo's relid sets, as a consequence of PHVs having had
-			 * ph_eval_at sets that include those.  We already checked above
-			 * that any such PHV is safe (and updated its ph_eval_at), so we
-			 * can just drop those references.
-			 */
-			remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
-
-			/*
-			 * Cross-check that the clause itself does not reference the
-			 * target rel or join.
-			 */
-#ifdef USE_ASSERT_CHECKING
-			{
-				Relids		clause_varnos = pull_varnos(root,
-														(Node *) rinfo->clause);
-
-				Assert(!bms_is_member(relid, clause_varnos));
-				Assert(!bms_is_member(ojrelid, clause_varnos));
-			}
-#endif
-			/* Now throw it back into the joininfo lists */
-			distribute_restrictinfo_to_rels(root, rinfo);
-		}
-	}
-
-	/*
-	 * There may be references to the rel in root->fkey_list, but if so,
-	 * match_foreign_keys_to_quals() will get rid of them.
-	 */
-
-	/*
-	 * Now remove the rel from the baserel array to prevent it from being
-	 * referenced again.  (We can't do this earlier because
-	 * remove_join_clause_from_rels will touch it.)
-	 */
-	root->simple_rel_array[relid] = NULL;
-	root->simple_rte_array[relid] = NULL;
-
-	/* And nuke the RelOptInfo, just in case there's another access path */
-	pfree(rel);
-
-	/*
-	 * Now repeat construction of attr_needed bits coming from all other
-	 * sources.
-	 */
-	rebuild_placeholder_attr_needed(root);
-	rebuild_joinclause_attr_needed(root);
-	rebuild_eclass_attr_needed(root);
-	rebuild_lateral_attr_needed(root);
-}
-
 /*
  * Remove any references to relid or ojrelid from the RestrictInfo.
  *
@@ -707,8 +729,7 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
 }
 
 /*
- * Remove any references to relid or sjinfo->ojrelid (if sjinfo != NULL)
- * from the EquivalenceClass.
+ * Remove any references to relid or ojrelid from the EquivalenceClass.
  *
  * Like remove_rel_from_restrictinfo, we don't worry about cleaning out
  * any nullingrel bits in contained Vars and PHVs.  (This might have to be
@@ -717,16 +738,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
  * level(s).
  */
 static void
-remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
-					   int relid, int subst)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
 {
 	ListCell   *lc;
 
 	/* Fix up the EC's overall relids */
-	ec->ec_relids = adjust_relid_set(ec->ec_relids, relid, subst);
-	if (sjinfo != NULL)
-		ec->ec_relids = adjust_relid_set(ec->ec_relids,
-										 sjinfo->ojrelid, subst);
+	ec->ec_relids = bms_del_member(ec->ec_relids, relid);
+	ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
 
 	/*
 	 * We don't expect any EC child members to exist at this point.  Ensure
@@ -745,14 +763,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
 		EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
 
 		if (bms_is_member(relid, cur_em->em_relids) ||
-			(sjinfo != NULL && bms_is_member(sjinfo->ojrelid,
-											 cur_em->em_relids)))
+			bms_is_member(ojrelid, cur_em->em_relids))
 		{
 			Assert(!cur_em->em_is_const);
-			cur_em->em_relids = adjust_relid_set(cur_em->em_relids, relid, subst);
-			if (sjinfo != NULL)
-				cur_em->em_relids = adjust_relid_set(cur_em->em_relids,
-													 sjinfo->ojrelid, subst);
+			cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
+			cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
 			if (bms_is_empty(cur_em->em_relids))
 				ec->ec_members = foreach_delete_current(ec->ec_members, lc);
 		}
@@ -763,11 +778,7 @@ remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
-		if (sjinfo == NULL)
-			ChangeVarNodesExtended((Node *) rinfo, relid, subst, 0,
-								   replace_relid_callback);
-		else
-			remove_rel_from_restrictinfo(rinfo, relid, sjinfo->ojrelid);
+		remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
 	}
 
 	/*
@@ -1958,14 +1969,11 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
 						   0, replace_relid_callback);
 
 	/* Replace links in the planner info */
-	remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+	remove_rel_from_query(root, toRemove->relid, toKeep->relid, NULL, NULL);
 
-	/* At last, replace varno in root targetlist and HAVING clause */
+	/* Replace varno in the fully-processed targetlist */
 	ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
 						   toKeep->relid, 0, replace_relid_callback);
-	ChangeVarNodesExtended((Node *) root->processed_groupClause,
-						   toRemove->relid, toKeep->relid, 0,
-						   replace_relid_callback);
 
 	adjust_relid_set(root->all_result_relids, toRemove->relid, toKeep->relid);
 	adjust_relid_set(root->leaf_result_relids, toRemove->relid, toKeep->relid);
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Clean up remove_rel_from_query() after self-join elimination commit
@ 2026-04-07 01:43  Tender Wang <[email protected]>
  parent: Richard Guo <[email protected]>
  1 sibling, 0 replies; 7+ messages in thread

From: Tender Wang @ 2026-04-07 01:43 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]>

Richard Guo <[email protected]> 于2026年4月6日周一 16:11写道:
>
> While working on reusing remove_rel_from_query() for inner-join
> removal, I noticed that the function is in pretty rough shape.  The
> self-join elimination (SJE) commit grafted self-join removal onto
> remove_rel_from_query(), which was originally written for left-join
> removal only.

Yes, I noticed this a few days ago when I tried to remove a redundant
filter added during SJE.

>This resulted in several issues:
> 1. Comments throughout remove_rel_from_query() still assumed only
> left-join removal, making the code misleading.  For example:
>
>   * This is relevant in case the outer join we're deleting is nested inside
>   * other outer joins:
>

The current logic of remove_rel_from_query() is not easy to read. For example,
It distinguishes between left-join removal and SJE based on whether
sjinfo is NULL.

> 2. This was called even for left-join removal, with subst=-1.  This is
> pointless and confusing.
>
>       ChangeVarNodesExtended((Node *) phv->phexpr, relid, subst, 0,
>                              replace_relid_callback);
>
Yes, it is.
> 3. phinfo->ph_lateral was adjusted for left-join removal, which is
> also confusing ...
>
>        phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
>
>        /*
>         * ph_lateral might contain rels mentioned in ph_eval_at after the
>         * replacement, remove them.
>         */
>        phinfo->ph_lateral = bms_difference(phinfo->ph_lateral,
> phinfo->ph_eval_at);
>
> ... since you can find this Assert just above:
>
>        Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
>
> 4. The comment about attr_needed reconstruction was in
> remove_rel_from_query(), but the actual rebuild is performed by
> the callers.
>
> 5. The EquivalenceClass processing in remove_rel_from_query():
>
>     /*
>      * Likewise remove references from EquivalenceClasses.
>      */
>     foreach(l, root->eq_classes)
>     {
>         EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
>
>         if (bms_is_member(relid, ec->ec_relids) ||
>             (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
>             remove_rel_from_eclass(ec, sjinfo, relid, subst);
>     }
>
> The condition always evaluates to true for self-join elimination
> (i.e., sjinfo == NULL), meaning every EquivalenceClass gets adjusted.
> But this is redundant because the caller remove_self_join_rel()
> already handles ECs via update_eclasses().
>
> 6. In remove_self_join_rel(), I notice this code:
>
>     /* At last, replace varno in root targetlist and HAVING clause */
>     ChangeVarNodesExtended((Node *) root->processed_tlist, toRemove->relid,
>                            toKeep->relid, 0, replace_relid_callback);
>     ChangeVarNodesExtended((Node *) root->processed_groupClause,
>                            toRemove->relid, toKeep->relid, 0,
>                            replace_relid_callback);
>
> The comment mentions "HAVING clause", but neither processed_tlist nor
> processed_groupClause has anything to do with the HAVING clause.
> Furthermore, processed_groupClause contains SortGroupClause nodes,
> which have no Var nodes to rewrite, so calling ChangeVarNodesExtended
> on it is pointless.

I didn't find as many as you listed here. I agree that we should do
something for
current logic.
>
> The attached patch is an attempt to fix all these issues.  It also
> aims to leave the code better structured for adding new types of join
> removal (such as inner-join removal) in the future.

I looked through the attached patch. LGTM.



-- 
Thanks,
Tender Wang





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Clean up remove_rel_from_query() after self-join elimination commit
@ 2026-04-07 08:22  Andrei Lepikhov <[email protected]>
  parent: Richard Guo <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Andrei Lepikhov @ 2026-04-07 08:22 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; Pg Hackers <[email protected]>

On 06/04/2026 10:11, Richard Guo wrote:
> Thoughts?
Thanks for your efforts!

The main goal of the SJE feature was to find common ground within the 
community - to come up with a solution that everyone could get behind on 
whether to allow optimisations that address redundant queries and reduce 
query tree complexity in early planning stages. So, some code roughness 
was ok.

I looked through your code, though maybe not as deeply as this part of 
the planner deserves. Overall, it looks good, and I didn’t spot any 
obvious problems, but I do have a few comments. We added some 
‘redundant’ checks with future optimisations in mind, so others can rely 
on these functions to remove tails from query structures or to error if 
something was left behind.

You explicitly write:

‘Each specific caller remains responsible for updating any remaining 
data structures required by its unique removal logic’

that differs from the initial idea. At the same time, by the end of SJE 
development, I wasn’t so sure we could invent a universal approach to 
guarantee the cleanup of the query tree - mostly because of the inherent 
volatility of the PlannerInfo structure and because we had not agreed to 
make the parse tree a read-only structure.

So, I’m fine with the changes in this patch.

-- 
regards, Andrei Lepikhov,
pgEdge





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Clean up remove_rel_from_query() after self-join elimination commit
@ 2026-04-07 09:57  wenhui qiu <[email protected]>
  parent: Andrei Lepikhov <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: wenhui qiu @ 2026-04-07 09:57 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]>

HI Richard
> While working on reusing remove_rel_from_query() for inner-join
> removal, I noticed that the function is in pretty rough shape.  The
> self-join elimination (SJE) commit grafted self-join removal onto
> remove_rel_from_query(), which was originally written for left-join
> removal only.

Thanks for working on this.

I see there are already asserts here for the mode split and for ojrelid
validity. What I had in mind was slightly narrower: making the joinrelids
contract explicit as well.As far as I can tell, the existing asserts would
still allow an outer-join call with joinrelids == NULL, even though
joinrelids is later used in the outer-join-specific PHV handling.
I wonder if something like

```c
Assert(!is_outer_join || joinrelids != NULL);
Assert(!is_self_join || joinrelids == NULL);



Thanks

On Tue, Apr 7, 2026 at 4:22 PM Andrei Lepikhov <[email protected]> wrote:

> On 06/04/2026 10:11, Richard Guo wrote:
> > Thoughts?
> Thanks for your efforts!
>
> The main goal of the SJE feature was to find common ground within the
> community - to come up with a solution that everyone could get behind on
> whether to allow optimisations that address redundant queries and reduce
> query tree complexity in early planning stages. So, some code roughness
> was ok.
>
> I looked through your code, though maybe not as deeply as this part of
> the planner deserves. Overall, it looks good, and I didn’t spot any
> obvious problems, but I do have a few comments. We added some
> ‘redundant’ checks with future optimisations in mind, so others can rely
> on these functions to remove tails from query structures or to error if
> something was left behind.
>
> You explicitly write:
>
> ‘Each specific caller remains responsible for updating any remaining
> data structures required by its unique removal logic’
>
> that differs from the initial idea. At the same time, by the end of SJE
> development, I wasn’t so sure we could invent a universal approach to
> guarantee the cleanup of the query tree - mostly because of the inherent
> volatility of the PlannerInfo structure and because we had not agreed to
> make the parse tree a read-only structure.
>
> So, I’m fine with the changes in this patch.
>
> --
> regards, Andrei Lepikhov,
> pgEdge
>
>
>


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Clean up remove_rel_from_query() after self-join elimination commit
@ 2026-04-18 10:17  Richard Guo <[email protected]>
  parent: wenhui qiu <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Richard Guo @ 2026-04-18 10:17 UTC (permalink / raw)
  To: wenhui qiu <[email protected]>; +Cc: Pg Hackers <[email protected]>

On Tue, Apr 7, 2026 at 6:57 PM wenhui qiu <[email protected]> wrote:
> Assert(!is_outer_join || joinrelids != NULL);

Worth asserting.  If a caller sets sjinfo but passes NULL for
joinrelids, this would silently over-delete PHVs.

> Assert(!is_self_join || joinrelids == NULL);

I prefer to not add this one.  It's not defending any invariant.

- Richard





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Clean up remove_rel_from_query() after self-join elimination commit
@ 2026-04-20 00:49  wenhui qiu <[email protected]>
  parent: Richard Guo <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: wenhui qiu @ 2026-04-20 00:49 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]>

Hi Richard
> Assert(!is_outer_join || joinrelids != NULL);

> Worth asserting.  If a caller sets sjinfo but passes NULL for
> joinrelids, this would silently over-delete PHVs.
> Assert(!is_self_join || joinrelids == NULL);

LGTM with the added assertion. Thanks again for all the heavy lifting
you're doing on the Postgres optimizer

> I prefer to not add this one.  It's not defending any invariant.
Thank you for your explanation ,


Thanks

On Sat, Apr 18, 2026 at 6:17 PM Richard Guo <[email protected]> wrote:

> On Tue, Apr 7, 2026 at 6:57 PM wenhui qiu <[email protected]> wrote:
> > Assert(!is_outer_join || joinrelids != NULL);
>
> Worth asserting.  If a caller sets sjinfo but passes NULL for
> joinrelids, this would silently over-delete PHVs.
>
> > Assert(!is_self_join || joinrelids == NULL);
>
> I prefer to not add this one.  It's not defending any invariant.
>
> - Richard
>


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Clean up remove_rel_from_query() after self-join elimination commit
@ 2026-04-20 08:17  Richard Guo <[email protected]>
  parent: wenhui qiu <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Richard Guo @ 2026-04-20 08:17 UTC (permalink / raw)
  To: wenhui qiu <[email protected]>; +Cc: Pg Hackers <[email protected]>

On Mon, Apr 20, 2026 at 9:49 AM wenhui qiu <[email protected]> wrote:
> > Assert(!is_outer_join || joinrelids != NULL);
>
> > Worth asserting.  If a caller sets sjinfo but passes NULL for
> > joinrelids, this would silently over-delete PHVs.
> > Assert(!is_self_join || joinrelids == NULL);
>
> LGTM with the added assertion.

Thanks.  Committed.

- Richard





^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2026-04-20 08:17 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-06 08:11 Clean up remove_rel_from_query() after self-join elimination commit Richard Guo <[email protected]>
2026-04-07 01:43 ` Tender Wang <[email protected]>
2026-04-07 08:22 ` Andrei Lepikhov <[email protected]>
2026-04-07 09:57   ` wenhui qiu <[email protected]>
2026-04-18 10:17     ` Richard Guo <[email protected]>
2026-04-20 00:49       ` wenhui qiu <[email protected]>
2026-04-20 08:17         ` Richard Guo <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox