Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wLPGm-001lCp-2B for pgsql-hackers@arkaria.postgresql.org; Fri, 08 May 2026 17:45:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wLPGl-00AdfX-1I for pgsql-hackers@arkaria.postgresql.org; Fri, 08 May 2026 17:45:47 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wLPGl-00AdfP-0C for pgsql-hackers@lists.postgresql.org; Fri, 08 May 2026 17:45:47 +0000 Received: from mail-ej1-x631.google.com ([2a00:1450:4864:20::631]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wLPGj-00000001GVz-00BB for pgsql-hackers@lists.postgresql.org; Fri, 08 May 2026 17:45:46 +0000 Received: by mail-ej1-x631.google.com with SMTP id a640c23a62f3a-bca0ff2f8dfso245625566b.1 for ; Fri, 08 May 2026 10:45:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1778262343; cv=none; d=google.com; s=arc-20240605; b=KdE+8rNwvQ3bYHsS+e/6QlNYYn/PEZasWHl0n8m9JaKi7LcEyoq1HWCqrc/jpxJ0Da DATWHy4NDkjvKTLPmy6eTaKH6+tRQ6aTvnEp42XrzxUaFekRjJ9Qc8/9SYow/MRulrgF 79Cw+7WnGf5lpNwzZToRHP0G6sdZ9WhfjJUTLDoBI7OCqTBQf6+IaguFJJRSgwWHLltI LofpkzHC7DQplvQUIXuGKcZUHHvhIGqjaGhgEn5t/IafSmlVE6KIBvFCCZd92w94Ikni Dh0+qV8EV9gJNuU3QLDGmvFRrElYqfQQYtwO7CIYOT9XXPXYWh0HuvYnRuqebsGScE7k EmMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=n0HFkf2BfUi01LQgVZIG29NmxC22B2DcjCYjd3ufu44=; fh=cAwZj7RoCdNRgSCZpfX7fDVlcaCBSosxOsGx4VMy5aQ=; b=YL3bZeAn5jfhQ6MVKbzyGKQRhOkmQKpAFx0HA3+epg+4DzqPkBpPLSsEEpsLILTxCD PNbefNHjg4Jlu3YLP9Qk6anem3+efq+vIQff9OT8t7FR/w1ghNx7ev5m+qbjwGXxhIrd H+6NddUmAbz5JNm1LpTcb7Rh/qDh9NLO6OMGs9NHrsPL0bn1SMsR9kp1SqcBfEiAJG5L +2Fq9uyLfJ94yc21mtfnv3fV4x88h9L7SA6CH+XAg02ioSkXgFlGZJ9XWa5qN1/uFsy0 qfrqk5piOUTT8lONjtiX31dCXChToSSsrKTQkZmrluI6l4MFj3SzwBjPdkZ4rzf/IjJj gXjg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778262343; x=1778867143; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=n0HFkf2BfUi01LQgVZIG29NmxC22B2DcjCYjd3ufu44=; b=C/Fft0STtx5zFQFJxuGflSTe6tfYVZX5IEMUM7kBwsCk1NCMX606hLP9+DXOTUvokP a/a+huWfohD2aSLYb3NGAOXe96aYe3qYxHE2FCFvt4c+v0FHmprSsvq0DQkx6zWmj6MT wH8tIFONP+gkLTGt6owHjnIUVIraigle6fuCZHjKp45Mc1iXEuGnyGLLGlNC4FAA3kwP 4iVzGqW+Mix27OLbkZeMoM7WOf119mnruG/aPl30u1ZNfj+jffOHu5/4QqelbdOLeCX9 vZx2gDqk9PC5WpJV1LHKK+Gb3jz4eMSePVm4DNMpTJrcc08W5ZrfciMJTvrpG570ZRU/ JQgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778262343; x=1778867143; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=n0HFkf2BfUi01LQgVZIG29NmxC22B2DcjCYjd3ufu44=; b=OMDU0etg4EUyovIUK1hJsK+Y+yPY2XgHpAULaViMjLwWBXZpkdJCMKriOA5EMf1wbh G1hiWoMqKbrS6BX7VvmOeRcSjAcv65LaZ7HxBihfCNae+pCi8rBqCKN//GwR7xqXBq7h Jem67tXgBQ5kKuRRrFQSbnOz3jNfa3hOKQN3ubLsMM7wGCo7+38pxoYZLsXrzh89hz0o 0Qt45KmGKfocCX+v78+1wDeMtE8/ICuLAMlmD6+dqmo4+ef5TmRpKRez8pe/hXvtQCb/ GMvY1pz5p0OxHknCzCya6EcXLdLZzEtgPPSMbUnY6k3LzNmZWgUNansuwu5Ml1L3IP1A QP/w== X-Forwarded-Encrypted: i=1; AFNElJ/pJXPzqHTgYq0ZQgmXiTIwGXSHVXi5t3ToTD2/Nvbd6UiWM2Sdh43m8vQNc7dejqJEyqgYerlFea8CBEZE@lists.postgresql.org X-Gm-Message-State: AOJu0YwcBmd9U5u8vMew62fIXcwRwr56y+8PZM16pb7jrCkcdO8VOyPH y6YZaRT9GlyUmFU5g2CF2eAy0aQ7UujYsnbVUrmjIkJE+Ry89YQSO8CsyJF/OYPhP9oi9ERVHCH VX9dv35jO89by3FxPpqgQ50ELEQc0h5g= X-Gm-Gg: AeBDieviRJx8BMDhagTgHsMBP0XxyQNLBGvA6U/P9l/zw79COhHgSPpJR+l6MvU0WeU ADgguIERh9P4fcxsxY5w6+iWfCJbyjfCXsAKNe3jnUz1PnC4EL5r3wajH4aVOLed1wlZe5+wREK XQgAfx+Ru4nlx2EnlMRQLY/8DMj99CJZNW9nCcRF0l3r74usAvSgfWbMQe5bt78UE77tD+gknpB X9ptJz9MJjcN0Em9tZh0LdjXrVrc8Fo1sQz+GxIudamiUferGEhlc9eCDkci229zDlqh9WUX26W ojo= X-Received: by 2002:a17:907:1b0e:b0:bc1:ac27:dfde with SMTP id a640c23a62f3a-bc56b41d3d3mr860483866b.18.1778262343309; Fri, 08 May 2026 10:45:43 -0700 (PDT) MIME-Version: 1.0 References: <0b1f670d-b39d-4966-bf32-f0d502ebc564@postgrespro.ru> <46bc4eaf-58c5-42ab-8041-d3380a0768de@postgrespro.ru> <9b041978-06e3-4a50-8a5d-dacbb054f23e@tantorlabs.com> <8d7cbbf6-ddc0-4696-8af2-a68d740e14f2@postgrespro.ru> <6078f7e1-4b29-4b31-bd28-b84149e404e4@postgrespro.ru> <975a3736-a8b5-49b3-8009-4d4e86867aa1@postgrespro.ru> <0adbf75a-2059-4bb7-b878-c33892dbc1f0@tantorlabs.com> <300d7bb0-345c-4c0b-a0ef-4de573fcc94b@postgrespro.ru> <32446f52-1b18-499a-a77b-d6b66ad0f4cf@tantorlabs.com> <87a6fcbf-929a-408a-9471-28be3080254a@postgrespro.ru> <4e9d279c-83c1-42d3-b74d-7366ecd7f085@tantorlabs.com> <0c498546-8af0-4aec-98fa-ad4f373cf711@yandex.ru> <0686b354-57dc-46e9-b756-21661c656c10@yandex.ru> In-Reply-To: <0686b354-57dc-46e9-b756-21661c656c10@yandex.ru> From: Peter Petrov Date: Fri, 8 May 2026 20:45:31 +0300 X-Gm-Features: AVHnY4IpsJ1GuYseiIHrXjp1Y5xkHuQuVHvJzrrMmCRZ4H8Dg2KdkxosHy8Lp7c Message-ID: Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query To: Alena Rybakina Cc: Ilia Evdokimov , PostgreSQL Hackers , Ranier Vilela , David Rowley Content-Type: multipart/alternative; boundary="0000000000004a5d05065151f761" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000004a5d05065151f761 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Alena! Sorry for my long silence, if you are interested in my thoughts about your patch then here they are: 1) First of all, let's read the comment below. /* * Separate out the WHERE clause. (We could theoretically also remove * top-level plain JOIN/ON clauses, but it's probably not worth the * trouble.) */ We need to separate two things: the jointree and the WHERE clause, so it's possible to do something like this whereClause =3D subselect->jointree->quals; subselect->jointree->quals =3D NULL; jointree =3D subselect->jointree; subselect->jointree =3D NULL; if (contain_vars_of_level((Node *) subselect, 1)) return NULL; /* Do our checks in the jointree and stop if we can't do pullup */ /* Return the jointree back */ subselect->jointree =3D jointree; I think it's more clear than in the patch right now. 2) We don't need to use get_relids_in_jointree() and nullable_above since the tree's traversing is from the top to the bottom and we know that in LEFT JOIN and FULL JOIN LHS is nullable in RIGHT JOIN and FULL JOIN RHS is nullable So we can use a boolean variables like this rarg_is_nullable =3D (is_nullable_side || j->jointype =3D=3D JOIN_FULL || j->jointype =3D=3D JOIN_LEFT); larg_is_nullable =3D (is_nullable_side || j->jointype =3D=3D JOIN_FULL || j->jointype =3D=3D JOIN_RIGHT); And then work with them. I suspect it will be much easier to follow that in the patch right now. I fear that you don't check FULL JOINS here. 3) To be honest, we just work with the top jointree, we don't descend to subqueries, therefore, I am not sure that the mutator is a good name here. AFAICS, mutators were designed to modify something including some parts of the subqueries but it's not the case here. It's a simple jointree traversal, we don't need HoistJoinQualsContext as well. I think, we need three things here: Node *node - the node in the jointree we are working with bool is_nullable_side - are we on the nullable side of some outer join List **exprs - the list in which we collect JoinExpr and FromExpr with outer references. I propose just traverse the jointree, make our checks and then collect JoinExpr and FromExpr for the further processing if everything is good. 4) After checking the WHERE clause and the jointree we can traverse our list, make a new whereClause by appending quals with outer references and then replace quals in JoinExprs to constant true. Something like make_and_qual() 5) I have also noticed that you are using canonicalize_qual() I suspect we don't need it either since it will be called later. And here is the stack subquery_planner pull_up_sublinks(root) preprocess_qual_conditions(root, (Node *) parse->jointree) j->quals =3D preprocess_expression(root, j->quals, EXPRKIND_QUAL= ) expr =3D (Node *) canonicalize_qual((Expr *) expr, false) find_duplicate_ors(qual, is_check) So we will flatten all nested BoolExpr with the type BOOL_AND. 6) I have noticed the new output from one regression test. The previous output was Merge Anti Join Merge Cond: (t1.c1 =3D t2.c2) -> Sort Sort Key: t1.c1 -> Seq Scan on tt4x t1 -> Sort and the new output is the Hash Anti Join Hash Anti Join Hash Cond: (t1.c1 =3D t2.c2) -> Seq Scan on tt4x t1 -> Hash This is very suspicious, why does this patch cause such changes? 7) There is a SubLink which won't be pulled up, you can see it below explain (COSTS OFF) SELECT * FROM tenk1 A WHERE EXISTS (SELECT 1 FROM tenk2 B, (SELECT f.hundred FROM (SELECT A.hundred) AS f ) AS v WHERE B.hundred =3D v.hundred ); And the reason is the following. (SELECT A.hundred) is the subquery and won't be processed until the function pull_up_sublinks() finishes its job. But it leads us to an interesting observation. Maybe we could process it somewhere in the SS_process_sublinks(). Perhaps, f and v subqueries could be pulled up and then the EXISTS sublink could be replaced with a SEMI JOIN. I don't know whether we should solve this problem right now. I have never seen such queries in real workloads but it doesn't mean that they don't exist. In short, I think your patch can be simplified. =D0=BF=D1=82, 8 =D0=BC=D0=B0=D1=8F 2026=E2=80=AF=D0=B3. =D0=B2 20:37, Alena= Rybakina : > > Feel free to review my patch! > > ----------- > Best regards, > Yandex Cloud > Alena Rybakina > --0000000000004a5d05065151f761 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hello Alena!

Sorry for my long silence, if you are interested in my thoughts about your patch then here they are:

1)=C2=A0 First of all, let's read the comment below.

/*
=C2=A0* Separate out the WHERE clause.=C2=A0 (We could theoretically = also remove
=C2=A0* top-level plain JOIN/ON clauses, but it's probably not wo= rth the
=C2=A0* trouble.)
=C2=A0*/

We need to separate two things: the jointree and the WHERE clause, so it's possible to do something like this

whereClause =3D subselect->jointree->quals;
subselect->jointree->quals =3D NULL;

jointree =3D subselect->jointree;
subselect->jointree =3D NULL;

if (contain_vars_of_level((Node *) subselect, 1))
=C2=A0 =C2=A0 return NULL;

/* Do our checks in the jointree and stop if we can't do pullup */

/* Return the jointree back */
subselect->jointree =3D jointree;

I think it's more clear than in the patch right now.

2) We don't need to use get_relids_in_jointree() and nullable_above since the tree's traversing is from the top to the bottom and we know that

=C2=A0 =C2=A0 =C2=A0in LEFT JOIN and FULL JOIN LHS is nullable
=C2=A0 =C2=A0 =C2=A0in RIGHT JOIN and FULL JOIN RHS=C2=A0 is nullable=

So we can use a boolean variables like this

rarg_is_nullable =3D (is_nullable_side ||
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 j->jointype =3D=3D JOI= N_FULL ||
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 j->jointype =3D=3D JOI= N_LEFT);

larg_is_nullable =3D (is_nullable_side ||
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0j->jointype =3D=3D JOIN= _FULL ||
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0j->jointype =3D=3D JOIN= _RIGHT);

And then work with them. I suspect it will be much easier to follow that in the patch right now.

I fear that you don't check FULL JOINS here.

3) To be honest, we just work with the top jointree, we don't descend to subqueries, therefore, I am not sure that the mutator is a good name here.

=C2=A0 =C2=A0 AFAICS, mutators were designed to modify something incl= uding some parts of the subqueries but it's not the case here.

=C2=A0 =C2=A0 It's a simple jointree traversal, we don't need HoistJoinQualsContext as well.

=C2=A0 =C2=A0 I think, we need three things here:

=C2=A0 =C2=A0 Node *node - the node in the jointree we are working w= ith

=C2=A0 =C2=A0 bool is_nullable_side - are we on the nullable side of= some outer join

=C2=A0 =C2=A0 List **exprs - the list in which we collect JoinExpr an= d FromExpr with outer references.

=C2=A0 =C2=A0 I propose just traverse the jointree, make our checks a= nd then collect JoinExpr and FromExpr for the further processing if everything is good.=C2=A0 =C2=A0 =C2=A0

4) After checking the WHERE clause and the jointree we can traverse our list, make a new whereClause by appending quals with outer references and then replace quals in JoinExprs to constant true.

=C2=A0 =C2=A0 Something like make_and_qual()

5) I have also noticed that you are using canonicalize_qual()

=C2=A0 =C2=A0 I suspect we don't need it either since it will be = called later. And here is the stack

=C2=A0 =C2=A0 subquery_planner
=C2=A0 =C2=A0 =C2=A0 =C2=A0pull_up_sublinks(root)
=C2=A0 =C2=A0 =C2=A0 =C2=A0preprocess_qual_conditions(root, (Node *) parse->jointree)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0j->quals =3D preprocess_e= xpression(root, j->quals, EXPRKIND_QUAL)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 expr =3D (Node *) ca= nonicalize_qual((Expr *) expr, false)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0find_du= plicate_ors(qual, is_check)=C2=A0=C2=A0
=C2=A0
=C2=A0 =C2=A0So we will flatten all nested BoolExpr with the type BOO= L_AND.

6) I have noticed the new output from one regression test. The previous output was

=C2=A0 =C2=A0 Merge Anti Join
=C2=A0 =C2=A0 =C2=A0 Merge Cond: (t1.c1 =3D t2.c2)
=C2=A0 =C2=A0 =C2=A0 =C2=A0->=C2=A0 Sort
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Key: t1.c= 1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->=C2=A0 Se= q Scan on tt4x t1
=C2=A0 =C2=A0 =C2=A0 =C2=A0->=C2=A0 Sort

=C2=A0 =C2=A0and the new output is the Hash Anti Join

=C2=A0 =C2=A0Hash Anti Join
=C2=A0 =C2=A0 =C2=A0 =C2=A0Hash Cond: (t1.c1 =3D t2.c2)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ->=C2=A0 Seq Scan on tt4x t1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ->=C2=A0 Hash
=C2=A0 =C2=A0
=C2=A0 This is very suspicious, why does this patch cause such change= s?

7) There is a SubLink which won't be pulled up, you can see it below

explain (COSTS OFF)
SELECT *
=C2=A0 FROM tenk1 A
=C2=A0WHERE EXISTS (SELECT 1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0FROM tenk2 B, (SELECT f.hundred
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0FROM (SELECT A.hundred) AS f
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0) AS v=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 WHERE B.hundred =3D v.hundred
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0);

And the reason is the following. (= SELECT A.hundred) is the subquery and won't be processed until the function pull_up_sublinks() finishes its job.

But it leads us to an interesting observation. Maybe we could process it somewhere in the SS_process_sublinks().

Perhaps, f and v subqueries could be pulled up and then the EXISTS sublink could be replaced with a SEMI JOIN.

I don't know whether we should solve this problem right now. I have never seen such queries in real workloads but it doesn't mea= n that they don't exist.

In short, I think your patch can be simplified.


=D0=BF=D1=82, 8 =D0=BC=D0=B0=D1=8F 2026=E2=80=AF=D0=B3. =D0=B2 20= :37, Alena Rybakina <lena.ri= backina@yandex.ru>:

Feel free to review my patch!

-----------
Best regards,
Yandex Cloud
Alena Rybakina
--0000000000004a5d05065151f761--