public inbox for [email protected]  
help / color / mirror / Atom feed
From: Alena Rybakina <[email protected]>
To: Peter Petrov <[email protected]>
Cc: Ilia Evdokimov <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Ranier Vilela <[email protected]>
Cc: David Rowley <[email protected]>
Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date: Wed, 13 May 2026 16:51:00 +0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAJ-KsAw0zS_OEcEraRdgD0wJF5-BHW9nWy286cyQryzo_HTD2Q@mail.gmail.com>
References: <[email protected]>
	<CAEudQAoD707uh5Pjpg5NMyF-QO=fzajA+BmtcoqQAeXN1C+TkQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAJ-KsAw0zS_OEcEraRdgD0wJF5-BHW9nWy286cyQryzo_HTD2Q@mail.gmail.com>

Hi! you are discussing it out of thread - I warn you just in case. Sorry 
for not replying on your email, I have noticed it now only.

On 08.05.2026 20:45, Peter Petrov wrote:
>
> Hello Alena!
>
> Sorry for my long silence, if you are interested in my thoughts about 
> your patch then here they are:
>
> 1)  First of all, let's read the comment below.
>
> /*
>  * Separate out the WHERE clause.  (We could theoretically also remove
>  * top-level plain JOIN/ON clauses, but it's probably not worth the
>  * trouble.)
>  */
>
> We need to separate two things: the jointree and the WHERE clause, so 
> it's possible to do something like this
>
> whereClause = subselect->jointree->quals;
> subselect->jointree->quals = NULL;
>
> jointree = subselect->jointree;
> subselect->jointree = NULL;
>
> if (contain_vars_of_level((Node *) subselect, 1))
>     return NULL;
>
> /* Do our checks in the jointree and stop if we can't do pullup */
>
> /* Return the jointree back */
> subselect->jointree = jointree;
>
> I think it's more clear than in the patch right now.
>
I need to learn it and analyze. I'll answer on this a bit later.
>
> 2) We don't need to use get_relids_in_jointree() and nullable_above 
> since the tree's traversing is from the top to the bottom and we know that
>
>      in LEFT JOIN and FULL JOIN LHS is nullable
>      in RIGHT JOIN and FULL JOIN RHS  is nullable
>
> So we can use a boolean variables like this
>
> rarg_is_nullable = (is_nullable_side ||
>                                   j->jointype == JOIN_FULL ||
>                                   j->jointype == JOIN_LEFT);
>
> larg_is_nullable = (is_nullable_side ||
>                                  j->jointype == JOIN_FULL ||
>                                  j->jointype == JOIN_RIGHT);
>
> And then work with them. I suspect it will be much easier to follow 
> that in the patch right now.
>
I think it doesn't enough - your null object can be located on the 
subquery upper on the tree - I the reference on this and the standard 
check for nullable side can miss it. I caught similar case before and 
added it in regression tests.
>
>
> I fear that you don't check FULL JOINS here.
>
maybe yes - I need to check it again. in my opinion, we don't need to 
provide optimization for this at all for full joins - it is rare case.
>
> 3) To be honest, we just work with the top jointree, we don't descend 
> to subqueries, therefore, I am not sure that the mutator is a good 
> name here.
>
>     AFAICS, mutators were designed to modify something including some 
> parts of the subqueries but it's not the case here.
>
>     It's a simple jointree traversal, we don't need 
> HoistJoinQualsContext as well.
>
>     I think, we need three things here:
>
>     Node *node - the node in the jointree we are working with
>
>     bool is_nullable_side - are we on the nullable side of some outer join
>
>     List **exprs - the list in which we collect JoinExpr and FromExpr 
> with outer references.
>
>     I propose just traverse the jointree, make our checks and then 
> collect JoinExpr and FromExpr for the further processing if everything 
> is good.
>
I think we should save mutator because it provides some transformations 
in subquery. But I'll check it again just in case to consider it from 
your points of view and ask you some particular questions to understand 
it clearly.
>
> 4) After checking the WHERE clause and the jointree we can traverse 
> our list, make a new whereClause by appending quals with outer 
> references and then replace quals in JoinExprs to constant true.
>
>     Something like make_and_qual()
>
> 5) I have also noticed that you are using canonicalize_qual()
>
>     I suspect we don't need it either since it will be called later. 
> And here is the stack
>
>     subquery_planner
>        pull_up_sublinks(root)
>        preprocess_qual_conditions(root, (Node *) parse->jointree)
>            j->quals = preprocess_expression(root, j->quals, EXPRKIND_QUAL)
>               expr = (Node *) canonicalize_qual((Expr *) expr, false)
>                  find_duplicate_ors(qual, is_check)
>
>    So we will flatten all nested BoolExpr with the type BOOL_AND.
>
> 6) I have noticed the new output from one regression test. The 
> previous output was
>
>     Merge Anti Join
>       Merge Cond: (t1.c1 = t2.c2)
>        ->  Sort
>                Sort Key: t1.c1
>                ->  Seq Scan on tt4x t1
>        ->  Sort
>
>    and the new output is the Hash Anti Join
>
>    Hash Anti Join
>        Hash Cond: (t1.c1 = t2.c2)
>         ->  Seq Scan on tt4x t1
>         ->  Hash
>
>   This is very suspicious, why does this patch cause such changes?
>
> 7) There is a SubLink which won't be pulled up, you can see it below
>
> explain (COSTS OFF)
> SELECT *
>   FROM tenk1 A
>  WHERE EXISTS (SELECT 1
>                                  FROM tenk2 B, (SELECT f.hundred
>    FROM (SELECT A.hundred) AS f
>                                                            ) AS v
>                               WHERE B.hundred = v.hundred
>                              );
>
> And the reason is the following. (SELECT A.hundred) is the subquery 
> and won't be processed until the function pull_up_sublinks() finishes 
> its job.
>
> But it leads us to an interesting observation. Maybe we could process 
> it somewhere in the SS_process_sublinks().
>
> Perhaps, f and v subqueries could be pulled up and then the EXISTS 
> sublink could be replaced with a SEMI JOIN.
>
> I don't know whether we should solve this problem right now. I have 
> never seen such queries in real workloads but it doesn't mean that 
> they don't exist.
>
> In short, I think your patch can be simplified.
>
>
I need to check it again and need some time to look at this.
Thank you for your feedback and opinion. I'll fix it and provide new 
polished version and happy to hear your further review!

-- 
-----------
Best regards,
Alena Rybakina
YandexCloud


view thread (22+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox