public inbox for [email protected]  
help / color / mirror / Atom feed
From: Petr Petrov <[email protected]>
To: Alena Rybakina <[email protected]>
Cc: Ilia Evdokimov <[email protected]>
Cc: David Rowley <[email protected]>
Cc: Ranier Vilela <[email protected]>
Cc: solaimurugan vellaipandiyan <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date: Fri, 8 May 2026 20:27:48 +0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAHEL7KS2euWWoXYUL_NnaFc5j7QggobmfYFWDm-eTJFYONdRQA@mail.gmail.com>
References: <[email protected]>
	<CAEudQAoD707uh5Pjpg5NMyF-QO=fzajA+BmtcoqQAeXN1C+TkQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAHEL7KS2euWWoXYUL_NnaFc5j7QggobmfYFWDm-eTJFYONdRQA@mail.gmail.com>

Hello Alena!

Sorry for my long silence, if you are interested in my thoughts about 
your patch then here they are:

1)  First of all, let's read the comment below.

/*
  * Separate out the WHERE clause.  (We could theoretically also remove
  * top-level plain JOIN/ON clauses, but it's probably not worth the
  * trouble.)
  */

We need to separate two things: the jointree and the WHERE clause, so 
it's possible to do something like this

whereClause = subselect->jointree->quals;
subselect->jointree->quals = NULL;

jointree = subselect->jointree;
subselect->jointree = NULL;

if (contain_vars_of_level((Node *) subselect, 1))
     return NULL;

/* Do our checks in the jointree and stop if we can't do pullup */

/* Return the jointree back */
subselect->jointree = jointree;

I think it's more clear than in the patch right now.

2) We don't need to use get_relids_in_jointree() and nullable_above 
since the tree's traversing is from the top to the bottom and we know that

      in LEFT JOIN and FULL JOIN LHS is nullable
      in RIGHT JOIN and FULL JOIN RHS  is nullable

So we can use a boolean variables like this

rarg_is_nullable = (is_nullable_side ||
                                   j->jointype == JOIN_FULL ||
                                   j->jointype == JOIN_LEFT);

larg_is_nullable = (is_nullable_side ||
                                  j->jointype == JOIN_FULL ||
                                  j->jointype == JOIN_RIGHT);

And then work with them. I suspect it will be much easier to follow that 
in the patch right now.

I fear that you don't check FULL JOINS here.

3) To be honest, we just work with the top jointree, we don't descend to 
subqueries, therefore, I am not sure that the mutator is a good name here.

     AFAICS, mutators were designed to modify something including some 
parts of the subqueries but it's not the case here.

     It's a simple jointree traversal, we don't need 
HoistJoinQualsContext as well.

     I think, we need three things here:

     Node *node - the node in the jointree we are working with

     bool is_nullable_side - are we on the nullable side of some outer join

     List **exprs - the list in which we collect JoinExpr and FromExpr 
with outer references.

     I propose just traverse the jointree, make our checks and then 
collect JoinExpr and FromExpr for the further processing if everything 
is good.

4) After checking the WHERE clause and the jointree we can traverse our 
list, make a new whereClause by appending quals with outer references 
and then replace quals in JoinExprs to constant true.

     Something like make_and_qual()

5) I have also noticed that you are using canonicalize_qual()

     I suspect we don't need it either since it will be called later. 
And here is the stack

     subquery_planner
        pull_up_sublinks(root)
        preprocess_qual_conditions(root, (Node *) parse->jointree)
            j->quals = preprocess_expression(root, j->quals, EXPRKIND_QUAL)
               expr = (Node *) canonicalize_qual((Expr *) expr, false)
                  find_duplicate_ors(qual, is_check)

    So we will flatten all nested BoolExpr with the type BOOL_AND. Let's 
allow Postgres do its job.

6) I have noticed the new output from one regression test. The previous 
output was

     Merge Anti Join
       Merge Cond: (t1.c1 = t2.c2)
        ->  Sort
                Sort Key: t1.c1
                ->  Seq Scan on tt4x t1
        ->  Sort

    and the new output is the Hash Anti Join

    Hash Anti Join
        Hash Cond: (t1.c1 = t2.c2)
         ->  Seq Scan on tt4x t1
         ->  Hash

   This is very suspicious, why does this patch cause such changes?

7) There is a SubLink which won't be pulled up, you can see it below

explain (COSTS OFF)
SELECT *
   FROM tenk1 A
  WHERE EXISTS (SELECT 1
                                 FROM tenk2 B, (SELECT f.hundred
  FROM (SELECT A.hundred) AS f
                                                            ) AS v
                              WHERE B.hundred = v.hundred
               );

And the reason is the following. (SELECT A.hundred) is the subquery and 
won't be processed until the function  pull_up_sublinks() finishes its job.

But it lead us to an interesting observation. Maybe we could process it 
somewhere in the SS_process_sublinks().

Perhaps, f and v subqueries could be pulled up and then the EXISTS 
sublink could be replaced with a SEMI JOIN.

I don't know whether we should solve this problem right now. I have 
never seen such queries in real workloads but it doesn't mean that they 
are not exists.

In short, I think your patch can be simplified.

On 6/26/25 18:40, solaimurugan vellaipandiyan wrote:
> In all tested cases, query results remained correct and I did not
> observe incorrect transformations during my testing.
>
> Overall the patch behavior looks good from my side and the planner now
> behaves consistently for these EXISTS pull-up scenarios.
>
> Regards,
> Solaimurugan V

Sincerely yours.

Peter Petrov.


view thread (22+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox