Hello Alena!
Sorry for my long silence, if you are interested in my thoughts
about your patch then here they are:
1) First of all, let's read the comment below.
/*
* Separate out the WHERE clause. (We could theoretically also
remove
* top-level plain JOIN/ON clauses, but it's probably not worth
the
* trouble.)
*/
We need to separate two things: the jointree and the WHERE clause,
so it's possible to do something like this
whereClause = subselect->jointree->quals;
subselect->jointree->quals = NULL;
jointree = subselect->jointree;
subselect->jointree = NULL;
if (contain_vars_of_level((Node *) subselect, 1))
return NULL;
/* Do our checks in the jointree and stop if we can't do pullup */
/* Return the jointree back */
subselect->jointree = jointree;
I think it's more clear than in the patch right now.
2) We don't need to use get_relids_in_jointree() and nullable_above since the tree's traversing is from the top to the bottom and we know that
in LEFT JOIN and FULL JOIN LHS is nullable
in RIGHT JOIN and FULL JOIN RHS is nullable
So we can use a boolean variables like this
rarg_is_nullable = (is_nullable_side ||
j->jointype == JOIN_FULL ||
j->jointype == JOIN_LEFT);
larg_is_nullable = (is_nullable_side ||
j->jointype == JOIN_FULL ||
j->jointype == JOIN_RIGHT);
And then work with them. I suspect it will be much easier to
follow that in the patch right now.
I fear that you don't check FULL JOINS here.
3) To be honest, we just work with the top jointree, we don't
descend to subqueries, therefore, I am not sure that the mutator
is a good name here.
AFAICS, mutators were designed to modify something including
some parts of the subqueries but it's not the case here.
It's a simple jointree traversal, we don't need
HoistJoinQualsContext as well.
I think, we need three things here:
Node *node - the node in the jointree we are working with
bool is_nullable_side - are we on the nullable side of some
outer join
List **exprs - the list in which we collect JoinExpr and
FromExpr with outer references.
I propose just traverse the jointree, make our checks and then
collect JoinExpr and FromExpr for the further processing if
everything is good.
4) After checking the WHERE clause and the jointree we can
traverse our list, make a new whereClause by appending quals with
outer references and then replace quals in JoinExprs to constant
true.
Something like make_and_qual()
5) I have also noticed that you are using canonicalize_qual()
I suspect we don't need it either since it will be called
later. And here is the stack
subquery_planner
pull_up_sublinks(root)
preprocess_qual_conditions(root, (Node *)
parse->jointree)
j->quals = preprocess_expression(root, j->quals,
EXPRKIND_QUAL)
expr = (Node *) canonicalize_qual((Expr *) expr,
false)
find_duplicate_ors(qual, is_check)
So we will flatten all nested BoolExpr with the type BOOL_AND.
6) I have noticed the new output from one regression test. The
previous output was
Merge Anti Join
Merge Cond: (t1.c1 = t2.c2)
-> Sort
Sort Key: t1.c1
-> Seq Scan on tt4x t1
-> Sort
and the new output is the Hash Anti Join
Hash Anti Join
Hash Cond: (t1.c1 = t2.c2)
-> Seq Scan on tt4x t1
-> Hash
This is very suspicious, why does this patch cause such changes?
7) There is a SubLink which won't be pulled up, you can see it
below
explain (COSTS OFF)
SELECT *
FROM tenk1 A
WHERE EXISTS (SELECT 1
FROM tenk2 B, (SELECT f.hundred
FROM (SELECT A.hundred) AS f
) AS v
WHERE B.hundred = v.hundred
);
Feel free to review my patch!
-----------
Best regards,
Yandex Cloud
Alena Rybakina