public inbox for [email protected]help / color / mirror / Atom feed
Re: Convert NOT IN sublinks to anti-joins when safe 12+ messages / 4 participants [nested] [flat]
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-02-04 14:59 ` David Geier <[email protected]> 2026-02-05 06:09 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 1 sibling, 1 reply; 12+ messages in thread From: David Geier @ 2026-02-04 14:59 UTC (permalink / raw) To: Richard Guo <[email protected]>; Pg Hackers <[email protected]> Hi! On 04.02.2026 10:47, Richard Guo wrote: > On Tue, Feb 3, 2026 at 4:12 PM Richard Guo <[email protected]> wrote: >> This topic has been discussed several times in the past. Due to the >> semantic mismatch regarding NULL handling, NOT IN is not ordinarily >> equivalent to an anti-join. However, if we can prove that neither the >> outer expressions nor the subquery outputs can yield NULL values, it >> should be safe to convert NOT IN to an anti-join. I used to play around with query rewrites of the same form, turning NOT IN into NOT EXISTS. As you rightfully pointed out, the straight forward rewrite only works if neither the outer expression nor the sub-query cannot yield NULLs, limiting the optimization considerably. Both cases can be fixed by amending the basic form of the rewrite. Basic form: SELECT t1.c1 FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c1 FROM t2) => SELECT t1.c1 FROM t1 WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) If the sub-select can yield NULLs, the rewrite can be fixed by adding an OR t2.c1 IS NULL clause, such as: SELECT t1.c1 FROM t1 WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL) which is equivalent to the following SQL which avoids the OR: SELECT t1.c1 FROM t1 WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL) If the outer expression can yield NULLs, the rewrite can be fixed by adding a t1.c1 IS NOT NULL clause, such as: SELECT t1.c1 FROM T1 WHERE t1.c1 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) Both fixes can of course be combined yielding: SELECT t1.c1 FROM T1 WHERE t1.c1 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL) What's our today's take on doing more involved transformations inside the planner to support such cases? It would greatly open up the scope of the optimization. > I've noticed a loose end in the v1 patch. > > The semantic gap between NOT IN and anti-join actually exists whenever > the operator returns NULL. For NOT IN, if (A op B) returns NULL, then > NOT (NULL) evaluates to NULL (effectively false), and the row is > discarded. In contrast, for an anti-join, if (A op B) returns NULL, > it implies no match was found, and the anti-join logic dictates that > the row should be kept. > > To guarantee that (A op B) never returns NULL, the current patch > verifies that both A and B are non-nullable. However, this is not > sufficient. The "op" might be an operator that returns NULL on > non-null inputs. > > On the other hand, if "op" does not return NULL on NULL inputs, like > IS DISTINCT FROM, we technically would not even need to require that A > and B are non-nullable. > > Is there a convenient way to verify that an operator never returns > NULL on non-null inputs? Would it be sufficient to insist that the > operator belongs to btree opclass (assuming that the strict ordering > requirements of btree imply this safety)? There's lots of code that e.g. calls FunctionCall2[Coll]() on operators, where that function raises an ERROR in case the returned value is NULL. Hence, I would say it's safe to make that assumption. Otherwise, you could restrict the check to only builtin operators. For those we know they have the expected semantics. -- David Geier ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-02-04 14:59 ` Re: Convert NOT IN sublinks to anti-joins when safe David Geier <[email protected]> @ 2026-02-05 06:09 ` Richard Guo <[email protected]> 2026-02-05 07:29 ` Re: Convert NOT IN sublinks to anti-joins when safe wenhui qiu <[email protected]> 2026-03-04 09:33 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 0 siblings, 2 replies; 12+ messages in thread From: Richard Guo @ 2026-02-05 06:09 UTC (permalink / raw) To: David Geier <[email protected]>; +Cc: Pg Hackers <[email protected]> On Wed, Feb 4, 2026 at 11:59 PM David Geier <[email protected]> wrote: > If the sub-select can yield NULLs, the rewrite can be fixed by adding an > OR t2.c1 IS NULL clause, such as: > > SELECT t1.c1 FROM t1 WHERE > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL) I'm not sure if this rewrite results in a better plan. The OR clause would force a nested loop join, which could be much slower than a hashed-subplan plan. > If the outer expression can yield NULLs, the rewrite can be fixed by > adding a t1.c1 IS NOT NULL clause, such as: > > SELECT t1.c1 FROM T1 WHERE > t1.c1 IS NOT NULL AND > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) This rewrite doesn't seem correct to me. If t2 is empty, you would incorrectly lose the NULL rows from t1 in the final result. > What's our today's take on doing more involved transformations inside > the planner to support such cases? It would greatly open up the scope of > the optimization. As mentioned in my initial email, the goal of this patch is not to handle every possible case, but rather only to handle the basic form where both sides of NOT IN are provably non-nullable. This keeps the code complexity to a minimum, and I believe this would cover the most common use cases in real world. - Richard ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-02-04 14:59 ` Re: Convert NOT IN sublinks to anti-joins when safe David Geier <[email protected]> 2026-02-05 06:09 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> @ 2026-02-05 07:29 ` wenhui qiu <[email protected]> 1 sibling, 0 replies; 12+ messages in thread From: wenhui qiu @ 2026-02-05 07:29 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: David Geier <[email protected]>; Pg Hackers <[email protected]> HI Richard > As mentioned in my initial email, the goal of this patch is not to > handle every possible case, but rather only to handle the basic form > where both sides of NOT IN are provably non-nullable. This keeps the > code complexity to a minimum, and I believe this would cover the most > common use cases in real world. Agree +1 ,The current path already covers common scenarios and is no less comprehensive than other databases.I'm already quite pleased that it can be merged. Having tested a certain widely used open-source database, I found it unable to process the following query: `SELECT * FROM join1 WHERE id NOT IN (SELECT id FROM join2 WHERE id IS NOT NULL);` Note that join2 allows null values for id. Thanks On Thu, Feb 5, 2026 at 2:09 PM Richard Guo <[email protected]> wrote: > On Wed, Feb 4, 2026 at 11:59 PM David Geier <[email protected]> wrote: > > If the sub-select can yield NULLs, the rewrite can be fixed by adding an > > OR t2.c1 IS NULL clause, such as: > > > > SELECT t1.c1 FROM t1 WHERE > > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1 OR t2.c1 IS NULL) > > I'm not sure if this rewrite results in a better plan. The OR clause > would force a nested loop join, which could be much slower than a > hashed-subplan plan. > > > If the outer expression can yield NULLs, the rewrite can be fixed by > > adding a t1.c1 IS NOT NULL clause, such as: > > > > SELECT t1.c1 FROM T1 WHERE > > t1.c1 IS NOT NULL AND > > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) > > This rewrite doesn't seem correct to me. If t2 is empty, you would > incorrectly lose the NULL rows from t1 in the final result. > > > What's our today's take on doing more involved transformations inside > > the planner to support such cases? It would greatly open up the scope of > > the optimization. > > As mentioned in my initial email, the goal of this patch is not to > handle every possible case, but rather only to handle the basic form > where both sides of NOT IN are provably non-nullable. This keeps the > code complexity to a minimum, and I believe this would cover the most > common use cases in real world. > > - Richard > > > ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-02-04 14:59 ` Re: Convert NOT IN sublinks to anti-joins when safe David Geier <[email protected]> 2026-02-05 06:09 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> @ 2026-03-04 09:33 ` Richard Guo <[email protected]> 1 sibling, 0 replies; 12+ messages in thread From: Richard Guo @ 2026-03-04 09:33 UTC (permalink / raw) To: David Geier <[email protected]>; +Cc: Pg Hackers <[email protected]> On Mon, Mar 2, 2026 at 9:50 PM David Geier <[email protected]> wrote: > The very last rewrite combines both cases. The rewritten query then > looks like: > > SELECT t1.c1 FROM T1 WHERE > t1.c1 IS NOT NULL AND > NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND > NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL) I'm still not convinced this rewrite is correct. As I mentioned earlier, it breaks down if t2 is empty while t1 contains NULL rows. For example: CREATE TABLE t1 (c1 int); CREATE TABLE t2 (c1 int); INSERT INTO t1 VALUES (1), (NULL); SELECT t1.c1 FROM t1 WHERE t1.c1 NOT IN (SELECT t2.c1 FROM t2); c1 ---- 1 (2 rows) SELECT t1.c1 FROM T1 WHERE t1.c1 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM t2 WHERE t1.c1 = t2.c1) AND NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c1 IS NULL); c1 ---- 1 (1 row) > Seems reasonable to start with the non-NULL variant, though there are > certainly cases where there's no PK / unique index on the relevant columns. Yeah. I don't know how to optimize nullable NOT IN clauses. It seems quite difficult to handle safely purely via query transformations. Maybe we can explore adding a dedicated Null-Aware Anti-Join execution node, much like Oracle's approach. But that is definitely beyond the scope of this current patch. - Richard ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-04 09:52 ` Richard Guo <[email protected]> 2026-03-05 01:57 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> 1 sibling, 1 reply; 12+ messages in thread From: Richard Guo @ 2026-03-04 09:52 UTC (permalink / raw) To: Pg Hackers <[email protected]> On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <[email protected]> wrote: > On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <[email protected]> wrote: > > Attached is the updated patch, which adds the check requiring the > > operator to be a member of a btree or hash opfamily. > Attached is another updated patch rebased on current master, with the > addition of support for RowCompareExpr to handle multi-column ordering > operations; otherwise unchanged. Attached is another updated patch rebased on current master, with some minor cosmetic adjustments; nothing essential has changed. - Richard Attachments: [application/octet-stream] v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (61.4K, 2-v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From cf0763dee9149b9f6f7d5cd3770b2b6bdf46d8a9 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v4] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 166 +++++++- src/backend/optimizer/prep/prepjointree.c | 72 +++- src/backend/optimizer/util/clauses.c | 384 ++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1272 insertions(+), 66 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index d7f3cedf3d5..299b3354f6d 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,141 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the three standard parser representations for ANY sublinks: a + * single OpExpr for single-column comparisons, a BoolExpr containing a list of + * OpExprs for multi-column equality or inequality checks (where equality + * becomes an AND and inequality becomes an OR), and a RowCompareExpr for + * multi-column ordering checks. In all cases, we validate the operators and + * the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + ListCell *lc; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison and + * does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (is_andclause(testexpr) || is_orclause(testexpr)) + { + /* multi-column equality or inequality checks */ + BoolExpr *bexpr = (BoolExpr *) testexpr; + + foreach(lc, bexpr->args) + { + OpExpr *opexpr = (OpExpr *) lfirst(lc); + + if (!IsA(opexpr, OpExpr)) + return false; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else if (IsA(testexpr, RowCompareExpr)) + { + /* multi-column ordering checks */ + RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; + + foreach(lc, rcexpr->opnos) + { + Oid opno = lfirst_oid(lc); + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opno)) + return false; + } + + outer_exprs = list_concat(outer_exprs, rcexpr->largs); + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach(lc, outer_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * Note: It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c90f4b32733..b2beb0a0d68 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -852,14 +852,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -885,7 +886,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -970,14 +971,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) @@ -3706,6 +3761,13 @@ has_notnull_forced_var(PlannerInfo *root, List *forced_null_vars, rte = rt_fetch(varno, root->parse->rtable); + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + { + bms_free(forcednullattnums); + continue; + } + /* * We must skip inheritance parent tables, as some child tables may * have a NOT NULL constraint for a column while others may not. This diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index a41d81734cf..c47c9da4a9b 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,230 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + ListCell *tl; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach(tl, query->targetList) + { + TargetEntry *tle = (TargetEntry *) lfirst(tl); + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the given + * expression, and to the quals found by find_subquery_safe_quals, to + * avoid being fooled by join aliases. + * + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Similarly, we must apply flatten_group_exprs to the given + * expression to avoid being fooled by grouping Vars. We do not need + * to apply flatten_group_exprs to the quals found by + * find_subquery_safe_quals though, as grouping Vars cannot appear in + * jointree quals. + * + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + ListCell *lc; + + /* All elements of the FROM list are allowable */ + foreach(lc, f->fromlist) + find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -2739,7 +2969,8 @@ eval_const_expressions_mutator(Node *node, if (!has_nullable_nonconst && !expr_is_nonnullable(context->root, - (Expr *) lfirst(arg), false)) + (Expr *) lfirst(arg), + NOTNULL_SOURCE_HASHTABLE)) has_nullable_nonconst = true; } } @@ -3418,7 +3649,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3612,7 +3844,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3664,7 +3896,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -3749,7 +3982,9 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } - if (arg && expr_is_nonnullable(context->root, (Expr *) arg, false)) + if (arg && + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { /* * If arg is proven non-nullable, simplify to boolean @@ -4384,14 +4619,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4406,35 +4638,89 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4444,16 +4730,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4463,7 +4755,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4480,7 +4772,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4495,7 +4787,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4511,13 +4803,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4565,7 +4857,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 1913b009d40..f10948483b9 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index 3d27a019609..685485148ac 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..200236a0a69 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,442 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t2.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Nested Loop Anti Join + Join Filter: (ROW(not_null_tab.id, not_null_tab.val) > ROW(not_null_tab_1.id, not_null_tab_1.val)) + -> Seq Scan on not_null_tab + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4cd016f4ac3 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,188 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 77e3c04144e..e6f896d663b 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1788,6 +1788,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-03-04 09:52 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> @ 2026-03-05 01:57 ` Japin Li <[email protected]> 2026-03-05 03:40 ` Re: Convert NOT IN sublinks to anti-joins when safe wenhui qiu <[email protected]> 2026-03-08 08:16 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 0 siblings, 2 replies; 12+ messages in thread From: Japin Li @ 2026-03-05 01:57 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]> Hi, Richard On Wed, 04 Mar 2026 at 18:52, Richard Guo <[email protected]> wrote: > On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <[email protected]> wrote: >> On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <[email protected]> wrote: >> > Attached is the updated patch, which adds the check requiring the >> > operator to be a member of a btree or hash opfamily. > >> Attached is another updated patch rebased on current master, with the >> addition of support for RowCompareExpr to handle multi-column ordering >> operations; otherwise unchanged. > > Attached is another updated patch rebased on current master, with some > minor cosmetic adjustments; nothing essential has changed. > Thank you for working on this! Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 299b3354f6d..0d31861da7f 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1484,7 +1484,6 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) { Node *testexpr = sublink->testexpr; List *outer_exprs = NIL; - ListCell *lc; /* Punt if sublink is not in the expected format */ if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) @@ -1514,10 +1513,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) /* multi-column equality or inequality checks */ BoolExpr *bexpr = (BoolExpr *) testexpr; - foreach(lc, bexpr->args) + foreach_ptr(OpExpr, opexpr, bexpr->args) { - OpExpr *opexpr = (OpExpr *) lfirst(lc); - if (!IsA(opexpr, OpExpr)) return false; @@ -1537,10 +1534,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) /* multi-column ordering checks */ RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; - foreach(lc, rcexpr->opnos) + foreach_oid(opno, rcexpr->opnos) { - Oid opno = lfirst_oid(lc); - /* verify operator safety; see comment above */ if (!op_is_safe_index_member(opno)) return false; @@ -1566,10 +1561,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); /* Check that every outer expression is non-nullable */ - foreach(lc, outer_exprs) + foreach_ptr(Expr, expr, outer_exprs) { - Expr *expr = (Expr *) lfirst(lc); - /* * We have already collected relation-level not-null constraints for * the outer query, so we can consult the global hash table for diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index c47c9da4a9b..3f3baf2149a 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2052,7 +2052,6 @@ query_outputs_are_not_nullable(Query *query) List *safe_quals = NIL; List *nonnullable_vars = NIL; bool computed_nonnullable_vars = false; - ListCell *tl; /* * If the query contains set operations, punt. The set ops themselves @@ -2083,9 +2082,8 @@ query_outputs_are_not_nullable(Query *query) /* * Examine each targetlist entry to prove that it can't produce NULL. */ - foreach(tl, query->targetList) + foreach_ptr(TargetEntry, tle, query->targetList) { - TargetEntry *tle = (TargetEntry *) lfirst(tl); Expr *expr = tle->expr; /* Resjunk columns can be ignored: they don't produce output values */ @@ -2194,11 +2192,10 @@ find_subquery_safe_quals(Node *jtnode, List **safe_quals) else if (IsA(jtnode, FromExpr)) { FromExpr *f = (FromExpr *) jtnode; - ListCell *lc; /* All elements of the FROM list are allowable */ - foreach(lc, f->fromlist) - find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + foreach_ptr(Node, node, f->fromlist) + find_subquery_safe_quals(node, safe_quals); /* ... and its WHERE quals are too */ if (f->quals) *safe_quals = lappend(*safe_quals, f->quals); > - Richard > > [2. text/x-diff; v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch]... -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd. ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-03-04 09:52 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-05 01:57 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> @ 2026-03-05 03:40 ` wenhui qiu <[email protected]> 1 sibling, 0 replies; 12+ messages in thread From: wenhui qiu @ 2026-03-05 03:40 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Richard Guo <[email protected]>; Pg Hackers <[email protected]> HI Japin > Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. Thank you review this path , I think using foreach_node should be safe ``` #define foreach_ptr(type, var, lst) foreach_internal(type, *, var, lst, lfirst) #define foreach_int(var, lst) foreach_internal(int, , var, lst, lfirst_int) #define foreach_oid(var, lst) foreach_internal(Oid, , var, lst, lfirst_oid) #define foreach_xid(var, lst) foreach_internal(TransactionId, , var, lst, lfirst_xid) /* * The internal implementation of the above macros. Do not use directly. * * This macro actually generates two loops in order to declare two variables of * different types. The outer loop only iterates once, so we expect optimizing * compilers will unroll it, thereby optimizing it away. */ #define foreach_internal(type, pointer, var, lst, func) \ for (type pointer var = 0, pointer var##__outerloop = (type pointer) 1; \ var##__outerloop; \ var##__outerloop = 0) \ for (ForEachState var##__state = {(lst), 0}; \ (var##__state.l != NIL && \ var##__state.i < var##__state.l->length && \ (var = (type pointer) func(&var##__state.l->elements[var##__state.i]), true)); \ var##__state.i++) /* * foreach_node - * The same as foreach_ptr, but asserts that the element is of the specified * node type. */ #define foreach_node(type, var, lst) \ for (type * var = 0, *var##__outerloop = (type *) 1; \ var##__outerloop; \ var##__outerloop = 0) \ for (ForEachState var##__state = {(lst), 0}; \ (var##__state.l != NIL && \ var##__state.i < var##__state.l->length && \ (var = lfirst_node(type, &var##__state.l->elements[var##__state.i]), true)); \ var##__state.i++) ``` On Thu, Mar 5, 2026 at 9:58 AM Japin Li <[email protected]> wrote: > > Hi, Richard > > On Wed, 04 Mar 2026 at 18:52, Richard Guo <[email protected]> wrote: > > On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <[email protected]> > wrote: > >> On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <[email protected]> > wrote: > >> > Attached is the updated patch, which adds the check requiring the > >> > operator to be a member of a btree or hash opfamily. > > > >> Attached is another updated patch rebased on current master, with the > >> addition of support for RowCompareExpr to handle multi-column ordering > >> operations; otherwise unchanged. > > > > Attached is another updated patch rebased on current master, with some > > minor cosmetic adjustments; nothing essential has changed. > > > > Thank you for working on this! > > Just a quick note: I think `foreach_ptr` is more appropriate here than > `foreach`. > > diff --git a/src/backend/optimizer/plan/subselect.c > b/src/backend/optimizer/plan/subselect.c > index 299b3354f6d..0d31861da7f 100644 > --- a/src/backend/optimizer/plan/subselect.c > +++ b/src/backend/optimizer/plan/subselect.c > @@ -1484,7 +1484,6 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > { > Node *testexpr = sublink->testexpr; > List *outer_exprs = NIL; > - ListCell *lc; > > /* Punt if sublink is not in the expected format */ > if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) > @@ -1514,10 +1513,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > /* multi-column equality or inequality checks */ > BoolExpr *bexpr = (BoolExpr *) testexpr; > > - foreach(lc, bexpr->args) > + foreach_ptr(OpExpr, opexpr, bexpr->args) > { > - OpExpr *opexpr = (OpExpr *) lfirst(lc); > - > if (!IsA(opexpr, OpExpr)) > return false; > > @@ -1537,10 +1534,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > /* multi-column ordering checks */ > RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; > > - foreach(lc, rcexpr->opnos) > + foreach_oid(opno, rcexpr->opnos) > { > - Oid opno = lfirst_oid(lc); > - > /* verify operator safety; see comment above */ > if (!op_is_safe_index_member(opno)) > return false; > @@ -1566,10 +1561,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > flatten_join_alias_vars(root, root->parse, (Node *) > outer_exprs); > > /* Check that every outer expression is non-nullable */ > - foreach(lc, outer_exprs) > + foreach_ptr(Expr, expr, outer_exprs) > { > - Expr *expr = (Expr *) lfirst(lc); > - > /* > * We have already collected relation-level not-null > constraints for > * the outer query, so we can consult the global hash > table for > diff --git a/src/backend/optimizer/util/clauses.c > b/src/backend/optimizer/util/clauses.c > index c47c9da4a9b..3f3baf2149a 100644 > --- a/src/backend/optimizer/util/clauses.c > +++ b/src/backend/optimizer/util/clauses.c > @@ -2052,7 +2052,6 @@ query_outputs_are_not_nullable(Query *query) > List *safe_quals = NIL; > List *nonnullable_vars = NIL; > bool computed_nonnullable_vars = false; > - ListCell *tl; > > /* > * If the query contains set operations, punt. The set ops > themselves > @@ -2083,9 +2082,8 @@ query_outputs_are_not_nullable(Query *query) > /* > * Examine each targetlist entry to prove that it can't produce > NULL. > */ > - foreach(tl, query->targetList) > + foreach_ptr(TargetEntry, tle, query->targetList) > { > - TargetEntry *tle = (TargetEntry *) lfirst(tl); > Expr *expr = tle->expr; > > /* Resjunk columns can be ignored: they don't produce > output values */ > @@ -2194,11 +2192,10 @@ find_subquery_safe_quals(Node *jtnode, List > **safe_quals) > else if (IsA(jtnode, FromExpr)) > { > FromExpr *f = (FromExpr *) jtnode; > - ListCell *lc; > > /* All elements of the FROM list are allowable */ > - foreach(lc, f->fromlist) > - find_subquery_safe_quals((Node *) lfirst(lc), > safe_quals); > + foreach_ptr(Node, node, f->fromlist) > + find_subquery_safe_quals(node, safe_quals); > /* ... and its WHERE quals are too */ > if (f->quals) > *safe_quals = lappend(*safe_quals, f->quals); > > > > - Richard > > > > [2. text/x-diff; > v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch]... > > -- > Regards, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. > > > ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-03-04 09:52 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-05 01:57 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> @ 2026-03-08 08:16 ` Richard Guo <[email protected]> 2026-03-09 04:01 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-09 05:14 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> 1 sibling, 2 replies; 12+ messages in thread From: Richard Guo @ 2026-03-08 08:16 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Pg Hackers <[email protected]> On Thu, Mar 5, 2026 at 10:57 AM Japin Li <[email protected]> wrote: > Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. Agreed. For homogeneous lists, it is better to use foreach_node so we get the additional type-safety assertions in dev builds. For heterogeneous lists, we should use foreach_ptr. Attached is an updated version of the patch that replaces foreach with foreach_ptr or foreach_node accordingly. Nothing else has changed. I plan to do one more round of self-review on this. Unless there are any further thoughts or concerns, I'm hoping to commit this in a week or two. Please let me know if anyone spots anything else I might have missed. - Richard Attachments: [application/octet-stream] v5-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (61.6K, 2-v5-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From 8083043ad59b5f35540ef997fd0f68fdafc3bcc6 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v5] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. The logic for verifying the non-nullability of the subquery outputs was adapted from prior work by David Rowley and Tom Lane. Author: Richard Guo <[email protected]> Reviewed-by: wenhui qiu <[email protected]> Reviewed-by: Zhang Mingli <[email protected]> Reviewed-by: Japin Li <[email protected]> Discussion: https://postgr.es/m/CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 159 +++++++- src/backend/optimizer/prep/prepjointree.c | 72 +++- src/backend/optimizer/util/clauses.c | 381 ++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1262 insertions(+), 66 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index d7f3cedf3d5..0d31861da7f 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,134 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the three standard parser representations for ANY sublinks: a + * single OpExpr for single-column comparisons, a BoolExpr containing a list of + * OpExprs for multi-column equality or inequality checks (where equality + * becomes an AND and inequality becomes an OR), and a RowCompareExpr for + * multi-column ordering checks. In all cases, we validate the operators and + * the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison and + * does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (is_andclause(testexpr) || is_orclause(testexpr)) + { + /* multi-column equality or inequality checks */ + BoolExpr *bexpr = (BoolExpr *) testexpr; + + foreach_ptr(OpExpr, opexpr, bexpr->args) + { + if (!IsA(opexpr, OpExpr)) + return false; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else if (IsA(testexpr, RowCompareExpr)) + { + /* multi-column ordering checks */ + RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; + + foreach_oid(opno, rcexpr->opnos) + { + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opno)) + return false; + } + + outer_exprs = list_concat(outer_exprs, rcexpr->largs); + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach_ptr(Expr, expr, outer_exprs) + { + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * Note: It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c90f4b32733..b2beb0a0d68 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -852,14 +852,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -885,7 +886,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -970,14 +971,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) @@ -3706,6 +3761,13 @@ has_notnull_forced_var(PlannerInfo *root, List *forced_null_vars, rte = rt_fetch(varno, root->parse->rtable); + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + { + bms_free(forcednullattnums); + continue; + } + /* * We must skip inheritance parent tables, as some child tables may * have a NOT NULL constraint for a column while others may not. This diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index a41d81734cf..9852ba5762b 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,227 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach_node(TargetEntry, tle, query->targetList) + { + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the given + * expression, and to the quals found by find_subquery_safe_quals, to + * avoid being fooled by join aliases. + * + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Similarly, we must apply flatten_group_exprs to the given + * expression to avoid being fooled by grouping Vars. We do not need + * to apply flatten_group_exprs to the quals found by + * find_subquery_safe_quals though, as grouping Vars cannot appear in + * jointree quals. + * + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + + /* All elements of the FROM list are allowable */ + foreach_ptr(Node, child_node, f->fromlist) + find_subquery_safe_quals(child_node, safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -2739,7 +2966,8 @@ eval_const_expressions_mutator(Node *node, if (!has_nullable_nonconst && !expr_is_nonnullable(context->root, - (Expr *) lfirst(arg), false)) + (Expr *) lfirst(arg), + NOTNULL_SOURCE_HASHTABLE)) has_nullable_nonconst = true; } } @@ -3418,7 +3646,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3612,7 +3841,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3664,7 +3893,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -3749,7 +3979,9 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } - if (arg && expr_is_nonnullable(context->root, (Expr *) arg, false)) + if (arg && + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { /* * If arg is proven non-nullable, simplify to boolean @@ -4384,14 +4616,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4406,35 +4635,89 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4444,16 +4727,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4463,7 +4752,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4480,7 +4769,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4495,7 +4784,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4511,13 +4800,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4565,7 +4854,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 1913b009d40..f10948483b9 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index b562ca380a8..e8b409afb7f 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..200236a0a69 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,442 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t2.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Nested Loop Anti Join + Join Filter: (ROW(not_null_tab.id, not_null_tab.val) > ROW(not_null_tab_1.id, not_null_tab_1.val)) + -> Seq Scan on not_null_tab + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4cd016f4ac3 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,188 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 3250564d4ff..8080dc16afa 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1788,6 +1788,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-03-04 09:52 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-05 01:57 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> 2026-03-08 08:16 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> @ 2026-03-09 04:01 ` Richard Guo <[email protected]> 1 sibling, 0 replies; 12+ messages in thread From: Richard Guo @ 2026-03-09 04:01 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Pg Hackers <[email protected]> On Sun, Mar 8, 2026 at 5:16 PM Richard Guo <[email protected]> wrote: > Attached is an updated version of the patch that replaces foreach with > foreach_ptr or foreach_node accordingly. Nothing else has changed. > > I plan to do one more round of self-review on this. Unless there are > any further thoughts or concerns, I'm hoping to commit this in a week > or two. Please let me know if anyone spots anything else I might have > missed. I just finished a final self-review and noticed a subtle issue in query_outputs_are_not_nullable(). When preparing a target expression for the non-nullability check, the previous code flattened join aliases Vars before grouping Vars. However, because the parser processes FROM/JOIN clauses before the GROUP BY clause, a grouping Var can actually wrap a join alias Var, not the reverse. So we should flatten grouping Vars first. Attached is an updated patch that fixes that issue. I plan to commit this in a week or two, barring any objections. - Richard Attachments: [application/octet-stream] v6-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (61.6K, 2-v6-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From a13a2ccc5ac62f21f0d324cb74a37ed6418f0ffe Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v6] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. The logic for verifying the non-nullability of the subquery outputs was adapted from prior work by David Rowley and Tom Lane. Author: Richard Guo <[email protected]> Reviewed-by: wenhui qiu <[email protected]> Reviewed-by: Zhang Mingli <[email protected]> Reviewed-by: Japin Li <[email protected]> Discussion: https://postgr.es/m/CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 159 +++++++- src/backend/optimizer/prep/prepjointree.c | 72 +++- src/backend/optimizer/util/clauses.c | 383 ++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1264 insertions(+), 66 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index d7f3cedf3d5..0d31861da7f 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,134 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the three standard parser representations for ANY sublinks: a + * single OpExpr for single-column comparisons, a BoolExpr containing a list of + * OpExprs for multi-column equality or inequality checks (where equality + * becomes an AND and inequality becomes an OR), and a RowCompareExpr for + * multi-column ordering checks. In all cases, we validate the operators and + * the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison and + * does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (is_andclause(testexpr) || is_orclause(testexpr)) + { + /* multi-column equality or inequality checks */ + BoolExpr *bexpr = (BoolExpr *) testexpr; + + foreach_ptr(OpExpr, opexpr, bexpr->args) + { + if (!IsA(opexpr, OpExpr)) + return false; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else if (IsA(testexpr, RowCompareExpr)) + { + /* multi-column ordering checks */ + RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; + + foreach_oid(opno, rcexpr->opnos) + { + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opno)) + return false; + } + + outer_exprs = list_concat(outer_exprs, rcexpr->largs); + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach_ptr(Expr, expr, outer_exprs) + { + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * Note: It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c90f4b32733..b2beb0a0d68 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -852,14 +852,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -885,7 +886,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -970,14 +971,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) @@ -3706,6 +3761,13 @@ has_notnull_forced_var(PlannerInfo *root, List *forced_null_vars, rte = rt_fetch(varno, root->parse->rtable); + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + { + bms_free(forcednullattnums); + continue; + } + /* * We must skip inheritance parent tables, as some child tables may * have a NOT NULL constraint for a column while others may not. This diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index a41d81734cf..8172e19254a 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,229 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach_node(TargetEntry, tle, query->targetList) + { + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must explicitly flatten grouping Vars and join + * aliases in the given expression. Note that flatten_group_exprs + * must be applied before flatten_join_alias_vars, as grouping Vars + * can wrap join aliases. + * + * We must also apply flatten_join_alias_vars to the quals extracted + * by find_subquery_safe_quals. We do not need to apply + * flatten_group_exprs to these quals, though, because grouping Vars + * cannot appear in jointree quals. + */ + + /* + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + + /* All elements of the FROM list are allowable */ + foreach_ptr(Node, child_node, f->fromlist) + find_subquery_safe_quals(child_node, safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -2739,7 +2968,8 @@ eval_const_expressions_mutator(Node *node, if (!has_nullable_nonconst && !expr_is_nonnullable(context->root, - (Expr *) lfirst(arg), false)) + (Expr *) lfirst(arg), + NOTNULL_SOURCE_HASHTABLE)) has_nullable_nonconst = true; } } @@ -3418,7 +3648,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3612,7 +3843,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3664,7 +3895,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -3749,7 +3981,9 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } - if (arg && expr_is_nonnullable(context->root, (Expr *) arg, false)) + if (arg && + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { /* * If arg is proven non-nullable, simplify to boolean @@ -4384,14 +4618,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4406,35 +4637,89 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4444,16 +4729,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4463,7 +4754,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4480,7 +4771,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4495,7 +4786,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4511,13 +4802,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4565,7 +4856,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 1913b009d40..f10948483b9 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index b562ca380a8..e8b409afb7f 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..200236a0a69 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,442 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t2.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Nested Loop Anti Join + Join Filter: (ROW(not_null_tab.id, not_null_tab.val) > ROW(not_null_tab_1.id, not_null_tab_1.val)) + -> Seq Scan on not_null_tab + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4cd016f4ac3 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,188 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 3250564d4ff..8080dc16afa 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1788,6 +1788,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-03-04 09:52 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-05 01:57 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> 2026-03-08 08:16 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> @ 2026-03-09 05:14 ` Japin Li <[email protected]> 2026-03-09 07:38 ` Re: Convert NOT IN sublinks to anti-joins when safe wenhui qiu <[email protected]> 1 sibling, 1 reply; 12+ messages in thread From: Japin Li @ 2026-03-09 05:14 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]> On Sun, 08 Mar 2026 at 17:16, Richard Guo <[email protected]> wrote: > On Thu, Mar 5, 2026 at 10:57 AM Japin Li <[email protected]> wrote: >> Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. > > Agreed. > > For homogeneous lists, it is better to use foreach_node so we get the > additional type-safety assertions in dev builds. For heterogeneous > lists, we should use foreach_ptr. > > Attached is an updated version of the patch that replaces foreach with > foreach_ptr or foreach_node accordingly. Nothing else has changed. > > I plan to do one more round of self-review on this. Unless there are > any further thoughts or concerns, I'm hoping to commit this in a week > or two. Please let me know if anyone spots anything else I might have > missed. > Thanks for updating the patch. LGTM. -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd. ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-03-04 09:52 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-05 01:57 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> 2026-03-08 08:16 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-09 05:14 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> @ 2026-03-09 07:38 ` wenhui qiu <[email protected]> 2026-03-12 02:14 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 0 siblings, 1 reply; 12+ messages in thread From: wenhui qiu @ 2026-03-09 07:38 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Richard Guo <[email protected]>; Pg Hackers <[email protected]> HI > I just finished a final self-review and noticed a subtle issue in > query_outputs_are_not_nullable(). When preparing a target expression > for the non-nullability check, the previous code flattened join > aliases Vars before grouping Vars. However, because the parser > processes FROM/JOIN clauses before the GROUP BY clause, a grouping Var > can actually wrap a join alias Var, not the reverse. So we should > flatten grouping Vars first. > Attached is an updated patch that fixes that issue. I plan to commit > this in a week or two, barring any objections. Thanks for updating the patch. v6 path LGTM. Thanks On Mon, Mar 9, 2026 at 1:14 PM Japin Li <[email protected]> wrote: > On Sun, 08 Mar 2026 at 17:16, Richard Guo <[email protected]> wrote: > > On Thu, Mar 5, 2026 at 10:57 AM Japin Li <[email protected]> wrote: > >> Just a quick note: I think `foreach_ptr` is more appropriate here than > `foreach`. > > > > Agreed. > > > > For homogeneous lists, it is better to use foreach_node so we get the > > additional type-safety assertions in dev builds. For heterogeneous > > lists, we should use foreach_ptr. > > > > Attached is an updated version of the patch that replaces foreach with > > foreach_ptr or foreach_node accordingly. Nothing else has changed. > > > > I plan to do one more round of self-review on this. Unless there are > > any further thoughts or concerns, I'm hoping to commit this in a week > > or two. Please let me know if anyone spots anything else I might have > > missed. > > > > Thanks for updating the patch. LGTM. > > -- > Regards, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. > > > ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe 2026-03-04 09:52 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-05 01:57 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> 2026-03-08 08:16 ` Re: Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-03-09 05:14 ` Re: Convert NOT IN sublinks to anti-joins when safe Japin Li <[email protected]> 2026-03-09 07:38 ` Re: Convert NOT IN sublinks to anti-joins when safe wenhui qiu <[email protected]> @ 2026-03-12 02:14 ` Richard Guo <[email protected]> 0 siblings, 0 replies; 12+ messages in thread From: Richard Guo @ 2026-03-12 02:14 UTC (permalink / raw) To: wenhui qiu <[email protected]>; +Cc: Japin Li <[email protected]>; Pg Hackers <[email protected]> On Mon, Mar 9, 2026 at 4:38 PM wenhui qiu <[email protected]> wrote: > Thanks for updating the patch. v6 path LGTM. Thanks! Committed with a few minor cosmetic tweaks. - Richard ^ permalink raw reply [nested|flat] 12+ messages in thread
end of thread, other threads:[~2026-03-12 02:14 UTC | newest] Thread overview: 12+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-02-04 14:59 ` David Geier <[email protected]> 2026-02-05 06:09 ` Richard Guo <[email protected]> 2026-02-05 07:29 ` wenhui qiu <[email protected]> 2026-03-04 09:33 ` Richard Guo <[email protected]> 2026-03-04 09:52 ` Richard Guo <[email protected]> 2026-03-05 01:57 ` Japin Li <[email protected]> 2026-03-05 03:40 ` wenhui qiu <[email protected]> 2026-03-08 08:16 ` Richard Guo <[email protected]> 2026-03-09 04:01 ` Richard Guo <[email protected]> 2026-03-09 05:14 ` Japin Li <[email protected]> 2026-03-09 07:38 ` wenhui qiu <[email protected]> 2026-03-12 02:14 ` Richard Guo <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox