public inbox for [email protected]help / color / mirror / Atom feed
Convert NOT IN sublinks to anti-joins when safe 13+ messages / 3 participants [nested] [flat]
* Convert NOT IN sublinks to anti-joins when safe @ 2026-02-03 07:12 Richard Guo <[email protected]> 0 siblings, 2 replies; 13+ messages in thread From: Richard Guo @ 2026-02-03 07:12 UTC (permalink / raw) To: Pg Hackers <[email protected]> This topic has been discussed several times in the past. Due to the semantic mismatch regarding NULL handling, NOT IN is not ordinarily equivalent to an anti-join. However, if we can prove that neither the outer expressions nor the subquery outputs can yield NULL values, it should be safe to convert NOT IN to an anti-join. I believe we are now in a much better position to attempt this again. The planner has accumulated significant infrastructure that makes this proof straightforward and reliable. Specifically, we can now leverage the outer-join-aware-Var infrastructure to tell whether a Var comes from the nullable side of an outer join, and the not-null-attnums hash table to efficiently check whether a Var is defined NOT NULL. We also have the expr_is_nonnullable() function that is smart enough to deduce non-nullability for expressions more complex than simple Vars/Consts. Attached is a draft patch for this attempt (part of the code is adapted from an old patch [1] by David and Tom). This patch aims for a conservative implementation: the goal is not to handle every theoretical case, but to handle canonical query patterns with minimal code complexity. The patch primarily targets patterns like: SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users); ... and SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users WHERE user_id IS NOT NULL); This is a very typical syntax for exclusion. In well-modeled databases, join keys like id and user_id are very likely to be defined as NOT NULL. It seems to me that the ROI here is quite positive: the added code complexity is very low (thanks to the existing infrastructure), while the benefit is that users writing this typical pattern will finally get efficient anti-join plans without needing manual rewrites. (For the outer expressions, we could potentially also use outer query quals to prove non-nullability. This patch does not attempt to do so. Implementing this would require passing state down during the pull_up_sublinks recursion; and given that find_nonnullable_vars can fail to prove non-nullability in many cases due to the lack of const-simplification at this stage, I'm not sure whether it is worth the code complexity. Besides, I haven't fully convinced myself that doing this does not introduce correctness issues.) Any thoughts? [1] https://postgr.es/m/[email protected] - Richard Attachments: [application/octet-stream] v1-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (53.3K, 2-v1-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From ed2f764dcd08d23cf19dd363fd735ea1ff5c27f6 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v1] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if either operand is NULL, the result is NULL (effectively false), causing the row to be discarded. In contrast, an anti-join returns true (row preserved) if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and performs the conversion when safety is proven. For a Var to be guaranteed non-nullable, it must be defined as NOT NULL in the schema and not come from the nullable side of an outer join, or it must be forced non-nullable by proper qual clauses. This patch leverages the outer-join-aware-Var infrastructure to detect if a Var comes from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema definitions. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by given qual clauses and expr_is_nonnullable to deduce non-nullability for other expression types. --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 128 +++++++- src/backend/optimizer/prep/prepjointree.c | 65 +++- src/backend/optimizer/util/clauses.c | 372 ++++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 27 ++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 1 + src/test/regress/expected/subselect.out | 376 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 151 +++++++++ 12 files changed, 1077 insertions(+), 64 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index e9dc9d31f05..6f0c47602ff 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_outerexprs_are_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,18 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, then the logic is identical and it's safe + * to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_outerexprs_are_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1444,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1457,104 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_outerexprs_are_not_nullable: verify that the outer expressions + * involved in an ANY_SUBLINK testexpr are provably non-nullable. + * + * We handle the two standard parser representations for ANY sublinks: a single + * OpExpr for single-column comparisons, and a BoolExpr containing a list of + * OpExprs for multi-column comparisons. In both cases, we collect the outer + * expressions (which are the first arguments of the OpExprs) and verify their + * non-nullability. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_outerexprs_are_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + ListCell *lc; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + Assert(list_length(opexpr->args) == 2); + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (IsA(testexpr, BoolExpr)) + { + /* multi-column comparison (AND clause of OpExprs) */ + BoolExpr *b = (BoolExpr *) testexpr; + + if (b->boolop != AND_EXPR) + return false; + + foreach(lc, b->args) + { + OpExpr *opexpr = (OpExpr *) lfirst(lc); + + if (!IsA(opexpr, OpExpr)) + return false; + + Assert(list_length(opexpr->args) == 2); + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach(lc, outer_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * XXX It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c80bfc88d82..4cbc167420d 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -848,14 +848,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -881,7 +882,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -966,14 +967,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 32204776c45..c4c89956b52 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,229 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + ListCell *tl; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach(tl, query->targetList) + { + TargetEntry *tle = (TargetEntry *) lfirst(tl); + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the given + * expression, and to the quals found by find_subquery_safe_quals, to + * avoid being fooled by join aliases. + * + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Similarly, we must apply flatten_group_exprs to the given + * expression to avoid being fooled by grouping Vars. We do not need + * to apply flatten_group_exprs to the quals found by + * find_subquery_safe_quals though, as grouping Vars cannot appear in + * jointree quals. + * + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + ListCell *lc; + + /* All elements of the FROM list are allowable */ + foreach(lc, f->fromlist) + find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -3338,7 +3567,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3532,7 +3762,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3584,7 +3814,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -4273,14 +4504,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4295,35 +4523,85 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4333,16 +4611,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4352,7 +4636,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4369,7 +4653,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4384,7 +4668,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4400,13 +4684,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4454,7 +4738,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index b924a2d900b..e790531b0fe 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -1071,6 +1071,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index 3d27a019609..8892c92ad37 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..36ab7987392 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -98,6 +98,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..123110a7fa2 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,379 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..75e58cc9b5f 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,154 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +ROLLBACK; -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-02-03 09:41 wenhui qiu <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 0 replies; 13+ messages in thread From: wenhui qiu @ 2026-02-03 09:41 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]> Hi Richard > I believe we are now in a much better position to attempt this again. > The planner has accumulated significant infrastructure that makes this > proof straightforward and reliable. Specifically, we can now leverage > the outer-join-aware-Var infrastructure to tell whether a Var comes > from the nullable side of an outer join, and the not-null-attnums hash > table to efficiently check whether a Var is defined NOT NULL. We also > have the expr_is_nonnullable() function that is smart enough to deduce > non-nullability for expressions more complex than simple Vars/Consts. Thank you for working on this.Indeed, the benefits are substantial and highly necessary, as Oracle, SQL Server, and MySQL have all implemented varying degrees of support.I shall test this path in my spare time. Thanks , On Tue, Feb 3, 2026 at 3:13 PM Richard Guo <[email protected]> wrote: > This topic has been discussed several times in the past. Due to the > semantic mismatch regarding NULL handling, NOT IN is not ordinarily > equivalent to an anti-join. However, if we can prove that neither the > outer expressions nor the subquery outputs can yield NULL values, it > should be safe to convert NOT IN to an anti-join. > > I believe we are now in a much better position to attempt this again. > The planner has accumulated significant infrastructure that makes this > proof straightforward and reliable. Specifically, we can now leverage > the outer-join-aware-Var infrastructure to tell whether a Var comes > from the nullable side of an outer join, and the not-null-attnums hash > table to efficiently check whether a Var is defined NOT NULL. We also > have the expr_is_nonnullable() function that is smart enough to deduce > non-nullability for expressions more complex than simple Vars/Consts. > > Attached is a draft patch for this attempt (part of the code is > adapted from an old patch [1] by David and Tom). This patch aims for > a conservative implementation: the goal is not to handle every > theoretical case, but to handle canonical query patterns with minimal > code complexity. > > The patch primarily targets patterns like: > > SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users); > > ... and > > SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM > banned_users WHERE user_id IS NOT NULL); > > This is a very typical syntax for exclusion. In well-modeled > databases, join keys like id and user_id are very likely to be defined > as NOT NULL. > > It seems to me that the ROI here is quite positive: the added code > complexity is very low (thanks to the existing infrastructure), while > the benefit is that users writing this typical pattern will finally > get efficient anti-join plans without needing manual rewrites. > > (For the outer expressions, we could potentially also use outer query > quals to prove non-nullability. This patch does not attempt to do so. > Implementing this would require passing state down during the > pull_up_sublinks recursion; and given that find_nonnullable_vars can > fail to prove non-nullability in many cases due to the lack of > const-simplification at this stage, I'm not sure whether it is worth > the code complexity. Besides, I haven't fully convinced myself that > doing this does not introduce correctness issues.) > > Any thoughts? > > [1] https://postgr.es/m/[email protected] > > - Richard > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-02-04 09:47 Richard Guo <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 1 reply; 13+ messages in thread From: Richard Guo @ 2026-02-04 09:47 UTC (permalink / raw) To: Pg Hackers <[email protected]> On Tue, Feb 3, 2026 at 4:12 PM Richard Guo <[email protected]> wrote: > This topic has been discussed several times in the past. Due to the > semantic mismatch regarding NULL handling, NOT IN is not ordinarily > equivalent to an anti-join. However, if we can prove that neither the > outer expressions nor the subquery outputs can yield NULL values, it > should be safe to convert NOT IN to an anti-join. I've noticed a loose end in the v1 patch. The semantic gap between NOT IN and anti-join actually exists whenever the operator returns NULL. For NOT IN, if (A op B) returns NULL, then NOT (NULL) evaluates to NULL (effectively false), and the row is discarded. In contrast, for an anti-join, if (A op B) returns NULL, it implies no match was found, and the anti-join logic dictates that the row should be kept. To guarantee that (A op B) never returns NULL, the current patch verifies that both A and B are non-nullable. However, this is not sufficient. The "op" might be an operator that returns NULL on non-null inputs. On the other hand, if "op" does not return NULL on NULL inputs, like IS DISTINCT FROM, we technically would not even need to require that A and B are non-nullable. Is there a convenient way to verify that an operator never returns NULL on non-null inputs? Would it be sufficient to insist that the operator belongs to btree opclass (assuming that the strict ordering requirements of btree imply this safety)? And, is it worth checking if an operator never returns NULL even on NULL inputs? If we can identify such operators, we should be able to remove the requirement that both sides of NOT IN must be non-nullable. Is there a convenient way to check for such operators? - Richard ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-02-05 06:51 Richard Guo <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 13+ messages in thread From: Richard Guo @ 2026-02-05 06:51 UTC (permalink / raw) To: Pg Hackers <[email protected]> On Wed, Feb 4, 2026 at 6:47 PM Richard Guo <[email protected]> wrote: > Is there a convenient way to verify that an operator never returns > NULL on non-null inputs? Would it be sufficient to insist that the > operator belongs to btree opclass (assuming that the strict ordering > requirements of btree imply this safety)? I think we can insist that the operator be a member of a btree or hash opfamily. Btree operators must adhere to strict total order, and hash operators must adhere to strict equality; if they return NULL for non-null inputs, the indexes themselves would be corrupt. I'm less confident about other access methods like gist or gin. Their semantics can be more flexible, and using such operators in a NOT IN clause is quite rare. Attached is the updated patch, which adds the check requiring the operator to be a member of a btree or hash opfamily. > And, is it worth checking if an operator never returns NULL even on > NULL inputs? If we can identify such operators, we should be able to > remove the requirement that both sides of NOT IN must be non-nullable. > Is there a convenient way to check for such operators? I don't know how to check for such operators, so I didn't do it in the patch. - Richard Attachments: [application/octet-stream] v2-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (57.8K, 2-v2-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From 65317f719f267be7d88eed191db5ed756f820410 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v2] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is null-safe before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 151 +++++++- src/backend/optimizer/prep/prepjointree.c | 65 +++- src/backend/optimizer/util/clauses.c | 373 +++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 400 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 169 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1186 insertions(+), 64 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index e9dc9d31f05..ff2567eaa78 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,126 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the two standard parser representations for ANY sublinks: a single + * OpExpr for single-column comparisons, and a BoolExpr containing a list of + * OpExprs for multi-column comparisons. In both cases, we validate the + * operators and the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + ListCell *lc; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + Assert(list_length(opexpr->args) == 2); + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison and + * does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (IsA(testexpr, BoolExpr)) + { + /* multi-column comparison (AND clause of OpExprs) */ + BoolExpr *b = (BoolExpr *) testexpr; + + if (b->boolop != AND_EXPR) + return false; + + foreach(lc, b->args) + { + OpExpr *opexpr = (OpExpr *) lfirst(lc); + + if (!IsA(opexpr, OpExpr)) + return false; + + Assert(list_length(opexpr->args) == 2); + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison + * and does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach(lc, outer_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * XXX It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c80bfc88d82..4cbc167420d 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -848,14 +848,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -881,7 +882,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -966,14 +967,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 32204776c45..ed34eda96a3 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,230 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + ListCell *tl; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach(tl, query->targetList) + { + TargetEntry *tle = (TargetEntry *) lfirst(tl); + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the given + * expression, and to the quals found by find_subquery_safe_quals, to + * avoid being fooled by join aliases. + * + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Similarly, we must apply flatten_group_exprs to the given + * expression to avoid being fooled by grouping Vars. We do not need + * to apply flatten_group_exprs to the quals found by + * find_subquery_safe_quals though, as grouping Vars cannot appear in + * jointree quals. + * + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + ListCell *lc; + + /* All elements of the FROM list are allowable */ + foreach(lc, f->fromlist) + find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -3338,7 +3568,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3532,7 +3763,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3584,7 +3815,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -4273,14 +4505,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4295,35 +4524,85 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4333,16 +4612,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4352,7 +4637,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4369,7 +4654,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4384,7 +4669,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4400,13 +4685,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4454,7 +4739,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index b924a2d900b..deb9aa04b72 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index 3d27a019609..685485148ac 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..bd54cdf6ce1 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,403 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4eaa353760a 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,172 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 9f5ee8fd482..8fecc01fb16 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1784,6 +1784,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-02-14 07:37 Richard Guo <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 13+ messages in thread From: Richard Guo @ 2026-02-14 07:37 UTC (permalink / raw) To: Pg Hackers <[email protected]> On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <[email protected]> wrote: > Attached is the updated patch, which adds the check requiring the > operator to be a member of a btree or hash opfamily. Attached is another updated patch rebased on current master, with the addition of support for RowCompareExpr to handle multi-column ordering operations; otherwise unchanged. - Richard Attachments: [application/octet-stream] v3-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (61.4K, 2-v3-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From 3899b811d3f904285fb5095cb34b3be6ab26b632 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v3] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 166 +++++++- src/backend/optimizer/prep/prepjointree.c | 72 +++- src/backend/optimizer/util/clauses.c | 384 ++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1272 insertions(+), 66 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index e9dc9d31f05..21d9a14231c 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,141 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the three standard parser representations for ANY sublinks: a + * single OpExpr for single-column comparisons, a BoolExpr containing a list of + * OpExprs for multi-column equality or inequality checks (where equality + * becomes an AND and inequality becomes an OR), and a RowCompareExpr for + * multi-column ordering checks. In all cases, we validate the operators and + * the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + ListCell *lc; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison + * and does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (is_andclause(testexpr) || is_orclause(testexpr)) + { + /* multi-column equality or inequality checks */ + BoolExpr *b = (BoolExpr *) testexpr; + + foreach(lc, b->args) + { + OpExpr *opexpr = (OpExpr *) lfirst(lc); + + if (!IsA(opexpr, OpExpr)) + return false; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else if (IsA(testexpr, RowCompareExpr)) + { + /* multi-column ordering checks */ + RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; + + foreach(lc, rcexpr->opnos) + { + Oid opno = lfirst_oid(lc); + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opno)) + return false; + } + + outer_exprs = list_concat(outer_exprs, rcexpr->largs); + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach(lc, outer_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * XXX It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c90f4b32733..b2beb0a0d68 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -852,14 +852,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -885,7 +886,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -970,14 +971,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) @@ -3706,6 +3761,13 @@ has_notnull_forced_var(PlannerInfo *root, List *forced_null_vars, rte = rt_fetch(varno, root->parse->rtable); + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + { + bms_free(forcednullattnums); + continue; + } + /* * We must skip inheritance parent tables, as some child tables may * have a NOT NULL constraint for a column while others may not. This diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 504a30d8836..0906b024088 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,230 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + ListCell *tl; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach(tl, query->targetList) + { + TargetEntry *tle = (TargetEntry *) lfirst(tl); + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the given + * expression, and to the quals found by find_subquery_safe_quals, to + * avoid being fooled by join aliases. + * + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Similarly, we must apply flatten_group_exprs to the given + * expression to avoid being fooled by grouping Vars. We do not need + * to apply flatten_group_exprs to the quals found by + * find_subquery_safe_quals though, as grouping Vars cannot appear in + * jointree quals. + * + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + ListCell *lc; + + /* All elements of the FROM list are allowable */ + foreach(lc, f->fromlist) + find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -2739,7 +2969,8 @@ eval_const_expressions_mutator(Node *node, if (!has_nullable_nonconst && !expr_is_nonnullable(context->root, - (Expr *) lfirst(arg), false)) + (Expr *) lfirst(arg), + NOTNULL_SOURCE_HASHTABLE)) has_nullable_nonconst = true; } } @@ -3418,7 +3649,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3612,7 +3844,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3664,7 +3896,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -3749,7 +3982,9 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } - if (arg && expr_is_nonnullable(context->root, (Expr *) arg, false)) + if (arg && + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { /* * If arg is proven non-nullable, simplify to boolean @@ -4384,14 +4619,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4406,35 +4638,89 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4444,16 +4730,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4463,7 +4755,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4480,7 +4772,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4495,7 +4787,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4511,13 +4803,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4565,7 +4857,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index b924a2d900b..deb9aa04b72 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index 3d27a019609..685485148ac 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..200236a0a69 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,442 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t2.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Nested Loop Anti Join + Join Filter: (ROW(not_null_tab.id, not_null_tab.val) > ROW(not_null_tab_1.id, not_null_tab_1.val)) + -> Seq Scan on not_null_tab + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4cd016f4ac3 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,188 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 6e2d876a40f..84dc4cb5543 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1787,6 +1787,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-04 09:52 Richard Guo <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 13+ messages in thread From: Richard Guo @ 2026-03-04 09:52 UTC (permalink / raw) To: Pg Hackers <[email protected]> On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <[email protected]> wrote: > On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <[email protected]> wrote: > > Attached is the updated patch, which adds the check requiring the > > operator to be a member of a btree or hash opfamily. > Attached is another updated patch rebased on current master, with the > addition of support for RowCompareExpr to handle multi-column ordering > operations; otherwise unchanged. Attached is another updated patch rebased on current master, with some minor cosmetic adjustments; nothing essential has changed. - Richard Attachments: [application/octet-stream] v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (61.4K, 2-v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From cf0763dee9149b9f6f7d5cd3770b2b6bdf46d8a9 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v4] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 166 +++++++- src/backend/optimizer/prep/prepjointree.c | 72 +++- src/backend/optimizer/util/clauses.c | 384 ++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1272 insertions(+), 66 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index d7f3cedf3d5..299b3354f6d 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,141 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the three standard parser representations for ANY sublinks: a + * single OpExpr for single-column comparisons, a BoolExpr containing a list of + * OpExprs for multi-column equality or inequality checks (where equality + * becomes an AND and inequality becomes an OR), and a RowCompareExpr for + * multi-column ordering checks. In all cases, we validate the operators and + * the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + ListCell *lc; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison and + * does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (is_andclause(testexpr) || is_orclause(testexpr)) + { + /* multi-column equality or inequality checks */ + BoolExpr *bexpr = (BoolExpr *) testexpr; + + foreach(lc, bexpr->args) + { + OpExpr *opexpr = (OpExpr *) lfirst(lc); + + if (!IsA(opexpr, OpExpr)) + return false; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else if (IsA(testexpr, RowCompareExpr)) + { + /* multi-column ordering checks */ + RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; + + foreach(lc, rcexpr->opnos) + { + Oid opno = lfirst_oid(lc); + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opno)) + return false; + } + + outer_exprs = list_concat(outer_exprs, rcexpr->largs); + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach(lc, outer_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * Note: It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c90f4b32733..b2beb0a0d68 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -852,14 +852,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -885,7 +886,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -970,14 +971,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) @@ -3706,6 +3761,13 @@ has_notnull_forced_var(PlannerInfo *root, List *forced_null_vars, rte = rt_fetch(varno, root->parse->rtable); + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + { + bms_free(forcednullattnums); + continue; + } + /* * We must skip inheritance parent tables, as some child tables may * have a NOT NULL constraint for a column while others may not. This diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index a41d81734cf..c47c9da4a9b 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,230 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + ListCell *tl; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach(tl, query->targetList) + { + TargetEntry *tle = (TargetEntry *) lfirst(tl); + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the given + * expression, and to the quals found by find_subquery_safe_quals, to + * avoid being fooled by join aliases. + * + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Similarly, we must apply flatten_group_exprs to the given + * expression to avoid being fooled by grouping Vars. We do not need + * to apply flatten_group_exprs to the quals found by + * find_subquery_safe_quals though, as grouping Vars cannot appear in + * jointree quals. + * + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + ListCell *lc; + + /* All elements of the FROM list are allowable */ + foreach(lc, f->fromlist) + find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -2739,7 +2969,8 @@ eval_const_expressions_mutator(Node *node, if (!has_nullable_nonconst && !expr_is_nonnullable(context->root, - (Expr *) lfirst(arg), false)) + (Expr *) lfirst(arg), + NOTNULL_SOURCE_HASHTABLE)) has_nullable_nonconst = true; } } @@ -3418,7 +3649,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3612,7 +3844,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3664,7 +3896,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -3749,7 +3982,9 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } - if (arg && expr_is_nonnullable(context->root, (Expr *) arg, false)) + if (arg && + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { /* * If arg is proven non-nullable, simplify to boolean @@ -4384,14 +4619,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4406,35 +4638,89 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4444,16 +4730,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4463,7 +4755,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4480,7 +4772,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4495,7 +4787,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4511,13 +4803,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4565,7 +4857,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 1913b009d40..f10948483b9 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index 3d27a019609..685485148ac 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..200236a0a69 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,442 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t2.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Nested Loop Anti Join + Join Filter: (ROW(not_null_tab.id, not_null_tab.val) > ROW(not_null_tab_1.id, not_null_tab_1.val)) + -> Seq Scan on not_null_tab + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4cd016f4ac3 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,188 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 77e3c04144e..e6f896d663b 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1788,6 +1788,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-05 01:57 Japin Li <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 2 replies; 13+ messages in thread From: Japin Li @ 2026-03-05 01:57 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]> Hi, Richard On Wed, 04 Mar 2026 at 18:52, Richard Guo <[email protected]> wrote: > On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <[email protected]> wrote: >> On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <[email protected]> wrote: >> > Attached is the updated patch, which adds the check requiring the >> > operator to be a member of a btree or hash opfamily. > >> Attached is another updated patch rebased on current master, with the >> addition of support for RowCompareExpr to handle multi-column ordering >> operations; otherwise unchanged. > > Attached is another updated patch rebased on current master, with some > minor cosmetic adjustments; nothing essential has changed. > Thank you for working on this! Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 299b3354f6d..0d31861da7f 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1484,7 +1484,6 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) { Node *testexpr = sublink->testexpr; List *outer_exprs = NIL; - ListCell *lc; /* Punt if sublink is not in the expected format */ if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) @@ -1514,10 +1513,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) /* multi-column equality or inequality checks */ BoolExpr *bexpr = (BoolExpr *) testexpr; - foreach(lc, bexpr->args) + foreach_ptr(OpExpr, opexpr, bexpr->args) { - OpExpr *opexpr = (OpExpr *) lfirst(lc); - if (!IsA(opexpr, OpExpr)) return false; @@ -1537,10 +1534,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) /* multi-column ordering checks */ RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; - foreach(lc, rcexpr->opnos) + foreach_oid(opno, rcexpr->opnos) { - Oid opno = lfirst_oid(lc); - /* verify operator safety; see comment above */ if (!op_is_safe_index_member(opno)) return false; @@ -1566,10 +1561,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); /* Check that every outer expression is non-nullable */ - foreach(lc, outer_exprs) + foreach_ptr(Expr, expr, outer_exprs) { - Expr *expr = (Expr *) lfirst(lc); - /* * We have already collected relation-level not-null constraints for * the outer query, so we can consult the global hash table for diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index c47c9da4a9b..3f3baf2149a 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2052,7 +2052,6 @@ query_outputs_are_not_nullable(Query *query) List *safe_quals = NIL; List *nonnullable_vars = NIL; bool computed_nonnullable_vars = false; - ListCell *tl; /* * If the query contains set operations, punt. The set ops themselves @@ -2083,9 +2082,8 @@ query_outputs_are_not_nullable(Query *query) /* * Examine each targetlist entry to prove that it can't produce NULL. */ - foreach(tl, query->targetList) + foreach_ptr(TargetEntry, tle, query->targetList) { - TargetEntry *tle = (TargetEntry *) lfirst(tl); Expr *expr = tle->expr; /* Resjunk columns can be ignored: they don't produce output values */ @@ -2194,11 +2192,10 @@ find_subquery_safe_quals(Node *jtnode, List **safe_quals) else if (IsA(jtnode, FromExpr)) { FromExpr *f = (FromExpr *) jtnode; - ListCell *lc; /* All elements of the FROM list are allowable */ - foreach(lc, f->fromlist) - find_subquery_safe_quals((Node *) lfirst(lc), safe_quals); + foreach_ptr(Node, node, f->fromlist) + find_subquery_safe_quals(node, safe_quals); /* ... and its WHERE quals are too */ if (f->quals) *safe_quals = lappend(*safe_quals, f->quals); > - Richard > > [2. text/x-diff; v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch]... -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd. ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-05 03:40 wenhui qiu <[email protected]> parent: Japin Li <[email protected]> 1 sibling, 0 replies; 13+ messages in thread From: wenhui qiu @ 2026-03-05 03:40 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Richard Guo <[email protected]>; Pg Hackers <[email protected]> HI Japin > Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. Thank you review this path , I think using foreach_node should be safe ``` #define foreach_ptr(type, var, lst) foreach_internal(type, *, var, lst, lfirst) #define foreach_int(var, lst) foreach_internal(int, , var, lst, lfirst_int) #define foreach_oid(var, lst) foreach_internal(Oid, , var, lst, lfirst_oid) #define foreach_xid(var, lst) foreach_internal(TransactionId, , var, lst, lfirst_xid) /* * The internal implementation of the above macros. Do not use directly. * * This macro actually generates two loops in order to declare two variables of * different types. The outer loop only iterates once, so we expect optimizing * compilers will unroll it, thereby optimizing it away. */ #define foreach_internal(type, pointer, var, lst, func) \ for (type pointer var = 0, pointer var##__outerloop = (type pointer) 1; \ var##__outerloop; \ var##__outerloop = 0) \ for (ForEachState var##__state = {(lst), 0}; \ (var##__state.l != NIL && \ var##__state.i < var##__state.l->length && \ (var = (type pointer) func(&var##__state.l->elements[var##__state.i]), true)); \ var##__state.i++) /* * foreach_node - * The same as foreach_ptr, but asserts that the element is of the specified * node type. */ #define foreach_node(type, var, lst) \ for (type * var = 0, *var##__outerloop = (type *) 1; \ var##__outerloop; \ var##__outerloop = 0) \ for (ForEachState var##__state = {(lst), 0}; \ (var##__state.l != NIL && \ var##__state.i < var##__state.l->length && \ (var = lfirst_node(type, &var##__state.l->elements[var##__state.i]), true)); \ var##__state.i++) ``` On Thu, Mar 5, 2026 at 9:58 AM Japin Li <[email protected]> wrote: > > Hi, Richard > > On Wed, 04 Mar 2026 at 18:52, Richard Guo <[email protected]> wrote: > > On Sat, Feb 14, 2026 at 4:37 PM Richard Guo <[email protected]> > wrote: > >> On Thu, Feb 5, 2026 at 3:51 PM Richard Guo <[email protected]> > wrote: > >> > Attached is the updated patch, which adds the check requiring the > >> > operator to be a member of a btree or hash opfamily. > > > >> Attached is another updated patch rebased on current master, with the > >> addition of support for RowCompareExpr to handle multi-column ordering > >> operations; otherwise unchanged. > > > > Attached is another updated patch rebased on current master, with some > > minor cosmetic adjustments; nothing essential has changed. > > > > Thank you for working on this! > > Just a quick note: I think `foreach_ptr` is more appropriate here than > `foreach`. > > diff --git a/src/backend/optimizer/plan/subselect.c > b/src/backend/optimizer/plan/subselect.c > index 299b3354f6d..0d31861da7f 100644 > --- a/src/backend/optimizer/plan/subselect.c > +++ b/src/backend/optimizer/plan/subselect.c > @@ -1484,7 +1484,6 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > { > Node *testexpr = sublink->testexpr; > List *outer_exprs = NIL; > - ListCell *lc; > > /* Punt if sublink is not in the expected format */ > if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) > @@ -1514,10 +1513,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > /* multi-column equality or inequality checks */ > BoolExpr *bexpr = (BoolExpr *) testexpr; > > - foreach(lc, bexpr->args) > + foreach_ptr(OpExpr, opexpr, bexpr->args) > { > - OpExpr *opexpr = (OpExpr *) lfirst(lc); > - > if (!IsA(opexpr, OpExpr)) > return false; > > @@ -1537,10 +1534,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > /* multi-column ordering checks */ > RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; > > - foreach(lc, rcexpr->opnos) > + foreach_oid(opno, rcexpr->opnos) > { > - Oid opno = lfirst_oid(lc); > - > /* verify operator safety; see comment above */ > if (!op_is_safe_index_member(opno)) > return false; > @@ -1566,10 +1561,8 @@ sublink_testexpr_is_not_nullable(PlannerInfo *root, > SubLink *sublink) > flatten_join_alias_vars(root, root->parse, (Node *) > outer_exprs); > > /* Check that every outer expression is non-nullable */ > - foreach(lc, outer_exprs) > + foreach_ptr(Expr, expr, outer_exprs) > { > - Expr *expr = (Expr *) lfirst(lc); > - > /* > * We have already collected relation-level not-null > constraints for > * the outer query, so we can consult the global hash > table for > diff --git a/src/backend/optimizer/util/clauses.c > b/src/backend/optimizer/util/clauses.c > index c47c9da4a9b..3f3baf2149a 100644 > --- a/src/backend/optimizer/util/clauses.c > +++ b/src/backend/optimizer/util/clauses.c > @@ -2052,7 +2052,6 @@ query_outputs_are_not_nullable(Query *query) > List *safe_quals = NIL; > List *nonnullable_vars = NIL; > bool computed_nonnullable_vars = false; > - ListCell *tl; > > /* > * If the query contains set operations, punt. The set ops > themselves > @@ -2083,9 +2082,8 @@ query_outputs_are_not_nullable(Query *query) > /* > * Examine each targetlist entry to prove that it can't produce > NULL. > */ > - foreach(tl, query->targetList) > + foreach_ptr(TargetEntry, tle, query->targetList) > { > - TargetEntry *tle = (TargetEntry *) lfirst(tl); > Expr *expr = tle->expr; > > /* Resjunk columns can be ignored: they don't produce > output values */ > @@ -2194,11 +2192,10 @@ find_subquery_safe_quals(Node *jtnode, List > **safe_quals) > else if (IsA(jtnode, FromExpr)) > { > FromExpr *f = (FromExpr *) jtnode; > - ListCell *lc; > > /* All elements of the FROM list are allowable */ > - foreach(lc, f->fromlist) > - find_subquery_safe_quals((Node *) lfirst(lc), > safe_quals); > + foreach_ptr(Node, node, f->fromlist) > + find_subquery_safe_quals(node, safe_quals); > /* ... and its WHERE quals are too */ > if (f->quals) > *safe_quals = lappend(*safe_quals, f->quals); > > > > - Richard > > > > [2. text/x-diff; > v4-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch]... > > -- > Regards, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. > > > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-08 08:16 Richard Guo <[email protected]> parent: Japin Li <[email protected]> 1 sibling, 2 replies; 13+ messages in thread From: Richard Guo @ 2026-03-08 08:16 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Pg Hackers <[email protected]> On Thu, Mar 5, 2026 at 10:57 AM Japin Li <[email protected]> wrote: > Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. Agreed. For homogeneous lists, it is better to use foreach_node so we get the additional type-safety assertions in dev builds. For heterogeneous lists, we should use foreach_ptr. Attached is an updated version of the patch that replaces foreach with foreach_ptr or foreach_node accordingly. Nothing else has changed. I plan to do one more round of self-review on this. Unless there are any further thoughts or concerns, I'm hoping to commit this in a week or two. Please let me know if anyone spots anything else I might have missed. - Richard Attachments: [application/octet-stream] v5-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (61.6K, 2-v5-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From 8083043ad59b5f35540ef997fd0f68fdafc3bcc6 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v5] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. The logic for verifying the non-nullability of the subquery outputs was adapted from prior work by David Rowley and Tom Lane. Author: Richard Guo <[email protected]> Reviewed-by: wenhui qiu <[email protected]> Reviewed-by: Zhang Mingli <[email protected]> Reviewed-by: Japin Li <[email protected]> Discussion: https://postgr.es/m/CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 159 +++++++- src/backend/optimizer/prep/prepjointree.c | 72 +++- src/backend/optimizer/util/clauses.c | 381 ++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1262 insertions(+), 66 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index d7f3cedf3d5..0d31861da7f 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,134 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the three standard parser representations for ANY sublinks: a + * single OpExpr for single-column comparisons, a BoolExpr containing a list of + * OpExprs for multi-column equality or inequality checks (where equality + * becomes an AND and inequality becomes an OR), and a RowCompareExpr for + * multi-column ordering checks. In all cases, we validate the operators and + * the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison and + * does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (is_andclause(testexpr) || is_orclause(testexpr)) + { + /* multi-column equality or inequality checks */ + BoolExpr *bexpr = (BoolExpr *) testexpr; + + foreach_ptr(OpExpr, opexpr, bexpr->args) + { + if (!IsA(opexpr, OpExpr)) + return false; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else if (IsA(testexpr, RowCompareExpr)) + { + /* multi-column ordering checks */ + RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; + + foreach_oid(opno, rcexpr->opnos) + { + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opno)) + return false; + } + + outer_exprs = list_concat(outer_exprs, rcexpr->largs); + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach_ptr(Expr, expr, outer_exprs) + { + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * Note: It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c90f4b32733..b2beb0a0d68 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -852,14 +852,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -885,7 +886,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -970,14 +971,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) @@ -3706,6 +3761,13 @@ has_notnull_forced_var(PlannerInfo *root, List *forced_null_vars, rte = rt_fetch(varno, root->parse->rtable); + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + { + bms_free(forcednullattnums); + continue; + } + /* * We must skip inheritance parent tables, as some child tables may * have a NOT NULL constraint for a column while others may not. This diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index a41d81734cf..9852ba5762b 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,227 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach_node(TargetEntry, tle, query->targetList) + { + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the given + * expression, and to the quals found by find_subquery_safe_quals, to + * avoid being fooled by join aliases. + * + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Similarly, we must apply flatten_group_exprs to the given + * expression to avoid being fooled by grouping Vars. We do not need + * to apply flatten_group_exprs to the quals found by + * find_subquery_safe_quals though, as grouping Vars cannot appear in + * jointree quals. + * + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + + /* All elements of the FROM list are allowable */ + foreach_ptr(Node, child_node, f->fromlist) + find_subquery_safe_quals(child_node, safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -2739,7 +2966,8 @@ eval_const_expressions_mutator(Node *node, if (!has_nullable_nonconst && !expr_is_nonnullable(context->root, - (Expr *) lfirst(arg), false)) + (Expr *) lfirst(arg), + NOTNULL_SOURCE_HASHTABLE)) has_nullable_nonconst = true; } } @@ -3418,7 +3646,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3612,7 +3841,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3664,7 +3893,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -3749,7 +3979,9 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } - if (arg && expr_is_nonnullable(context->root, (Expr *) arg, false)) + if (arg && + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { /* * If arg is proven non-nullable, simplify to boolean @@ -4384,14 +4616,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4406,35 +4635,89 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4444,16 +4727,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4463,7 +4752,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4480,7 +4769,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4495,7 +4784,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4511,13 +4800,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4565,7 +4854,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 1913b009d40..f10948483b9 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index b562ca380a8..e8b409afb7f 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..200236a0a69 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,442 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t2.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Nested Loop Anti Join + Join Filter: (ROW(not_null_tab.id, not_null_tab.val) > ROW(not_null_tab_1.id, not_null_tab_1.val)) + -> Seq Scan on not_null_tab + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4cd016f4ac3 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,188 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 3250564d4ff..8080dc16afa 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1788,6 +1788,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-09 04:01 Richard Guo <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 0 replies; 13+ messages in thread From: Richard Guo @ 2026-03-09 04:01 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Pg Hackers <[email protected]> On Sun, Mar 8, 2026 at 5:16 PM Richard Guo <[email protected]> wrote: > Attached is an updated version of the patch that replaces foreach with > foreach_ptr or foreach_node accordingly. Nothing else has changed. > > I plan to do one more round of self-review on this. Unless there are > any further thoughts or concerns, I'm hoping to commit this in a week > or two. Please let me know if anyone spots anything else I might have > missed. I just finished a final self-review and noticed a subtle issue in query_outputs_are_not_nullable(). When preparing a target expression for the non-nullability check, the previous code flattened join aliases Vars before grouping Vars. However, because the parser processes FROM/JOIN clauses before the GROUP BY clause, a grouping Var can actually wrap a join alias Var, not the reverse. So we should flatten grouping Vars first. Attached is an updated patch that fixes that issue. I plan to commit this in a week or two, barring any objections. - Richard Attachments: [application/octet-stream] v6-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch (61.6K, 2-v6-0001-Convert-NOT-IN-sublinks-to-anti-joins-when-safe.patch) download | inline diff: From a13a2ccc5ac62f21f0d324cb74a37ed6418f0ffe Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 30 Jan 2026 12:29:42 +0900 Subject: [PATCH v6] Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe. However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. The logic for verifying the non-nullability of the subquery outputs was adapted from prior work by David Rowley and Tom Lane. Author: Richard Guo <[email protected]> Reviewed-by: wenhui qiu <[email protected]> Reviewed-by: Zhang Mingli <[email protected]> Reviewed-by: Japin Li <[email protected]> Discussion: https://postgr.es/m/CAMbWs495eF=-fSa5CwJS6B-BaEi3ARp0UNb4Lt3EkgUGZJwkAQ@mail.gmail.com --- src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 159 +++++++- src/backend/optimizer/prep/prepjointree.c | 72 +++- src/backend/optimizer/util/clauses.c | 383 ++++++++++++++++--- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/cache/lsyscache.c | 68 ++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++ src/tools/pgindent/typedefs.list | 1 + 13 files changed, 1264 insertions(+), 66 deletions(-) diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 97ea95a4eb8..9aaf1c4e5ca 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -3447,7 +3447,7 @@ restriction_is_always_true(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ @@ -3512,7 +3512,7 @@ restriction_is_always_false(PlannerInfo *root, if (nulltest->argisrow) return false; - return expr_is_nonnullable(root, nulltest->arg, true); + return expr_is_nonnullable(root, nulltest->arg, NOTNULL_SOURCE_RELOPT); } /* If it's an OR, check its sub-clauses */ diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index d7f3cedf3d5..0d31861da7f 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -91,6 +91,7 @@ static bool contain_outer_selfref(Node *node); static bool contain_outer_selfref_walker(Node *node, Index *depth); static void inline_cte(PlannerInfo *root, CommonTableExpr *cte); static bool inline_cte_walker(Node *node, inline_cte_walker_context *context); +static bool sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink); static bool simplify_EXISTS_query(PlannerInfo *root, Query *query); static Query *convert_EXISTS_to_ANY(PlannerInfo *root, Query *subselect, Node **testexpr, List **paramIds); @@ -1306,11 +1307,14 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * - * The only non-obvious input parameter is available_rels: this is the set - * of query rels that can safely be referenced in the sublink expression. - * (We must restrict this to avoid changing the semantics when a sublink - * is present in an outer join's ON qual.) The conversion must fail if - * the converted qual would reference any but these parent-query relids. + * If under_not is true, the caller actually found NOT (ANY SubLink), so + * that what we must try to build is an ANTI not SEMI join. + * + * available_rels is the set of query rels that can safely be referenced + * in the sublink expression. (We must restrict this to avoid changing + * the semantics when a sublink is present in an outer join's ON qual.) + * The conversion must fail if the converted qual would reference any but + * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to @@ -1333,7 +1337,7 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, Query *values) */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, - Relids available_rels) + bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; @@ -1351,6 +1355,19 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); + /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that neither the outer query's expressions nor the sub-select's + * output columns can be NULL, and further that the operator itself cannot + * return NULL for non-null inputs, then the logic is identical and it's + * safe to convert NOT IN to an anti-join. + */ + if (under_not && + (!sublink_testexpr_is_not_nullable(root, sublink) || + !query_outputs_are_not_nullable(subselect))) + return NULL; + /* * If the sub-select contains any Vars of the parent query, we treat it as * LATERAL. (Vars from higher levels don't matter here.) @@ -1428,7 +1445,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); - result->jointype = JOIN_SEMI; + result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; @@ -1441,12 +1458,134 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, return result; } +/* + * sublink_testexpr_is_not_nullable: verify that testexpr of an ANY_SUBLINK + * guarantees a non-null result, assuming the inner side is also non-null. + * + * To ensure the expression never returns NULL, we require both that the outer + * expressions are provably non-nullable and that the operator itself is safe. + * We validate operator safety by checking for membership in a standard index + * operator family (B-tree or Hash); this acts as a proxy for standard boolean + * behavior, ensuring the operator does not produce NULL results from non-null + * inputs. + * + * We handle the three standard parser representations for ANY sublinks: a + * single OpExpr for single-column comparisons, a BoolExpr containing a list of + * OpExprs for multi-column equality or inequality checks (where equality + * becomes an AND and inequality becomes an OR), and a RowCompareExpr for + * multi-column ordering checks. In all cases, we validate the operators and + * the outer expressions. + * + * It is acceptable for this check not to be exhaustive. We can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +static bool +sublink_testexpr_is_not_nullable(PlannerInfo *root, SubLink *sublink) +{ + Node *testexpr = sublink->testexpr; + List *outer_exprs = NIL; + + /* Punt if sublink is not in the expected format */ + if (sublink->subLinkType != ANY_SUBLINK || testexpr == NULL) + return false; + + if (IsA(testexpr, OpExpr)) + { + /* single-column comparison */ + OpExpr *opexpr = (OpExpr *) testexpr; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* + * We rely on membership in a B-tree or Hash operator family as a + * guarantee that the operator acts as a proper boolean comparison and + * does not yield NULL for valid non-null inputs. + */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + else if (is_andclause(testexpr) || is_orclause(testexpr)) + { + /* multi-column equality or inequality checks */ + BoolExpr *bexpr = (BoolExpr *) testexpr; + + foreach_ptr(OpExpr, opexpr, bexpr->args) + { + if (!IsA(opexpr, OpExpr)) + return false; + + /* standard ANY structure should be op(outer_var, param) */ + if (list_length(opexpr->args) != 2) + return false; + + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opexpr->opno)) + return false; + + outer_exprs = lappend(outer_exprs, linitial(opexpr->args)); + } + } + else if (IsA(testexpr, RowCompareExpr)) + { + /* multi-column ordering checks */ + RowCompareExpr *rcexpr = (RowCompareExpr *) testexpr; + + foreach_oid(opno, rcexpr->opnos) + { + /* verify operator safety; see comment above */ + if (!op_is_safe_index_member(opno)) + return false; + } + + outer_exprs = list_concat(outer_exprs, rcexpr->largs); + } + else + { + /* Punt if other node types */ + return false; + } + + /* + * Since the query hasn't yet been through expression preprocessing, we + * must apply flatten_join_alias_vars to the outer expressions to avoid + * being fooled by join aliases. + * + * We do not need to apply flatten_group_exprs though, since grouping Vars + * cannot appear in jointree quals. + */ + outer_exprs = (List *) + flatten_join_alias_vars(root, root->parse, (Node *) outer_exprs); + + /* Check that every outer expression is non-nullable */ + foreach_ptr(Expr, expr, outer_exprs) + { + /* + * We have already collected relation-level not-null constraints for + * the outer query, so we can consult the global hash table for + * nullability information. + */ + if (!expr_is_nonnullable(root, expr, NOTNULL_SOURCE_HASHTABLE)) + return false; + + /* + * Note: It is possible to further prove non-nullability by examining + * the qual clauses available at or below the jointree node where this + * NOT IN clause is evaluated, but for the moment it doesn't seem + * worth the extra complication. + */ + } + + return true; +} + /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * - * The API of this function is identical to convert_ANY_sublink_to_join's, - * except that we also support the case where the caller has found NOT EXISTS, - * so we need an additional input parameter "under_not". + * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c90f4b32733..b2beb0a0d68 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -852,14 +852,15 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, if ((saop = convert_VALUES_to_ANY(root, sublink->testexpr, (Query *) sublink->subselect)) != NULL) - + { /* * The VALUES sequence was simplified. Nothing more to do * here. */ return (Node *) saop; + } - if ((j = convert_ANY_sublink_to_join(root, sublink, + if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -885,7 +886,7 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, return NULL; } if (available_rels2 != NULL && - (j = convert_ANY_sublink_to_join(root, sublink, + (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ @@ -970,14 +971,68 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, } if (is_notclause(node)) { - /* If the immediate argument of NOT is EXISTS, try to convert */ + /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { - if (sublink->subLinkType == EXISTS_SUBLINK) + if (sublink->subLinkType == ANY_SUBLINK) + { + if ((j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels1)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink1; + *jtlink1 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + if (available_rels2 != NULL && + (j = convert_ANY_sublink_to_join(root, sublink, true, + available_rels2)) != NULL) + { + /* Yes; insert the new join node into the join tree */ + j->larg = *jtlink2; + *jtlink2 = (Node *) j; + /* Recursively process pulled-up jointree nodes */ + j->rarg = pull_up_sublinks_jointree_recurse(root, + j->rarg, + &child_rels); + + /* + * Now recursively process the pulled-up quals. Because + * we are underneath a NOT, we can't pull up sublinks that + * reference the left-hand stuff, but it's still okay to + * pull up sublinks referencing j->rarg. + */ + j->quals = pull_up_sublinks_qual_recurse(root, + j->quals, + &j->rarg, + child_rels, + NULL, NULL); + /* Return NULL representing constant TRUE */ + return NULL; + } + } + else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) @@ -3706,6 +3761,13 @@ has_notnull_forced_var(PlannerInfo *root, List *forced_null_vars, rte = rt_fetch(varno, root->parse->rtable); + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + { + bms_free(forcednullattnums); + continue; + } + /* * We must skip inheritance parent tables, as some child tables may * have a NOT NULL constraint for a column while others may not. This diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index a41d81734cf..8172e19254a 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "catalog/pg_class.h" +#include "catalog/pg_inherits.h" #include "catalog/pg_language.h" #include "catalog/pg_operator.h" #include "catalog/pg_proc.h" @@ -112,6 +113,7 @@ static bool contain_context_dependent_node_walker(Node *node, int *flags); static bool contain_leaked_vars_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); +static void find_subquery_safe_quals(Node *jtnode, List **safe_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static bool convert_saop_to_hashed_saop_walker(Node *node, void *context); static Node *eval_const_expressions_mutator(Node *node, @@ -1433,6 +1435,10 @@ contain_leaked_vars_walker(Node *node, void *context) context); } +/***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. @@ -1701,7 +1707,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level) * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND - * format. + * format (but the results are still good if it wasn't AND/OR flattened). * * Attnos of the identified Vars are returned in a multibitmapset (a List of * Bitmapsets). List indexes correspond to relids (varnos), while the per-rel @@ -2021,6 +2027,229 @@ find_forced_null_var(Node *node) return NULL; } +/* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ +bool +query_outputs_are_not_nullable(Query *query) +{ + PlannerInfo subroot; + List *safe_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + */ + if (query->setOperations) + return false; + + /* + * If the query contains grouping sets, punt. Grouping sets can introduce + * NULL values, and we currently lack the PlannerInfo needed to flatten + * grouping Vars in the query's outputs. + */ + if (query->groupingSets) + return false; + + /* + * We need a PlannerInfo to pass to expr_is_nonnullable. Fortunately, we + * can cons up an entirely dummy one, because only the "parse" link in the + * struct is used by expr_is_nonnullable. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach_node(TargetEntry, tle, query->targetList) + { + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * Look through binary relabelings, since we know those don't + * introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must explicitly flatten grouping Vars and join + * aliases in the given expression. Note that flatten_group_exprs + * must be applied before flatten_join_alias_vars, as grouping Vars + * can wrap join aliases. + * + * We must also apply flatten_join_alias_vars to the quals extracted + * by find_subquery_safe_quals. We do not need to apply + * flatten_group_exprs to these quals, though, because grouping Vars + * cannot appear in jointree quals. + */ + + /* + * We have verified that the query does not contain grouping sets, so + * it's safe to use NULL as the root here. + */ + expr = (Expr *) flatten_group_exprs(NULL, query, (Node *) expr); + + /* + * We won't be dealing with arbitrary expressions, so it's safe to use + * NULL as the root, so long as adjust_standard_join_alias_expression + * can handle everything the parser would make as a join alias + * expression. + */ + expr = (Expr *) flatten_join_alias_vars(NULL, query, (Node *) expr); + + /* + * Check to see if the expr cannot be NULL. Since we're on a raw + * parse tree, we need to look up the not-null constraints from the + * system catalogs. + */ + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + continue; + + if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* + * For a plain Var, even if that didn't work, we can conclude that + * the Var is not nullable if find_nonnullable_vars can find a + * "var IS NOT NULL" or similarly strict condition among the quals + * on non-outerjoined-rels. Compute the list of Vars having such + * quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + find_subquery_safe_quals((Node *) query->jointree, &safe_quals); + safe_quals = (List *) + flatten_join_alias_vars(NULL, query, (Node *) safe_quals); + nonnullable_vars = find_nonnullable_vars((Node *) safe_quals); + computed_nonnullable_vars = true; + } + + if (!mbms_is_member(var->varno, + var->varattno - FirstLowInvalidHeapAttributeNumber, + nonnullable_vars)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Punt otherwise */ + return false; + } + } + + return true; +} + +/* + * find_subquery_safe_quals + * Traverse jointree to locate quals on non-outerjoined-rels. + * + * We locate all WHERE and JOIN/ON quals that constrain the rels that are not + * below the nullable side of any outer join, and add them to the *safe_quals + * list (forming a list with implicit-AND semantics). These quals can be used + * to prove non-nullability of the subquery's outputs. + * + * Top-level caller must initialize *safe_quals to NIL. + */ +static void +find_subquery_safe_quals(Node *jtnode, List **safe_quals) +{ + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + /* Leaf node: nothing to do */ + return; + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + + /* All elements of the FROM list are allowable */ + foreach_ptr(Node, child_node, f->fromlist) + find_subquery_safe_quals(child_node, safe_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *safe_quals = lappend(*safe_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_subquery_safe_quals(j->larg, safe_quals); + find_subquery_safe_quals(j->rarg, safe_quals); + /* and grab the ON quals too */ + if (j->quals) + *safe_quals = lappend(*safe_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_subquery_safe_quals(j->larg, safe_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_subquery_safe_quals(j->rarg, safe_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); +} + /* * Can we treat a ScalarArrayOpExpr as strict? * @@ -2739,7 +2968,8 @@ eval_const_expressions_mutator(Node *node, if (!has_nullable_nonconst && !expr_is_nonnullable(context->root, - (Expr *) lfirst(arg), false)) + (Expr *) lfirst(arg), + NOTNULL_SOURCE_HASHTABLE)) has_nullable_nonconst = true; } } @@ -3418,7 +3648,8 @@ eval_const_expressions_mutator(Node *node, newargs = lappend(newargs, e); break; } - if (expr_is_nonnullable(context->root, (Expr *) e, false)) + if (expr_is_nonnullable(context->root, (Expr *) e, + NOTNULL_SOURCE_HASHTABLE)) { if (newargs == NIL) return e; /* first expr */ @@ -3612,7 +3843,7 @@ eval_const_expressions_mutator(Node *node, */ if (relem && expr_is_nonnullable(context->root, (Expr *) relem, - false)) + NOTNULL_SOURCE_HASHTABLE)) { if (ntest->nulltesttype == IS_NULL) return makeBoolConst(false, false); @@ -3664,7 +3895,8 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } if (!ntest->argisrow && arg && - expr_is_nonnullable(context->root, (Expr *) arg, false)) + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { bool result; @@ -3749,7 +3981,9 @@ eval_const_expressions_mutator(Node *node, return makeBoolConst(result, false); } - if (arg && expr_is_nonnullable(context->root, (Expr *) arg, false)) + if (arg && + expr_is_nonnullable(context->root, (Expr *) arg, + NOTNULL_SOURCE_HASHTABLE)) { /* * If arg is proven non-nullable, simplify to boolean @@ -4384,14 +4618,11 @@ simplify_aggref(Aggref *aggref, eval_const_expressions_context *context) * If the Var is defined NOT NULL and meanwhile is not nulled by any outer * joins or grouping sets, then we can know that it cannot be NULL. * - * use_rel_info indicates whether the corresponding RelOptInfo is available for - * use. + * "source" specifies where we should look for NOT NULL proofs. */ bool -var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) +var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) { - Bitmapset *notnullattnums = NULL; - Assert(IsA(var, Var)); /* skip upper-level Vars */ @@ -4406,35 +4637,89 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) if (var->varattno < 0) return true; - /* - * Check if the Var is defined as NOT NULL. We retrieve the column NOT - * NULL constraint information from the corresponding RelOptInfo if it is - * available; otherwise, we search the hash table for this information. - */ - if (use_rel_info) - { - RelOptInfo *rel = find_base_rel(root, var->varno); + /* we don't trust whole-row Vars */ + if (var->varattno == 0) + return false; - notnullattnums = rel->notnullattnums; - } - else + /* Check if the Var is defined as NOT NULL. */ + switch (source) { - RangeTblEntry *rte = planner_rt_fetch(var->varno, root); + case NOTNULL_SOURCE_RELOPT: + { + /* + * We retrieve the column NOT NULL constraint information from + * the corresponding RelOptInfo. + */ + RelOptInfo *rel; + Bitmapset *notnullattnums; - /* - * We must skip inheritance parent tables, as some child tables may - * have a NOT NULL constraint for a column while others may not. This - * cannot happen with partitioned tables, though. - */ - if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) - return false; + rel = find_base_rel(root, var->varno); + notnullattnums = rel->notnullattnums; - notnullattnums = find_relation_notnullatts(root, rte->relid); - } + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_HASHTABLE: + { + /* + * We retrieve the column NOT NULL constraint information from + * the hash table. + */ + RangeTblEntry *rte; + Bitmapset *notnullattnums; - if (var->varattno > 0 && - bms_is_member(var->varattno, notnullattnums)) - return true; + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + */ + if (rte->inh && rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + notnullattnums = find_relation_notnullatts(root, rte->relid); + + return bms_is_member(var->varattno, notnullattnums); + } + case NOTNULL_SOURCE_SYSCACHE: + { + /* + * We look up the "attnotnull" field in the attribute + * relation. + */ + RangeTblEntry *rte; + + rte = planner_rt_fetch(var->varno, root); + + /* We can only reason about ordinary relations */ + if (rte->rtekind != RTE_RELATION) + return false; + + /* + * We must skip inheritance parent tables, as some child + * tables may have a NOT NULL constraint for a column while + * others may not. This cannot happen with partitioned + * tables, though. + * + * Note that we need to check if the relation actually has any + * children, as we might not have done that yet. + */ + if (rte->inh && has_subclass(rte->relid) && + rte->relkind != RELKIND_PARTITIONED_TABLE) + return false; + + return get_attnotnull(rte->relid, var->varattno); + } + default: + elog(ERROR, "unrecognized NotNullSource: %d", + (int) source); + break; + } return false; } @@ -4444,16 +4729,22 @@ var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info) * * Returns true iff the given 'expr' cannot produce SQL NULLs. * - * If 'use_rel_info' is true, nullability of Vars is checked via the - * corresponding RelOptInfo for the given Var. Some callers require - * nullability information before RelOptInfos are generated. These should - * pass 'use_rel_info' as false. + * source: specifies where we should look for NOT NULL proofs for Vars. + * - NOTNULL_SOURCE_RELOPT: Used when RelOptInfos have been generated. We + * retrieve nullability information directly from the RelOptInfo corresponding + * to the Var. + * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, + * but we have already collected relation-level not-null constraints into the + * global hash table. + * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we have to + * look up the 'attnotnull' field directly in the system catalogs. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. */ bool -expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) +expr_is_nonnullable(PlannerInfo *root, Expr *expr, NotNullSource source) { /* since this function recurses, it could be driven to stack overflow */ check_stack_depth(); @@ -4463,7 +4754,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) case T_Var: { if (root) - return var_is_nonnullable(root, (Var *) expr, use_rel_info); + return var_is_nonnullable(root, (Var *) expr, source); } break; case T_Const: @@ -4480,7 +4771,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, coalesceexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4495,7 +4786,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) foreach_ptr(Expr, arg, minmaxexpr->args) { - if (expr_is_nonnullable(root, arg, use_rel_info)) + if (expr_is_nonnullable(root, arg, source)) return true; } } @@ -4511,13 +4802,13 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) /* The default result must be present and non-nullable */ if (caseexpr->defresult == NULL || - !expr_is_nonnullable(root, caseexpr->defresult, use_rel_info)) + !expr_is_nonnullable(root, caseexpr->defresult, source)) return false; /* All branch results must be non-nullable */ foreach_ptr(CaseWhen, casewhen, caseexpr->args) { - if (!expr_is_nonnullable(root, casewhen->result, use_rel_info)) + if (!expr_is_nonnullable(root, casewhen->result, source)) return false; } @@ -4565,7 +4856,7 @@ expr_is_nonnullable(PlannerInfo *root, Expr *expr, bool use_rel_info) * non-nullable. */ return expr_is_nonnullable(root, ((RelabelType *) expr)->arg, - use_rel_info); + source); } default: break; diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 37d34685b93..6c8fb7b7275 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -834,7 +834,7 @@ int8inc_support(PG_FUNCTION_ARGS) PG_RETURN_POINTER(NULL); /* If the arg isn't NULLable, do the conversion */ - if (expr_is_nonnullable(req->root, arg, false)) + if (expr_is_nonnullable(req->root, arg, NOTNULL_SOURCE_HASHTABLE)) { Aggref *newagg; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 1913b009d40..f10948483b9 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -858,6 +858,47 @@ comparison_ops_are_compatible(Oid opno1, Oid opno2) return result; } +/* + * op_is_safe_index_member + * Check if the operator is a member of a B-tree or Hash operator family. + * + * We use this check as a proxy for "null-safety": if an operator is trusted by + * the btree or hash opfamily, it implies that the operator adheres to standard + * boolean behavior, and would not return NULL when given valid non-null + * inputs, as doing so would break index integrity. + */ +bool +op_is_safe_index_member(Oid opno) +{ + bool result = false; + CatCList *catlist; + int i; + + /* + * Search pg_amop to see if the target operator is registered for any + * btree or hash opfamily. + */ + catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno)); + + for (i = 0; i < catlist->n_members; i++) + { + HeapTuple tuple = &catlist->members[i]->tuple; + Form_pg_amop aform = (Form_pg_amop) GETSTRUCT(tuple); + + /* Check if the AM is B-tree or Hash */ + if (aform->amopmethod == BTREE_AM_OID || + aform->amopmethod == HASH_AM_OID) + { + result = true; + break; + } + } + + ReleaseSysCacheList(catlist); + + return result; +} + /* ---------- AMPROC CACHES ---------- */ @@ -1071,6 +1112,33 @@ get_attoptions(Oid relid, int16 attnum) return result; } +/* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ +bool +get_attnotnull(Oid relid, AttrNumber attnum) +{ + HeapTuple tp; + bool result = false; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + } + + return result; +} + /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index a64034e8a6d..853a28c0007 100644 --- a/src/include/optimizer/clauses.h +++ b/src/include/optimizer/clauses.h @@ -42,6 +42,7 @@ extern Relids find_nonnullable_rels(Node *clause); extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *node); extern Var *find_forced_null_var(Node *node); +extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index b562ca380a8..e8b409afb7f 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -130,6 +130,14 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check); /* in util/clauses.c: */ +/* Enum to specify where var_is_nonnullable should look for NOT NULL proofs */ +typedef enum +{ + NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ + NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ + NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ +} NotNullSource; + extern bool contain_mutable_functions(Node *clause); extern bool contain_mutable_functions_after_planning(Expr *expr); extern bool contain_volatile_functions(Node *clause); @@ -145,10 +153,11 @@ extern Node *estimate_expression_value(PlannerInfo *root, Node *node); extern Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod, Oid result_collation); -extern bool var_is_nonnullable(PlannerInfo *root, Var *var, bool use_rel_info); +extern bool var_is_nonnullable(PlannerInfo *root, Var *var, + NotNullSource source); extern bool expr_is_nonnullable(PlannerInfo *root, Expr *expr, - bool use_rel_info); + NotNullSource source); extern List *expand_function_arguments(List *args, bool include_out_arguments, Oid result_type, diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 8a5503eb973..4ecccf46bd3 100644 --- a/src/include/optimizer/subselect.h +++ b/src/include/optimizer/subselect.h @@ -22,6 +22,7 @@ extern ScalarArrayOpExpr *convert_VALUES_to_ANY(PlannerInfo *root, Query *values); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 5655aca4c14..b9ad84ecd41 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -89,6 +89,7 @@ extern bool get_op_hash_functions(Oid opno, extern List *get_op_index_interpretation(Oid opno); extern bool equality_ops_are_compatible(Oid opno1, Oid opno2); extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2); +extern bool op_is_safe_index_member(Oid opno); extern Oid get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype, int16 procnum); extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok); @@ -98,6 +99,7 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); +extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 2135d82884d..200236a0a69 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3323,3 +3323,442 @@ SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); Seq Scan on onek t (1 row) +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- +BEGIN; +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + QUERY PLAN +---------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = null_tab.id) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab + Filter: (id IS NOT NULL) +(6 rows) + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------- + Seq Scan on null_tab + Filter: ((id IS NOT NULL) AND (NOT (ANY (id = (hashed SubPlan any_1).col1)))) + SubPlan any_1 + -> Seq Scan on not_null_tab +(4 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(11 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + QUERY PLAN +----------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t2.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + QUERY PLAN +---------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = t1.id) + -> Seq Scan on not_null_tab + -> Hash + -> Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + Filter: (id IS NOT NULL) + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 +(13 rows) + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + QUERY PLAN +------------------------------------------------- + Merge Anti Join + Merge Cond: (not_null_tab.id = t1.id) + -> Sort + Sort Key: not_null_tab.id + -> Seq Scan on not_null_tab + -> Nested Loop Left Join + -> Merge Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Seq Scan on null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on null_tab t2 + -> Materialize + -> Seq Scan on null_tab t3 +(16 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + -> Sort + Sort Key: t1.id + -> Hash Anti Join + Hash Cond: (t1.id = not_null_tab.id) + -> Seq Scan on not_null_tab t1 + -> Hash + -> Seq Scan on not_null_tab + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 +(12 rows) + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------- + Merge Left Join + Merge Cond: (t1.id = t2.id) + Filter: (NOT (ANY (t2.id = (hashed SubPlan any_1).col1))) + -> Sort + Sort Key: t1.id + -> Seq Scan on not_null_tab t1 + -> Sort + Sort Key: t2.id + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(11 rows) + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------ + Nested Loop Left Join + Join Filter: (NOT (ANY (t1.id = (hashed SubPlan any_1).col1))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on not_null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +---------------------------------------------------- + Nested Loop Left Join + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Hash Anti Join + Hash Cond: (t2.id = not_null_tab.id) + -> Seq Scan on not_null_tab t2 + -> Hash + -> Seq Scan on not_null_tab +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +--------------------------------------------------------------------------------------------------- + Merge Anti Join + Merge Cond: ((not_null_tab.id = not_null_tab_1.id) AND (not_null_tab.val = not_null_tab_1.val)) + -> Sort + Sort Key: not_null_tab.id, not_null_tab.val + -> Seq Scan on not_null_tab + -> Sort + Sort Key: not_null_tab_1.id, not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(8 rows) + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Nested Loop Anti Join + Join Filter: (ROW(not_null_tab.id, not_null_tab.val) > ROW(not_null_tab_1.id, not_null_tab_1.val)) + -> Seq Scan on not_null_tab + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + QUERY PLAN +-------------------------------------------------------------------------------------------------------------- + Nested Loop + Join Filter: (NOT (ANY ((t1.id = (hashed SubPlan any_1).col1) AND (t2.id = (hashed SubPlan any_1).col2)))) + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 + SubPlan any_1 + -> Seq Scan on not_null_tab +(7 rows) + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + QUERY PLAN +-------------------------------------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY ((id = (SubPlan any_1).col1) AND (val = (SubPlan any_1).col2)))) + SubPlan any_1 + -> Materialize + -> Nested Loop + -> Seq Scan on not_null_tab t1 + -> Materialize + -> Seq Scan on null_tab t2 +(8 rows) + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (COALESCE(null_tab.id, '-1'::integer) = not_null_tab.id) + -> Seq Scan on null_tab + -> Hash + -> Seq Scan on not_null_tab +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + QUERY PLAN +----------------------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = COALESCE(null_tab.id, '-1'::integer)) + -> Seq Scan on not_null_tab + -> Hash + -> Seq Scan on null_tab +(5 rows) + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + QUERY PLAN +----------------------------------------------------------- + Hash Anti Join + Hash Cond: (not_null_tab.id = not_null_tab_1.id) + -> Seq Scan on not_null_tab + -> Hash + -> HashAggregate + Group Key: not_null_tab_1.id + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Group Key: null_tab.id + -> Seq Scan on null_tab +(6 rows) + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> HashAggregate + Hash Key: not_null_tab_1.id + Hash Key: not_null_tab_1.val + -> Seq Scan on not_null_tab not_null_tab_1 +(7 rows) + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + QUERY PLAN +------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id ?= (SubPlan any_1).col1))) + SubPlan any_1 + -> Materialize + -> Seq Scan on not_null_tab not_null_tab_1 +(5 rows) + +ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index cadc3293687..4cd016f4ac3 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1448,3 +1448,188 @@ SELECT * FROM onek t1, lateral (SELECT * FROM onek t2 WHERE t2.ten IN (values (t -- VtA causes the whole expression to be evaluated as a constant EXPLAIN (COSTS OFF) SELECT ten FROM onek t WHERE 1.0::integer IN ((VALUES (1), (3))); + +-- +-- Check NOT IN performs an ANTI JOIN when both the outer query's expressions +-- and the sub-select's output columns are provably non-nullable, and the +-- operator itself cannot return NULL for non-null inputs. +-- + +BEGIN; + +CREATE TEMP TABLE not_null_tab (id int NOT NULL, val int NOT NULL); +CREATE TEMP TABLE null_tab (id int, val int); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab WHERE id IS NOT NULL); + +-- No ANTI JOIN: outer side is nullable (we don't check outer query quals for now) +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE id IS NOT NULL + AND id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is defined NOT NULL +-- and is not nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- No ANTI JOIN: inner side is defined NOT NULL but is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t2.id + FROM not_null_tab t1 + LEFT JOIN not_null_tab t2 ON t1.id = t2.id + WHERE t2.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + LEFT JOIN null_tab t2 ON t1.id = t2.id + WHERE t1.id IS NOT NULL +); + +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t1.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: outer side is nulled by outer join +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 ON t1.id = t2.id +WHERE t2.id NOT IN (SELECT id FROM not_null_tab); + +-- No ANTI JOIN: sublink is in an outer join's ON qual and references the +-- non-nullable side +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t1.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT id, val FROM not_null_tab); + +-- ANTI JOIN: both sides are defined NOT NULL +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT (id, val) > ANY (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the outer side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab t1, null_tab t2 +WHERE (t1.id, t2.id) NOT IN (SELECT id, val FROM not_null_tab); + +-- No ANTI JOIN: one column of the inner side is nullable +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE (id, val) NOT IN (SELECT t1.id, t2.id FROM not_null_tab t1, null_tab t2); + +-- ANTI JOIN: COALESCE(nullable, constant) is non-nullable +EXPLAIN (COSTS OFF) +SELECT * FROM null_tab +WHERE COALESCE(id, -1) NOT IN (SELECT id FROM not_null_tab); + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT COALESCE(id, -1) FROM null_tab); + +-- ANTI JOIN: GROUP BY (without Grouping Sets) preserves the non-nullability of +-- the column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM not_null_tab GROUP BY id); + +-- No ANTI JOIN: GROUP BY on a nullable column +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM null_tab GROUP BY id); + +-- No ANTI JOIN: Grouping Sets can introduce NULLs +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT id + FROM not_null_tab + GROUP BY GROUPING SETS ((id), (val)) +); + +-- create a custom "unsafe" equality operator +CREATE FUNCTION int4eq_unsafe(int4, int4) + RETURNS bool + AS 'int4eq' + LANGUAGE internal IMMUTABLE; + +CREATE OPERATOR ?= ( + PROCEDURE = int4eq_unsafe, + LEFTARG = int4, + RIGHTARG = int4 +); + +-- No ANTI JOIN: the operator is not safe +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); + +ROLLBACK; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 3250564d4ff..8080dc16afa 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1788,6 +1788,7 @@ Node NodeTag NonEmptyRange NoneCompressorState +NotNullSource Notification NotificationList NotifyStmt -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-09 05:14 Japin Li <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 1 reply; 13+ messages in thread From: Japin Li @ 2026-03-09 05:14 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]> On Sun, 08 Mar 2026 at 17:16, Richard Guo <[email protected]> wrote: > On Thu, Mar 5, 2026 at 10:57 AM Japin Li <[email protected]> wrote: >> Just a quick note: I think `foreach_ptr` is more appropriate here than `foreach`. > > Agreed. > > For homogeneous lists, it is better to use foreach_node so we get the > additional type-safety assertions in dev builds. For heterogeneous > lists, we should use foreach_ptr. > > Attached is an updated version of the patch that replaces foreach with > foreach_ptr or foreach_node accordingly. Nothing else has changed. > > I plan to do one more round of self-review on this. Unless there are > any further thoughts or concerns, I'm hoping to commit this in a week > or two. Please let me know if anyone spots anything else I might have > missed. > Thanks for updating the patch. LGTM. -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd. ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-09 07:38 wenhui qiu <[email protected]> parent: Japin Li <[email protected]> 0 siblings, 1 reply; 13+ messages in thread From: wenhui qiu @ 2026-03-09 07:38 UTC (permalink / raw) To: Japin Li <[email protected]>; +Cc: Richard Guo <[email protected]>; Pg Hackers <[email protected]> HI > I just finished a final self-review and noticed a subtle issue in > query_outputs_are_not_nullable(). When preparing a target expression > for the non-nullability check, the previous code flattened join > aliases Vars before grouping Vars. However, because the parser > processes FROM/JOIN clauses before the GROUP BY clause, a grouping Var > can actually wrap a join alias Var, not the reverse. So we should > flatten grouping Vars first. > Attached is an updated patch that fixes that issue. I plan to commit > this in a week or two, barring any objections. Thanks for updating the patch. v6 path LGTM. Thanks On Mon, Mar 9, 2026 at 1:14 PM Japin Li <[email protected]> wrote: > On Sun, 08 Mar 2026 at 17:16, Richard Guo <[email protected]> wrote: > > On Thu, Mar 5, 2026 at 10:57 AM Japin Li <[email protected]> wrote: > >> Just a quick note: I think `foreach_ptr` is more appropriate here than > `foreach`. > > > > Agreed. > > > > For homogeneous lists, it is better to use foreach_node so we get the > > additional type-safety assertions in dev builds. For heterogeneous > > lists, we should use foreach_ptr. > > > > Attached is an updated version of the patch that replaces foreach with > > foreach_ptr or foreach_node accordingly. Nothing else has changed. > > > > I plan to do one more round of self-review on this. Unless there are > > any further thoughts or concerns, I'm hoping to commit this in a week > > or two. Please let me know if anyone spots anything else I might have > > missed. > > > > Thanks for updating the patch. LGTM. > > -- > Regards, > Japin Li > ChengDu WenWu Information Technology Co., Ltd. > > > ^ permalink raw reply [nested|flat] 13+ messages in thread
* Re: Convert NOT IN sublinks to anti-joins when safe @ 2026-03-12 02:14 Richard Guo <[email protected]> parent: wenhui qiu <[email protected]> 0 siblings, 0 replies; 13+ messages in thread From: Richard Guo @ 2026-03-12 02:14 UTC (permalink / raw) To: wenhui qiu <[email protected]>; +Cc: Japin Li <[email protected]>; Pg Hackers <[email protected]> On Mon, Mar 9, 2026 at 4:38 PM wenhui qiu <[email protected]> wrote: > Thanks for updating the patch. v6 path LGTM. Thanks! Committed with a few minor cosmetic tweaks. - Richard ^ permalink raw reply [nested|flat] 13+ messages in thread
end of thread, other threads:[~2026-03-12 02:14 UTC | newest] Thread overview: 13+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-02-03 07:12 Convert NOT IN sublinks to anti-joins when safe Richard Guo <[email protected]> 2026-02-03 09:41 ` wenhui qiu <[email protected]> 2026-02-04 09:47 ` Richard Guo <[email protected]> 2026-02-05 06:51 ` Richard Guo <[email protected]> 2026-02-14 07:37 ` Richard Guo <[email protected]> 2026-03-04 09:52 ` Richard Guo <[email protected]> 2026-03-05 01:57 ` Japin Li <[email protected]> 2026-03-05 03:40 ` wenhui qiu <[email protected]> 2026-03-08 08:16 ` Richard Guo <[email protected]> 2026-03-09 04:01 ` Richard Guo <[email protected]> 2026-03-09 05:14 ` Japin Li <[email protected]> 2026-03-09 07:38 ` wenhui qiu <[email protected]> 2026-03-12 02:14 ` Richard Guo <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox