public inbox for [email protected]help / color / mirror / Atom feed
var_is_nonnullable() fails to handle invalid NOT NULL constraints 3+ messages / 2 participants [nested] [flat]
* var_is_nonnullable() fails to handle invalid NOT NULL constraints @ 2026-04-10 08:48 Richard Guo <[email protected]> 2026-04-12 09:33 ` Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints SATYANARAYANA NARLAPURAM <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Richard Guo @ 2026-04-10 08:48 UTC (permalink / raw) To: Pg Hackers <[email protected]> While fixing another bug in var_is_nonnullable(), I noticed $subject. The NOTNULL_SOURCE_SYSCACHE code path (newly added for the NOT IN to anti-join transformation) checks pg_attribute.attnotnull, which can be true even for invalid (NOT VALID) NOT NULL constraints. The consequence is that query_outputs_are_not_nullable() could wrongly conclude that a subquery's output is non-nullable, causing NOT IN to be incorrectly converted to an anti-join. The attached fix checks the attnullability field in the relation's tuple descriptor instead, which correctly distinguishes valid from invalid constraints. This is also consistent with what we do in get_relation_notnullatts(). It could be argued that the added table_open/table_close call is a performance concern, but I don't think so: 1. The relation is already locked by the rewriter, so table_open(rte->relid, NoLock) is just a relcache lookup. 2. This code path is only reached when converting NOT IN to an anti-join, and only after the outer side of the test expression has already been proved non-nullable. 3. It is only called for relation RTEs in the subquery. Thoughts? - Richard Attachments: [application/octet-stream] v1-0001-Fix-var_is_nonnullable-to-handle-invalid-NOT-NULL.patch (8.7K, 2-v1-0001-Fix-var_is_nonnullable-to-handle-invalid-NOT-NULL.patch) download | inline diff: From 5192aef920936133c7ffb7482465e9a1847a0164 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Fri, 10 Apr 2026 16:51:18 +0900 Subject: [PATCH v1] Fix var_is_nonnullable() to handle invalid NOT NULL constraints The NOTNULL_SOURCE_SYSCACHE code path in var_is_nonnullable() used get_attnotnull() to check pg_attribute.attnotnull, which is true for both valid and invalid (NOT VALID) NOT NULL constraints. An invalid constraint does not guarantee the absence of NULLs, so this could lead to incorrect results. For example, query_outputs_are_not_nullable() could wrongly conclude that a subquery's output is non-nullable, causing NOT IN to be incorrectly converted to an anti-join. Fix by checking the attnullability field in the relation's tuple descriptor instead, which correctly distinguishes valid from invalid constraints, consistent with what the NOTNULL_SOURCE_HASHTABLE code path already does. While at it, rename NOTNULL_SOURCE_SYSCACHE to NOTNULL_SOURCE_CATALOG to reflect that this code path no longer uses a syscache lookup, and remove the now-unused get_attnotnull() function. --- src/backend/optimizer/util/clauses.c | 34 +++++++++++++++++++------ src/backend/utils/cache/lsyscache.c | 27 -------------------- src/include/optimizer/optimizer.h | 2 +- src/include/utils/lsyscache.h | 1 - src/test/regress/expected/subselect.out | 23 +++++++++++++++++ src/test/regress/sql/subselect.sql | 14 ++++++++++ 6 files changed, 64 insertions(+), 37 deletions(-) diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index cd4e2e86c6d..598b4edacce 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -20,6 +20,7 @@ #include "postgres.h" #include "access/htup_details.h" +#include "access/table.h" #include "catalog/pg_class.h" #include "catalog/pg_inherits.h" #include "catalog/pg_language.h" @@ -59,6 +60,7 @@ #include "utils/jsonpath.h" #include "utils/lsyscache.h" #include "utils/memutils.h" +#include "utils/rel.h" #include "utils/syscache.h" #include "utils/typcache.h" @@ -2134,7 +2136,7 @@ query_outputs_are_not_nullable(Query *query) * parse tree, we need to look up the not-null constraints from the * system catalogs. */ - if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_SYSCACHE)) + if (expr_is_nonnullable(&subroot, expr, NOTNULL_SOURCE_CATALOG)) continue; if (IsA(expr, Var)) @@ -4696,13 +4698,19 @@ var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) return bms_is_member(var->varattno, notnullattnums); } - case NOTNULL_SOURCE_SYSCACHE: + case NOTNULL_SOURCE_CATALOG: { /* - * We look up the "attnotnull" field in the attribute - * relation. + * We check the attnullability field in the tuple descriptor. + * This is necessary rather than checking the attnotnull field + * from the attribute relation, because attnotnull is also set + * for invalid (NOT VALID) NOT NULL constraints, which do not + * guarantee the absence of NULLs. */ RangeTblEntry *rte; + Relation rel; + CompactAttribute *attr; + bool result; rte = planner_rt_fetch(var->varno, root); @@ -4723,7 +4731,17 @@ var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) rte->relkind != RELKIND_PARTITIONED_TABLE) return false; - return get_attnotnull(rte->relid, var->varattno); + /* + * We need not lock the relation since it was already locked + * by the rewriter. + */ + rel = table_open(rte->relid, NoLock); + attr = TupleDescCompactAttr(RelationGetDescr(rel), + var->varattno - 1); + result = (attr->attnullability == ATTNULLABLE_VALID); + table_close(rel, NoLock); + + return result; } default: elog(ERROR, "unrecognized NotNullSource: %d", @@ -4746,9 +4764,9 @@ var_is_nonnullable(PlannerInfo *root, Var *var, NotNullSource source) * - NOTNULL_SOURCE_HASHTABLE: Used when RelOptInfos are not yet available, * but we have already collected relation-level not-null constraints into the * global hash table. - * - NOTNULL_SOURCE_SYSCACHE: Used for raw parse trees where neither - * RelOptInfos nor the hash table are available. In this case, we have to - * look up the 'attnotnull' field directly in the system catalogs. + * - NOTNULL_SOURCE_CATALOG: Used for raw parse trees where neither + * RelOptInfos nor the hash table are available. In this case, we check the + * column's attnullability in the tuple descriptor. * * For now, we support only a limited set of expression types. Support for * additional node types can be added in the future. diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 9b2b97f5cba..fa9fe9ffa26 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -1114,33 +1114,6 @@ get_attoptions(Oid relid, int16 attnum) return result; } -/* - * get_attnotnull - * - * Given the relation id and the attribute number, - * return the "attnotnull" field from the attribute relation. - */ -bool -get_attnotnull(Oid relid, AttrNumber attnum) -{ - HeapTuple tp; - bool result = false; - - tp = SearchSysCache2(ATTNUM, - ObjectIdGetDatum(relid), - Int16GetDatum(attnum)); - - if (HeapTupleIsValid(tp)) - { - Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); - - result = att_tup->attnotnull; - ReleaseSysCache(tp); - } - - return result; -} - /* ---------- PG_CAST CACHE ---------- */ /* diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index e8b409afb7f..cb6241e2bdd 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -135,7 +135,7 @@ typedef enum { NOTNULL_SOURCE_RELOPT, /* Use RelOptInfo */ NOTNULL_SOURCE_HASHTABLE, /* Use Global Hash Table */ - NOTNULL_SOURCE_SYSCACHE, /* Use System Catalog */ + NOTNULL_SOURCE_CATALOG, /* Use System Catalog */ } NotNullSource; extern bool contain_mutable_functions(Node *clause); diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index e57795fa01f..2e0258d877e 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -99,7 +99,6 @@ extern Oid get_atttype(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); extern Datum get_attoptions(Oid relid, int16 attnum); -extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern Oid get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok); extern char *get_collation_name(Oid colloid); extern bool get_collation_isdeterministic(Oid colloid); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 200236a0a69..a3778c23c34 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -3761,4 +3761,27 @@ WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); -> Seq Scan on not_null_tab not_null_tab_1 (5 rows) +-- No ANTI JOIN: the inner side has an unvalidated NOT NULL constraint, so +-- the column might contain NULLs. +CREATE TEMP TABLE notnull_notvalid_tab (id int); +INSERT INTO notnull_notvalid_tab VALUES (NULL); +ALTER TABLE notnull_notvalid_tab ADD CONSTRAINT nn NOT NULL id NOT VALID; +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM notnull_notvalid_tab); + QUERY PLAN +---------------------------------------------------------- + Seq Scan on not_null_tab + Filter: (NOT (ANY (id = (hashed SubPlan any_1).col1))) + SubPlan any_1 + -> Seq Scan on notnull_notvalid_tab +(4 rows) + +-- NOT IN with NULL on inner side should return no rows +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM notnull_notvalid_tab); + id | val +----+----- +(0 rows) + ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index 4cd016f4ac3..1a02c3f86c0 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1632,4 +1632,18 @@ EXPLAIN (COSTS OFF) SELECT * FROM not_null_tab WHERE NOT id ?= ANY (SELECT id FROM not_null_tab); +-- No ANTI JOIN: the inner side has an unvalidated NOT NULL constraint, so +-- the column might contain NULLs. +CREATE TEMP TABLE notnull_notvalid_tab (id int); +INSERT INTO notnull_notvalid_tab VALUES (NULL); +ALTER TABLE notnull_notvalid_tab ADD CONSTRAINT nn NOT NULL id NOT VALID; + +EXPLAIN (COSTS OFF) +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM notnull_notvalid_tab); + +-- NOT IN with NULL on inner side should return no rows +SELECT * FROM not_null_tab +WHERE id NOT IN (SELECT id FROM notnull_notvalid_tab); + ROLLBACK; -- 2.39.5 (Apple Git-154) ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints 2026-04-10 08:48 var_is_nonnullable() fails to handle invalid NOT NULL constraints Richard Guo <[email protected]> @ 2026-04-12 09:33 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-04-15 00:51 ` Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints Richard Guo <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: SATYANARAYANA NARLAPURAM @ 2026-04-12 09:33 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Pg Hackers <[email protected]> Hi RIchard, On Fri, Apr 10, 2026 at 1:48 AM Richard Guo <[email protected]> wrote: > While fixing another bug in var_is_nonnullable(), I noticed $subject. > The NOTNULL_SOURCE_SYSCACHE code path (newly added for the NOT IN to > anti-join transformation) checks pg_attribute.attnotnull, which can be > true even for invalid (NOT VALID) NOT NULL constraints. > > The consequence is that query_outputs_are_not_nullable() could wrongly > conclude that a subquery's output is non-nullable, causing NOT IN to > be incorrectly converted to an anti-join. > > The attached fix checks the attnullability field in the relation's > tuple descriptor instead, which correctly distinguishes valid from > invalid constraints. This is also consistent with what we do in > get_relation_notnullatts(). > I tested this patch against the current HEAD (155c03ee) and it looks good. Build & tests: Applies cleanly, compiles without warnings, all 247 regression tests pass including the new subselect test case. Reproduced the bug before the patch and verified it is fixed after the patch. > It could be argued that the added table_open/table_close call is a > performance concern, but I don't think so: > > 1. The relation is already locked by the rewriter, so > table_open(rte->relid, NoLock) is just a relcache lookup. > > 2. This code path is only reached when converting NOT IN to an > anti-join, and only after the outer side of the test expression has > already been proved non-nullable. > > 3. It is only called for relation RTEs in the subquery. > > Thoughts? > Looks like it needs to perform table_open/table_close multiple times depending upon the number of output columns? I don't see it as a major concern but let others comment. Thanks, Satya ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints 2026-04-10 08:48 var_is_nonnullable() fails to handle invalid NOT NULL constraints Richard Guo <[email protected]> 2026-04-12 09:33 ` Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints SATYANARAYANA NARLAPURAM <[email protected]> @ 2026-04-15 00:51 ` Richard Guo <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: Richard Guo @ 2026-04-15 00:51 UTC (permalink / raw) To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: Pg Hackers <[email protected]> On Sun, Apr 12, 2026 at 6:33 PM SATYANARAYANA NARLAPURAM <[email protected]> wrote: > I tested this patch against the current HEAD (155c03ee) and it looks good. > Build & tests: Applies cleanly, compiles without warnings, all 247 regression tests > pass including the new subselect test case. Reproduced the bug before the patch > and verified it is fixed after the patch. Thanks for testing. > Looks like it needs to perform table_open/table_close multiple times depending upon > the number of output columns? I don't see it as a major concern but let others comment. I don't see it as a major concern either performance-wise. table_open here is just a hash lookup and refcount increment, and table_close is just a refcount decrement. And the call only happens in NOT IN cases, and only after sublink_testexpr_is_not_nullable has already returned true. So I've committed this patch. - Richard ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2026-04-15 00:51 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-04-10 08:48 var_is_nonnullable() fails to handle invalid NOT NULL constraints Richard Guo <[email protected]> 2026-04-12 09:33 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-04-15 00:51 ` Richard Guo <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox