public inbox for [email protected]
help / color / mirror / Atom feed[PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
11+ messages / 4 participants
[nested] [flat]
* [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
@ 2026-04-16 13:01 Chengpeng Yan <[email protected]>
2026-04-17 12:45 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
0 siblings, 2 replies; 11+ messages in thread
From: Chengpeng Yan @ 2026-04-16 13:01 UTC (permalink / raw)
To: pgsql-hackers; +Cc: David Rowley <[email protected]>
Hi,
ExecEvalHashedScalarArrayOp() produces results that are not semantically
equivalent to ExecEvalScalarArrayOp() when the LHS is NULL and the
comparison function is non-strict.
With the attached setup, the following two queries disagree:
select (a in
(0::myint,1::myint,2::myint,3::myint,4::myint,
5::myint,6::myint,7::myint,8::myint,9::myint)) is null
from inttest where a is null;
This takes the hashed SAOP path and returns false.
select (a in
(0::myint,1::myint,2::myint,3::myint,4::myint,5::myint)) is null
from inttest where a is null;
This stays on the linear path and returns true.
This only occurs when:
* the planner selects the hashed SAOP path,
* the LHS evaluates to NULL at execution time, and
* the comparison function is non-strict.
The root cause is that ExecEvalHashedScalarArrayOp() only special-cases
NULL LHS for strict functions, and otherwise proceeds to probe the hash
table. This is incorrect for non-strict functions, and can also result
in probing with an undefined Datum.
The first attached patch fixes this by bypassing hash probing when the
LHS is NULL and the comparator is non-strict, falling back to a linear
evaluation consistent with ExecEvalScalarArrayOp(). For NOT IN, only
non-NULL results are inverted.
The second patch is a cleanup that factors out the common array scan and
boolean reduction logic shared by ExecEvalScalarArrayOp() and the new
fallback path. No functional change intended.
The patches include regression tests using a custom type with a
non-strict, hashable “=” operator. A standalone SQL reproducer is also
attached.
--
Best regards,
Chengpeng Yan
Attachments:
[application/octet-stream] v1-0002-Factor-out-common-ScalarArrayOp-array-reduction-l.patch (6.9K, 3-v1-0002-Factor-out-common-ScalarArrayOp-array-reduction-l.patch)
download | inline diff:
From a617677bbd71acf3413d59220b1180f175cafdaa Mon Sep 17 00:00:00 2001
From: Chengpeng Yan <[email protected]>
Date: Mon, 13 Apr 2026 12:31:05 +0800
Subject: [PATCH v1 2/2] Factor out common ScalarArrayOp array reduction logic
ExecEvalScalarArrayOp() and the NULL-lhs fallback in
ExecEvalHashedScalarArrayOp() both need the same array-element scan and
boolean reduction logic. Move that logic into a shared helper so the
semantics live in one place.
No functional change intended.
---
src/backend/executor/execExprInterp.c | 148 +++++++++++---------------
1 file changed, 65 insertions(+), 83 deletions(-)
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 5437a8cd4b7..8f3d72d19e4 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -154,6 +154,12 @@ static TupleDesc get_cached_rowtype(Oid type_id, int32 typmod,
bool *changed);
static void ExecEvalRowNullInt(ExprState *state, ExprEvalStep *op,
ExprContext *econtext, bool checkisnull);
+static void ExecEvalArrayCompareReduce(FunctionCallInfo fcinfo,
+ ArrayType *arr, int16 typlen,
+ bool typbyval, char typalign,
+ bool useOr,
+ bool invert_nonnull_result,
+ Datum *result, bool *resultnull);
/* fast-path evaluation functions */
static Datum ExecJustInnerVar(ExprState *state, ExprContext *econtext, bool *isnull);
@@ -4031,13 +4037,6 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
int nitems;
Datum result;
bool resultnull;
- int16 typlen;
- bool typbyval;
- char typalign;
- uint8 typalignby;
- char *s;
- uint8 *bitmap;
- int bitmask;
/*
* If the array is NULL then we return NULL --- it's not very meaningful
@@ -4086,14 +4085,49 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
op->d.scalararrayop.element_type = ARR_ELEMTYPE(arr);
}
- typlen = op->d.scalararrayop.typlen;
- typbyval = op->d.scalararrayop.typbyval;
- typalign = op->d.scalararrayop.typalign;
- typalignby = typalign_to_alignby(typalign);
+ ExecEvalArrayCompareReduce(fcinfo,
+ arr,
+ op->d.scalararrayop.typlen,
+ op->d.scalararrayop.typbyval,
+ op->d.scalararrayop.typalign,
+ useOr,
+ false,
+ &result,
+ &resultnull);
+
+ *op->resvalue = result;
+ *op->resnull = resultnull;
+}
+
+/*
+ * Shared helper for ExecEvalScalarArrayOp() and the hashed NULL-lhs fallback.
+ * Callers must handle the scalar-NULL strict fast path before invoking this.
+ *
+ * The linear path passes the expression's comparison function and uses useOr
+ * to select OR or AND reduction. The hashed NULL-lhs fallback instead passes
+ * the equality function used by hash probes, so NOT IN inverts non-NULL
+ * results after reduction.
+ */
+static void
+ExecEvalArrayCompareReduce(FunctionCallInfo fcinfo,
+ ArrayType *arr, int16 typlen,
+ bool typbyval, char typalign,
+ bool useOr,
+ bool invert_nonnull_result,
+ Datum *result, bool *resultnull)
+{
+ uint8 typalignby = typalign_to_alignby(typalign);
+ int nitems;
+ char *s;
+ uint8 *bitmap;
+ int bitmask;
+ bool strictfunc = fcinfo->flinfo->fn_strict;
+
+ nitems = ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr));
/* Initialize result appropriately depending on useOr */
- result = BoolGetDatum(!useOr);
- resultnull = false;
+ *result = BoolGetDatum(!useOr);
+ *resultnull = false;
/* Loop over the array elements */
s = (char *) ARR_DATA_PTR(arr);
@@ -4129,18 +4163,18 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
else
{
fcinfo->isnull = false;
- thisresult = op->d.scalararrayop.fn_addr(fcinfo);
+ thisresult = fcinfo->flinfo->fn_addr(fcinfo);
}
/* Combine results per OR or AND semantics */
if (fcinfo->isnull)
- resultnull = true;
+ *resultnull = true;
else if (useOr)
{
if (DatumGetBool(thisresult))
{
- result = BoolGetDatum(true);
- resultnull = false;
+ *result = BoolGetDatum(true);
+ *resultnull = false;
break; /* needn't look at any more elements */
}
}
@@ -4148,8 +4182,8 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
{
if (!DatumGetBool(thisresult))
{
- result = BoolGetDatum(false);
- resultnull = false;
+ *result = BoolGetDatum(false);
+ *resultnull = false;
break; /* needn't look at any more elements */
}
}
@@ -4166,8 +4200,9 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
}
}
- *op->resvalue = result;
- *op->resnull = resultnull;
+ /* Apply the caller-requested NOT IN inversion, preserving NULL. */
+ if (invert_nonnull_result && !*resultnull)
+ *result = BoolGetDatum(!DatumGetBool(*result));
}
/*
@@ -4254,11 +4289,6 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
int16 typlen;
bool typbyval;
char typalign;
- uint8 typalignby;
- int nitems;
- char *s;
- uint8 *bitmap;
- int bitmask;
if (strictfunc)
{
@@ -4272,67 +4302,19 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
&typlen,
&typbyval,
&typalign);
- typalignby = typalign_to_alignby(typalign);
-
- nitems = ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr));
-
- /* Compute IN as an OR reduction of equality results. */
- result = BoolGetDatum(false);
- resultnull = false;
fcinfo->args[0].value = (Datum) 0;
fcinfo->args[0].isnull = true;
- s = (char *) ARR_DATA_PTR(arr);
- bitmap = ARR_NULLBITMAP(arr);
- bitmask = 1;
- for (int i = 0; i < nitems; i++)
- {
- Datum elt;
- Datum thisresult;
-
- /* Get array element, checking for NULL. */
- if (bitmap && (*bitmap & bitmask) == 0)
- {
- fcinfo->args[1].value = (Datum) 0;
- fcinfo->args[1].isnull = true;
- }
- else
- {
- elt = fetch_att(s, typbyval, typlen);
- s = att_addlength_pointer(s, typlen, s);
- s = (char *) att_nominal_alignby(s, typalignby);
- fcinfo->args[1].value = elt;
- fcinfo->args[1].isnull = false;
- }
-
- fcinfo->isnull = false;
- thisresult = op->d.hashedscalararrayop.finfo->fn_addr(fcinfo);
-
- if (fcinfo->isnull)
- resultnull = true;
- else if (DatumGetBool(thisresult))
- {
- result = BoolGetDatum(true);
- resultnull = false;
- break;
- }
-
- /* Advance bitmap pointer if any. */
- if (bitmap)
- {
- bitmask <<= 1;
- if (bitmask == 0x100)
- {
- bitmap++;
- bitmask = 1;
- }
- }
- }
-
- /* Reverse for NOT IN; NULL stays NULL. */
- if (!inclause && !resultnull)
- result = BoolGetDatum(!DatumGetBool(result));
+ ExecEvalArrayCompareReduce(fcinfo,
+ arr,
+ typlen,
+ typbyval,
+ typalign,
+ true,
+ !inclause,
+ &result,
+ &resultnull);
*op->resvalue = result;
*op->resnull = resultnull;
--
2.50.1 (Apple Git-155)
[application/octet-stream] v1-0001-Fix-hashed-ScalarArrayOp-NULL-handling-for-non-st.patch (6.5K, 4-v1-0001-Fix-hashed-ScalarArrayOp-NULL-handling-for-non-st.patch)
download | inline diff:
From 1b2c8bb9bf44534d50f565acc4ea0bec2786cfa2 Mon Sep 17 00:00:00 2001
From: Chengpeng Yan <[email protected]>
Date: Fri, 13 Feb 2026 13:11:20 +0800
Subject: [PATCH v1 1/2] Fix hashed ScalarArrayOp NULL handling for non-strict
operators
ExecEvalHashedScalarArrayOp() only short-circuits NULL lhs values for
strict operators. For non-strict operators, a NULL lhs could still reach
the hash probe path, diverging from ExecEvalScalarArrayOp() and returning
a non-NULL result where SQL's three-valued logic requires NULL. The hash
probe would also depend on a NULL lhs Datum payload, whose value is not
meaningful.
Fix this by bypassing the hash lookup for NULL lhs values with non-strict
operators and falling back to the same per-element reduction used by the
linear ScalarArrayOp path. Preserve the existing strict fast path and add
regression coverage for hashed and non-hashed IN / NOT IN cases.
---
src/backend/executor/execExprInterp.c | 95 ++++++++++++++++++++++-
src/test/regress/expected/expressions.out | 30 +++++++
src/test/regress/sql/expressions.sql | 11 +++
3 files changed, 132 insertions(+), 4 deletions(-)
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 3c4843cde86..5437a8cd4b7 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -4243,12 +4243,99 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
Assert(!*op->resnull);
/*
- * If the scalar is NULL, and the function is strict, return NULL; no
- * point in executing the search.
+ * If the scalar is NULL, we can only use the hash table with strict
+ * functions. For non-strict functions we must evaluate each element to
+ * preserve SQL's three-valued logic; probing the hash table would also
+ * depend on the NULL lhs Datum payload, whose value is not meaningful.
*/
- if (fcinfo->args[0].isnull && strictfunc)
+ if (scalar_isnull)
{
- *op->resnull = true;
+ ArrayType *arr;
+ int16 typlen;
+ bool typbyval;
+ char typalign;
+ uint8 typalignby;
+ int nitems;
+ char *s;
+ uint8 *bitmap;
+ int bitmask;
+
+ if (strictfunc)
+ {
+ *op->resnull = true;
+ return;
+ }
+
+ arr = DatumGetArrayTypeP(*op->resvalue);
+
+ get_typlenbyvalalign(ARR_ELEMTYPE(arr),
+ &typlen,
+ &typbyval,
+ &typalign);
+ typalignby = typalign_to_alignby(typalign);
+
+ nitems = ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr));
+
+ /* Compute IN as an OR reduction of equality results. */
+ result = BoolGetDatum(false);
+ resultnull = false;
+
+ fcinfo->args[0].value = (Datum) 0;
+ fcinfo->args[0].isnull = true;
+
+ s = (char *) ARR_DATA_PTR(arr);
+ bitmap = ARR_NULLBITMAP(arr);
+ bitmask = 1;
+ for (int i = 0; i < nitems; i++)
+ {
+ Datum elt;
+ Datum thisresult;
+
+ /* Get array element, checking for NULL. */
+ if (bitmap && (*bitmap & bitmask) == 0)
+ {
+ fcinfo->args[1].value = (Datum) 0;
+ fcinfo->args[1].isnull = true;
+ }
+ else
+ {
+ elt = fetch_att(s, typbyval, typlen);
+ s = att_addlength_pointer(s, typlen, s);
+ s = (char *) att_nominal_alignby(s, typalignby);
+ fcinfo->args[1].value = elt;
+ fcinfo->args[1].isnull = false;
+ }
+
+ fcinfo->isnull = false;
+ thisresult = op->d.hashedscalararrayop.finfo->fn_addr(fcinfo);
+
+ if (fcinfo->isnull)
+ resultnull = true;
+ else if (DatumGetBool(thisresult))
+ {
+ result = BoolGetDatum(true);
+ resultnull = false;
+ break;
+ }
+
+ /* Advance bitmap pointer if any. */
+ if (bitmap)
+ {
+ bitmask <<= 1;
+ if (bitmask == 0x100)
+ {
+ bitmap++;
+ bitmask = 1;
+ }
+ }
+ }
+
+ /* Reverse for NOT IN; NULL stays NULL. */
+ if (!inclause && !resultnull)
+ result = BoolGetDatum(!DatumGetBool(result));
+
+ *op->resvalue = result;
+ *op->resnull = resultnull;
return;
}
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 9a3c97b15a3..5ce0c0d11de 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -388,6 +388,36 @@ default for type myint using hash as
function 1 myinthash(myint);
create table inttest (a myint);
insert into inttest values(1::myint),(null);
+-- scalar NULL with a non-strict operator still requires per-element checks.
+-- With no NULLs on the right side, IN and NOT IN should both yield NULL.
+select (a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint)) is null
+from inttest where a is null;
+ ?column?
+----------
+ t
+(1 row)
+
+select (a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint)) is null
+from inttest where a is null;
+ ?column?
+----------
+ t
+(1 row)
+
+select (a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint)) is null
+from inttest where a is null;
+ ?column?
+----------
+ t
+(1 row)
+
+select (a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint)) is null
+from inttest where a is null;
+ ?column?
+----------
+ t
+(1 row)
+
-- try an array with enough elements to cause hashing
select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
a
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index e02c21f3368..33af3a17128 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -198,6 +198,17 @@ default for type myint using hash as
create table inttest (a myint);
insert into inttest values(1::myint),(null);
+-- scalar NULL with a non-strict operator still requires per-element checks.
+-- With no NULLs on the right side, IN and NOT IN should both yield NULL.
+select (a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint)) is null
+from inttest where a is null;
+select (a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint)) is null
+from inttest where a is null;
+select (a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint)) is null
+from inttest where a is null;
+select (a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint)) is null
+from inttest where a is null;
+
-- try an array with enough elements to cause hashing
select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
--
2.50.1 (Apple Git-155)
[application/octet-stream] repro.sql (2.3K, 5-repro.sql)
download
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
@ 2026-04-17 12:45 ` =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
1 sibling, 0 replies; 11+ messages in thread
From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-04-17 12:45 UTC (permalink / raw)
To: =?utf-8?B?Q2hlbmdwZW5nIFlhbg==?= <[email protected]>; pgsql-hackers; +Cc: =?utf-8?B?RGF2aWQgUm93bGV5?= <[email protected]>
Hi Chengpeng,
Thanks for your report. After reading the code, I find that there is still
an issue even if the LHS is not null and the comparator is strict (make
myinteq always return null, see the attached sql for details).
The main reason is that we did not consider whether the comparator's
return value is null during looking up the hash. Not sure whether this
is a big problem.
--
Regards,
ChangAo Chen
Attachments:
[application/octet-stream] repro.sql (1.3K, 2-repro.sql)
download
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
@ 2026-04-19 13:06 ` David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
1 sibling, 1 reply; 11+ messages in thread
From: David Rowley @ 2026-04-19 13:06 UTC (permalink / raw)
To: Chengpeng Yan <[email protected]>; +Cc: pgsql-hackers
On Fri, 17 Apr 2026 at 01:01, Chengpeng Yan <[email protected]> wrote:
> The first attached patch fixes this by bypassing hash probing when the
> LHS is NULL and the comparator is non-strict, falling back to a linear
> evaluation consistent with ExecEvalScalarArrayOp(). For NOT IN, only
> non-NULL results are inverted.
Thanks for the bug report.
I don't think we need to fallback on a linear search. If the
non-strict function returns false for NULL = NULL, then as far as I
can see, we can still get the correct result by checking if the hash
table contains any other members. What I'm not certain of is if a
non-strict function must return NULL for NULL = non-NULL. If yes, then
we could just do it as the attached patch. I made this check the hash
table to see if it has non-NULL Datums hashed. This means something
like "WHERE NULL IN (NULL, 1)" for a non-strict function returning
false for NULL = NULL and NULL for NULL = 1 would evaluate the same as
"WHERE false OR NULL", which is NULL. Whereas, "WHERE NULL IN(NULL)"
would be "false".
If we need to assume the non-strict function could return false on
NULL = non-NULL, then we could test for that when inserting the first
datum into the hash table and store the behaviour in the expression.
It may also be worth doing that check for NULL = NULL so that we don't
need to call the equals function every time we see a NULL.
I'll need to dig a bit deeper to see if we've written down any rules
about non-strict equality functions anywhere...
David
Attachments:
[application/octet-stream] v2-0001-Fix-incorrect-logic-for-hashed-IN-NOT-IN-with-non.patch (14.0K, 2-v2-0001-Fix-incorrect-logic-for-hashed-IN-NOT-IN-with-non.patch)
download | inline diff:
From df834d3dca34a0ce3146d87108301b74c9e306c9 Mon Sep 17 00:00:00 2001
From: David Rowley <[email protected]>
Date: Sun, 19 Apr 2026 15:24:39 +1200
Subject: [PATCH v2] Fix incorrect logic for hashed IN / NOT IN with non-strict
operators
This fixes the incorrect logic for hashed IN and NOT IN clauses which
triggers when there are 9 or more constant elements in the clause and
the datatype supports hashing.
When the value being looked up was NULL, in some cases it was possible
that the hashed version evaluated to false rather than NULL, and in much
more rare cases, true instead of NULL. The former technically wouldn't
cause an issue for an IN or NOT IN in a WHERE clause as false and NULL
mean the same thing there, however, the latter could cause rows to match
the WHERE clause that shouldn't.
Since all built-in operators are strict, this could only affect custom
types.
Reported-by: Chengpeng Yan <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/backend/executor/execExprInterp.c | 109 ++++++++++-------
src/test/regress/expected/expressions.out | 141 ++++++++++++++++------
src/test/regress/sql/expressions.sql | 75 ++++++++++--
3 files changed, 236 insertions(+), 89 deletions(-)
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 3c4843cde86..ff561df4e1f 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -4246,9 +4246,10 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
* If the scalar is NULL, and the function is strict, return NULL; no
* point in executing the search.
*/
- if (fcinfo->args[0].isnull && strictfunc)
+ if (scalar_isnull && strictfunc)
{
*op->resnull = true;
+ *op->resvalue = (Datum) 0;
return;
}
@@ -4348,59 +4349,79 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
op->d.hashedscalararrayop.has_nulls = has_nulls;
}
- /* Check the hash to see if we have a match. */
- hashfound = NULL != saophash_lookup(elements_tab->hashtab, scalar);
-
- /* the result depends on if the clause is an IN or NOT IN clause */
- if (inclause)
- result = BoolGetDatum(hashfound); /* IN */
- else
- result = BoolGetDatum(!hashfound); /* NOT IN */
-
- resultnull = false;
-
- /*
- * If we didn't find a match in the array, we still might need to handle
- * the possibility of null values. We didn't put any NULLs into the
- * hashtable, but instead marked if we found any when building the table
- * in has_nulls.
- */
- if (!hashfound && op->d.hashedscalararrayop.has_nulls)
+ if (!scalar_isnull)
{
- if (strictfunc)
- {
+ /* perform hash lookup on the non-NULL value */
+ hashfound = NULL != saophash_lookup(elements_tab->hashtab, scalar);
- /*
- * We have nulls in the array so a non-null lhs and no match must
- * yield NULL.
- */
+ /*
+ * If we didn't find the value and the array contains a NULL, the
+ * result is NULL
+ */
+ if (!hashfound && op->d.hashedscalararrayop.has_nulls)
+ {
result = (Datum) 0;
resultnull = true;
}
else
{
- /*
- * Execute function will null rhs just once.
- *
- * The hash lookup path will have scribbled on the lhs argument so
- * we need to set it up also (even though we entered this function
- * with it already set).
- */
- fcinfo->args[0].value = scalar;
- fcinfo->args[0].isnull = scalar_isnull;
- fcinfo->args[1].value = (Datum) 0;
- fcinfo->args[1].isnull = true;
+ /* the result depends on if the clause is an IN or NOT IN clause */
+ if (inclause)
+ result = BoolGetDatum(hashfound); /* IN */
+ else
+ result = BoolGetDatum(!hashfound); /* NOT IN */
- result = op->d.hashedscalararrayop.finfo->fn_addr(fcinfo);
- resultnull = fcinfo->isnull;
+ resultnull = false;
+ }
+ }
+ else if (op->d.hashedscalararrayop.has_nulls)
+ {
+ /*
+ * The short-circuit earlier means we should never get here with a
+ * NULL LHS and a strict function.
+ */
+ Assert(!strictfunc);
- /*
- * Reverse the result for NOT IN clauses since the above function
- * is the equality function and we need not-equals.
- */
- if (!inclause)
- result = BoolGetDatum(!DatumGetBool(result));
+ /*
+ * We have non-strict operator, a NULL LHS and the array contains a
+ * NULL. Call the function to see if it classes NULLs as equal.
+ */
+ fcinfo->args[0].value = (Datum) 0;
+ fcinfo->args[0].isnull = true;
+ fcinfo->args[1].value = (Datum) 0;
+ fcinfo->args[1].isnull = true;
+
+ result = op->d.hashedscalararrayop.finfo->fn_addr(fcinfo);
+ resultnull = fcinfo->isnull;
+
+ /*
+ * If the non-strict function returned false for NULL = NULL then we
+ * must still consider other array members. If any non-NULL elements
+ * exists then NULL = any-non-NULL must yield NULL even with a
+ * non-strict function. Here we're emulating "FALSE OR NULL" for IN
+ * and "TRUE AND NULL" for NOT IN, both of which yield NULL. We don't
+ * need to handle NOT IN specifically here as we reverse the result of
+ * that below.
+ */
+ if (!resultnull && !DatumGetBool(result) &&
+ op->d.hashedscalararrayop.elements_tab->hashtab->members > 0)
+ {
+ result = (Datum) 0;
+ resultnull = true;
}
+
+ /*
+ * Reverse the result for NOT IN clauses since the above function is
+ * the equality function and we need not-equals.
+ */
+ else if (!inclause)
+ result = BoolGetDatum(!DatumGetBool(result));
+ }
+ else
+ {
+ /* NULL LHS and array has no NULLs. NULL result */
+ result = (Datum) 0;
+ resultnull = true;
}
*op->resvalue = result;
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 9a3c97b15a3..cbfa3f5a1cc 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -387,42 +387,115 @@ default for type myint using hash as
operator 1 = (myint, myint),
function 1 myinthash(myint);
create table inttest (a myint);
-insert into inttest values(1::myint),(null);
--- try an array with enough elements to cause hashing
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
- 1
-
-(2 rows)
+insert into inttest values (1::myint), (2::myint), (null);
+-- Test EEOP_HASHED_SCALARARRAYOP against EEOP_SCALARARRAYOP. Ensure the
+-- result of non-hashed vs hashed is the same.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | t | t
+ 2 | t | t
+ | |
+(3 rows)
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
-(0 rows)
-
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
-(0 rows)
-
--- ensure the result matched with the non-hashed version. We simply remove
--- some array elements so that we don't reach the hashing threshold.
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
- 1
-
-(2 rows)
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | t | t
+ | t | t
+(3 rows)
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
-(0 rows)
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | f | f
+ 2 | f | f
+ | |
+(3 rows)
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | f | f
+ | f | f
+(3 rows)
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
-(0 rows)
+-- Now make the equal function return false when given two NULLs
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+-- And try the same again to ensure EEOP_HASHED_SCALARARRAYOP does the same
+-- thing as EEOP_SCALARARRAYOP.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | t | t
+ 2 | t | t
+ | |
+(3 rows)
+
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | t | t
+ | |
+(3 rows)
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | f | f
+ 2 | f | f
+ | |
+(3 rows)
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | f | f
+ | |
+(3 rows)
rollback;
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index e02c21f3368..9c362b10365 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -196,16 +196,69 @@ default for type myint using hash as
function 1 myinthash(myint);
create table inttest (a myint);
-insert into inttest values(1::myint),(null);
-
--- try an array with enough elements to cause hashing
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
--- ensure the result matched with the non-hashed version. We simply remove
--- some array elements so that we don't reach the hashing threshold.
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint, null);
+insert into inttest values (1::myint), (2::myint), (null);
+
+-- Test EEOP_HASHED_SCALARARRAYOP against EEOP_SCALARARRAYOP. Ensure the
+-- result of non-hashed vs hashed is the same.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+-- Now make the equal function return false when given two NULLs
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+
+-- And try the same again to ensure EEOP_HASHED_SCALARARRAYOP does the same
+-- thing as EEOP_SCALARARRAYOP.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
rollback;
--
2.51.0
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
@ 2026-04-20 03:46 ` David Rowley <[email protected]>
2026-04-20 04:14 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Tom Lane <[email protected]>
2026-04-20 06:17 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
0 siblings, 2 replies; 11+ messages in thread
From: David Rowley @ 2026-04-20 03:46 UTC (permalink / raw)
To: Chengpeng Yan <[email protected]>; +Cc: pgsql-hackers
On Mon, 20 Apr 2026 at 01:06, David Rowley <[email protected]> wrote:
> If we need to assume the non-strict function could return false on
> NULL = non-NULL, then we could test for that when inserting the first
> datum into the hash table and store the behaviour in the expression.
> It may also be worth doing that check for NULL = NULL so that we don't
> need to call the equals function every time we see a NULL.
>
> I'll need to dig a bit deeper to see if we've written down any rules
> about non-strict equality functions anywhere...
Of course, it is possible to make the strict function do that, and
non-hashed IN / NOT IN handles it, so the hashed version shouldn't
have an excuse to not do the right thing.
I've attached a version that "probes" the equality function for its
NULL = NULL behaviour and its NULL = non-NULL behaviour and returns
whatever the result of the probe was at the appropriate time.
What I came up with does feel quite elaborate, so I'd quite like a 2nd opinion.
The patch does assume that the non-strict function will return the
same thing for NULL = non-NULL as it will for non-NULL = NULL.
Technically, if you coded the function to do something different
there, the hashed vs non-hashed could differ in their result. My
thoughts there, if someone is expecting anything sane out of such an
equality function, then they're probably going to be disappointed due
to various other optimisations we have.
David
Attachments:
[application/octet-stream] v3-0001-Fix-incorrect-logic-for-hashed-IN-NOT-IN-with-non.patch (20.5K, 2-v3-0001-Fix-incorrect-logic-for-hashed-IN-NOT-IN-with-non.patch)
download | inline diff:
From 35c84ffbbc0fe7b09c7edea7570ff21963821c07 Mon Sep 17 00:00:00 2001
From: David Rowley <[email protected]>
Date: Sun, 19 Apr 2026 15:24:39 +1200
Subject: [PATCH v3] Fix incorrect logic for hashed IN / NOT IN with non-strict
operators
This fixes the incorrect logic for hashed IN and NOT IN clauses which
triggers when there are 9 or more constant elements in the clause and
the datatype supports hashing.
When the value being looked up was NULL, in some cases it was possible
that the hashed version evaluated to false rather than NULL, and in much
more rare cases, true instead of NULL. The former technically wouldn't
cause an issue for an IN or NOT IN in a WHERE clause as false and NULL
mean the same thing there, however, the latter could cause rows to match
the WHERE clause that shouldn't.
Since all built-in operators are strict, this could only affect custom
types.
Reported-by: Chengpeng Yan <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/backend/executor/execExprInterp.c | 161 ++++++++++++-----
src/include/executor/execExpr.h | 4 +
src/test/regress/expected/expressions.out | 203 ++++++++++++++++++----
src/test/regress/sql/expressions.sql | 114 ++++++++++--
4 files changed, 395 insertions(+), 87 deletions(-)
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 3c4843cde86..d7c61b5bd06 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -4246,9 +4246,10 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
* If the scalar is NULL, and the function is strict, return NULL; no
* point in executing the search.
*/
- if (fcinfo->args[0].isnull && strictfunc)
+ if (scalar_isnull && strictfunc)
{
*op->resnull = true;
+ *op->resvalue = (Datum) 0;
return;
}
@@ -4267,6 +4268,7 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
int bitmask;
MemoryContext oldcontext;
ArrayType *arr;
+ bool checked_null_eq_nonnull = strictfunc;
saop = op->d.hashedscalararrayop.saop;
@@ -4327,6 +4329,23 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
s = (char *) att_nominal_alignby(s, typalignby);
saophash_insert(elements_tab->hashtab, element, &hashfound);
+
+ /*
+ * For non-strict functions, check what NULL = non-NULL does.
+ * We do assume that this will be the same for every Datum
+ * value.
+ */
+ if (!checked_null_eq_nonnull)
+ {
+ fcinfo->args[0].value = (Datum) 0;
+ fcinfo->args[0].isnull = true;
+ fcinfo->args[1].value = element;
+ fcinfo->args[1].isnull = false;
+
+ op->d.hashedscalararrayop.null_eq_nonnull = DatumGetBool(op->d.hashedscalararrayop.finfo->fn_addr(fcinfo));
+ op->d.hashedscalararrayop.null_eq_nonnull_isnull = fcinfo->isnull;
+ checked_null_eq_nonnull = true;
+ }
}
/* Advance bitmap pointer if any. */
@@ -4346,61 +4365,119 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
* non-strict functions with a null lhs value if no match is found.
*/
op->d.hashedscalararrayop.has_nulls = has_nulls;
- }
- /* Check the hash to see if we have a match. */
- hashfound = NULL != saophash_lookup(elements_tab->hashtab, scalar);
+ if (has_nulls && !strictfunc)
+ {
+ fcinfo->args[0].value = (Datum) 0;
+ fcinfo->args[0].isnull = true;
+ fcinfo->args[1].value = (Datum) 0;
+ fcinfo->args[1].isnull = true;
- /* the result depends on if the clause is an IN or NOT IN clause */
- if (inclause)
- result = BoolGetDatum(hashfound); /* IN */
- else
- result = BoolGetDatum(!hashfound); /* NOT IN */
+ op->d.hashedscalararrayop.null_eq_null = DatumGetBool(op->d.hashedscalararrayop.finfo->fn_addr(fcinfo));
+ op->d.hashedscalararrayop.null_eq_null_isnull = fcinfo->isnull;
- resultnull = false;
+ }
+ }
- /*
- * If we didn't find a match in the array, we still might need to handle
- * the possibility of null values. We didn't put any NULLs into the
- * hashtable, but instead marked if we found any when building the table
- * in has_nulls.
- */
- if (!hashfound && op->d.hashedscalararrayop.has_nulls)
+ if (likely(!scalar_isnull))
{
- if (strictfunc)
- {
+ /* perform hash lookup on the non-NULL value */
+ hashfound = NULL != saophash_lookup(elements_tab->hashtab, scalar);
+ /*
+ * If we didn't find the value and the array contains a NULL, the
+ * result is NULL
+ */
+ if (!hashfound && op->d.hashedscalararrayop.has_nulls)
+ {
/*
- * We have nulls in the array so a non-null lhs and no match must
- * yield NULL.
+ * For strict functions, a failed lookup for arrays containing a
+ * NULL means a NULL result. For non-strict it depends on if the
+ * array contains any non-null values and what the strict function
+ * results for non-NULL = NULL.
*/
- result = (Datum) 0;
- resultnull = true;
+ if (likely(strictfunc) ||
+ op->d.hashedscalararrayop.elements_tab->hashtab->members == 0)
+ {
+ result = (Datum) 0;
+ resultnull = true;
+ }
+ else
+ {
+ /*
+ * Use the cached value for NULL = non-NULL. Let's just
+ * assume this is the same as non-NULL = NULL. Any non-strict
+ * function where that doesn't hold true will often result in
+ * surprising behavior.
+ */
+ result = BoolGetDatum(op->d.hashedscalararrayop.null_eq_nonnull);
+ resultnull = op->d.hashedscalararrayop.null_eq_nonnull_isnull;
+
+ /* Reverse any non-null result for NOT IN clauses */
+ if (!resultnull && !inclause)
+ result = BoolGetDatum(!DatumGetBool(result));
+ }
}
else
{
- /*
- * Execute function will null rhs just once.
- *
- * The hash lookup path will have scribbled on the lhs argument so
- * we need to set it up also (even though we entered this function
- * with it already set).
- */
- fcinfo->args[0].value = scalar;
- fcinfo->args[0].isnull = scalar_isnull;
- fcinfo->args[1].value = (Datum) 0;
- fcinfo->args[1].isnull = true;
+ /* the result depends on if the clause is an IN or NOT IN clause */
+ if (inclause)
+ result = BoolGetDatum(hashfound); /* IN */
+ else
+ result = BoolGetDatum(!hashfound); /* NOT IN */
- result = op->d.hashedscalararrayop.finfo->fn_addr(fcinfo);
- resultnull = fcinfo->isnull;
+ resultnull = false;
+ }
+ }
+ else if (op->d.hashedscalararrayop.has_nulls)
+ {
+ /*
+ * The short-circuit earlier means we should never get here with a
+ * NULL LHS and a strict function.
+ */
+ Assert(!strictfunc);
- /*
- * Reverse the result for NOT IN clauses since the above function
- * is the equality function and we need not-equals.
- */
- if (!inclause)
- result = BoolGetDatum(!DatumGetBool(result));
+ result = BoolGetDatum(op->d.hashedscalararrayop.null_eq_null);
+ resultnull = op->d.hashedscalararrayop.null_eq_null_isnull;
+
+ /*
+ * If the non-strict function returned false for NULL = NULL then we
+ * must still consider other array members. What we return here must
+ * depend on both if there are any non-nulls in the array and what the
+ * function returns for null = non-null. If null = non-null yields
+ * NULL then that's what we return so that we behave the same way as
+ * EEOP_SCALARARRAYOP.
+ */
+ if (!resultnull && !DatumGetBool(result) &&
+ op->d.hashedscalararrayop.null_eq_nonnull_isnull &&
+ op->d.hashedscalararrayop.elements_tab->hashtab->members > 0)
+ {
+ result = (Datum) 0;
+ resultnull = true;
}
+
+ /*
+ * Reverse the result for NOT IN clauses since the above function is
+ * the equality function and we need not-equals.
+ */
+ else if (!inclause)
+ result = BoolGetDatum(!DatumGetBool(result));
+ }
+ else
+ {
+ Assert(!strictfunc);
+
+ /*
+ * NULL LHS and array has no NULLs. Use the value that we cached from
+ * the test call of the function we did when inserting the first
+ * non-null into the hash table above.
+ */
+ result = op->d.hashedscalararrayop.null_eq_nonnull;
+ resultnull = op->d.hashedscalararrayop.null_eq_nonnull_isnull;
+
+ /* Reverse any non-null result for NOT IN clauses */
+ if (!resultnull && !inclause)
+ result = BoolGetDatum(!DatumGetBool(result));
}
*op->resvalue = result;
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index aa9b361fa31..56e6bb5d239 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -641,6 +641,10 @@ typedef struct ExprEvalStep
{
bool has_nulls;
bool inclause; /* true for IN and false for NOT IN */
+ bool null_eq_nonnull; /* non strict func null behavior */
+ bool null_eq_nonnull_isnull; /* the isnull for the above */
+ bool null_eq_null; /* non strict func null behavior */
+ bool null_eq_null_isnull; /* the isnull for the above */
struct ScalarArrayOpExprHashTable *elements_tab;
FmgrInfo *finfo; /* function's lookup data */
FunctionCallInfo fcinfo_data; /* arguments etc */
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 9a3c97b15a3..57616a02c3c 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -387,42 +387,177 @@ default for type myint using hash as
operator 1 = (myint, myint),
function 1 myinthash(myint);
create table inttest (a myint);
-insert into inttest values(1::myint),(null);
--- try an array with enough elements to cause hashing
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
- 1
-
-(2 rows)
+insert into inttest values (1::myint), (2::myint), (null);
+-- Test EEOP_HASHED_SCALARARRAYOP against EEOP_SCALARARRAYOP. Ensure the
+-- result of non-hashed vs hashed is the same.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | t | t
+ 2 | t | t
+ | |
+(3 rows)
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
-(0 rows)
-
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
-(0 rows)
-
--- ensure the result matched with the non-hashed version. We simply remove
--- some array elements so that we don't reach the hashing threshold.
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
- 1
-
-(2 rows)
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | t | t
+ | t | t
+(3 rows)
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | f | f
+ 2 | f | f
+ | |
+(3 rows)
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | f | f
+ | f | f
+(3 rows)
+
+-- Now make the equal function return false when given two NULLs
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+-- And try the same again to ensure EEOP_HASHED_SCALARARRAYOP does the same
+-- thing as EEOP_SCALARARRAYOP.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | t | t
+ 2 | t | t
+ | |
+(3 rows)
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
-(0 rows)
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | t | t
+ | |
+(3 rows)
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | f | f
+ 2 | f | f
+ | |
+(3 rows)
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | |
+ 2 | f | f
+ | |
+(3 rows)
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
-(0 rows)
+-- Try one more time with something even more bizarre
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ elsif $1 is null and $2 is not null then
+ return false;
+ elsif $1 is not null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | t | t
+ 2 | t | t
+ | f | f
+(3 rows)
+
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | f | f
+ 2 | t | t
+ | f | f
+(3 rows)
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | f | f
+ 2 | f | f
+ | t | t
+(3 rows)
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ 1 | t | t
+ 2 | f | f
+ | t | t
+(3 rows)
rollback;
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index e02c21f3368..e48d6b3184a 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -196,16 +196,108 @@ default for type myint using hash as
function 1 myinthash(myint);
create table inttest (a myint);
-insert into inttest values(1::myint),(null);
-
--- try an array with enough elements to cause hashing
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
--- ensure the result matched with the non-hashed version. We simply remove
--- some array elements so that we don't reach the hashing threshold.
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint, null);
+insert into inttest values (1::myint), (2::myint), (null);
+
+-- Test EEOP_HASHED_SCALARARRAYOP against EEOP_SCALARARRAYOP. Ensure the
+-- result of non-hashed vs hashed is the same.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+-- Now make the equal function return false when given two NULLs
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+
+-- And try the same again to ensure EEOP_HASHED_SCALARARRAYOP does the same
+-- thing as EEOP_SCALARARRAYOP.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+-- Try one more time with something even more bizarre
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ elsif $1 is null and $2 is not null then
+ return false;
+ elsif $1 is not null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (null::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
rollback;
--
2.51.0
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
@ 2026-04-20 04:14 ` Tom Lane <[email protected]>
1 sibling, 0 replies; 11+ messages in thread
From: Tom Lane @ 2026-04-20 04:14 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: Chengpeng Yan <[email protected]>; pgsql-hackers
David Rowley <[email protected]> writes:
> I've attached a version that "probes" the equality function for its
> NULL = NULL behaviour and its NULL = non-NULL behaviour and returns
> whatever the result of the probe was at the appropriate time.
> What I came up with does feel quite elaborate, so I'd quite like a 2nd opinion.
> The patch does assume that the non-strict function will return the
> same thing for NULL = non-NULL as it will for non-NULL = NULL.
Meh. I think we assume that hashable equality functions satisfy the
symmetric law, ie A = B if and only if B = A, so that part is fine.
However, I do not care for the assumption that any random non-null
input will produce the same answer. As a quick counter-example,
consider a text-like datatype that tries to emulate Oracle's semantics
that an empty string is the same as NULL. Your code would arrive at
different results depending on whether the first non-null input
chanced to be an empty string.
(I've not read this whole thread, so I don't have a global opinion
on what we ought to do here. I suspect it's a tricky subject.)
regards, tom lane
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
@ 2026-04-20 06:17 ` Chengpeng Yan <[email protected]>
2026-04-22 23:33 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
1 sibling, 1 reply; 11+ messages in thread
From: Chengpeng Yan @ 2026-04-20 06:17 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: pgsql-hackers; cca5507 <[email protected]>; Tom Lane <[email protected]>
> On Apr 20, 2026, at 11:46, David Rowley <[email protected]> wrote:
>
> Of course, it is possible to make the strict function do that, and
> non-hashed IN / NOT IN handles it, so the hashed version shouldn't
> have an excuse to not do the right thing.
>
> I've attached a version that "probes" the equality function for its
> NULL = NULL behaviour and its NULL = non-NULL behaviour and returns
> whatever the result of the probe was at the appropriate time.
>
> What I came up with does feel quite elaborate, so I'd quite like a 2nd opinion.
>
> The patch does assume that the non-strict function will return the
> same thing for NULL = non-NULL as it will for non-NULL = NULL.
> Technically, if you coded the function to do something different
> there, the hashed vs non-hashed could differ in their result. My
> thoughts there, if someone is expecting anything sane out of such an
> equality function, then they're probably going to be disappointed due
> to various other optimisations we have.
Hi,
Thanks for the discussion.
I agree with Tom's concern that it does not seem safe to generalize from
NULL = first-non-NULL to all non-NULL values. Unless I am missing one, I
do not know of a planner/executor-visible contract that would justify
that assumption.
For the original NULL-LHS bug, a linear fallback still seems like the
safest baseline fix to me. It is conservative, but it matches
ExecEvalScalarArrayOp() without adding extra assumptions. The obvious
downside is performance, although this path only triggers when the
runtime LHS is NULL and the comparator is non-strict. It may also be
possible to cache the NULL-LHS outcome once per expression, since the
RHS array is constant in the hashed SAOP case, which might help reduce
the cost of that fallback.
ChangAo's example also seems to expose a separate correctness issue. If
the comparator can return NULL even for non-NULL inputs, then a lookup
hit seems sufficient, but a miss is no longer enough to distinguish
FALSE for IN / TRUE for NOT IN from NULL.
A conservative fix there would again be a linear fallback after miss,
which should recover the right semantics, but that case does seem much
more performance-sensitive.
So I would be interested to hear what people think about both cases,
especially if there is a better way to preserve correctness without
paying the full linear-fallback cost.
--
Best regards,
Chengpeng Yan
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 06:17 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
@ 2026-04-22 23:33 ` David Rowley <[email protected]>
2026-04-23 04:31 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
0 siblings, 1 reply; 11+ messages in thread
From: David Rowley @ 2026-04-22 23:33 UTC (permalink / raw)
To: Chengpeng Yan <[email protected]>; +Cc: pgsql-hackers; cca5507 <[email protected]>; Tom Lane <[email protected]>
On Mon, 20 Apr 2026 at 18:17, Chengpeng Yan <[email protected]> wrote:
> It may also be
> possible to cache the NULL-LHS outcome once per expression, since the
> RHS array is constant in the hashed SAOP case, which might help reduce
> the cost of that fallback.
Yeah, it doesn't make sense to repeatedly perform a linear search over
the array to check if NULL matches anything in the array. Let's just
do that once when we build the hash table and reuse that cached value
whenever we see a NULL. We can skip that step with strict functions
since we'll short-circuit earlier.
A patch for that is attached.
> ChangAo's example also seems to expose a separate correctness issue. If
> the comparator can return NULL even for non-NULL inputs, then a lookup
> hit seems sufficient, but a miss is no longer enough to distinguish
> FALSE for IN / TRUE for NOT IN from NULL.
IMO it's unrealistic to assume we can do anything sane with an
equality function that always returns NULL.
> A conservative fix there would again be a linear fallback after miss,
> which should recover the right semantics, but that case does seem much
> more performance-sensitive.
I really doubt it's worth troubling over that. If we did want to do
something, then it would be more efficient to probe the hash table
directly after we insert a Datum and verify we can find it again. If
we can't find any value we just inserted, mark the entire table as
broken and have it so we check for that and do a linear search.
David
Attachments:
[application/octet-stream] v3-0001-Fix-incorrect-logic-for-hashed-IN-NOT-IN-with-non.patch (20.6K, 2-v3-0001-Fix-incorrect-logic-for-hashed-IN-NOT-IN-with-non.patch)
download | inline diff:
From 535bff835ed25a10bddac1743107164a6bc7777b Mon Sep 17 00:00:00 2001
From: David Rowley <[email protected]>
Date: Thu, 23 Apr 2026 11:29:49 +1200
Subject: [PATCH v3] Fix incorrect logic for hashed IN / NOT IN with non-strict
operators
ExecEvalHashedScalarArrayOp(), when using a strict equality function,
performs a short-circuit when looking up NULL values. When the function
is non-strict, the code incorrectly looked up the hash table for a
zero-valued Datum, which could have resulted in an accidental true
return if the hash table contained zero valued Datum, or could result
in a crash for non-byval types.
Here we fix this by adding an extra step when we build the hash table to
check what the result of a NULL lookup would be. This requires looping
over the array and checking what the non-hashed version of the code
would do. We cache the results of that in the expression so that we can
reuse the result any time we're asked to search for a NULL value.
It's important to note that non-strict equality functions are free to
treat any NULL value as equal to any non-NULL value. For example,
someone may wish to design a type that treats an empty string and NULL
as equal.
All built-in types have strict equality functions, so this could affect
custom / user-defined types.
Author: Chengpeng Yan <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/backend/executor/execExprInterp.c | 115 +++++++++---
src/include/executor/execExpr.h | 4 +
src/test/regress/expected/expressions.out | 203 ++++++++++++++++++----
src/test/regress/sql/expressions.sql | 114 ++++++++++--
4 files changed, 368 insertions(+), 68 deletions(-)
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 3c4843cde86..efc91097b11 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -178,6 +178,14 @@ static Datum ExecJustHashInnerVarVirt(ExprState *state, ExprContext *econtext, b
static Datum ExecJustHashOuterVarStrict(ExprState *state, ExprContext *econtext, bool *isnull);
/* execution helper functions */
+static pg_attribute_always_inline void ExecEvalArrayCompareInternal(FunctionCallInfo fcinfo,
+ ArrayType *arr,
+ int16 typlen,
+ bool typbyval,
+ char typalign,
+ bool useOr,
+ Datum *result,
+ bool *resultnull);
static pg_attribute_always_inline void ExecAggPlainTransByVal(AggState *aggstate,
AggStatePerTrans pertrans,
AggStatePerGroup pergroup,
@@ -4031,13 +4039,6 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
int nitems;
Datum result;
bool resultnull;
- int16 typlen;
- bool typbyval;
- char typalign;
- uint8 typalignby;
- char *s;
- uint8 *bitmap;
- int bitmask;
/*
* If the array is NULL then we return NULL --- it's not very meaningful
@@ -4086,14 +4087,43 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
op->d.scalararrayop.element_type = ARR_ELEMTYPE(arr);
}
- typlen = op->d.scalararrayop.typlen;
- typbyval = op->d.scalararrayop.typbyval;
- typalign = op->d.scalararrayop.typalign;
- typalignby = typalign_to_alignby(typalign);
+ ExecEvalArrayCompareInternal(fcinfo,
+ arr,
+ op->d.scalararrayop.typlen,
+ op->d.scalararrayop.typbyval,
+ op->d.scalararrayop.typalign,
+ useOr,
+ &result,
+ &resultnull);
+
+ *op->resvalue = result;
+ *op->resnull = resultnull;
+}
+
+/*
+ * Shared helper for ExecEvalScalarArrayOp() and the NULL-LHS fallback for
+ * non-strict ExecEvalHashedScalarArrayOp().
+ *
+ * Callers must handle the strict LHS-is-NULL; return NULL fast path prior to
+ * calling this.
+ */
+static pg_attribute_always_inline void
+ExecEvalArrayCompareInternal(FunctionCallInfo fcinfo, ArrayType *arr,
+ int16 typlen, bool typbyval, char typalign,
+ bool useOr, Datum *result, bool *resultnull)
+{
+ uint8 typalignby = typalign_to_alignby(typalign);
+ int nitems;
+ char *s;
+ uint8 *bitmap;
+ int bitmask;
+ bool strictfunc = fcinfo->flinfo->fn_strict;
+
+ nitems = ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr));
/* Initialize result appropriately depending on useOr */
- result = BoolGetDatum(!useOr);
- resultnull = false;
+ *result = BoolGetDatum(!useOr);
+ *resultnull = false;
/* Loop over the array elements */
s = (char *) ARR_DATA_PTR(arr);
@@ -4129,18 +4159,18 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
else
{
fcinfo->isnull = false;
- thisresult = op->d.scalararrayop.fn_addr(fcinfo);
+ thisresult = fcinfo->flinfo->fn_addr(fcinfo);
}
/* Combine results per OR or AND semantics */
if (fcinfo->isnull)
- resultnull = true;
+ *resultnull = true;
else if (useOr)
{
if (DatumGetBool(thisresult))
{
- result = BoolGetDatum(true);
- resultnull = false;
+ *result = BoolGetDatum(true);
+ *resultnull = false;
break; /* needn't look at any more elements */
}
}
@@ -4148,8 +4178,8 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
{
if (!DatumGetBool(thisresult))
{
- result = BoolGetDatum(false);
- resultnull = false;
+ *result = BoolGetDatum(false);
+ *resultnull = false;
break; /* needn't look at any more elements */
}
}
@@ -4165,9 +4195,6 @@ ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op)
}
}
}
-
- *op->resvalue = result;
- *op->resnull = resultnull;
}
/*
@@ -4246,7 +4273,7 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
* If the scalar is NULL, and the function is strict, return NULL; no
* point in executing the search.
*/
- if (fcinfo->args[0].isnull && strictfunc)
+ if (scalar_isnull && strictfunc)
{
*op->resnull = true;
return;
@@ -4346,8 +4373,50 @@ ExecEvalHashedScalarArrayOp(ExprState *state, ExprEvalStep *op, ExprContext *eco
* non-strict functions with a null lhs value if no match is found.
*/
op->d.hashedscalararrayop.has_nulls = has_nulls;
+
+ /*
+ * When we have a non-strict equality function, check and cache the
+ * result from looking up a NULL. Non-strict functions are free to
+ * treat a NULL as equal to any other value, e.g. a 0 or an empty
+ * string. Here we perform a linear search over the array and cache
+ * the outcome so that we can result that any time we receive a NULL.
+ */
+ if (!strictfunc)
+ {
+ bool null_lhs_result;
+
+ fcinfo->args[0].value = (Datum) 0;
+ fcinfo->args[0].isnull = true;
+
+ ExecEvalArrayCompareInternal(fcinfo, arr, typlen, typbyval,
+ typalign, true, &result,
+ &resultnull);
+
+ null_lhs_result = DatumGetBool(result);
+
+ /* invert non-NULL results for NOT IN */
+ if (!resultnull && !inclause)
+ null_lhs_result = !null_lhs_result;
+
+ op->d.hashedscalararrayop.null_lhs_isnull = resultnull;
+ op->d.hashedscalararrayop.null_lhs_result = null_lhs_result;
+ }
+ }
+
+ /*
+ * When looking up an SQL NULL value with non-strict functions, we defer
+ * to the value we cached when building the hash table.
+ */
+ if (scalar_isnull)
+ {
+ Assert(!strictfunc);
+
+ *op->resnull = op->d.hashedscalararrayop.null_lhs_isnull;
+ *op->resvalue = BoolGetDatum(op->d.hashedscalararrayop.null_lhs_result);
+ return;
}
+
/* Check the hash to see if we have a match. */
hashfound = NULL != saophash_lookup(elements_tab->hashtab, scalar);
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index aa9b361fa31..c61b3d624d5 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -641,6 +641,10 @@ typedef struct ExprEvalStep
{
bool has_nulls;
bool inclause; /* true for IN and false for NOT IN */
+ bool null_lhs_result; /* for non-strict lookups, we
+ * cache what looking up NULL
+ * returns. */
+ bool null_lhs_isnull;
struct ScalarArrayOpExprHashTable *elements_tab;
FmgrInfo *finfo; /* function's lookup data */
FunctionCallInfo fcinfo_data; /* arguments etc */
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 9a3c97b15a3..730f7bc7eba 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -387,42 +387,177 @@ default for type myint using hash as
operator 1 = (myint, myint),
function 1 myinthash(myint);
create table inttest (a myint);
-insert into inttest values(1::myint),(null);
--- try an array with enough elements to cause hashing
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
- 1
-
-(2 rows)
+insert into inttest values (null), (0::myint), (1::myint);
+-- Test EEOP_HASHED_SCALARARRAYOP against EEOP_SCALARARRAYOP. Ensure the
+-- result of non-hashed vs hashed is the same.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | |
+ 0 | f | f
+ 1 | t | t
+(3 rows)
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
-(0 rows)
-
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
- a
----
-(0 rows)
-
--- ensure the result matched with the non-hashed version. We simply remove
--- some array elements so that we don't reach the hashing threshold.
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
- 1
-
-(2 rows)
+select
+ a,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | t | t
+ 0 | |
+ 1 | t | t
+(3 rows)
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
-(0 rows)
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | |
+ 0 | t | t
+ 1 | f | f
+(3 rows)
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint, null);
- a
----
-(0 rows)
+select
+ a,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | f | f
+ 0 | |
+ 1 | f | f
+(3 rows)
+
+-- Now make the equal function return false when given two NULLs
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+-- And try the same again to ensure EEOP_HASHED_SCALARARRAYOP does the same
+-- thing as EEOP_SCALARARRAYOP.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | |
+ 0 | f | f
+ 1 | t | t
+(3 rows)
+
+select
+ a,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | |
+ 0 | |
+ 1 | t | t
+(3 rows)
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | |
+ 0 | t | t
+ 1 | f | f
+(3 rows)
+
+select
+ a,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | |
+ 0 | |
+ 1 | f | f
+(3 rows)
+
+-- Try again with an equality function that treats NULLs as equal to 0.
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return coalesce($1::int,0) = coalesce($2::int, 0);
+ end if;
+end;
+$$ language plpgsql immutable;
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed,
+ a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed_zero,
+ a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed_zero
+from inttest;
+ a | not_hashed | hashed | not_hashed_zero | hashed_zero
+---+------------+--------+-----------------+-------------
+ | f | f | t | t
+ 0 | f | f | t | t
+ 1 | t | t | t | t
+(3 rows)
+
+select
+ a,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+ from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | f | f
+ 0 | t | t
+ 1 | t | t
+(3 rows)
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed,
+ a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed_zero,
+ a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed_zero
+from inttest;
+ a | not_hashed | hashed | not_hashed_zero | hashed_zero
+---+------------+--------+-----------------+-------------
+ | t | t | f | f
+ 0 | t | t | f | f
+ 1 | f | f | f | f
+(3 rows)
+
+select
+ a,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+from inttest;
+ a | not_hashed | hashed
+---+------------+--------
+ | t | t
+ 0 | f | f
+ 1 | f | f
+(3 rows)
rollback;
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index e02c21f3368..3b3048f9731 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -196,16 +196,108 @@ default for type myint using hash as
function 1 myinthash(myint);
create table inttest (a myint);
-insert into inttest values(1::myint),(null);
-
--- try an array with enough elements to cause hashing
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint, null);
--- ensure the result matched with the non-hashed version. We simply remove
--- some array elements so that we don't reach the hashing threshold.
-select * from inttest where a in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
-select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myint, null);
-select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint, null);
+insert into inttest values (null), (0::myint), (1::myint);
+
+-- Test EEOP_HASHED_SCALARARRAYOP against EEOP_SCALARARRAYOP. Ensure the
+-- result of non-hashed vs hashed is the same.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+from inttest;
+
+-- Now make the equal function return false when given two NULLs
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return $1::int = $2::int;
+ end if;
+end;
+$$ language plpgsql immutable;
+
+-- And try the same again to ensure EEOP_HASHED_SCALARARRAYOP does the same
+-- thing as EEOP_SCALARARRAYOP.
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed
+from inttest;
+
+select
+ a,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+from inttest;
+
+-- Try again with an equality function that treats NULLs as equal to 0.
+create or replace function myinteq(myint, myint) returns bool as $$
+begin
+ if $1 is null and $2 is null then
+ return false;
+ else
+ return coalesce($1::int,0) = coalesce($2::int, 0);
+ end if;
+end;
+$$ language plpgsql immutable;
+
+select
+ a,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed,
+ a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed_zero,
+ a in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed_zero
+from inttest;
+
+select
+ a,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+ from inttest;
+
+select
+ a,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as not_hashed,
+ a not in (1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint,9::myint) as hashed,
+ a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed_zero,
+ a not in (0::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed_zero
+from inttest;
+
+select
+ a,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint) as not_hashed,
+ a not in (null::myint,1::myint,2::myint,3::myint,4::myint,5::myint,6::myint,7::myint,8::myint) as hashed
+from inttest;
rollback;
--
2.51.0
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 06:17 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-22 23:33 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
@ 2026-04-23 04:31 ` Chengpeng Yan <[email protected]>
2026-04-23 05:32 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
0 siblings, 1 reply; 11+ messages in thread
From: Chengpeng Yan @ 2026-04-23 04:31 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: pgsql-hackers; cca5507 <[email protected]>; Tom Lane <[email protected]>
Hi,
> On Apr 23, 2026, at 07:33, David Rowley <[email protected]> wrote:
>
> Yeah, it doesn't make sense to repeatedly perform a linear search over
> the array to check if NULL matches anything in the array. Let's just
> do that once when we build the hash table and reuse that cached value
> whenever we see a NULL. We can skip that step with strict functions
> since we'll short-circuit earlier.
>
> A patch for that is attached.
Thanks for working on this. Overall, this version looks good to me, and
I'm fine with the current approach. One possible improvement, though not
a blocker, would be to defer the lhs-NULL handling until we actually
encounter the first NULL on the lhs. That could avoid a bit of extra
work in the common case where the lhs contains no NULLs. That said, I
think the current implementation is perfectly OK as-is.
> IMO it's unrealistic to assume we can do anything sane with an
> equality function that always returns NULL.
>
> I really doubt it's worth troubling over that. If we did want to do
> something, then it would be more efficient to probe the hash table
> directly after we insert a Datum and verify we can find it again. If
> we can't find any value we just inserted, mark the entire table as
> broken and have it so we check for that and do a linear search.
I tend to agree. Even if such a case can be constructed, it seems rare
enough that I am not sure it is worth adding more complexity, or extra
overhead in the common hashed SAOP path, to handle it in this patch. I
think we can revisit that separately if a concrete case turns up that
seems worth looking into.
--
Best regards,
Chengpeng Yan
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 06:17 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-22 23:33 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-23 04:31 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
@ 2026-04-23 05:32 ` David Rowley <[email protected]>
2026-04-23 05:47 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
0 siblings, 1 reply; 11+ messages in thread
From: David Rowley @ 2026-04-23 05:32 UTC (permalink / raw)
To: Chengpeng Yan <[email protected]>; +Cc: pgsql-hackers; cca5507 <[email protected]>; Tom Lane <[email protected]>
Thanks for looking.
On Thu, 23 Apr 2026 at 16:31, Chengpeng Yan <[email protected]> wrote:
> One possible improvement, though not
> a blocker, would be to defer the lhs-NULL handling until we actually
> encounter the first NULL on the lhs. That could avoid a bit of extra
> work in the common case where the lhs contains no NULLs.
I thought of it, but didn't do it as it meant having to keep a bit
more state to track if we've filled the cache yet, plus the extra
costs incurred to check if we've done it yet that would have to be
paid for every NULL lookup. We currently have to check if the hash
table has been set up already, so I felt more comfortable installing
the new code in with that.
David
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 06:17 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-22 23:33 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-23 04:31 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-23 05:32 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
@ 2026-04-23 05:47 ` Chengpeng Yan <[email protected]>
2026-04-24 02:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
0 siblings, 1 reply; 11+ messages in thread
From: Chengpeng Yan @ 2026-04-23 05:47 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: pgsql-hackers; cca5507 <[email protected]>; Tom Lane <[email protected]>
> On Apr 23, 2026, at 13:32, David Rowley <[email protected]> wrote:
>
> I thought of it, but didn't do it as it meant having to keep a bit
> more state to track if we've filled the cache yet, plus the extra
> costs incurred to check if we've done it yet that would have to be
> paid for every NULL lookup. We currently have to check if the hash
> table has been set up already, so I felt more comfortable installing
> the new code in with that.
That makes sense to me. I agree that tying it to the existing hash-table
setup is the simpler place to do it, and avoids adding extra state plus
another check on each NULL lookup.
So I am fine with keeping it as-is.
--
Best regards,
Chengpeng Yan
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-19 13:06 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 03:46 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-20 06:17 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-22 23:33 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-23 04:31 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-23 05:32 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators David Rowley <[email protected]>
2026-04-23 05:47 ` Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
@ 2026-04-24 02:06 ` David Rowley <[email protected]>
0 siblings, 0 replies; 11+ messages in thread
From: David Rowley @ 2026-04-24 02:06 UTC (permalink / raw)
To: Chengpeng Yan <[email protected]>; +Cc: pgsql-hackers; cca5507 <[email protected]>; Tom Lane <[email protected]>
On Thu, 23 Apr 2026 at 17:47, Chengpeng Yan <[email protected]> wrote:
> So I am fine with keeping it as-is.
Thanks. Pushed.
David
^ permalink raw reply [nested|flat] 11+ messages in thread
end of thread, other threads:[~2026-04-24 02:06 UTC | newest]
Thread overview: 11+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-16 13:01 [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators Chengpeng Yan <[email protected]>
2026-04-17 12:45 ` =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
2026-04-19 13:06 ` David Rowley <[email protected]>
2026-04-20 03:46 ` David Rowley <[email protected]>
2026-04-20 04:14 ` Tom Lane <[email protected]>
2026-04-20 06:17 ` Chengpeng Yan <[email protected]>
2026-04-22 23:33 ` David Rowley <[email protected]>
2026-04-23 04:31 ` Chengpeng Yan <[email protected]>
2026-04-23 05:32 ` David Rowley <[email protected]>
2026-04-23 05:47 ` Chengpeng Yan <[email protected]>
2026-04-24 02:06 ` David Rowley <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox