Re: Reduce planning time for large NOT IN lists containing NULL

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Reduce planning time for large NOT IN lists containing NULL
9+ messages / 4 participants
[nested] [flat]

* Re: Reduce planning time for large NOT IN lists containing NULL
@ 2026-02-24 08:15 Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Ilia Evdokimov @ 2026-02-24 08:15 UTC (permalink / raw)
  To: Zsolt Parragi <[email protected]>; +Cc: David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

Hi,

On 2/23/26 22:44, Zsolt Parragi wrote:
> Hello
>
> I think it would be a good idea to add a test, I think there's a
> regression with this patch:
>
> CREATE TABLE notin_test AS SELECT generate_series(1, 1000) AS x;
> ANALYZE notin_test;
>
> CREATE FUNCTION replace_elem(arr int[], idx int, val int)
> RETURNS int[] AS $$
> BEGIN
>      arr[idx] := val;
>      RETURN arr;
> END;
> $$ LANGUAGE plpgsql IMMUTABLE;
>
> EXPLAIN SELECT * FROM notin_test WHERE x <> ALL(ARRAY[1,99,3]);
> -- same array, constructed from an array with a NULL
> EXPLAIN SELECT * FROM notin_test WHERE x <>
> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99));
> DROP TABLE notin_test;
> DROP FUNCTION replace_elem;
>
> ARR_HASNULL probably should be array_contains_nulls, as ARR_HASNULL
> simply checks for the existence of a NULL bitmap.

Could you clarify what exactly this additional test meant to verify?

The current patch only introduces an early exit from the expensive 
per-element selectivity loop in the <> ALL case when a NULL is detected. 
If the goal is to verify the correctness of IN / NOT IN semantics, those 
cases already covered in expressions.sql, including scenarios with NULL 
elements.

I attached this thread to commitfest: 
https://commitfest.postgresql.org/patch/6519/

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/








^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
@ 2026-02-24 08:29 ` David Geier <[email protected]>
  2026-02-24 21:14   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: David Geier @ 2026-02-24 08:29 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; Zsolt Parragi <[email protected]>; +Cc: PostgreSQL Developers <[email protected]>

>> I think it would be a good idea to add a test, I think there's a
>> regression with this patch:
>>
>> CREATE TABLE notin_test AS SELECT generate_series(1, 1000) AS x;
>> ANALYZE notin_test;
>>
>> CREATE FUNCTION replace_elem(arr int[], idx int, val int)
>> RETURNS int[] AS $$
>> BEGIN
>>      arr[idx] := val;
>>      RETURN arr;
>> END;
>> $$ LANGUAGE plpgsql IMMUTABLE;
>>
>> EXPLAIN SELECT * FROM notin_test WHERE x <> ALL(ARRAY[1,99,3]);
>> -- same array, constructed from an array with a NULL
>> EXPLAIN SELECT * FROM notin_test WHERE x <>
>> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99));
>> DROP TABLE notin_test;
>> DROP FUNCTION replace_elem;

Good catch. The macro name is misleading here. It should have been
called ARR_HASNULLBITMAP().

+1 on adding an explicit test that says why we care about that case.

>> ARR_HASNULL probably should be array_contains_nulls, as ARR_HASNULL
>> simply checks for the existence of a NULL bitmap.

Using array_contains_nulls() seems fine. In case the IN list doesn't
contain NULL, the function can immediately bail thanks to the
!ARR_HASNULL() check in the beginning.

It only needs to iterate over the NULL-bitmap, if it exists. This is the
case if there's actually a NULL element in the array, or if the array
initially contained NULL and all NULLs got removed subsequently.

If we ever find the latter case to matter we could remove the
NULL-bitmap in array_set_element() / array_set_element_expanded(), when
the last NULL element got removed.

> Could you clarify what exactly this additional test meant to verify?

Zsolt's test case creates an array that initially contains NULL. The
NULL element is subsequently replaced by a non-NULL value but
array_set_element_expanded() keeps the NULL-bitmap around. With that,
your ARR_ISNULL() check bails and causes the selectivity estimation to
incorrectly return 0.

> I attached this thread to commitfest: https://commitfest.postgresql.org/
> patch/6519/

I'll assign myself as reviewer.

--
David Geier

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
@ 2026-02-24 21:14   ` Ilia Evdokimov <[email protected]>
  2026-02-25 13:43     ` Re: Reduce planning time for large NOT IN lists containing NULL Zsolt Parragi <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Ilia Evdokimov @ 2026-02-24 21:14 UTC (permalink / raw)
  To: David Geier <[email protected]>; Zsolt Parragi <[email protected]>; +Cc: PostgreSQL Developers <[email protected]>

On 2/24/26 11:29, David Geier wrote:

> Using array_contains_nulls() seems fine. In case the IN list doesn't
> contain NULL, the function can immediately bail thanks to the
> !ARR_HASNULL() check in the beginning.
>
> It only needs to iterate over the NULL-bitmap, if it exists. This is the
> case if there's actually a NULL element in the array, or if the array
> initially contained NULL and all NULLs got removed subsequently.
>
> If we ever find the latter case to matter we could remove the
> NULL-bitmap in array_set_element() / array_set_element_expanded(), when
> the last NULL element got removed.
>
>> Could you clarify what exactly this additional test meant to verify?
> Zsolt's test case creates an array that initially contains NULL. The
> NULL element is subsequently replaced by a non-NULL value but
> array_set_element_expanded() keeps the NULL-bitmap around. With that,
> your ARR_ISNULL() check bails and causes the selectivity estimation to
> incorrectly return 0.

Ah, right - thanks for the clarification. I agree.

Regarding the regression test: the suggestion test case is good, but 
there is not a straightforward way to expose the estimated row count 
without also showing the costs, and costs are unstable. To avoid that, I 
reused the parsing approach already present in stats_ext.sql to extract 
only the estimated row count from EXPLAIN.

Since the test table contains exactly 1000 rows and we run ANALYZE, all 
rows are included in the statistics sample. Therefore the estimate for x 
<> ALL(array[1, 99, 2]) is deterministically 997 rows, and the test 
stable and ensures we detect the incorrect early-zero estimate.

Let me know if you'd prefer a different approach. I've attached v4 patch.

-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/



Attachments:

  [text/x-patch] v4-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch (4.9K, 3-v4-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch)
  download | inline diff:
From a9dd1d1c9990a30173c5210570efccb034e8b5f4 Mon Sep 17 00:00:00 2001
From: Evdokimov Ilia <[email protected]>
Date: Wed, 25 Feb 2026 00:08:26 +0300
Subject: [PATCH v4] Reduce planning time for large NOT IN lists containing 
 NULL

For x <> ALL (...) / x NOT IN (...), the presence of a NULL element
makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large NOT IN / <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c          |  9 +++++
 src/test/regress/expected/expressions.out | 44 +++++++++++++++++++++++
 src/test/regress/sql/expressions.sql      | 41 +++++++++++++++++++++
 3 files changed, 94 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec655593..eef3f0375a5 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2018,6 +2018,11 @@ scalararraysel(PlannerInfo *root,
 		if (arrayisnull)		/* qual can't succeed if null array */
 			return (Selectivity) 0.0;
 		arrayval = DatumGetArrayTypeP(arraydatum);
+
+		/* Selectivity of "WHERE x NOT IN (NULL, ... )" is always 0 */
+		if (!useOr && array_contains_nulls(arrayval))
+			return (Selectivity) 0.0;
+
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 		deconstruct_array(arrayval,
@@ -2115,6 +2120,10 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/* Selectivity of "WHERE x NOT IN (NULL, ... )" is always 0 */
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
+
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out
index 9a3c97b15a3..34f14a5775a 100644
--- a/src/test/regress/expected/expressions.out
+++ b/src/test/regress/expected/expressions.out
@@ -426,3 +426,47 @@ select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myi
 (0 rows)
 
 rollback;
+-- Test <> ALL when array initially contained NULL but no longer does
+begin;
+create function check_estimated_rows(text) returns table (estimated int)
+language plpgsql as
+$$
+declare
+    ln text;
+    tmp text[];
+    first_row bool := true;
+begin
+    for ln in
+        execute format('explain %s', $1)
+    loop
+        if first_row then
+            first_row := false;
+            tmp := regexp_match(ln, 'rows=(\d*)');
+            return query select tmp[1]::int;
+        end if;
+    end loop;
+end;
+$$;
+create function replace_elem(arr int[], idx int, val int)
+returns int[] AS $$
+begin
+      arr[idx] := val;
+      return arr;
+end;
+$$ language plpgsql immutable;
+create table notin_test as select generate_series(1, 1000) as x;
+analyze notin_test;
+select * from check_estimated_rows('select * from notin_test where x <> all(array[1,99,3])');
+ estimated 
+-----------
+       997
+(1 row)
+
+-- same array, constructed from an array with a NULL
+select * from check_estimated_rows('select * from notin_test where x <> all(replace_elem(array[1,null,3], 2, 99))');
+ estimated 
+-----------
+       997
+(1 row)
+
+rollback;
diff --git a/src/test/regress/sql/expressions.sql b/src/test/regress/sql/expressions.sql
index e02c21f3368..ca94859bbf8 100644
--- a/src/test/regress/sql/expressions.sql
+++ b/src/test/regress/sql/expressions.sql
@@ -209,3 +209,44 @@ select * from inttest where a not in (1::myint,2::myint,3::myint,4::myint,5::myi
 select * from inttest where a not in (0::myint,2::myint,3::myint,4::myint,5::myint, null);
 
 rollback;
+
+-- Test <> ALL when array initially contained NULL but no longer does
+
+begin;
+
+create function check_estimated_rows(text) returns table (estimated int)
+language plpgsql as
+$$
+declare
+    ln text;
+    tmp text[];
+    first_row bool := true;
+begin
+    for ln in
+        execute format('explain %s', $1)
+    loop
+        if first_row then
+            first_row := false;
+            tmp := regexp_match(ln, 'rows=(\d*)');
+            return query select tmp[1]::int;
+        end if;
+    end loop;
+end;
+$$;
+
+create function replace_elem(arr int[], idx int, val int)
+returns int[] AS $$
+begin
+      arr[idx] := val;
+      return arr;
+end;
+$$ language plpgsql immutable;
+
+create table notin_test as select generate_series(1, 1000) as x;
+analyze notin_test;
+
+select * from check_estimated_rows('select * from notin_test where x <> all(array[1,99,3])');
+-- same array, constructed from an array with a NULL
+select * from check_estimated_rows('select * from notin_test where x <> all(replace_elem(array[1,null,3], 2, 99))');
+
+rollback;
\ No newline at end of file
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-24 21:14   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
@ 2026-02-25 13:43     ` Zsolt Parragi <[email protected]>
  2026-03-03 01:08       ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Zsolt Parragi @ 2026-02-25 13:43 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; +Cc: David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

expressions.sql is missing a new line at the end of the file,
otherwise it looks good to me.






^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-24 21:14   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-25 13:43     ` Re: Reduce planning time for large NOT IN lists containing NULL Zsolt Parragi <[email protected]>
@ 2026-03-03 01:08       ` David Rowley <[email protected]>
  2026-03-17 10:50         ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: David Rowley @ 2026-03-03 01:08 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

On Tue, 3 Mar 2026 at 04:04, Ilia Evdokimov
<[email protected]> wrote:
> I've fixed this in v5-patch.

I had a look at this and wondered if we guarantee that no rows will
match, then why can't we perform constant folding on the
ScalarArrayOpExpr when !useOr and the array contains a NULL element
and the operator is strict. Seemingly, one of the reasons for that is
down to the expression returning NULL vs false. Take the following two
tests from expressions.out:

select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 2, null);
 ?column?
----------

(1 row)

select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1, null);
 ?column?
----------
 f
(1 row)

Here we see that we return false when we find the left operand in the
array, but NULL when we don't find it and the array contains NULL. So,
unless the left operand is a const, we wouldn't know how to simplify
the ScalarArrayOpExpr during planning as the false or NULL would only
be known during evaluation of the expression.

However, when the expression being simplified is an EXPRKIND_QUAL, it
shouldn't matter if the result is false or NULL as both mean the same
and there shouldn't be any code that cares about the difference.
Currently, we don't pass the "kind" down into
eval_const_expressions(), but I don't really see why we couldn't. It
would be a fair bit of work figuring out with confidence what the
extra arg should be passed as in all the existing call sites of that
function. We'd have to document in the header comment for
eval_const_expressions() that constant-folding on EXPRKIND_QUAL
expressions can enable additional optimisations which disregard the
difference between NULL and false.

For the patch, I imagine it's still a useful optimisation as the
ScalarArrayOpExpr might not be in an EXPRKIND_QUAL.

There are a couple of things I don't like:

1) The new test is in expressions.sql. The comment at the top of that
file reads: "expression evaluation tests that don't fit into a more
specific file". The new test isn't anything to do with expression
evaluation. It's about planner estimation. I see that
misc_function.sql has the explain_mask_costs() function. I'm not sure
that's the right place either, as the usages of that function are for
testing SupportRequestRows prosupport functions. I wonder if we need a
dedicated row_estimate.sql or selectivity_est.sql file. The
explain_mask_costs() wouldn't be out of place if they were moved into
a new test like that. It was me that started putting those in
misc_function.sql, and I don't object to them being moved to a new
test. I'd be as a separate commit, however.

2) The new test creates a new table and inserts 1000 rows. There does
not seem to be anything special about the new table. Why don't you use
one of the ones from test_setup.sql?

3) Looking at var_eq_const(), it seems like it's coded to assume the
operator is always strict, per "If the constant is NULL, assume
operator is strict and return zero". If that's good enough for
var_eq_const(), then it should be good enough for the new code. I
think it would be good if you wrote that or something similar in the
new code so that the reader knows taking the short-circuit with
non-strict functions is on purpose.

David





^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-24 21:14   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-25 13:43     ` Re: Reduce planning time for large NOT IN lists containing NULL Zsolt Parragi <[email protected]>
  2026-03-03 01:08       ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
@ 2026-03-17 10:50         ` Ilia Evdokimov <[email protected]>
  2026-03-18 04:32           ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Ilia Evdokimov @ 2026-03-17 10:50 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

Rebased the v7-patches to fix git apply failure.


-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/


Attachments:

  [text/x-patch] v7-0002-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch (4.6K, 2-v7-0002-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch)
  download | inline diff:
From 33094d8e110152c5c9af119373699be2ceb42abb Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Tue, 17 Mar 2026 13:40:57 +0300
Subject: [PATCH v7 2/2] Reduce planning time for large NOT IN lists containing
 NULL

For x <> ALL (...), the presence of a NULL makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c              | 17 +++++++++++
 src/test/regress/expected/selectivity_est.out | 30 +++++++++++++++++++
 src/test/regress/sql/selectivity_est.sql      | 22 ++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..073d93f4f3d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2018,6 +2018,15 @@ scalararraysel(PlannerInfo *root,
 		if (arrayisnull)		/* qual can't succeed if null array */
 			return (Selectivity) 0.0;
 		arrayval = DatumGetArrayTypeP(arraydatum);
+
+		/*
+		 * For ALL semantics, if the array contains NULL, assume operator is
+		 * strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so return
+		 * zero.
+		 */
+		if (!useOr && array_contains_nulls(arrayval))
+			return (Selectivity) 0.0;
+
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 		deconstruct_array(arrayval,
@@ -2115,6 +2124,14 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/*
+			 * For ALL semantics, if the array contains NULL, assume operator
+			 * is strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so
+			 * return zero.
+			 */
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
+
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
diff --git a/src/test/regress/expected/selectivity_est.out b/src/test/regress/expected/selectivity_est.out
index 8fc5c9c9e07..d482f2ae7a0 100644
--- a/src/test/regress/expected/selectivity_est.out
+++ b/src/test/regress/expected/selectivity_est.out
@@ -175,4 +175,34 @@ false, true, false, true);
  Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
 (1 row)
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+DROP FUNCTION replace_elem;
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
diff --git a/src/test/regress/sql/selectivity_est.sql b/src/test/regress/sql/selectivity_est.sql
index 416d5ea1f75..3ffec43907f 100644
--- a/src/test/regress/sql/selectivity_est.sql
+++ b/src/test/regress/sql/selectivity_est.sql
@@ -122,5 +122,27 @@ SELECT explain_mask_costs($$
 SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
 false, true, false, true);
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+
+DROP FUNCTION replace_elem;
+
 
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
-- 
2.34.1



  [text/x-patch] v7-0001-Move-planner-row-estimation-tests-to-selectivity..patch (29.0K, 3-v7-0001-Move-planner-row-estimation-tests-to-selectivity..patch)
  download | inline diff:
From 644fa9a077aaf7d77c3a58add925d2e01cbcae20 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Tue, 17 Mar 2026 13:39:38 +0300
Subject: [PATCH v7 1/2] Move planner row-estimation tests to selectivity.sql

Move explain_mask_costs() and the associated planner row-estimation
tests from misc_functions.sql to a new regression test file,
selectivity.sql.

The tests exercise SupportRequestRows support functions and other
planner selectivity estimation behavior, so they do not logically
belong in misc_functions.sql.
---
 src/test/regress/expected/misc_functions.out  | 178 ------------------
 src/test/regress/expected/selectivity_est.out | 178 ++++++++++++++++++
 src/test/regress/parallel_schedule            |   2 +-
 src/test/regress/sql/misc_functions.sql       | 125 ------------
 src/test/regress/sql/selectivity_est.sql      | 126 +++++++++++++
 5 files changed, 305 insertions(+), 304 deletions(-)
 create mode 100644 src/test/regress/expected/selectivity_est.out
 create mode 100644 src/test/regress/sql/selectivity_est.sql

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 6c03b1a79d7..cf55cdf3688 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -2,46 +2,6 @@
 \getenv libdir PG_LIBDIR
 \getenv dlsuffix PG_DLSUFFIX
 \set regresslib :libdir '/regress' :dlsuffix
--- Function to assist with verifying EXPLAIN which includes costs.  A series
--- of bool flags allows control over which portions are masked out
-CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
-    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
-LANGUAGE plpgsql AS
-$$
-DECLARE
-    ln text;
-    analyze_str text;
-BEGIN
-    IF do_analyze = true THEN
-        analyze_str := 'on';
-    ELSE
-        analyze_str := 'off';
-    END IF;
-
-    -- avoid jit related output by disabling it
-    SET LOCAL jit = 0;
-
-    FOR ln IN
-        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
-            analyze_str, query)
-    LOOP
-        IF hide_costs = true THEN
-            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
-        END IF;
-
-        IF hide_row_est = true THEN
-            -- don't use 'g' so that we leave the actual rows intact
-            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
-        END IF;
-
-        IF hide_width = true THEN
-            ln := regexp_replace(ln, 'width=\d+', 'width=N');
-        END IF;
-
-        RETURN NEXT ln;
-    END LOOP;
-END;
-$$;
 --
 -- num_nulls()
 --
@@ -671,143 +631,6 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
          Index Cond: (unique1 = g.g)
 (4 rows)
 
---
--- Test the SupportRequestRows support function for generate_series_timestamp()
---
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- As above but with generate_series_timestamp
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- As above but with generate_series_timestamptz_at_zone()
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- Ensure the estimated and actual row counts match when the range isn't
--- evenly divisible by the step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
-true, true, false, true);
-                                    explain_mask_costs                                     
--------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=5 width=N) (actual rows=5.00 loops=1)
-(1 row)
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-                                    explain_mask_costs                                     
--------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
-(1 row)
-
--- Ensure we get the default row estimate for infinity values
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
--- Ensure the row estimate behaves correctly when step size is zero.
--- We expect generate_series_timestamp() to throw the error rather than in
--- the support function.
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
-ERROR:  step size cannot equal zero
---
--- Test the SupportRequestRows support function for generate_series_numeric()
---
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
-(1 row)
-
--- As above but with non-default step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=13 width=N) (actual rows=13.00 loops=1)
-(1 row)
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
-(1 row)
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
-true, true, false, true);
-                                    explain_mask_costs                                     
--------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
-(1 row)
-
--- Ensure we get the default row estimate for error cases (infinity/NaN values
--- and zero step size)
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
 --
 -- Test SupportRequestInlineInFrom request
 --
@@ -970,7 +793,6 @@ SELECT pg_column_toast_chunk_id(a) IS NULL,
 (1 row)
 
 DROP TABLE test_chunk_id;
-DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
 -- test stratnum translation support functions
 SELECT gist_translate_cmptype_common(7);
  gist_translate_cmptype_common 
diff --git a/src/test/regress/expected/selectivity_est.out b/src/test/regress/expected/selectivity_est.out
new file mode 100644
index 00000000000..8fc5c9c9e07
--- /dev/null
+++ b/src/test/regress/expected/selectivity_est.out
@@ -0,0 +1,178 @@
+-- Function to assist with verifying EXPLAIN which includes costs.  A series
+-- of bool flags allows control over which portions are masked out
+CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
+    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
+LANGUAGE plpgsql AS
+$$
+DECLARE
+    ln text;
+    analyze_str text;
+BEGIN
+    IF do_analyze = true THEN
+        analyze_str := 'on';
+    ELSE
+        analyze_str := 'off';
+    END IF;
+
+    -- avoid jit related output by disabling it
+    SET LOCAL jit = 0;
+
+    FOR ln IN
+        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
+            analyze_str, query)
+    LOOP
+        IF hide_costs = true THEN
+            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
+        END IF;
+
+        IF hide_row_est = true THEN
+            -- don't use 'g' so that we leave the actual rows intact
+            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
+        END IF;
+
+        IF hide_width = true THEN
+            ln := regexp_replace(ln, 'width=\d+', 'width=N');
+        END IF;
+
+        RETURN NEXT ln;
+    END LOOP;
+END;
+$$;
+--
+-- Test the SupportRequestRows support function for generate_series_timestamp()
+--
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- As above but with generate_series_timestamp
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- As above but with generate_series_timestamptz_at_zone()
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- Ensure the estimated and actual row counts match when the range isn't
+-- evenly divisible by the step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                     
+-------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=5 width=N) (actual rows=5.00 loops=1)
+(1 row)
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                     
+-------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
+(1 row)
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_timestamp() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
+ERROR:  step size cannot equal zero
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
+(1 row)
+
+-- As above but with non-default step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=13 width=N) (actual rows=13.00 loops=1)
+(1 row)
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
+(1 row)
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                     
+-------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
+(1 row)
+
+-- Ensure we get the default row estimate for error cases (infinity/NaN values
+-- and zero step size)
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e779ada70cb..645f493319b 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # The stats test resets stats, so nothing else needing stats access can be in
 # this group.
 # ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate graph_table_rls
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate graph_table_rls selectivity_est
 
 # event_trigger depends on create_am and cannot run concurrently with
 # any test that runs DDL
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 35b7983996c..c8226652f2c 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -4,47 +4,6 @@
 
 \set regresslib :libdir '/regress' :dlsuffix
 
--- Function to assist with verifying EXPLAIN which includes costs.  A series
--- of bool flags allows control over which portions are masked out
-CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
-    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
-LANGUAGE plpgsql AS
-$$
-DECLARE
-    ln text;
-    analyze_str text;
-BEGIN
-    IF do_analyze = true THEN
-        analyze_str := 'on';
-    ELSE
-        analyze_str := 'off';
-    END IF;
-
-    -- avoid jit related output by disabling it
-    SET LOCAL jit = 0;
-
-    FOR ln IN
-        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
-            analyze_str, query)
-    LOOP
-        IF hide_costs = true THEN
-            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
-        END IF;
-
-        IF hide_row_est = true THEN
-            -- don't use 'g' so that we leave the actual rows intact
-            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
-        END IF;
-
-        IF hide_width = true THEN
-            ln := regexp_replace(ln, 'width=\d+', 'width=N');
-        END IF;
-
-        RETURN NEXT ln;
-    END LOOP;
-END;
-$$;
-
 --
 -- num_nulls()
 --
@@ -277,89 +236,6 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
 
---
--- Test the SupportRequestRows support function for generate_series_timestamp()
---
-
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-
--- As above but with generate_series_timestamp
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-
--- As above but with generate_series_timestamptz_at_zone()
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
-true, true, false, true);
-
--- Ensure the estimated and actual row counts match when the range isn't
--- evenly divisible by the step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
-true, true, false, true);
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
-true, true, false, true);
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-
--- Ensure we get the default row estimate for infinity values
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
-false, true, false, true);
-
--- Ensure the row estimate behaves correctly when step size is zero.
--- We expect generate_series_timestamp() to throw the error rather than in
--- the support function.
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
-
---
--- Test the SupportRequestRows support function for generate_series_numeric()
---
-
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
-true, true, false, true);
-
--- As above but with non-default step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
-true, true, false, true);
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
-true, true, false, true);
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
-true, true, false, true);
-
--- Ensure we get the default row estimate for error cases (infinity/NaN values
--- and zero step size)
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
-false, true, false, true);
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
-false, true, false, true);
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
-false, true, false, true);
-
 --
 -- Test SupportRequestInlineInFrom request
 --
@@ -443,7 +319,6 @@ SELECT pg_column_toast_chunk_id(a) IS NULL,
   pg_column_toast_chunk_id(b) IN (SELECT chunk_id FROM pg_toast.:toastrel)
   FROM test_chunk_id;
 DROP TABLE test_chunk_id;
-DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
 
 -- test stratnum translation support functions
 SELECT gist_translate_cmptype_common(7);
diff --git a/src/test/regress/sql/selectivity_est.sql b/src/test/regress/sql/selectivity_est.sql
new file mode 100644
index 00000000000..416d5ea1f75
--- /dev/null
+++ b/src/test/regress/sql/selectivity_est.sql
@@ -0,0 +1,126 @@
+-- Function to assist with verifying EXPLAIN which includes costs.  A series
+-- of bool flags allows control over which portions are masked out
+CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
+    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
+LANGUAGE plpgsql AS
+$$
+DECLARE
+    ln text;
+    analyze_str text;
+BEGIN
+    IF do_analyze = true THEN
+        analyze_str := 'on';
+    ELSE
+        analyze_str := 'off';
+    END IF;
+
+    -- avoid jit related output by disabling it
+    SET LOCAL jit = 0;
+
+    FOR ln IN
+        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
+            analyze_str, query)
+    LOOP
+        IF hide_costs = true THEN
+            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
+        END IF;
+
+        IF hide_row_est = true THEN
+            -- don't use 'g' so that we leave the actual rows intact
+            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
+        END IF;
+
+        IF hide_width = true THEN
+            ln := regexp_replace(ln, 'width=\d+', 'width=N');
+        END IF;
+
+        RETURN NEXT ln;
+    END LOOP;
+END;
+$$;
+
+--
+-- Test the SupportRequestRows support function for generate_series_timestamp()
+--
+
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+
+-- As above but with generate_series_timestamp
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+
+-- As above but with generate_series_timestamptz_at_zone()
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimated and actual row counts match when the range isn't
+-- evenly divisible by the step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
+true, true, false, true);
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
+false, true, false, true);
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_timestamp() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
+
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+
+-- As above but with non-default step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure we get the default row estimate for error cases (infinity/NaN values
+-- and zero step size)
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
+false, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
+false, true, false, true);
+
+
+DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-24 21:14   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-25 13:43     ` Re: Reduce planning time for large NOT IN lists containing NULL Zsolt Parragi <[email protected]>
  2026-03-03 01:08       ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
  2026-03-17 10:50         ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
@ 2026-03-18 04:32           ` David Rowley <[email protected]>
  2026-03-18 14:26             ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: David Rowley @ 2026-03-18 04:32 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

On Tue, 17 Mar 2026 at 23:51, Ilia Evdokimov
<[email protected]> wrote:
> Rebased the v7-patches to fix git apply failure.

Thanks. I've pushed 0001.

I ended up renaming the new file to planner_est.sql as the function
handles width estimate masking too, so I thought just calling it
selectivity_est was a bit too restrictive. I went with planner_est.
That means 0002 needed rebased. I've done that in the attached. Will
look more closely at that one later.

David

From ae698d22b40a665b4048f5db61b3f791be0d246e Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Tue, 17 Mar 2026 13:40:57 +0300
Subject: [PATCH v8] Reduce planning time for large NOT IN lists containing
 NULL

For x <> ALL (...), the presence of a NULL makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c          | 17 +++++++++++++
 src/test/regress/expected/planner_est.out | 30 +++++++++++++++++++++++
 src/test/regress/sql/planner_est.sql      | 22 +++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..073d93f4f3d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2018,6 +2018,15 @@ scalararraysel(PlannerInfo *root,
 		if (arrayisnull)		/* qual can't succeed if null array */
 			return (Selectivity) 0.0;
 		arrayval = DatumGetArrayTypeP(arraydatum);
+
+		/*
+		 * For ALL semantics, if the array contains NULL, assume operator is
+		 * strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so return
+		 * zero.
+		 */
+		if (!useOr && array_contains_nulls(arrayval))
+			return (Selectivity) 0.0;
+
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 		deconstruct_array(arrayval,
@@ -2115,6 +2124,14 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/*
+			 * For ALL semantics, if the array contains NULL, assume operator
+			 * is strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so
+			 * return zero.
+			 */
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
+
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
diff --git a/src/test/regress/expected/planner_est.out b/src/test/regress/expected/planner_est.out
index 3a47061800a..7718197283b 100644
--- a/src/test/regress/expected/planner_est.out
+++ b/src/test/regress/expected/planner_est.out
@@ -183,4 +183,34 @@ false, true, false, true);
  Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
 (1 row)
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+DROP FUNCTION replace_elem;
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
diff --git a/src/test/regress/sql/planner_est.sql b/src/test/regress/sql/planner_est.sql
index 47d5ae679c7..391ec42a6e7 100644
--- a/src/test/regress/sql/planner_est.sql
+++ b/src/test/regress/sql/planner_est.sql
@@ -131,5 +131,27 @@ SELECT explain_mask_costs($$
 SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
 false, true, false, true);
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+
+DROP FUNCTION replace_elem;
+
 
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
-- 
2.51.0



Attachments:

  [text/plain] v8-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch (4.5K, 2-v8-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch)
  download | inline diff:
From ae698d22b40a665b4048f5db61b3f791be0d246e Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Tue, 17 Mar 2026 13:40:57 +0300
Subject: [PATCH v8] Reduce planning time for large NOT IN lists containing
 NULL

For x <> ALL (...), the presence of a NULL makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c          | 17 +++++++++++++
 src/test/regress/expected/planner_est.out | 30 +++++++++++++++++++++++
 src/test/regress/sql/planner_est.sql      | 22 +++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..073d93f4f3d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2018,6 +2018,15 @@ scalararraysel(PlannerInfo *root,
 		if (arrayisnull)		/* qual can't succeed if null array */
 			return (Selectivity) 0.0;
 		arrayval = DatumGetArrayTypeP(arraydatum);
+
+		/*
+		 * For ALL semantics, if the array contains NULL, assume operator is
+		 * strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so return
+		 * zero.
+		 */
+		if (!useOr && array_contains_nulls(arrayval))
+			return (Selectivity) 0.0;
+
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 		deconstruct_array(arrayval,
@@ -2115,6 +2124,14 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/*
+			 * For ALL semantics, if the array contains NULL, assume operator
+			 * is strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so
+			 * return zero.
+			 */
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
+
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
diff --git a/src/test/regress/expected/planner_est.out b/src/test/regress/expected/planner_est.out
index 3a47061800a..7718197283b 100644
--- a/src/test/regress/expected/planner_est.out
+++ b/src/test/regress/expected/planner_est.out
@@ -183,4 +183,34 @@ false, true, false, true);
  Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
 (1 row)
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+DROP FUNCTION replace_elem;
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
diff --git a/src/test/regress/sql/planner_est.sql b/src/test/regress/sql/planner_est.sql
index 47d5ae679c7..391ec42a6e7 100644
--- a/src/test/regress/sql/planner_est.sql
+++ b/src/test/regress/sql/planner_est.sql
@@ -131,5 +131,27 @@ SELECT explain_mask_costs($$
 SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
 false, true, false, true);
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+
+DROP FUNCTION replace_elem;
+
 
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-24 21:14   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-25 13:43     ` Re: Reduce planning time for large NOT IN lists containing NULL Zsolt Parragi <[email protected]>
  2026-03-03 01:08       ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
  2026-03-17 10:50         ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-03-18 04:32           ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
@ 2026-03-18 14:26             ` Ilia Evdokimov <[email protected]>
  2026-03-19 04:28               ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Ilia Evdokimov @ 2026-03-18 14:26 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

On 3/18/26 07:32, David Rowley wrote:

> Thanks. I've pushed 0001.
>
> I ended up renaming the new file to planner_est.sql as the function
> handles width estimate masking too, so I thought just calling it
> selectivity_est was a bit too restrictive. I went with planner_est.

+1. Thank you.

> That means 0002 needed rebased. I've done that in the attached.

After the new test was committed, I realized that v8 tests relies on 
selectivity calculation, which are not guaranteed to remain stable over 
time and way vary depending on planner heuristics or platform 
differences. Therefore, it seems better to remove tests from v8.

Instead, we can test the invariant behavior: when NULL is present in a 
<> ALL clause, the selectivity is always 0.0.

The v9-patch adds three test cases: a degenerate case with only NULL, 
NULL combined with constants, NULL combined with both constants and 
non-constant expression.

Thoughts?


-- 
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/


Attachments:

  [text/x-patch] v9-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch (4.9K, 3-v9-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch)
  download | inline diff:
From c0ba98141c183090783653d882d206acb7c38a29 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Wed, 18 Mar 2026 17:23:50 +0300
Subject: [PATCH v9] Reduce planning time for large NOT IN lists containing
 NULL

For x <> ALL (...), the presence of a NULL makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c          | 17 +++++++++++
 src/test/regress/expected/planner_est.out | 35 +++++++++++++++++++++++
 src/test/regress/sql/planner_est.sql      | 16 +++++++++++
 3 files changed, 68 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..073d93f4f3d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2018,6 +2018,15 @@ scalararraysel(PlannerInfo *root,
 		if (arrayisnull)		/* qual can't succeed if null array */
 			return (Selectivity) 0.0;
 		arrayval = DatumGetArrayTypeP(arraydatum);
+
+		/*
+		 * For ALL semantics, if the array contains NULL, assume operator is
+		 * strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so return
+		 * zero.
+		 */
+		if (!useOr && array_contains_nulls(arrayval))
+			return (Selectivity) 0.0;
+
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 		deconstruct_array(arrayval,
@@ -2115,6 +2124,14 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/*
+			 * For ALL semantics, if the array contains NULL, assume operator
+			 * is strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so
+			 * return zero.
+			 */
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
+
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
diff --git a/src/test/regress/expected/planner_est.out b/src/test/regress/expected/planner_est.out
index 3a47061800a..624d5858687 100644
--- a/src/test/regress/expected/planner_est.out
+++ b/src/test/regress/expected/planner_est.out
@@ -183,4 +183,39 @@ false, true, false, true);
  Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
 (1 row)
 
+--
+-- Test <> ALL behavior when NULL appears in the array
+--
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[NULL]::integer[]);$$,
+true, true, false, true);
+                            explain_mask_costs                            
+--------------------------------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
+   Filter: (unique1 <> ALL ('{NULL}'::integer[]))
+   Rows Removed by Filter: 10000
+(3 rows)
+
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 99, NULL]);$$,
+true, true, false, true);
+                            explain_mask_costs                            
+--------------------------------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
+   Filter: (unique1 <> ALL ('{1,2,99,NULL}'::integer[]))
+   Rows Removed by Filter: 10000
+(3 rows)
+
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 98, (SELECT 99), NULL]);$$,
+true, true, false, true);
+                                 explain_mask_costs                                  
+-------------------------------------------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
+   Filter: (unique1 <> ALL (ARRAY[1, 2, 98, (InitPlan expr_1).col1, NULL::integer]))
+   Rows Removed by Filter: 10000
+   InitPlan expr_1
+     ->  Result  (cost=N..N rows=1 width=N) (actual rows=1.00 loops=1)
+(5 rows)
+
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
diff --git a/src/test/regress/sql/planner_est.sql b/src/test/regress/sql/planner_est.sql
index 47d5ae679c7..61c9f640761 100644
--- a/src/test/regress/sql/planner_est.sql
+++ b/src/test/regress/sql/planner_est.sql
@@ -131,5 +131,21 @@ SELECT explain_mask_costs($$
 SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
 false, true, false, true);
 
+--
+-- Test <> ALL behavior when NULL appears in the array
+--
+
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[NULL]::integer[]);$$,
+true, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 99, NULL]);$$,
+true, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 98, (SELECT 99), NULL]);$$,
+true, true, false, true);
+
 
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-24 08:29 ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-24 21:14   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-25 13:43     ` Re: Reduce planning time for large NOT IN lists containing NULL Zsolt Parragi <[email protected]>
  2026-03-03 01:08       ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
  2026-03-17 10:50         ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-03-18 04:32           ` Re: Reduce planning time for large NOT IN lists containing NULL David Rowley <[email protected]>
  2026-03-18 14:26             ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
@ 2026-03-19 04:28               ` David Rowley <[email protected]>
  0 siblings, 0 replies; 9+ messages in thread

From: David Rowley @ 2026-03-19 04:28 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

On Thu, 19 Mar 2026 at 03:26, Ilia Evdokimov
<[email protected]> wrote:
> After the new test was committed, I realized that v8 tests relies on selectivity calculation, which are not guaranteed to remain stable over time and way vary depending on planner heuristics or platform differences. Therefore, it seems better to remove tests from v8.
>
> Instead, we can test the invariant behavior: when NULL is present in a <> ALL clause, the selectivity is always 0.0.
>
> The v9-patch adds three test cases: a degenerate case with only NULL, NULL combined with constants, NULL combined with both constants and non-constant expression.
>
> Thoughts?

I've now pushed the main patch.

I did end up removing the first test of the v9 tests since it was
exercising the same code path as the 2nd test. I also didn't see the
need to execute the query, so I changed it to run EXPLAIN without
ANALYZE.

For the main patch, I only adjusted the comments a little. I wanted to
make reference to var_eq_const() as the short-circuit really is trying
to follow what that function would have done if the short-circuit path
hadn't been taken.

I also added comments in the tests to explain what we are expecting to
see. I am probably in a minority of people who do this. I find it's
important to understand that when it comes to updating the expected
results of existing tests. I expect we have plenty of tests that no
longer test what the original test was meant to test as a result of
people not doing this.

David





^ permalink  raw  reply  [nested|flat] 9+ messages in thread

end of thread, other threads:[~2026-03-19 04:28 UTC | newest]

Thread overview: 9+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24 08:15 Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
2026-02-24 08:29 ` David Geier <[email protected]>
2026-02-24 21:14   ` Ilia Evdokimov <[email protected]>
2026-02-25 13:43     ` Zsolt Parragi <[email protected]>
2026-03-03 01:08       ` David Rowley <[email protected]>
2026-03-17 10:50         ` Ilia Evdokimov <[email protected]>
2026-03-18 04:32           ` David Rowley <[email protected]>
2026-03-18 14:26             ` Ilia Evdokimov <[email protected]>
2026-03-19 04:28               ` David Rowley <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox