Re: Reduce planning time for large NOT IN lists containing NULL

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Reduce planning time for large NOT IN lists containing NULL
5+ messages / 2 participants
[nested] [flat]

* Re: Reduce planning time for large NOT IN lists containing NULL
@ 2026-02-19 15:38 David Geier <[email protected]>
  2026-02-20 09:06 ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: David Geier @ 2026-02-19 15:38 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; PostgreSQL Developers <[email protected]>

Hi Ilia!

On 18.02.2026 15:11, Ilia Evdokimov wrote:
> Hi hackers,
> 
> In this thread [0] an interesting idea came up about avoiding
> unnecessary work during selectivity estimation for x <> ALL (NULL, ...)
> or x NOT IN (NULL, ...)
> 
> Semantically, if the array contains at least one NULL, the selectivity
> of x NOT IN (...) is always 0.0, regardless of the other elements in the
> list.
> 
> Currently, the planner still iterates over all array elements and
> invokes the operator's selectivity estimator for each of them. For large
> IN / ALL lists, this increases planning time.

+1 on the general idea.

> For constant arrays I propose adding a simple pre-check before entering
> the per-element loop: detect whether the array contains at least one
> NULL element (e.g., via memchr() for the deconstructed array case). If
> so, and we are in the ALL / NOT IN case, we can immediately return
> selectivity = 0.0 and skip all further computation. This would avoid
> extra per-element estimation work while preserving semantics.

How much overhead does the memchr() call add? It seems like this
approach optimizes the edge case at the expense of the common case,
which doesn't seem like a good trade-off.

How about instead adding a flag to ArrayType which indicates if the
array contains NULL or not. This flag could be set in
construct_md_array() which already iterates over all elements. The flag
would need to be kept up-to-date, e.g. in array_set_element() and
possibly other functions modifying the array.

> In cases where array elements are not known to be constants in advance,
> such a pre-check is less straightforward, because each element must
> first be inspected to determine whether it is a Const and whether it is
> NULL. That already requires iterating through the list, so introducing a
> separate early pass would not actually reduce the amount of work.
> Therefore, it like makes sense to keep the current behavior in that
> situation.

Agreed.

--
David Geier






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-19 15:38 Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
@ 2026-02-20 09:06 ` Ilia Evdokimov <[email protected]>
  2026-02-20 13:25   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Ilia Evdokimov @ 2026-02-20 09:06 UTC (permalink / raw)
  To: David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

Hi David,

Thanks for review

On 2/19/26 18:38, David Geier wrote:
> +1 on the general idea.
>
>> For constant arrays I propose adding a simple pre-check before entering
>> the per-element loop: detect whether the array contains at least one
>> NULL element (e.g., via memchr() for the deconstructed array case). If
>> so, and we are in the ALL / NOT IN case, we can immediately return
>> selectivity = 0.0 and skip all further computation. This would avoid
>> extra per-element estimation work while preserving semantics.
> How much overhead does the memchr() call add? It seems like this
> approach optimizes the edge case at the expense of the common case,
> which doesn't seem like a good trade-off.
>
> How about instead adding a flag to ArrayType which indicates if the
> array contains NULL or not. This flag could be set in
> construct_md_array() which already iterates over all elements. The flag
> would need to be kept up-to-date, e.g. in array_set_element() and
> possibly other functions modifying the array.

It seems we might reinventing the wheel.

There is already ARR_HASNULL() which allows us to detect the presence of 
NULL in ArrayType.

>> In cases where array elements are not known to be constants in advance,
>> such a pre-check is less straightforward, because each element must
>> first be inspected to determine whether it is a Const and whether it is
>> NULL. That already requires iterating through the list, so introducing a
>> separate early pass would not actually reduce the amount of work.
>> Therefore, it like makes sense to keep the current behavior in that
>> situation.
> Agreed.

After thinking about this more, is seems reasonable to short-circuit еру 
loop when we detect a NULL element by checking whether the element is a 
Const and NULL.


I've attached v2 patch.

-- 
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/


Attachments:

  [text/x-patch] v2-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch (1.7K, 3-v2-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch)
  download | inline diff:
From 76775013947d9345b4754e33d9b807845f86546f Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Fri, 20 Feb 2026 12:03:10 +0300
Subject: [PATCH v2] Reduce planning time for large NOT IN lists containing
 NULL

For x <> ALL (...) / x NOT IN (...), the presence of a NULL element
makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large NOT IN / <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec655593..9770656f125 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2025,6 +2025,10 @@ scalararraysel(PlannerInfo *root,
 						  elmlen, elmbyval, elmalign,
 						  &elem_values, &elem_nulls, &num_elems);
 
+		/* Selectivity of "WHERE x NOT IN (NULL, ... )" is always 0 "*/
+		if (!useOr && ARR_HASNULL(arrayval))
+			return (Selectivity) 0.0;
+
 		/*
 		 * For generic operators, we assume the probability of success is
 		 * independent for each array element.  But for "= ANY" or "<> ALL",
@@ -2115,6 +2119,9 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/* Selectivity of "WHERE x NOT IN (NULL, ... )" is always 0 "*/
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-19 15:38 Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-20 09:06 ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
@ 2026-02-20 13:25   ` Ilia Evdokimov <[email protected]>
  2026-02-20 13:28     ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Ilia Evdokimov @ 2026-02-20 13:25 UTC (permalink / raw)
  To: David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

On 2/20/26 12:06, Ilia Evdokimov wrote:

> There is already ARR_HASNULL() which allows us to detect the presence 
> of NULL in ArrayType.
>
I've moved the NULL check higher, placing it immediately after 
DatumGetArrayTypeP(). This allows us to detect the presence of NULL 
elements before decontructing the array.


I tested this with several queries of the form:

WHERE x NOT IN (NULL, ...)

In my tests (with column x having detailed statistics, so selectivity 
estimation performs more work), planning time decreases from *5-200 ms 
before the patch* to *~ 1-2 ms after the patch*, depending on the list size.

-- 
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-19 15:38 Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-20 09:06 ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-20 13:25   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
@ 2026-02-20 13:28     ` Ilia Evdokimov <[email protected]>
  2026-02-20 14:22       ` Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Ilia Evdokimov @ 2026-02-20 13:28 UTC (permalink / raw)
  To: David Geier <[email protected]>; PostgreSQL Developers <[email protected]>

On 2/20/26 16:25, Ilia Evdokimov wrote:

> I've moved the NULL check higher, placing it immediately after 
> DatumGetArrayTypeP(). This allows us to detect the presence of NULL 
> elements before decontructing the array.
>
Sorry, I forgot to attach the patch in my previous mail. Attaching it now.

-- 
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/


Attachments:

  [text/x-patch] v3-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch (1.8K, 2-v3-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch)
  download | inline diff:
From 909e4683a400f0aa613fc77fa4bb6f0208c75f78 Mon Sep 17 00:00:00 2001
From: Ilia Evdokimov <[email protected]>
Date: Fri, 20 Feb 2026 16:14:12 +0300
Subject: [PATCH v3] Reduce planning time for large NOT IN lists containing
 NULL

For x <> ALL (...) / x NOT IN (...), the presence of a NULL element
makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large NOT IN / <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec655593..624356a30b2 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2018,6 +2018,11 @@ scalararraysel(PlannerInfo *root,
 		if (arrayisnull)		/* qual can't succeed if null array */
 			return (Selectivity) 0.0;
 		arrayval = DatumGetArrayTypeP(arraydatum);
+
+		/* Selectivity of "WHERE x NOT IN (NULL, ... )" is always 0 */
+		if (!useOr && ARR_HASNULL(arrayval))
+			return (Selectivity) 0.0;
+
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 		deconstruct_array(arrayval,
@@ -2115,6 +2120,10 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/* Selectivity of "WHERE x NOT IN (NULL, ... )" is always 0 */
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
+
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Reduce planning time for large NOT IN lists containing NULL
  2026-02-19 15:38 Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
  2026-02-20 09:06 ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-20 13:25   ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
  2026-02-20 13:28     ` Re: Reduce planning time for large NOT IN lists containing NULL Ilia Evdokimov <[email protected]>
@ 2026-02-20 14:22       ` David Geier <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: David Geier @ 2026-02-20 14:22 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; PostgreSQL Developers <[email protected]>

On 20.02.2026 14:28, Ilia Evdokimov wrote:
> On 2/20/26 16:25, Ilia Evdokimov wrote:
> 
>> I've moved the NULL check higher, placing it immediately after
>> DatumGetArrayTypeP(). This allows us to detect the presence of NULL
>> elements before decontructing the array.
>>
> Sorry, I forgot to attach the patch in my previous mail. Attaching it now.

Cool that the array code had that functionality already.

The patch looks good to me and now no longer regresses other cases. The
speedup will be less pronounced once the hash-based [NOT] IN code is
merged [1] but will still save considerable amounts of cycles.

It seems like we don't have a regression test which has a NULL value in
the NOT IN list. Maybe you can find a good place to add that one?

--
David Geier

[1]
https://www.postgresql.org/message-id/flat/7db341e0-fbc6-4ec5-922c-11fdafe7be12%40tantorlabs.com






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2026-02-20 14:22 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-19 15:38 Re: Reduce planning time for large NOT IN lists containing NULL David Geier <[email protected]>
2026-02-20 09:06 ` Ilia Evdokimov <[email protected]>
2026-02-20 13:25   ` Ilia Evdokimov <[email protected]>
2026-02-20 13:28     ` Ilia Evdokimov <[email protected]>
2026-02-20 14:22       ` David Geier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox