public inbox for [email protected]
help / color / mirror / Atom feedFrom: David Geier <[email protected]>
To: Ilia Evdokimov <[email protected]>
To: PostgreSQL Developers <[email protected]>
Subject: Re: Reduce planning time for large NOT IN lists containing NULL
Date: Thu, 19 Feb 2026 16:38:41 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
Hi Ilia!
On 18.02.2026 15:11, Ilia Evdokimov wrote:
> Hi hackers,
>
> In this thread [0] an interesting idea came up about avoiding
> unnecessary work during selectivity estimation for x <> ALL (NULL, ...)
> or x NOT IN (NULL, ...)
>
> Semantically, if the array contains at least one NULL, the selectivity
> of x NOT IN (...) is always 0.0, regardless of the other elements in the
> list.
>
> Currently, the planner still iterates over all array elements and
> invokes the operator's selectivity estimator for each of them. For large
> IN / ALL lists, this increases planning time.
+1 on the general idea.
> For constant arrays I propose adding a simple pre-check before entering
> the per-element loop: detect whether the array contains at least one
> NULL element (e.g., via memchr() for the deconstructed array case). If
> so, and we are in the ALL / NOT IN case, we can immediately return
> selectivity = 0.0 and skip all further computation. This would avoid
> extra per-element estimation work while preserving semantics.
How much overhead does the memchr() call add? It seems like this
approach optimizes the edge case at the expense of the common case,
which doesn't seem like a good trade-off.
How about instead adding a flag to ArrayType which indicates if the
array contains NULL or not. This flag could be set in
construct_md_array() which already iterates over all elements. The flag
would need to be kept up-to-date, e.g. in array_set_element() and
possibly other functions modifying the array.
> In cases where array elements are not known to be constants in advance,
> such a pre-check is less straightforward, because each element must
> first be inspected to determine whether it is a Const and whether it is
> NULL. That already requires iterating through the list, so introducing a
> separate early pass would not actually reduce the amount of work.
> Therefore, it like makes sense to keep the current behavior in that
> situation.
Agreed.
--
David Geier
view thread (5+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Reduce planning time for large NOT IN lists containing NULL
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox