Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vtMTY-001YUw-0k for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Feb 2026 09:07:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vtMTU-00774D-0X for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Feb 2026 09:07:00 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vtMTT-007745-2D for pgsql-hackers@lists.postgresql.org; Fri, 20 Feb 2026 09:06:59 +0000 Received: from forward501a.mail.yandex.net ([2a02:6b8:c0e:500:1:45:d181:d501]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vtMTQ-00000000M8w-0npp for pgsql-hackers@lists.postgresql.org; Fri, 20 Feb 2026 09:06:59 +0000 Received: from mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:582e:0:640:200:0]) by forward501a.mail.yandex.net (Yandex) with ESMTPS id 080EA81268; Fri, 20 Feb 2026 12:06:56 +0300 (MSK) Received: by mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id t6YC59mGtCg0-YfHAres9; Fri, 20 Feb 2026 12:06:55 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tantorlabs.com; s=mail; t=1771578415; bh=aeSLO0ULChMsHR/+Z2pylAQN/gYGbo6MBMbJvBDUYYU=; h=In-Reply-To:References:To:Subject:Message-ID:Date:From; b=oqofv3/ON0ToErss7WjV1CvvyhpM2hfhcTrYh5YJDHLFhFk2Rkb5esXUIURjrPGdF J9G8XuEQut1qvq98OKQw2wsy8NZ+n9v3si9eEpQn+iuG60E3gNgSUNIw2vq7SmkexV OAAErBlw/wfawQ11ljHGPrxJU/NtL6nGCslkF5c0= Authentication-Results: mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net; dkim=pass header.i=@tantorlabs.com Content-Type: multipart/mixed; boundary="------------5zRGx0TmCB4vzKWybUsQvigs" Message-ID: <7791ec74-369d-48c7-969c-cf4ad18024f2@tantorlabs.com> Date: Fri, 20 Feb 2026 12:06:55 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Reduce planning time for large NOT IN lists containing NULL To: David Geier , PostgreSQL Developers References: <52596e08-0902-475b-ad0f-265ec6918f19@gmail.com> Content-Language: en-US From: Ilia Evdokimov In-Reply-To: <52596e08-0902-475b-ad0f-265ec6918f19@gmail.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------5zRGx0TmCB4vzKWybUsQvigs Content-Type: multipart/alternative; boundary="------------r8LAm0pPWkkB9qEMRqs6QNgT" --------------r8LAm0pPWkkB9qEMRqs6QNgT Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi David, Thanks for review On 2/19/26 18:38, David Geier wrote: > +1 on the general idea. > >> For constant arrays I propose adding a simple pre-check before entering >> the per-element loop: detect whether the array contains at least one >> NULL element (e.g., via memchr() for the deconstructed array case). If >> so, and we are in the ALL / NOT IN case, we can immediately return >> selectivity = 0.0 and skip all further computation. This would avoid >> extra per-element estimation work while preserving semantics. > How much overhead does the memchr() call add? It seems like this > approach optimizes the edge case at the expense of the common case, > which doesn't seem like a good trade-off. > > How about instead adding a flag to ArrayType which indicates if the > array contains NULL or not. This flag could be set in > construct_md_array() which already iterates over all elements. The flag > would need to be kept up-to-date, e.g. in array_set_element() and > possibly other functions modifying the array. It seems we might reinventing the wheel. There is already ARR_HASNULL() which allows us to detect the presence of NULL in ArrayType. >> In cases where array elements are not known to be constants in advance, >> such a pre-check is less straightforward, because each element must >> first be inspected to determine whether it is a Const and whether it is >> NULL. That already requires iterating through the list, so introducing a >> separate early pass would not actually reduce the amount of work. >> Therefore, it like makes sense to keep the current behavior in that >> situation. > Agreed. After thinking about this more, is seems reasonable to short-circuit еру loop when we detect a NULL element by checking whether the element is a Const and NULL. I've attached v2 patch. -- Best regards. Ilia Evdokimov, Tantor Labs LLC, https://tantorlabs.com/ --------------r8LAm0pPWkkB9qEMRqs6QNgT Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi David,

Thanks for review

On 2/19/26 18:38, David Geier wrote:
+1 on the general idea.

For constant arrays I propose adding a simple pre-check before entering
the per-element loop: detect whether the array contains at least one
NULL element (e.g., via memchr() for the deconstructed array case). If
so, and we are in the ALL / NOT IN case, we can immediately return
selectivity = 0.0 and skip all further computation. This would avoid
extra per-element estimation work while preserving semantics.
How much overhead does the memchr() call add? It seems like this
approach optimizes the edge case at the expense of the common case,
which doesn't seem like a good trade-off.

How about instead adding a flag to ArrayType which indicates if the
array contains NULL or not. This flag could be set in
construct_md_array() which already iterates over all elements. The flag
would need to be kept up-to-date, e.g. in array_set_element() and
possibly other functions modifying the array.

It seems we might reinventing the wheel.

There is already ARR_HASNULL() which allows us to detect the presence of NULL in ArrayType.

In cases where array elements are not known to be constants in advance,
such a pre-check is less straightforward, because each element must
first be inspected to determine whether it is a Const and whether it is
NULL. That already requires iterating through the list, so introducing a
separate early pass would not actually reduce the amount of work.
Therefore, it like makes sense to keep the current behavior in that
situation.
Agreed.

After thinking about this more, is seems reasonable to short-circuit еру loop when we detect a NULL element by checking whether the element is a Const and NULL.


I've attached v2 patch.

--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

--------------r8LAm0pPWkkB9qEMRqs6QNgT-- --------------5zRGx0TmCB4vzKWybUsQvigs Content-Type: text/x-patch; charset=UTF-8; name="v2-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.patch" Content-Disposition: attachment; filename*0="v2-0001-Reduce-planning-time-for-large-NOT-IN-lists-conta.pa"; filename*1="tch" Content-Transfer-Encoding: base64 RnJvbSA3Njc3NTAxMzk0N2Q5MzQ1YjQ3NTRlMzNkOWI4MDc4NDVmODY1NDZmIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBJbGlhIEV2ZG9raW1vdiA8aWx5YS5ldmRva2ltb3ZA dGFudG9ybGFicy5ydT4KRGF0ZTogRnJpLCAyMCBGZWIgMjAyNiAxMjowMzoxMCArMDMwMApT dWJqZWN0OiBbUEFUQ0ggdjJdIFJlZHVjZSBwbGFubmluZyB0aW1lIGZvciBsYXJnZSBOT1Qg SU4gbGlzdHMgY29udGFpbmluZwogTlVMTAoKRm9yIHggPD4gQUxMICguLi4pIC8geCBOT1Qg SU4gKC4uLiksIHRoZSBwcmVzZW5jZSBvZiBhIE5VTEwgZWxlbWVudAptYWtlcyB0aGUgc2Vs ZWN0aXZpdHkgMC4wLgoKVGhlIHBsYW5uZXIgY3VycmVudGx5IHN0aWxsIGl0ZXJhdGVzIG92 ZXIgYWxsIGVsZW1lbnRzIGFuZCBjb21wdXRlcwpwZXItZWxlbWVudCBzZWxlY3Rpdml0eSwg ZXZlbiB0aG91Z2ggdGhlIGZpbmFsIHJlc3VsdCBpcyBrbm93bi4KCkFkZCBhbiBlYXJseSBO VUxMIGNoZWNrIGZvciBjb25zdGFudCBhcnJheXMgYW5kIGltbWVkaWF0ZWx5IHJldHVybgow LjAgdW5kZXIgQUxMIHNlbWFudGljcy4KClRoaXMgcmVkdWNlcyBwbGFubmluZyB0aW1lIGZv ciBsYXJnZSBOT1QgSU4gLyA8PiBBTEwgbGlzdHMgd2l0aG91dApjaGFuZ2luZyBzZW1hbnRp Y3MuCi0tLQogc3JjL2JhY2tlbmQvdXRpbHMvYWR0L3NlbGZ1bmNzLmMgfCA3ICsrKysrKysK IDEgZmlsZSBjaGFuZ2VkLCA3IGluc2VydGlvbnMoKykKCmRpZmYgLS1naXQgYS9zcmMvYmFj a2VuZC91dGlscy9hZHQvc2VsZnVuY3MuYyBiL3NyYy9iYWNrZW5kL3V0aWxzL2FkdC9zZWxm dW5jcy5jCmluZGV4IDI5ZmVjNjU1NTkzLi45NzcwNjU2ZjEyNSAxMDA2NDQKLS0tIGEvc3Jj L2JhY2tlbmQvdXRpbHMvYWR0L3NlbGZ1bmNzLmMKKysrIGIvc3JjL2JhY2tlbmQvdXRpbHMv YWR0L3NlbGZ1bmNzLmMKQEAgLTIwMjUsNiArMjAyNSwxMCBAQCBzY2FsYXJhcnJheXNlbChQ bGFubmVySW5mbyAqcm9vdCwKIAkJCQkJCSAgZWxtbGVuLCBlbG1ieXZhbCwgZWxtYWxpZ24s CiAJCQkJCQkgICZlbGVtX3ZhbHVlcywgJmVsZW1fbnVsbHMsICZudW1fZWxlbXMpOwogCisJ CS8qIFNlbGVjdGl2aXR5IG9mICJXSEVSRSB4IE5PVCBJTiAoTlVMTCwgLi4uICkiIGlzIGFs d2F5cyAwICIqLworCQlpZiAoIXVzZU9yICYmIEFSUl9IQVNOVUxMKGFycmF5dmFsKSkKKwkJ CXJldHVybiAoU2VsZWN0aXZpdHkpIDAuMDsKKwogCQkvKgogCQkgKiBGb3IgZ2VuZXJpYyBv cGVyYXRvcnMsIHdlIGFzc3VtZSB0aGUgcHJvYmFiaWxpdHkgb2Ygc3VjY2VzcyBpcwogCQkg KiBpbmRlcGVuZGVudCBmb3IgZWFjaCBhcnJheSBlbGVtZW50LiAgQnV0IGZvciAiPSBBTlki IG9yICI8PiBBTEwiLApAQCAtMjExNSw2ICsyMTE5LDkgQEAgc2NhbGFyYXJyYXlzZWwoUGxh bm5lckluZm8gKnJvb3QsCiAJCQlMaXN0CSAgICphcmdzOwogCQkJU2VsZWN0aXZpdHkgczI7 CiAKKwkJCS8qIFNlbGVjdGl2aXR5IG9mICJXSEVSRSB4IE5PVCBJTiAoTlVMTCwgLi4uICki IGlzIGFsd2F5cyAwICIqLworCQkJaWYgKCF1c2VPciAmJiBJc0EoZWxlbSwgQ29uc3QpICYm ICgoQ29uc3QgKikgZWxlbSktPmNvbnN0aXNudWxsKQorCQkJCXJldHVybiAoU2VsZWN0aXZp dHkpIDAuMDsKIAkJCS8qCiAJCQkgKiBUaGVvcmV0aWNhbGx5LCBpZiBlbGVtIGlzbid0IG9m IG5vbWluYWxfZWxlbWVudF90eXBlIHdlIHNob3VsZAogCQkJICogaW5zZXJ0IGEgUmVsYWJl bFR5cGUsIGJ1dCBpdCBzZWVtcyB1bmxpa2VseSB0aGF0IGFueSBvcGVyYXRvcgotLSAKMi4z NC4xCgo= --------------5zRGx0TmCB4vzKWybUsQvigs--