public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tatsuya Kawata <[email protected]>
To: David Geier <[email protected]>
Cc: Ilia Evdokimov <[email protected]>
Cc: Chengpeng Yan <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Hash-based MCV matching for large IN-lists
Date: Sat, 7 Feb 2026 16:42:01 +0900
Message-ID: <CAHza6qdAH99C0xa27YDTixiNVFa99j90QYquUPAxL0JwahmggA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

Hi,

Thank you for this patch.
I've been studying how PostgreSQL handles selectivity estimation, and this
optimization for large IN-lists looks very useful.

I ran some tests for the special cases David mentioned:

- NULL + ALL: correctly returns selectivity ≈ 0 (rows=1)
- isInequality: <> ALL estimates match NOT IN
- Cross-type: int = ANY(bigint[]) works correctly
- Duplicate values: IN (1,1,1,2,2,3) preserves existing behavior

I noticed a few minor points:

1. The comment in MCVInHashEntry struct contains a typo:
   "number of occurrences od current value" -> "of"

2. The ALL + NULL early-return logic appears in two places (lines 2579-2591
and 2644-2656). I initially considered consolidating this by checking for
NULL elements before building the hash table, but realized this would add
an extra loop in the common case where there are no NULLs.
   Perhaps a brief comment explaining why this check is duplicated (to
avoid the overhead of a separate NULL-scanning loop) would help future
readers understand the design choice?

3. Minor style suggestion: adding a brief SQL example in the header comment
(e.g., "WHERE x IN (1,2,3,...)" or "WHERE x = ANY(ARRAY[...])") might help
future readers quickly understand the use case.

Thanks again for working on this optimization. It's been very educational
to follow the discussion and understand how selectivity estimation works in
PostgreSQL.

Regards,
Tatsuya Kawata


view thread (10+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Hash-based MCV matching for large IN-lists
  In-Reply-To: <CAHza6qdAH99C0xa27YDTixiNVFa99j90QYquUPAxL0JwahmggA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox