Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vocxq-00BRBi-0a for pgsql-hackers@arkaria.postgresql.org; Sat, 07 Feb 2026 07:42:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vocxm-006IGm-2y for pgsql-hackers@arkaria.postgresql.org; Sat, 07 Feb 2026 07:42:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vocxm-006IGG-1K for pgsql-hackers@lists.postgresql.org; Sat, 07 Feb 2026 07:42:42 +0000 Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vocxk-00000000xbz-0alj for pgsql-hackers@lists.postgresql.org; Sat, 07 Feb 2026 07:42:41 +0000 Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-4359a16a400so2658957f8f.1 for ; Fri, 06 Feb 2026 23:42:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770450157; cv=none; d=google.com; s=arc-20240605; b=HR3CCXmo7iw7KESP+vMsgIGQmMffXCcKN4ukhwc/g/g+2fhmAo3QagBy1R/eTOP7tW Xio4V/qadQBhvu9VKeloAeLaFbLnpMzHv1+xBiwfEY0ziK4g7uV/izqPJXSjMm2kLv1d KEv+L1XC45gufSFIB/NQDYUPPFiwIMd3C+dbqaYyJsdD6RlQw3E2i36yAvYWQHsEnZLG ob5ev1RItHZqMNZoraxHtOL3zvlsSvdOmvM4yynGtLDYlBXR9kemLW/LQc+z6hItm3Fj 5sX8k0QWCS8/7SxJFtDQy1RnEDaQw+o/w4ZpqgTFONYgLiyePG273AxXIemg5gihixZ3 xZJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=k5G6ig6Jmt2+cKlOQIveXxAp42clfAK9oNYhWpSbPeA=; fh=w+xu6R+jtArE5Ck5Ww501x8cKqVwOi3qkyFuzAh+ZFw=; b=RUWIdhwVODBwWwEszgpLMROlFWZMX2XpifbUjDwi0pKV1SS/VxAFVoxCZ9hNaSjyUJ Jz5X0brSIwT0FxKePkXklsTA5MerRn6uYKhcZrXXGHZjM9kUOGl95YPV2F+ULIaT2rBS YV2PjXpyy3EDIupsOmk41HwBZcCn9IuhabC4OriVtDRjrSgYaE92OOzuusRprIrhcY/m hJtrxXeMlGAP6YreH3QoZrwSKmXx2OW+t3st/8K9W9vp6U0R7j/AEJO/3XVFh8cJNn8Z +DAy2VZ3wroYJHJKOnJckgEONKdYYy/jyi07xr9Tx/iwouonSW78AJU9E3mVzph27Q+9 q23g==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770450157; x=1771054957; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=k5G6ig6Jmt2+cKlOQIveXxAp42clfAK9oNYhWpSbPeA=; b=Z7T0052jwEP2B7tTmufeVlOHewaQZTVEJmiB2wlSHIOHqZAskYzc6XiK0FhZ/7hyOB ZvKF80Ur6tHc0H/eDFKxal2K8CNyfDXe5woKDfzSaJpUx6sd/KMD6t7RkCFOYJ8yeQkj RO+1wD+JKnr0c2akdHk3paLEi4VY+oymz/faSCqcq6VT53nJJkNLTQnGe14cAamfgHvX +2Mg6i2YFo0cpfA0CCkDAzMyUA8aJJjGnCPoA0jubLlogrzVIax7+ADJFnpgO7t0WcfX xgJ4plnL6hrtELIljHG7XA1BxV9m+DLCOzrrrhQPtZPDvvRIYcxAl1qccLRZE2U4iNxr NgQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770450157; x=1771054957; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=k5G6ig6Jmt2+cKlOQIveXxAp42clfAK9oNYhWpSbPeA=; b=l3Jz1yWkReldCPNzGhQHrZazVP/ff3bDlWfCrLks/JsAMOs24GDLgXIHSvc65hU4/w F2NLH+dOsFrfiw0QDfKovCAD5/irtlR1fPHgdsm0q7YfsTUxHygMuY8ZFu4q2Hcz5AlR vyT6MzE7T9R9U5E1Io3BDUPH6qrxf5Gj+JNQlBDC6fHZEFJl1V90iRZOObLC4NsOUlQJ /VMy3RBchIfcrIIgEc/JMLJLWAslAv05OnZr6YhA3mbb2IQg2O38Woaatfooh5P7Io81 aStjFH17xWty5TJYqOqvWFuI4KLVW+m1j1a8fk9QPXwL1oerLqUMAPJg9nBr7n2tXWAs YL8A== X-Forwarded-Encrypted: i=1; AJvYcCVRLD+WaDdUoZXVLC+hXPxKyHDEw8n1rpu0G/sKDkpJoOT8INc9MFZiCKDzzKtZqVCt0yCRRT9BR0c6nOnq@lists.postgresql.org X-Gm-Message-State: AOJu0YwnJzshCs/CPp4DjTDUDD4Arw/RFvKkElazmeKSelG+EivJHgUI fvKQ23iI2dqcNRV/1SiYf5eYUVuQVabjCifAtTVr6J1eS/mZ2a8Zn24so+10w2sSolijxAP+biq G6gSL9pSArs4F9SA3Mdr3kSvofOEXZic= X-Gm-Gg: AZuq6aIQ3ClfEYCiMqh0p7IpkCD4sGKyyCx1/v8pq6b/g+eCvCv+7YG/vop73ISeRqi ydyrJO5OAATVvV5Pit8atV3S9iBJfl2eawfvkF5JPkgJVE3R2GrTidYzXepRdgLotadV2NrO9zr xV+npNJCP4VzvwnAbBebdsjyIruUtDc1rLAn832pIKuD9vOgAf5zNTuNWpmaTz6xvRcFYc/WE1d 776z6/Zm2s1SGx0OdePm6Sy489tf5GgOZo1LWPexp1ozTNUGBDqEmO0WHRclEPBsFC6MY1dG1TT w1uSJXwLYMzYdz+Xm+CRHjw8vCvLGNBwaihPdZLbNuo8WY0OwyHQCH7SraGUTxGCbllWew== X-Received: by 2002:a05:6000:2210:b0:432:c0e8:4a33 with SMTP id ffacd0b85a97d-43629341477mr7769818f8f.22.1770450157136; Fri, 06 Feb 2026 23:42:37 -0800 (PST) MIME-Version: 1.0 References: <7db341e0-fbc6-4ec5-922c-11fdafe7be12@tantorlabs.com> <988e3168-6096-488a-bb42-787e1e8c21a4@tantorlabs.com> <21FF6A0B-886F-4132-B6A6-0F4E934B4CEE@Outlook.com> <8ee9d903-bb13-434c-8145-b3769bf2b3cb@tantorlabs.com> In-Reply-To: From: Tatsuya Kawata Date: Sat, 7 Feb 2026 16:42:01 +0900 X-Gm-Features: AZwV_QhAoNtWb0d3KhmBAIPYLPvVy4mRa82dHMyGLc59oL7vkNAFWPOjnrDXG-Y Message-ID: Subject: Re: Hash-based MCV matching for large IN-lists To: David Geier Cc: Ilia Evdokimov , Chengpeng Yan , "pgsql-hackers@lists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000b56d50064a370cd0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000b56d50064a370cd0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Thank you for this patch. I've been studying how PostgreSQL handles selectivity estimation, and this optimization for large IN-lists looks very useful. I ran some tests for the special cases David mentioned: - NULL + ALL: correctly returns selectivity =E2=89=88 0 (rows=3D1) - isInequality: <> ALL estimates match NOT IN - Cross-type: int =3D ANY(bigint[]) works correctly - Duplicate values: IN (1,1,1,2,2,3) preserves existing behavior I noticed a few minor points: 1. The comment in MCVInHashEntry struct contains a typo: "number of occurrences od current value" -> "of" 2. The ALL + NULL early-return logic appears in two places (lines 2579-2591 and 2644-2656). I initially considered consolidating this by checking for NULL elements before building the hash table, but realized this would add an extra loop in the common case where there are no NULLs. Perhaps a brief comment explaining why this check is duplicated (to avoid the overhead of a separate NULL-scanning loop) would help future readers understand the design choice? 3. Minor style suggestion: adding a brief SQL example in the header comment (e.g., "WHERE x IN (1,2,3,...)" or "WHERE x =3D ANY(ARRAY[...])") might hel= p future readers quickly understand the use case. Thanks again for working on this optimization. It's been very educational to follow the discussion and understand how selectivity estimation works in PostgreSQL. Regards, Tatsuya Kawata --000000000000b56d50064a370cd0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Thank you for this patch.
I've been stud= ying how PostgreSQL handles selectivity estimation, and this optimization f= or large IN-lists looks very useful.

I ran some tests for the specia= l cases David mentioned:

- NULL + ALL: correctly returns selectivity= =E2=89=88 0 (rows=3D1)
- isInequality: <> ALL estimates match NOT= IN
- Cross-type: int =3D ANY(bigint[]) works correctly
- Duplicate v= alues: IN (1,1,1,2,2,3) preserves existing behavior

I noticed a few = minor points:

1. The comment in MCVInHashEntry struct contains a typ= o:
=C2=A0 =C2=A0"number of occurrences od current value" ->= "of"

2. The ALL + NULL early-return logic appears in two = places (lines 2579-2591 and 2644-2656). I initially considered consolidatin= g this by checking for NULL elements before building the hash table, but re= alized this would add an extra loop in the common case where there are no N= ULLs.
=C2=A0 =C2=A0Perhaps a brief comment explaining why this check is = duplicated (to avoid the overhead of a separate NULL-scanning loop) would h= elp future readers understand the design choice?

3. Minor style sugg= estion: adding a brief SQL example in the header comment (e.g., "WHERE= x IN (1,2,3,...)" or "WHERE x =3D ANY(ARRAY[...])") might h= elp future readers quickly understand the use case.

Thanks again for= working on this optimization. It's been very educational to follow the= discussion and understand how selectivity estimation works in PostgreSQL.<= br>
Regards,
Tatsuya Kawata
--000000000000b56d50064a370cd0--