Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wP0Wv-000LLF-30 for pgsql-hackers@arkaria.postgresql.org; Mon, 18 May 2026 16:09:22 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wP0Ws-002kv1-2M for pgsql-hackers@arkaria.postgresql.org; Mon, 18 May 2026 16:09:19 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wP0Ws-002kus-0c for pgsql-hackers@lists.postgresql.org; Mon, 18 May 2026 16:09:19 +0000 Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wP0Wp-00000000C9A-3oyH for pgsql-hackers@lists.postgresql.org; Mon, 18 May 2026 16:09:18 +0000 Received: by mail-lj1-x234.google.com with SMTP id 38308e7fff4ca-393c40246afso30265561fa.1 for ; Mon, 18 May 2026 09:09:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779120555; cv=none; d=google.com; s=arc-20240605; b=YrYOctcOZYbyxiJKiWRVMhZbIBkvhDphlTUYxC7DCsrijPDPlKzssNFQ5ovBvBX2UB YUULh1moVIGH7YVPkyPQBK1a6eG6ImXBtlorx9nLdcnjVaIt7eoOdHIWGmnRgd6ba23o Ky9MxpwGUchslo4e67kkh/jyu5ZUO496PJPLhb3iuxOHAkaaE+ISbprTxVTxFqCIectc 1+2nxLpTpgVCQjcA7/lXuEJEDbbNKPijIeDGIoi9Xn+eov/K9x1JMGHOVR/qsoySWsev 5ru9CPo28nt62wDB1LdA8PNuXwDZlyQIqbuLt5ej9fRUaPkwBE20MbWgHtX9CoznlIhK N2eA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:mime-version:dkim-signature; bh=TyLqnmtqqZQVk2WYh/ONzk6x7sF/bCfKjbxzDPMfzwI=; fh=nwNxTtLLPTU0ewfLM7SSbrjMajMl+wwnFkCY/fi90vE=; b=eXd6JRE6GPwfZtTHFZoKi0sBDMk+UBeI8WXy4USlmBIzheJlHpWsIj4aFQW5eN7MHL XjuuXdcFiiBmiSjVsDojOPiIjYq6IxmT9FCNZaql98zUF6c90xvd8xRmRrnpgU5/kYAx 89w8N5oB7YzJr2A92CmpfICw8EAoy1ijui7V10zAXmo0muL/4PU4iTGZYs2XGYoI+7kF kmDO9rJIqiyXLuv5GEy9Qv82r3sm+hPLCe/yVK7rvt+9LuH8poPg7YoG12u3/V4CRBwu vAfD5POq1GJ4Sn5rONeDkvy/ncpcYn3iJVyPuWjMfmciLvW7ZHJ99Vm9T2xoJqpSlNTX yJBw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779120555; x=1779725355; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=TyLqnmtqqZQVk2WYh/ONzk6x7sF/bCfKjbxzDPMfzwI=; b=NNzt26jJntlGQPMCjoMntYwj6DAUfPUokKygkefROgWK4f4sE2RkX4uyM82SypSpPn QVjj7rx+0P5sih1/FTQJ6aYgngrlEXNKvwpyLJuX6rCnMrn8JJr/OXS5ZNP0weUQnfZ6 GAEGdmQ+T/ob+U1cFcw31eX/4+ACLa4Y5sbx75c3kMkaUgVaI2fAZFTxq62p8xOmnjI5 dFlxb5ORa7PAat2LcMS4yqloYB3Vrw0lENk/u2c5mVCPWPSc8rcJYRUV2zte32BCY03b VfFQI/g5qT1TDT38Bf9Z3Bs5m6g7/vC9PiVgrS6wMKtihjKnWT9RAmugPEmeK4XiFagB 97cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779120555; x=1779725355; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TyLqnmtqqZQVk2WYh/ONzk6x7sF/bCfKjbxzDPMfzwI=; b=iaEsuvEjzwR70r3B5LTK9DH79HBrK3YRd9jBV2+KjmJW9lV2Z1W2kIYJ/2YQSj2nIU U1qBFiFbpfkZleUUYNty6ODDOP2jLbFmzU0cx27mdXFT8PAi5NPgVmyYJCnatR/qORJG W7tH5xKKV5Ms4pcbDIzmBhh37S91SAlTtQ9OdrO33rFY5hMZ9m1/8qpCmZaOnf19z7Sa qYJ2It/ITEivFqKSC8yDy+2nHnGButVMDSinQZFTziEw+jM8OQ304CfpHAqHOiCZCibt pZqqGToiI5n10aElravcG8LjEM/RCuSzS9ILjCwPS3jQ8lDVL2wbAZy5rT6RC1WdIylA rz4Q== X-Gm-Message-State: AOJu0YyYHLdkv+5EDcU+uYVzZEOOK5YSC/9Xjy7TBruiGl5CCZxi4D+b u16EEEgbrTXKvCPeKBNuC+igWqwVQTPBxxNamRZj59DDXGRBYhv/BB/WzIHt7BEB0/YohENk2tR 3i+V+HYFusebJ1UT4ncKYbwPcL4A0KeuULRdU X-Gm-Gg: Acq92OHd/0U9C2R1HDKpXwTUJHQTPl1LwZ1uEGkrgwps9Zzo/kdV2O1rxHoAvCI2dbR hxaZuNSGwZv2Xjt8XekU3WSFA5KRrS6Xgi62OedQ+x09SZURbqE7XMTBkM0QXiNAM5BNgaeccRl C4y0NulVrI3pgLDdhh1Rh8+0j1KXxLXOc1VvsRCMCrp7xWVvr/5uFIaVsu6hPBWD6xuyYsUU7ho VJJKd5/smm9WlHu2hpEeOVxJQlKNryPNgIm6I6ZySAatkKz/Csljd3gK3AZvsbRii3Gjm4Oaa+v Y69NT72IdjDWoSLfKDh+GVKwqKDbQqpwfP+Z0DkO3w== X-Received: by 2002:a05:651c:1609:b0:395:9ffa:58a0 with SMTP id 38308e7fff4ca-3959ffa5d2dmr5626511fa.12.1779120554480; Mon, 18 May 2026 09:09:14 -0700 (PDT) MIME-Version: 1.0 From: =?UTF-8?Q?Enrique_S=C3=A1nchez?= Date: Mon, 18 May 2026 18:09:03 +0200 X-Gm-Features: AVHnY4LpQQjURCF-68g5m_AaUqbeJDz5k1cTnXJOmehQsOpFiB1UmTUGeDizH1c Message-ID: Subject: Extended statistics improvement: multi-column MCV missing values To: pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000a99865065219c8cc" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a99865065219c8cc Content-Type: text/plain; charset="UTF-8" Hi, Postgres only uses multi-column MCVs when the value we are looking for is in the list. If not, it falls back into individual independent statistics to estimate selectivity. However, a miss in a multi-column MCV list still yields valuable information that it currently throws away: we know that the combination's frequency is strictly bounded by the frequency of the last (least common) item in that MCV list. Test case ========= ``` -- 1. Create a minimal table DROP TABLE IF EXISTS planner_trap; CREATE TABLE planner_trap ( col_a integer, col_b integer, col_c integer, sort_col integer ); -- 2. 1 becomes the most common value (MCV) for all three columns. INSERT INTO planner_trap (col_a, col_b, col_c, sort_col) SELECT (pow(random(), 5) * 100)::integer + 1, (pow(random(), 5) * 10)::integer + 1, (pow(random(), 5) * 50)::integer + 1, i FROM generate_series(1, 100000) AS i; -- 3. Create indexes CREATE INDEX idx_planner_trap_sort ON planner_trap(sort_col); CREATE INDEX idx_planner_trap_a_b_c ON planner_trap(col_a, col_b, col_c); -- 4. Make the exact combination (1, 1, 1) yield zero rows. This ensures it won't appear -- in the MCV list, even though the value '1' remains the most common for each individual column. UPDATE planner_trap SET col_a = 2 WHERE col_a = 1 AND col_b = 1 AND col_c = 1; -- 5. Create MCV statistics CREATE STATISTICS (mcv) on col_a, col_b, col_c FROM planner_trap; ANALYZE planner_trap; ``` The planner multiplies the individual selectivities and overestimates the row count (5909; actual=0): ``` postgres=# EXPLAIN ANALYZE SELECT * FROM planner_trap WHERE col_a = 1 AND col_b = 1 AND col_c = 1 ORDER BY sort_col DESC LIMIT 10; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..7.72 rows=10 width=16) (actual time=14.632..14.633 rows=0.00 loops=1) Buffers: shared hit=16355 dirtied=1 -> Index Scan Backward using idx_planner_trap_sort on planner_trap (cost=0.29..4386.99 rows=5909 width=16) (actual time=14.630..14.630 rows=0.00 loops=1) Filter: ((col_a = 1) AND (col_b = 1) AND (col_c = 1)) Rows Removed by Filter: 100000 Index Searches: 1 Buffers: shared hit=16355 dirtied=1 Planning: Buffers: shared hit=33 read=1 Planning Time: 0.478 ms Execution Time: 14.651 ms ``` 1. Cap selectivity with the last MCV item's frequency ===================================================== Applying the last MCV cap here, Postgres estimates 117 rows: 0.001167 * 100000 = 117 ``` postgres=# SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid), pg_mcv_list_items(stxdmcv) m WHERE stxname = 'planner_trap_col_a_col_b_col_c_stat' and index = 99; index | values | nulls | frequency | base_frequency -------+----------+---------+-----------------------+---------------- 99 | {1,1,31} | {f,f,f} | 0.0011666666666666668 | 0.001049409536 ``` making postgres to choose a better plan: ``` Limit (cost=300.72..300.75 rows=10 width=16) (actual time=0.045..0.046 rows=0.00 loops=1) Buffers: shared hit=3 read=2 -> Sort (cost=300.72..301.02 rows=117 width=16) (actual time=0.043..0.044 rows=0.00 loops=1) Sort Key: sort_col DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=3 read=2 -> Bitmap Heap Scan on planner_trap (cost=5.78..298.19 rows=117 width=16) (actual time=0.026..0.027 rows=0.00 loops=1) Recheck Cond: ((col_a = 1) AND (col_b = 1) AND (col_c = 1)) Buffers: shared read=2 -> Bitmap Index Scan on idx_planner_trap_a_b_c (cost=0.00..5.76 rows=117 width=0) (actual time=0.019..0.020 rows=0.00 loops=1) Index Cond: ((col_a = 1) AND (col_b = 1) AND (col_c = 1)) Index Searches: 1 Buffers: shared read=2 Planning: Buffers: shared hit=51 read=8 dirtied=4 Planning Time: 0.742 ms Execution Time: 0.094 ms ``` 2. Estimate selectivity as Postgres does for single-column values not in MCVs ============================================================================= While that significantly improves estimations, we could mirror what Postgres already does for individual MCVs. Quote from the official documentation: > The approach is to use the fact that the value is not in the list, combined with the knowledge of the frequencies for all of the MCVs: > That is, add up all the frequencies for the MCVs and subtract them from one, then divide by the number of other distinct values. To achieve this, we need to store an ndistinct estimation alongside the MCVs that can be used for partial or entire column match. P(1, 1, 1) = (1 - sum(MCVs)) / (ndistinct(col_a, col_b, col_c) - MCV_list_size) ``` postgres=# SELECT sum(m.frequency) FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid), pg_mcv_list_items(stxdmcv) m WHERE stxname = 'planner_trap_col_a_col_b_col_c_stat'; sum --------------------- 0.39456666666666645 postgres=# SELECT stxkeys AS k, stxdndistinct AS nd FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid) WHERE stxname = 'stts2'; k | nd -------+--------------------------------------------------- 1 2 3 | [...{"attributes": [1, 2, 3], "ndistinct": 8511}] ``` Then (1 - 0.39456666666666645) / (8511 - 100) = 0.000071981; 0.000071981 * 100000 = 7 rows ``` Limit (cost=30.30..30.32 rows=7 width=16) (actual time=0.035..0.036 rows=0.00 loops=1) Buffers: shared hit=2 -> Sort (cost=30.30..30.32 rows=7 width=16) (actual time=0.033..0.034 rows=0.00 loops=1) Sort Key: sort_col DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=2 -> Bitmap Heap Scan on planner_trap (cost=4.38..30.20 rows=7 width=16) (actual time=0.028..0.029 rows=0.00 loops=1) Recheck Cond: ((col_a = 1) AND (col_b = 1) AND (col_c = 1)) Buffers: shared hit=2 -> Bitmap Index Scan on idx_planner_trap_a_b_c (cost=0.00..4.38 rows=7 width=0) (actual time=0.022..0.022 rows=0.00 loops=1) Index Cond: ((col_a = 1) AND (col_b = 1) AND (col_c = 1)) Index Searches: 1 Buffers: shared hit=2 Planning Time: 0.192 ms Execution Time: 0.057 ms ``` I think this is a cheap way to prevent bad estimations. The storage overhead of adding an ndistinct field is trivial compared to the MCV list itself, and the O(1) arithmetic during planning adds no measurable overhead. I look forward to your feedback before drafting a patch. Best, Enrique. --000000000000a99865065219c8cc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Postgres = only uses multi-column MCVs when the value we are looking for is in the lis= t. If not, it falls back into individual independent statistics to estimate= selectivity.
However, a miss in a multi-column MCV list still yields va= luable information that it currently throws away: we know that the combinat= ion's frequency is strictly bounded by the frequency of the last (least= common) item in that MCV list.

Test case
=3D=3D=3D=3D=3D=3D=3D= =3D=3D
```
-- 1. Create a minimal table
DROP TABLE IF EXISTS plann= er_trap;
CREATE TABLE planner_trap (
=C2=A0 =C2=A0 col_a integer,
= =C2=A0 =C2=A0 col_b integer,
=C2=A0 =C2=A0 col_c integer,
=C2=A0 =C2= =A0 sort_col integer
);

-- 2. 1 becomes the most common value (MC= V) for all three columns.
INSERT INTO planner_trap (col_a, col_b, col_c,= sort_col)
SELECT
=C2=A0 =C2=A0 (pow(random(), 5) * 100)::integer + = 1,
=C2=A0 =C2=A0 (pow(random(), 5) * 10)::integer + 1,
=C2=A0 =C2=A0 = (pow(random(), 5) * 50)::integer + 1,
=C2=A0 =C2=A0 i
FROM generate_s= eries(1, 100000) AS i;

-- 3. Create indexes
CREATE INDEX idx_plan= ner_trap_sort ON planner_trap(sort_col);
CREATE INDEX idx_planner_trap_a= _b_c ON planner_trap(col_a, col_b, col_c);

-- 4. Make the exact comb= ination (1, 1, 1) yield zero rows. This ensures it won't appear
-- i= n the MCV list, even though the value '1' remains the most common f= or each individual column.
UPDATE planner_trap
SET col_a =3D 2
W= HERE col_a =3D 1 AND col_b =3D 1 AND col_c =3D 1;

-- 5. Create MCV s= tatistics
CREATE STATISTICS (mcv) on col_a, col_b, col_c FROM planner_tr= ap;
ANALYZE planner_trap;
```

The planner multiplies the indiv= idual selectivities and overestimates the row count (5909; actual=3D0):
= ```
postgres=3D# EXPLAIN ANALYZE SELECT * FROM planner_trap WHERE col_a = =3D 1 AND col_b =3D 1 AND col_c =3D 1 ORDER BY sort_col DESC LIMIT 10;
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0
---------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------
=C2=A0Limit =C2=A0(cost=3D0.29..7.72 rows=3D= 10 width=3D16) (actual time=3D14.632..14.633 rows=3D0.00 loops=3D1)
=C2= =A0 =C2=A0Buffers: shared hit=3D16355 dirtied=3D1
=C2=A0 =C2=A0-> =C2= =A0Index Scan Backward using idx_planner_trap_sort on planner_trap =C2=A0(c= ost=3D0.29..4386.99 rows=3D5909 width=3D16) (actual time=3D14.630..14.630 r= ows=3D0.00 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Filter: ((col_a = =3D 1) AND (col_b =3D 1) AND (col_c =3D 1))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Rows Removed by Filter: 100000
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I= ndex Searches: 1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit= =3D16355 dirtied=3D1
=C2=A0Planning:
=C2=A0 =C2=A0Buffers: shared hit= =3D33 read=3D1
=C2=A0Planning Time: 0.478 ms
=C2=A0Execution Time: 14= .651 ms
```

1. Cap selectivity with the last MCV item's frequ= ency
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D
Applying the last MCV cap here, Postgres estimates 11= 7 rows: 0.001167 * 100000 =3D 117
```
postgres=3D# SELECT m.* FROM pg= _statistic_ext join pg_statistic_ext_data on (oid =3D stxoid),
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pg_mcv_list_items(stxdmcv)= m WHERE stxname =3D 'planner_trap_col_a_col_b_col_c_stat' and inde= x =3D 99;
=C2=A0index | =C2=A0values =C2=A0| =C2=A0nulls =C2=A0| =C2=A0 = =C2=A0 =C2=A0 frequency =C2=A0 =C2=A0 =C2=A0 | base_frequency
-------+-= ---------+---------+-----------------------+----------------
=C2=A0 =C2= =A0 99 | {1,1,31} | {f,f,f} | 0.0011666666666666668 | 0.001049409536
```=

making postgres to choose a better plan:
```
=C2=A0Limit =C2= =A0(cost=3D300.72..300.75 rows=3D10 width=3D16) (actual time=3D0.045..0.046= rows=3D0.00 loops=3D1)
=C2=A0 =C2=A0Buffers: shared hit=3D3 read=3D2=C2=A0 =C2=A0-> =C2=A0Sort =C2=A0(cost=3D300.72..301.02 rows=3D117 widt= h=3D16) (actual time=3D0.043..0.044 rows=3D0.00 loops=3D1)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0Sort Key: sort_col DESC
=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0Sort Method: quicksort =C2=A0Memory: 25kB
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0Buffers: shared hit=3D3 read=3D2
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0-> =C2=A0Bitmap Heap Scan on planner_trap =C2=A0(cost=3D5.78..= 298.19 rows=3D117 width=3D16) (actual time=3D0.026..0.027 rows=3D0.00 loops= =3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Recheck Con= d: ((col_a =3D 1) AND (col_b =3D 1) AND (col_c =3D 1))
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared read=3D2
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Bitmap Index Sc= an on idx_planner_trap_a_b_c =C2=A0(cost=3D0.00..5.76 rows=3D117 width=3D0)= (actual time=3D0.019..0.020 rows=3D0.00 loops=3D1)
=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Cond: ((col_a= =3D 1) AND (col_b =3D 1) AND (col_c =3D 1))
=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Searches: 1
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Bu= ffers: shared read=3D2
=C2=A0Planning:
=C2=A0 =C2=A0Buffers: shared h= it=3D51 read=3D8 dirtied=3D4
=C2=A0Planning Time: 0.742 ms
=C2=A0Exec= ution Time: 0.094 ms
```

2. Estimate selectivity as Postgres does= for single-column values not in MCVs
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
While that significantl= y improves estimations, we could mirror what Postgres already does for indi= vidual MCVs. Quote from the official documentation:
> The approach is= to use the fact that the value is not in the list, combined with the knowl= edge of the frequencies for all of the MCVs:
> That is, add up all th= e frequencies for the MCVs and subtract them from one, then divide by the n= umber of other distinct values.

To achieve this, we need to store an= ndistinct estimation alongside the MCVs that can be used for partial or en= tire column match.

P(1, 1, 1) =3D (1 - sum(MCVs)) / (ndistinct(col_a= , col_b, col_c) - MCV_list_size)

```
postgres=3D# SELECT sum(m.fr= equency) FROM pg_statistic_ext join pg_statistic_ext_data on (oid =3D stxoi= d),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pg_mcv_list_= items(stxdmcv) m WHERE stxname =3D 'planner_trap_col_a_col_b_col_c_stat= ';
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sum =C2=A0 =C2=A0 =C2=A0 =C2=A0=
---------------------
=C2=A00.39456666666666645
=C2=A0
postgr= es=3D# SELECT stxkeys AS k, stxdndistinct AS nd
=C2=A0 FROM pg_statistic= _ext join pg_statistic_ext_data on (oid =3D stxoid)
=C2=A0 WHERE stxname= =3D 'stts2';
=C2=A0 =C2=A0k =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
-------+---= ------------------------------------------------
=C2=A01 2 3 | [...{&quo= t;attributes": [1, 2, 3], "ndistinct": 8511}]
```

= Then (1 - 0.39456666666666645) / (8511 - 100) =3D 0.000071981; 0.000071981 = * 100000 =3D 7 rows
```
=C2=A0Limit =C2=A0(cost=3D30.30..30.32 rows= =3D7 width=3D16) (actual time=3D0.035..0.036 rows=3D0.00 loops=3D1)
=C2= =A0 =C2=A0Buffers: shared hit=3D2
=C2=A0 =C2=A0-> =C2=A0Sort =C2=A0(c= ost=3D30.30..30.32 rows=3D7 width=3D16) (actual time=3D0.033..0.034 rows=3D= 0.00 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Key: sort_col DES= C
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Method: quicksort =C2=A0Memory:= 25kB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit=3D2
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Bitmap Heap Scan on planner_trap= =C2=A0(cost=3D4.38..30.20 rows=3D7 width=3D16) (actual time=3D0.028..0.029= rows=3D0.00 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0Recheck Cond: ((col_a =3D 1) AND (col_b =3D 1) AND (col_c =3D 1))=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit= =3D2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0= Bitmap Index Scan on idx_planner_trap_a_b_c =C2=A0(cost=3D0.00..4.38 rows= =3D7 width=3D0) (actual time=3D0.022..0.022 rows=3D0.00 loops=3D1)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0In= dex Cond: ((col_a =3D 1) AND (col_b =3D 1) AND (col_c =3D 1))
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Sea= rches: 1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0Buffers: shared hit=3D2
=C2=A0Planning Time: 0.192 ms
= =C2=A0Execution Time: 0.057 ms
```

I think this is a cheap way to= prevent bad estimations. The storage overhead of adding an ndistinct field= is trivial compared to the MCV list itself, and the O(1) arithmetic during= planning adds no measurable overhead. I look forward to your feedback befo= re drafting a patch.

Best,
Enrique.
--000000000000a99865065219c8cc--