public inbox for [email protected]  
help / color / mirror / Atom feed
From: Alexandra Wang <[email protected]>
To: Corey Huinker <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Jeff Davis <[email protected]>
Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
Date: Wed, 28 Jan 2026 21:04:19 -0800
Message-ID: <CAK98qZ0LwJbUoiZjjFXitojHy4UskkjYDiSd_JZfGE9LbfZm9w@mail.gmail.com> (raw)
In-Reply-To: <CAK98qZ2mW=geT9NKe5vC68-sB9EJe_887uV=MCFt6y9AhyTp7A@mail.gmail.com>
References: <CADkLM=cUwMftPLFq0iD6-qKRyNiRM2HZGYVp6=0noxA8GfuEtA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CADkLM=fEi_GeeS3zyg6B5WgswyPe0wNXHfKQOxjy8A5fXHD7=A@mail.gmail.com>
	<CAK98qZ2mW=geT9NKe5vC68-sB9EJe_887uV=MCFt6y9AhyTp7A@mail.gmail.com>

Hi hackers,

As promised in my previous email, I'm sharing a proof-of-concept patch
exploring join statistics for correlated columns across relations.
This is a POC at this point, but I hope the performance numbers below
give a better idea of both the potential usefulness of join statistics
and the complexity of implementing them.

Join Order Benchmark (JOB)
---------------------------------------

I got interested in this work after reading a 2015 VLDB paper [1] that
introduced the Join Order Benchmark (JOB) [2].  JOB uses the IMDB
dataset, with queries having 3-16 joins (average 8). Unlike standard
benchmarks such as TPC-H which use randomly generated uniform and
independent data, JOB uses real-world data with correlations and
skewed distributions, which make cardinality estimation much harder.

The paper tested PostgreSQL among other databases. I reran the
benchmark on current PG 19devel to see where we stand today. (The
original data source is gone, but CedarDB hosts a mirror [3]; I
created a fork [4] and added some scripts. The schema DDLs and queries
are unchanged.)

Test setup:
MacBook, Apple M3 Pro
Postgres master branch (19devel), built with -O3
cold and warm runs, 3 runs each

Results (average of 3 runs):

Postgres 19devel branch:
Total execution time cold run: 140.75 s (±2.12 s)
Total execution time warm run: 117.72 s (±0.76 s)

Postgres 19devel branch, SET enable_nestloop = off:
Total execution time cold run: 87.16 s (±0.17 s)
Total execution time warm run: 86.68 s (±0.42 s)

These results suggest that the planner is frequently choosing nested
loop joins where hash or merge joins would be better.

(As a side note: with nestloops disabled, plans are dominated by hash
joins, so cold vs warm runs are very similar due to limited cache
reuse.)

Problem
--------

The slowest query in this workload is 16b.sql (attached). Its plan
contains 7 nested loop joins.

The innermost join is estimated as 34 rows but actually produces
41,840 rows — a ~1230x underestimation:

Nested Loop  (cost=6.80..3813.14 rows=34 width=4) (actual
time=2.336..304.160 rows=41840 loops=1)
  -> Seq Scan on keyword k (cost=0.00..2685.11 rows=1 width=4) (actual
time=0.402..6.161 rows=1.00 loops=1)
     Filter: (keyword = 'character-name-in-title')
  -> Bitmap Heap Scan on movie_keyword mk (cost=6.80..1124.98 rows=305
width=8) (actual time=1.933..295.493 rows=41840.00 loops=1)
     Recheck Cond: (k.id = keyword_id)

This underestimation cascades to the outermost join: ~3K estimated vs
~3.7M actual rows.

The problem can be reduced to the following query:

SELECT * FROM movie_keyword mk, keyword k
WHERE k.keyword = 'character-name-in-title'
AND k.id = mk.keyword_id;

The tables:

keyword(id PK, keyword, phonetic_code)
movie_keyword(id PK, movie_id, keyword_id)

These tables are strongly correlated via keyword_id -> keyword.id. We
can even add:

ALTER TABLE movie_keyword
  ADD CONSTRAINT movie_keyword_keyword_id_fkey
  FOREIGN KEY (keyword_id) REFERENCES keyword(id);
CREATE STATISTICS keyword_stats (dependencies)
  ON id, keyword FROM keyword;
ANALYZE keyword, movie_keyword;

However, even with the FK constraint and extended statistics on the
keyword table, the estimate remains unchanged:

SELECT * FROM check_estimated_rows(
  'SELECT * FROM movie_keyword mk, keyword k
   WHERE k.keyword = ''character-name-in-title''
     AND k.id = mk.keyword_id');
 estimated | actual
-----------+--------
        34 |  41840

This indicates that the planner cannot connect the skewed distribution
of the join key on the referencing table with the filter selectivity
on the referenced table as it applies to the join result. This is the
gap that join statistics aim to fill.

Proposed Solution
------------------

I've attached two patches:

0001: Extend CREATE STATISTICS for join MCV statistics
0002: Automatically create join MCV statistics from FK constraints

0001 is the core patch. 0002 is an optional convenience feature
that detects FK relationships during ANALYZE and auto-creates
join stats.

Syntax:

CREATE STATISTICS <name> (mcv)
  ON <other_table>.<filter_col> [, ...]
  FROM <primary_table> alias
  JOIN <other_table> alias ON (<join_condition>);

Example -- single filter column:

CREATE STATISTICS mk_keyword_stats (mcv)
  ON k.keyword
  FROM movie_keyword mk
  JOIN keyword k ON (mk.keyword_id = k.id);
ANALYZE movie_keyword;

Example -- multiple filter columns:

CREATE STATISTICS mk_multi_stats (mcv)
  ON k.keyword, k.phonetic_code
  FROM movie_keyword mk
  JOIN keyword k ON (mk.keyword_id = k.id);
ANALYZE movie_keyword;

Catalog changes:

pg_statistic_ext:
- New stats kind 'c' (join MCV) in stxkind
- New field stxotherrel (Oid): the other table in the join
- New field stxjoinkeys (int2vector): join column pair [primary_joinkey,
  other_joinkey]
- Existing stxkeys stores the filter column(s) on stxotherrel
- New index pg_statistic_ext_otherrel_index on (stxrelid, stxotherrel)

pg_statistic_ext_data:
- New field stxdjoinmcv (pg_join_mcv_list): serialized join MCV data

How stats are collected:

Join statistics are collected during ANALYZE of the primary table
(stxrelid). The current approach assumes a dependency relationship
between the join key column and the filter column(s) on the other
table. Specifically, during ANALYZE, we reuse the already-computed
single-column MCV stats for the primary table's join key column. For
each MCV value, we look up the matching row in the other table via the
join key and extract the filter column values, carrying over the
primary-side MCV frequency. This is not accurate in all cases, but
works reasonably well for the POC, especially for foreign-key-like
joins where the primary table's join key distribution is
representative of the join result.

For example, the catalog contents look like:

stxrelid      | stxotherrel | stxjoinkeys | stxkeys | stxkind
--------------+-------------+-------------+---------+--------
movie_keyword | keyword     | 2 1         | 2       | {c}

index | values       | nulls | frequency
------+--------------+-------+----------
    0 | {keyword_1}  | {f}   |      0.06
    1 | {keyword_2}  | {f}   |      0.06
    2 | {keyword_3}  | {f}   |      0.06
  ... | ...          | ...   |       ...

How the planner uses join stats:

I added the join pattern detection logic in
clauselist_selectivity_ext() and get_foreign_key_join_selectivity().
The planner looks for a pattern of:

- An equijoin clause between two relations (the join key pair), and
- Equality or IN filter clauses on columns of one relation (the filter
  columns)

When this pattern is found and matching join MCV stats exist, the
planner compares the filter values against the stored MCV items to
compute the join selectivity, replacing the default estimate.

JOB Benchmark Results
---------------------

I created a single join statistics object for the (movie_keyword,
keyword) pair and reran the JOB benchmark:

CREATE STATISTICS movie_keyword_keyword_stats (mcv)
  ON k.keyword
  FROM movie_keyword mk
  JOIN keyword k ON (mk.keyword_id = k.id);

Total execution times (average of 3 runs):

                      Cold run          Warm run
Default (baseline)  140.75s (+/-2.1)  117.72s (+/-0.8)
enable_nestloop=off   87.16s (+/-0.2)   86.68s (+/-0.4)
With join stats       85.81s (+/-3.4)   65.24s (+/-0.5)

With a single join statistics object:
- Cold run: -39% vs baseline (140.75s -> 85.81s),
  comparable to enable_nestloop=off
- Warm run: -45% vs baseline (117.72s -> 65.24s),
  -25% vs enable_nestloop=off (86.68s -> 65.24s)

Per-query times for the 10 slowest (out of 113) queries on baseline
(master branch, enable_nestloop = on):

  Query | Baseline  | No Nestloop | Join Stats | Best
  ------+-----------+-------------+------------+----------
  16b   | 10,673 ms |   2,789 ms  |  2,775 ms  | 3.8x JS
  17e   |  7,709 ms |   2,260 ms  |  2,149 ms  | 3.6x JS
  17f   |  7,506 ms |   2,304 ms  |  2,460 ms  | 3.3x NL
  17a   |  7,288 ms |   1,453 ms  |  2,388 ms  | 5.0x NL
  17b   |  6,490 ms |   1,580 ms  |  5,413 ms  | 4.1x NL
  17c   |  6,240 ms |   1,095 ms  |  5,268 ms  | 5.7x NL
  17d   |  6,261 ms |   1,263 ms  |  5,291 ms  | 5.0x NL
  6d    |  6,234 ms |     988 ms  |  1,565 ms  | 6.3x NL
  25c   |  6,133 ms |   1,848 ms  |  1,738 ms  | 3.5x JS
  6f    |  5,728 ms |   1,785 ms  |  1,556 ms  | 3.7x JS

  (JS = join stats fastest, NL = no-nestloop fastest)

All 10 queries improved over baseline. For 16b, 3 nested loop joins
were replaced with 2 hash joins and 1 merge join. Some queries (e.g.
17b) kept the same plan shape and joins but gained a Memoize node from
better cardinality estimates.

Current Limitations
-------------------

This is a proof of concept. Known limitations include:

1. The current catalog design is not ideal. It is asymmetric (a
"primary" and an "other" table), which is natural for FK-like joins,
but less intuitive for other joins.

2. Stats collection piggybacks on ANALYZE of the primary table and
uses its single-column MCV for the join key.  This can be inaccurate
when the MCV values on the "primary" side don't cover the important
values on the other side, or when the filter column isn't fully
dependent on the join key. A more accurate approach would execute the
actual join during collection, which could also decouple join stats
collection from single-table ANALYZE.

3. Currently limited to: equality join clauses, equality and IN filter
clauses, simple Var stats objects (no expressions), inner joins only,
and two-way joins only. Some of these are easier to extend; others may
be harder or unnecessary (like n-way joins).

4. Patch 0002 (auto-creation from FK constraints) should probably be
gated behind a GUC. I'm not strongly attached to this patch, but kept
it because FK joins seem like a natural and common use case.

Conclusion
----------

Even with all the current limitations, I think this experiment
suggests that join-level statistics can significantly improve plans,
as a single join stats object can materially improve workload
performance.

If there's interest, I'm happy to continue iterating on the design.

In particular, I'd welcome feedback on:
- whether this is a direction worth pursuing,
- the catalog design,
- the stats collection approach,
- the planner integration strategy,
- and scope (what kinds of joins / predicates are worth supporting).

Best,
Alex

[1] https://www.vldb.org/pvldb/vol9/p204-leis.pdf
[2] https://github.com/gregrahn/join-order-benchmark
[3] https://cedardb.com/docs/example_datasets/job/
[4] https://github.com/l-wang/join-order-benchmark



-- 
Alexandra Wang
EDB: https://www.enterprisedb.com

Pager usage is off.
SET
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5353.71..5353.72 rows=1 width=64) (actual time=10607.642..10607.644 rows=1.00 loops=1)
   Buffers: shared hit=22223141 read=1381834
   ->  Nested Loop  (cost=9.37..5337.88 rows=3165 width=33) (actual time=2.503..9525.673 rows=3710592.00 loops=1)
         Join Filter: (an.person_id = n.id)
         Buffers: shared hit=22223141 read=1381834
         ->  Nested Loop  (cost=8.95..4667.49 rows=1378 width=25) (actual time=2.490..6179.436 rows=2832555.00 loops=1)
               Buffers: shared hit=11897439 read=1107501
               ->  Nested Loop  (cost=8.52..4048.53 rows=1378 width=21) (actual time=2.481..3069.508 rows=2832555.00 loops=1)
                     Join Filter: (ci.movie_id = t.id)
                     Buffers: shared hit=3159885 read=751362
                     ->  Nested Loop  (cost=8.08..3929.09 rows=63 width=29) (actual time=2.245..937.683 rows=68316.00 loops=1)
                           Buffers: shared hit=815660 read=146762
                           ->  Nested Loop  (cost=7.66..3850.15 rows=177 width=33) (actual time=2.224..598.822 rows=148552.00 loops=1)
                                 Join Filter: (mc.movie_id = t.id)
                                 Buffers: shared hit=295838 read=72376
                                 ->  Nested Loop  (cost=7.23..3829.71 rows=34 width=25) (actual time=2.185..494.519 rows=41840.00 loops=1)
                                       Buffers: shared hit=125415 read=54538
                                       ->  Nested Loop  (cost=6.80..3813.14 rows=34 width=4) (actual time=2.177..346.122 rows=41840.00 loops=1)
                                             Buffers: shared hit=24 read=12569
                                             ->  Seq Scan on keyword k  (cost=0.00..2685.11 rows=1 width=4) (actual time=0.382..7.022 rows=1.00 loops=1)
                                                   Filter: (keyword = 'character-name-in-title'::text)
                                                   Rows Removed by Filter: 134168
                                                   Buffers: shared hit=22 read=986
                                             ->  Bitmap Heap Scan on movie_keyword mk  (cost=6.80..1124.98 rows=305 width=8) (actual time=1.794..336.613 rows=41840.00 loops=1)
                                                   Recheck Cond: (k.id = keyword_id)
                                                   Heap Blocks: exact=11547
                                                   Buffers: shared hit=2 read=11583
                                                   ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..6.72 rows=305 width=0) (actual time=0.965..0.965 rows=41840.00 loops=1)
                                                         Index Cond: (keyword_id = k.id)
                                                         Index Searches: 1
                                                         Buffers: shared hit=2 read=36
                                       ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.003..0.003 rows=1.00 loops=41840)
                                             Index Cond: (id = mk.movie_id)
                                             Index Searches: 41840
                                             Buffers: shared hit=125391 read=41969
                                 ->  Index Scan using movie_id_movie_companies on movie_companies mc  (cost=0.43..0.54 rows=5 width=8) (actual time=0.002..0.002 rows=3.55 loops=41840)
                                       Index Cond: (movie_id = mk.movie_id)
                                       Index Searches: 41840
                                       Buffers: shared hit=170423 read=17838
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..0.45 rows=1 width=4) (actual time=0.002..0.002 rows=0.46 loops=148552)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
                                 Rows Removed by Filter: 1
                                 Index Searches: 148552
                                 Buffers: shared hit=519822 read=74386
                     ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.44..1.40 rows=40 width=8) (actual time=0.003..0.029 rows=41.46 loops=68316)
                           Index Cond: (movie_id = mk.movie_id)
                           Index Searches: 68316
                           Buffers: shared hit=2344225 read=604600
               ->  Index Only Scan using name_pkey on name n  (cost=0.43..0.45 rows=1 width=4) (actual time=0.001..0.001 rows=1.00 loops=2832555)
                     Index Cond: (id = ci.person_id)
                     Heap Fetches: 0
                     Index Searches: 2832555
                     Buffers: shared hit=8737554 read=356139
         ->  Index Scan using person_id_aka_name on aka_name an  (cost=0.42..0.46 rows=2 width=20) (actual time=0.001..0.001 rows=1.31 loops=2832555)
               Index Cond: (person_id = ci.person_id)
               Index Searches: 2832555
               Buffers: shared hit=10325702 read=274333
 Planning:
   Buffers: shared hit=550 read=125
 Planning Time: 4.686 ms
 Execution Time: 10607.707 ms
(62 rows)


Pager usage is off.
SET
                                                                                                QUERY PLAN                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=280264.91..280264.92 rows=1 width=64) (actual time=2741.164..2763.554 rows=1.00 loops=1)
   Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
   ->  Gather  (cost=280264.69..280264.90 rows=2 width=64) (actual time=2738.152..2763.550 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
         ->  Partial Aggregate  (cost=279264.69..279264.70 rows=1 width=64) (actual time=2736.217..2736.250 rows=1.00 loops=3)
               Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
               ->  Parallel Hash Join  (cost=136368.90..271857.81 rows=1481375 width=33) (actual time=2244.259..2397.555 rows=1236864.00 loops=3)
                     Hash Cond: (n.id = an.person_id)
                     Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
                     ->  Nested Loop  (cost=114301.82..235944.12 rows=635491 width=25) (actual time=198.873..2044.905 rows=944185.00 loops=3)
                           Buffers: shared hit=4177986 read=1081976, temp read=1698 written=1704
                           ->  Nested Loop  (cost=114301.38..188137.25 rows=635491 width=21) (actual time=198.784..1236.382 rows=944185.00 loops=3)
                                 Join Filter: (ci.movie_id = t.id)
                                 Buffers: shared hit=2546486 read=732334, temp read=1698 written=1704
                                 ->  Nested Loop  (cost=114300.94..131790.25 rows=30693 width=29) (actual time=198.716..273.204 rows=22772.00 loops=3)
                                       Buffers: shared hit=268202 read=61791, temp read=1698 written=1704
                                       ->  Merge Join  (cost=114300.51..116833.42 rows=30693 width=8) (actual time=198.599..222.967 rows=22772.00 loops=3)
                                             Merge Cond: (mc.movie_id = mk.movie_id)
                                             Buffers: shared hit=23792 read=32935, temp read=1698 written=1704
                                             ->  Sort  (cost=81821.71..82837.92 rows=406485 width=4) (actual time=126.514..138.221 rows=384599.00 loops=3)
                                                   Sort Key: mc.movie_id
                                                   Sort Method: external merge  Disk: 4584kB
                                                   Buffers: shared hit=8 read=21864, temp read=1698 written=1704
                                                   Worker 0:  Sort Method: external merge  Disk: 4536kB
                                                   Worker 1:  Sort Method: external merge  Disk: 4464kB
                                                   ->  Parallel Hash Join  (cost=5393.42..38393.85 rows=406485 width=4) (actual time=10.192..94.773 rows=384599.00 loops=3)
                                                         Hash Cond: (mc.company_id = cn.id)
                                                         Buffers: shared read=21864
                                                         ->  Parallel Seq Scan on movie_companies mc  (cost=0.00..30052.81 rows=1122881 width=8) (actual time=0.108..28.237 rows=869709.33 loops=3)
                                                               Buffers: shared read=18824
                                                         ->  Parallel Hash  (cost=4767.91..4767.91 rows=50041 width=4) (actual time=10.007..10.007 rows=28281.00 loops=3)
                                                               Buckets: 131072  Batches: 1  Memory Usage: 4416kB
                                                               Buffers: shared read=3040
                                                               ->  Parallel Seq Scan on company_name cn  (cost=0.00..4767.91 rows=50041 width=4) (actual time=0.104..8.026 rows=28281.00 loops=3)
                                                                     Filter: ((country_code)::text = '[us]'::text)
                                                                     Rows Removed by Filter: 50051
                                                                     Buffers: shared read=3040
                                             ->  Sort  (cost=32478.65..32575.53 rows=38749 width=4) (actual time=72.067..73.364 rows=52744.00 loops=3)
                                                   Sort Key: mk.movie_id
                                                   Sort Method: quicksort  Memory: 1537kB
                                                   Buffers: shared hit=23784 read=11071
                                                   Worker 0:  Sort Method: quicksort  Memory: 1537kB
                                                   Worker 1:  Sort Method: quicksort  Memory: 1537kB
                                                   ->  Nested Loop  (cost=444.74..29525.62 rows=38749 width=4) (actual time=2.513..69.487 rows=41840.00 loops=3)
                                                         Buffers: shared hit=23784 read=11071
                                                         ->  Seq Scan on keyword k  (cost=0.00..2685.11 rows=1 width=4) (actual time=0.302..3.581 rows=1.00 loops=3)
                                                               Filter: (keyword = 'character-name-in-title'::text)
                                                               Rows Removed by Filter: 134168
                                                               Buffers: shared hit=2987 read=37
                                                         ->  Bitmap Heap Scan on movie_keyword mk  (cost=444.74..26453.01 rows=38749 width=8) (actual time=2.209..63.804 rows=41840.00 loops=3)
                                                               Recheck Cond: (k.id = keyword_id)
                                                               Heap Blocks: exact=11547
                                                               Buffers: shared hit=20797 read=11034
                                                               ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..435.05 rows=38749 width=0) (actual time=1.266..1.266 rows=41840.00 loops=3)
                                                                     Index Cond: (keyword_id = k.id)
                                                                     Index Searches: 3
                                                                     Buffers: shared hit=80 read=36
                                       ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.002..0.002 rows=1.00 loops=68316)
                                             Index Cond: (id = mk.movie_id)
                                             Index Searches: 68316
                                             Buffers: shared hit=244410 read=28856
                                 ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.44..1.36 rows=38 width=8) (actual time=0.003..0.040 rows=41.46 loops=68316)
                                       Index Cond: (movie_id = mk.movie_id)
                                       Index Searches: 68316
                                       Buffers: shared hit=2278284 read=670543
                           ->  Memoize  (cost=0.44..0.46 rows=1 width=4) (actual time=0.001..0.001 rows=1.00 loops=2832555)
                                 Cache Key: ci.person_id
                                 Cache Mode: logical
                                 Estimates: capacity=71064 distinct keys=71064 lookups=635491 hit percent=88.82%
                                 Hits: 724363  Misses: 200258  Evictions: 119594  Overflows: 0  Memory Usage: 8193kB
                                 Buffers: shared hit=1631500 read=349642
                                 Worker 0:  Hits: 744868  Misses: 209441  Evictions: 128778  Overflows: 0  Memory Usage: 8193kB
                                 Worker 1:  Hits: 745857  Misses: 207768  Evictions: 127108  Overflows: 0  Memory Usage: 8193kB
                                 ->  Index Only Scan using name_pkey on name n  (cost=0.43..0.45 rows=1 width=4) (actual time=0.003..0.003 rows=1.00 loops=617467)
                                       Index Cond: (id = ci.person_id)
                                       Heap Fetches: 0
                                       Index Searches: 617467
                                       Buffers: shared hit=1631500 read=349642
                     ->  Parallel Hash  (cost=15171.59..15171.59 rows=375559 width=20) (actual time=48.749..48.750 rows=300447.33 loops=3)
                           Buckets: 131072  Batches: 8  Memory Usage: 7296kB
                           Buffers: shared hit=282 read=11134, temp written=4116
                           ->  Parallel Seq Scan on aka_name an  (cost=0.00..15171.59 rows=375559 width=20) (actual time=0.095..18.852 rows=300447.33 loops=3)
                                 Buffers: shared hit=282 read=11134
 Planning:
   Buffers: shared hit=837 read=144
 Planning Time: 5.321 ms
 Execution Time: 2764.320 ms
(89 rows)


Pager usage is off.
SET
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5353.71..5353.72 rows=1 width=64) (actual time=10607.642..10607.644 rows=1.00 loops=1)
   Buffers: shared hit=22223141 read=1381834
   ->  Nested Loop  (cost=9.37..5337.88 rows=3165 width=33) (actual time=2.503..9525.673 rows=3710592.00 loops=1)
         Join Filter: (an.person_id = n.id)
         Buffers: shared hit=22223141 read=1381834
         ->  Nested Loop  (cost=8.95..4667.49 rows=1378 width=25) (actual time=2.490..6179.436 rows=2832555.00 loops=1)
               Buffers: shared hit=11897439 read=1107501
               ->  Nested Loop  (cost=8.52..4048.53 rows=1378 width=21) (actual time=2.481..3069.508 rows=2832555.00 loops=1)
                     Join Filter: (ci.movie_id = t.id)
                     Buffers: shared hit=3159885 read=751362
                     ->  Nested Loop  (cost=8.08..3929.09 rows=63 width=29) (actual time=2.245..937.683 rows=68316.00 loops=1)
                           Buffers: shared hit=815660 read=146762
                           ->  Nested Loop  (cost=7.66..3850.15 rows=177 width=33) (actual time=2.224..598.822 rows=148552.00 loops=1)
                                 Join Filter: (mc.movie_id = t.id)
                                 Buffers: shared hit=295838 read=72376
                                 ->  Nested Loop  (cost=7.23..3829.71 rows=34 width=25) (actual time=2.185..494.519 rows=41840.00 loops=1)
                                       Buffers: shared hit=125415 read=54538
                                       ->  Nested Loop  (cost=6.80..3813.14 rows=34 width=4) (actual time=2.177..346.122 rows=41840.00 loops=1)
                                             Buffers: shared hit=24 read=12569
                                             ->  Seq Scan on keyword k  (cost=0.00..2685.11 rows=1 width=4) (actual time=0.382..7.022 rows=1.00 loops=1)
                                                   Filter: (keyword = 'character-name-in-title'::text)
                                                   Rows Removed by Filter: 134168
                                                   Buffers: shared hit=22 read=986
                                             ->  Bitmap Heap Scan on movie_keyword mk  (cost=6.80..1124.98 rows=305 width=8) (actual time=1.794..336.613 rows=41840.00 loops=1)
                                                   Recheck Cond: (k.id = keyword_id)
                                                   Heap Blocks: exact=11547
                                                   Buffers: shared hit=2 read=11583
                                                   ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..6.72 rows=305 width=0) (actual time=0.965..0.965 rows=41840.00 loops=1)
                                                         Index Cond: (keyword_id = k.id)
                                                         Index Searches: 1
                                                         Buffers: shared hit=2 read=36
                                       ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.003..0.003 rows=1.00 loops=41840)
                                             Index Cond: (id = mk.movie_id)
                                             Index Searches: 41840
                                             Buffers: shared hit=125391 read=41969
                                 ->  Index Scan using movie_id_movie_companies on movie_companies mc  (cost=0.43..0.54 rows=5 width=8) (actual time=0.002..0.002 rows=3.55 loops=41840)
                                       Index Cond: (movie_id = mk.movie_id)
                                       Index Searches: 41840
                                       Buffers: shared hit=170423 read=17838
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..0.45 rows=1 width=4) (actual time=0.002..0.002 rows=0.46 loops=148552)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
                                 Rows Removed by Filter: 1
                                 Index Searches: 148552
                                 Buffers: shared hit=519822 read=74386
                     ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.44..1.40 rows=40 width=8) (actual time=0.003..0.029 rows=41.46 loops=68316)
                           Index Cond: (movie_id = mk.movie_id)
                           Index Searches: 68316
                           Buffers: shared hit=2344225 read=604600
               ->  Index Only Scan using name_pkey on name n  (cost=0.43..0.45 rows=1 width=4) (actual time=0.001..0.001 rows=1.00 loops=2832555)
                     Index Cond: (id = ci.person_id)
                     Heap Fetches: 0
                     Index Searches: 2832555
                     Buffers: shared hit=8737554 read=356139
         ->  Index Scan using person_id_aka_name on aka_name an  (cost=0.42..0.46 rows=2 width=20) (actual time=0.001..0.001 rows=1.31 loops=2832555)
               Index Cond: (person_id = ci.person_id)
               Index Searches: 2832555
               Buffers: shared hit=10325702 read=274333
 Planning:
   Buffers: shared hit=550 read=125
 Planning Time: 4.686 ms
 Execution Time: 10607.707 ms
(62 rows)



Attachments:

  [text/plain] 16b_nl.txt (5.6K, 3-16b_nl.txt)
  download | inline:
Pager usage is off.
SET
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5353.71..5353.72 rows=1 width=64) (actual time=10607.642..10607.644 rows=1.00 loops=1)
   Buffers: shared hit=22223141 read=1381834
   ->  Nested Loop  (cost=9.37..5337.88 rows=3165 width=33) (actual time=2.503..9525.673 rows=3710592.00 loops=1)
         Join Filter: (an.person_id = n.id)
         Buffers: shared hit=22223141 read=1381834
         ->  Nested Loop  (cost=8.95..4667.49 rows=1378 width=25) (actual time=2.490..6179.436 rows=2832555.00 loops=1)
               Buffers: shared hit=11897439 read=1107501
               ->  Nested Loop  (cost=8.52..4048.53 rows=1378 width=21) (actual time=2.481..3069.508 rows=2832555.00 loops=1)
                     Join Filter: (ci.movie_id = t.id)
                     Buffers: shared hit=3159885 read=751362
                     ->  Nested Loop  (cost=8.08..3929.09 rows=63 width=29) (actual time=2.245..937.683 rows=68316.00 loops=1)
                           Buffers: shared hit=815660 read=146762
                           ->  Nested Loop  (cost=7.66..3850.15 rows=177 width=33) (actual time=2.224..598.822 rows=148552.00 loops=1)
                                 Join Filter: (mc.movie_id = t.id)
                                 Buffers: shared hit=295838 read=72376
                                 ->  Nested Loop  (cost=7.23..3829.71 rows=34 width=25) (actual time=2.185..494.519 rows=41840.00 loops=1)
                                       Buffers: shared hit=125415 read=54538
                                       ->  Nested Loop  (cost=6.80..3813.14 rows=34 width=4) (actual time=2.177..346.122 rows=41840.00 loops=1)
                                             Buffers: shared hit=24 read=12569
                                             ->  Seq Scan on keyword k  (cost=0.00..2685.11 rows=1 width=4) (actual time=0.382..7.022 rows=1.00 loops=1)
                                                   Filter: (keyword = 'character-name-in-title'::text)
                                                   Rows Removed by Filter: 134168
                                                   Buffers: shared hit=22 read=986
                                             ->  Bitmap Heap Scan on movie_keyword mk  (cost=6.80..1124.98 rows=305 width=8) (actual time=1.794..336.613 rows=41840.00 loops=1)
                                                   Recheck Cond: (k.id = keyword_id)
                                                   Heap Blocks: exact=11547
                                                   Buffers: shared hit=2 read=11583
                                                   ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..6.72 rows=305 width=0) (actual time=0.965..0.965 rows=41840.00 loops=1)
                                                         Index Cond: (keyword_id = k.id)
                                                         Index Searches: 1
                                                         Buffers: shared hit=2 read=36
                                       ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.003..0.003 rows=1.00 loops=41840)
                                             Index Cond: (id = mk.movie_id)
                                             Index Searches: 41840
                                             Buffers: shared hit=125391 read=41969
                                 ->  Index Scan using movie_id_movie_companies on movie_companies mc  (cost=0.43..0.54 rows=5 width=8) (actual time=0.002..0.002 rows=3.55 loops=41840)
                                       Index Cond: (movie_id = mk.movie_id)
                                       Index Searches: 41840
                                       Buffers: shared hit=170423 read=17838
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..0.45 rows=1 width=4) (actual time=0.002..0.002 rows=0.46 loops=148552)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
                                 Rows Removed by Filter: 1
                                 Index Searches: 148552
                                 Buffers: shared hit=519822 read=74386
                     ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.44..1.40 rows=40 width=8) (actual time=0.003..0.029 rows=41.46 loops=68316)
                           Index Cond: (movie_id = mk.movie_id)
                           Index Searches: 68316
                           Buffers: shared hit=2344225 read=604600
               ->  Index Only Scan using name_pkey on name n  (cost=0.43..0.45 rows=1 width=4) (actual time=0.001..0.001 rows=1.00 loops=2832555)
                     Index Cond: (id = ci.person_id)
                     Heap Fetches: 0
                     Index Searches: 2832555
                     Buffers: shared hit=8737554 read=356139
         ->  Index Scan using person_id_aka_name on aka_name an  (cost=0.42..0.46 rows=2 width=20) (actual time=0.001..0.001 rows=1.31 loops=2832555)
               Index Cond: (person_id = ci.person_id)
               Index Searches: 2832555
               Buffers: shared hit=10325702 read=274333
 Planning:
   Buffers: shared hit=550 read=125
 Planning Time: 4.686 ms
 Execution Time: 10607.707 ms
(62 rows)


  [application/octet-stream] v1-0002-Automatically-create-join-MCV-statistics-from-FK-.patch (33.2K, 4-v1-0002-Automatically-create-join-MCV-statistics-from-FK-.patch)
  download | inline diff:
From acecea4421a69d70ba07786682237872a3deb50e Mon Sep 17 00:00:00 2001
From: Alexandra Wang <[email protected]>
Date: Mon, 26 Jan 2026 14:00:59 -0800
Subject: [PATCH v1 2/2] Automatically create join MCV statistics from FK
 constraints

This commit adds automatic detection and creation of join MCV statistics
during ANALYZE when the table has foreign key constraints and the
referenced table has functional dependencies.

Example:

-- Given these tables:
CREATE TABLE keywords (id INT PRIMARY KEY, keyword TEXT);
CREATE TABLE movie_keywords (
    movie_id INT,
    keyword_id INT REFERENCES keywords(id)
);

-- And functional dependencies on referenced table:
CREATE STATISTICS kw_deps (dependencies) ON id, keyword FROM keywords;

-- ANALYZE automatically detects the FK and creates join MCV stats for
-- movie_keywords.keyword_id and keywords.keyword.
ANALYZE movie_keywords;

Current limitation: currently, the functional dependency values are
just assumed to be 1, which could lead to less accurate estimation.
---
 src/backend/statistics/extended_stats.c       | 494 ++++++++++++++++++
 .../statistics/extended_stats_internal.h      |  11 +
 .../regress/expected/stats_ext_crossrel.out   | 249 +++++++++
 src/test/regress/sql/stats_ext_crossrel.sql   | 140 +++++
 4 files changed, 894 insertions(+)

diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 8939f56cc67..5b8b4cf6c42 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -20,7 +20,11 @@
 #include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/dependency.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_constraint.h"
+#include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
 #include "commands/defrem.h"
@@ -129,6 +133,22 @@ bms_to_int2vector(Bitmapset *bms)
 	return buildint2vector(attnums, nmembers);
 }
 
+/* Join MCV functions */
+static bool statext_join_mcv_exists(List *statslist,
+									Oid primary_rel, AttrNumber primary_joinkey_attr,
+									Oid other_rel, AttrNumber other_joinkey_attr,
+									AttrNumber filter_attr);
+static void build_implicit_join_mcv_stats_from_fk(Relation onerel, List *statslist,
+												  Relation pg_stext, bool inh,
+												  int numrows, HeapTuple *rows,
+												  int natts, VacAttrStats **vacattrstats);
+static List *detect_join_stats_candidates(Relation rel);
+static Oid	create_implicit_join_stat(Relation pg_stext,
+									  Oid referencing_rel,
+									  AttrNumber referencing_attr,
+									  Oid referenced_rel,
+									  AttrNumber referenced_attr,
+									  AttrNumber filter_attr);
 
 /*
  * Compute requested extended stats, using the rows sampled for the plain
@@ -320,6 +340,14 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 		MemoryContextReset(cxt);
 	}
 
+	/*
+	 * Now that we've built stats for explicit and previously existing
+	 * implicit stats objects, detect and build new implicit FK-based join
+	 * stats if available.
+	 */
+	build_implicit_join_mcv_stats_from_fk(onerel, statslist, pg_stext, inh,
+										  numrows, rows, natts, vacattrstats);
+
 	MemoryContextSwitchTo(oldcxt);
 	MemoryContextDelete(cxt);
 
@@ -2807,3 +2835,469 @@ make_build_data(Relation rel, StatExtEntry *stat, int numrows, HeapTuple *rows,
 
 	return result;
 }
+
+/*
+ * statext_join_mcv_exists
+ *		Check if a join mcv statistic already exists in the statslist.
+ *
+ * This prevents duplicate creation during FK detection when the stat
+ * was already manually created or exists from a previous ANALYZE.
+ */
+static bool
+statext_join_mcv_exists(List *statslist,
+						Oid primary_rel, AttrNumber primary_joinkey_attr,
+						Oid other_rel, AttrNumber other_joinkey_attr,
+						AttrNumber filter_attr)
+{
+	ListCell   *lc;
+
+	foreach(lc, statslist)
+	{
+		StatExtEntry *stat = (StatExtEntry *) lfirst(lc);
+
+		if (!list_member_int(stat->types, STATS_EXT_JOIN_MCV))
+			continue;
+
+		if (stat->otherrel != other_rel)
+			continue;
+
+		/* Check if join keys match: [target_joinkey, other_joinkey] */
+		if (!stat->joinkeys || stat->joinkeys->dim1 != 2)
+			continue;
+		if (stat->joinkeys->values[0] != primary_joinkey_attr ||
+			stat->joinkeys->values[1] != other_joinkey_attr)
+			continue;
+
+		/* Check if filter column matches */
+		if (bms_num_members(stat->columns) != 1)
+			continue;
+		if (!bms_is_member(filter_attr, stat->columns))
+			continue;
+
+		/* Found a match! */
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * get_functional_dependents
+ *		Find columns in referenced table that are functionally dependent on
+ *		the referenced column.
+ *
+ * Scans pg_statistic_ext for the referenced table, loads any functional
+ * dependency statistics, and returns a list of AttrNumbers for columns
+ * that have a dependency: referenced_attr → column
+ *
+ * Returns: List of AttrNumbers
+ */
+static List *
+get_functional_dependents(Oid referenced_rel, AttrNumber referenced_attr)
+{
+	List	   *dependents = NIL;
+	Relation	pg_stext;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/* Open pg_statistic_ext and scan for entries on referenced_rel */
+	pg_stext = table_open(StatisticExtRelationId, AccessShareLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_statistic_ext_stxrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(referenced_rel));
+
+	scan = systable_beginscan(pg_stext, StatisticExtRelidIndexId, true,
+							  NULL, 1, &key);
+
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_statistic_ext stat = (Form_pg_statistic_ext) GETSTRUCT(tuple);
+		MVDependencies *dependencies;
+		int			i;
+
+		/* Check if this stats object has dependencies */
+		if (!statext_is_kind_built(tuple, STATS_EXT_DEPENDENCIES))
+			continue;
+
+		/* Load dependencies */
+		dependencies = statext_dependencies_load(stat->oid, false);
+		if (!dependencies)
+			continue;
+
+		/* Scan each dependency for pattern: referenced_attr → other_col */
+		for (i = 0; i < dependencies->ndeps; i++)
+		{
+			MVDependency *dep = dependencies->deps[i];
+
+			/*
+			 * Check if this is a simple dependency (2 attributes) where the
+			 * first attribute is our referenced_attr
+			 */
+			if (dep->nattributes == 2 &&
+				dep->attributes[0] == referenced_attr)
+			{
+				AttrNumber	dependent_attr = dep->attributes[1];
+
+				/* Add to list if not already there */
+				if (!list_member_int(dependents, dependent_attr))
+					dependents = lappend_int(dependents, dependent_attr);
+			}
+		}
+
+		pfree(dependencies);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_stext, AccessShareLock);
+
+	return dependents;
+}
+
+/*
+ * build_implicit_join_mcv_stats_from_fk
+ *		Build implicit FK-based join MCV statistics for the relation being analyzed
+ */
+static void
+build_implicit_join_mcv_stats_from_fk(Relation onerel, List *statslist,
+									  Relation pg_stext, bool inh,
+									  int numrows, HeapTuple *rows,
+									  int natts, VacAttrStats **vacattrstats)
+{
+	List	   *candidates;
+	ListCell   *lc_cand;
+
+	elog(DEBUG1, "Join mcv stats: Starting FK-based candidate detection for relation %s",
+		 RelationGetRelationName(onerel));
+
+	/* Detect candidates based on FK constraints and functional dependencies */
+	candidates = detect_join_stats_candidates(onerel);
+
+	foreach(lc_cand, candidates)
+	{
+		FKJoinStatsCandidate *candidate = (FKJoinStatsCandidate *) lfirst(lc_cand);
+		JoinMCVList *join_mcv;
+		Oid			stat_oid;
+		int2vector *joinkeys;
+		int2vector *filter_attnums;
+		int16		joinkeys_array[2];
+		int16		filter_array[1];
+
+		elog(DEBUG1, "Join mcv stats: Checking FK candidate for referencing_rel=%u attr=%d, "
+			 "referenced_rel=%u attr=%d, filter_attr=%d",
+			 candidate->referencing_rel, candidate->referencing_attr,
+			 candidate->referenced_rel, candidate->referenced_attr,
+			 candidate->filter_attr);
+
+		/*
+		 * Skip stat that already exists (either manually created or from a
+		 * previous ANALYZE).
+		 */
+		if (statext_join_mcv_exists(statslist,
+									candidate->referencing_rel,
+									candidate->referencing_attr,
+									candidate->referenced_rel,
+									candidate->referenced_attr,
+									candidate->filter_attr))
+		{
+			elog(DEBUG1, "Join mcv stats: Stat already exists, skipping FK candidate");
+			continue;
+		}
+
+		elog(DEBUG1, "Join mcv stats: Creating new implicit stat for FK candidate");
+
+		/* Create implicit statistics object first to get stat_oid */
+		stat_oid = create_implicit_join_stat(pg_stext,
+											 candidate->referencing_rel,
+											 candidate->referencing_attr,
+											 candidate->referenced_rel,
+											 candidate->referenced_attr,
+											 candidate->filter_attr);
+
+		/* Construct joinkeys: [referencing_attr, referenced_attr] */
+		joinkeys_array[0] = candidate->referencing_attr;
+		joinkeys_array[1] = candidate->referenced_attr;
+		joinkeys = buildint2vector(joinkeys_array, 2);
+
+		/* Construct filter_attnums: [filter_attr] */
+		filter_array[0] = candidate->filter_attr;
+		filter_attnums = buildint2vector(filter_array, 1);
+
+		join_mcv = statext_join_mcv_build(stat_oid,
+										  candidate->referencing_rel,
+										  candidate->referenced_rel,
+										  joinkeys,
+										  filter_attnums,
+										  numrows,
+										  rows,
+										  natts,
+										  vacattrstats);
+
+		elog(DEBUG1, "Join mcv stats: Built MCV list with %d items",
+			 join_mcv ? join_mcv->nitems : 0);
+
+		if (!join_mcv)
+			continue;
+
+		statext_store(stat_oid, inh,
+					  NULL, NULL, NULL, (Datum) 0, NULL,
+					  join_mcv);
+
+		pfree(join_mcv);
+	}
+
+	list_free_deep(candidates);
+}
+
+/*
+ * detect_join_stats_candidates
+ *		Scan foreign key constraints on the analyzed table and identify
+ *		candidates for join MCV statistics collection.
+ *
+ * For each FK constraint:
+ * 1. Get the FK column (referencing side)
+ * 2. Get the referenced table and column
+ * 3. Find columns in referenced table with functional dependencies
+ * 4. Create a candidate for each dependent column
+ *
+ * Returns: List of FKJoinStatsCandidate structs
+ */
+static List *
+detect_join_stats_candidates(Relation rel)
+{
+	List	   *candidates = NIL;
+	Relation	pg_constraint;
+	SysScanDesc scan;
+	ScanKeyData key;
+	HeapTuple	tuple;
+
+	/* Open pg_constraint and scan for FK constraints on this table */
+	pg_constraint = table_open(ConstraintRelationId, AccessShareLock);
+
+	ScanKeyInit(&key,
+				Anum_pg_constraint_conrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(RelationGetRelid(rel)));
+
+	scan = systable_beginscan(pg_constraint, ConstraintRelidTypidNameIndexId,
+							  true, NULL, 1, &key);
+
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tuple);
+		Datum		conkey_datum;
+		Datum		confkey_datum;
+		bool		isnull;
+		ArrayType  *conkey_array;
+		ArrayType  *confkey_array;
+		int			numfks;
+		AttrNumber *conkey;
+		AttrNumber *confkey;
+		List	   *dependents;
+		ListCell   *lc;
+		int			i;
+
+		/* Only interested in foreign key constraints */
+		if (con->contype != CONSTRAINT_FOREIGN)
+			continue;
+
+		/* Get the FK column numbers */
+		conkey_datum = heap_getattr(tuple, Anum_pg_constraint_conkey,
+									RelationGetDescr(pg_constraint), &isnull);
+		Assert(!isnull);
+		conkey_array = DatumGetArrayTypeP(conkey_datum);
+
+		/* Get the referenced column numbers */
+		confkey_datum = heap_getattr(tuple, Anum_pg_constraint_confkey,
+									 RelationGetDescr(pg_constraint), &isnull);
+		Assert(!isnull);
+		confkey_array = DatumGetArrayTypeP(confkey_datum);
+
+		/* Decode the arrays */
+		numfks = ARR_DIMS(conkey_array)[0];
+		conkey = (AttrNumber *) ARR_DATA_PTR(conkey_array);
+		confkey = (AttrNumber *) ARR_DATA_PTR(confkey_array);
+
+		/*
+		 * For each FK column, find functional dependents in referenced table
+		 * TODO: For now we only handle single-column FKs
+		 */
+		for (i = 0; i < numfks; i++)
+		{
+			AttrNumber	referencing_attr = conkey[i];
+			AttrNumber	referenced_attr = confkey[i];
+			Oid			referenced_rel = con->confrelid;
+
+			/* Find columns dependent on the referenced column */
+			dependents = get_functional_dependents(referenced_rel, referenced_attr);
+
+			/* Create a candidate for each dependent column */
+			foreach(lc, dependents)
+			{
+				AttrNumber	filter_attr = lfirst_int(lc);
+				FKJoinStatsCandidate *candidate;
+
+				candidate = (FKJoinStatsCandidate *) palloc(sizeof(FKJoinStatsCandidate));
+				candidate->referencing_rel = RelationGetRelid(rel);
+				candidate->referencing_attr = referencing_attr;
+				candidate->referenced_rel = referenced_rel;
+				candidate->referenced_attr = referenced_attr;
+				candidate->filter_attr = filter_attr;
+
+				candidates = lappend(candidates, candidate);
+			}
+
+			list_free(dependents);
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(pg_constraint, AccessShareLock);
+
+	return candidates;
+}
+
+/*
+ * create_implicit_join_stat
+ *		Create an implicit pg_statistic_ext entry for join MCV statistics
+ *
+ * This creates a statistics object in pg_statistic_ext to hold the
+ * join MCV data. The object is "implicit" in the sense that it's
+ * automatically created by ANALYZE, not explicitly via CREATE STATISTICS.
+ *
+ * Returns the OID of the created statistics object, or InvalidOid if creation
+ * failed or an equivalent object already exists.
+ */
+static Oid
+create_implicit_join_stat(Relation pg_stext,
+						  Oid referencing_rel,
+						  AttrNumber referencing_attr,
+						  Oid referenced_rel,
+						  AttrNumber referenced_attr,
+						  AttrNumber filter_attr)
+{
+	NameData	stat_name;
+	Oid			stat_oid;
+	Datum		values[Natts_pg_statistic_ext];
+	bool		nulls[Natts_pg_statistic_ext];
+	HeapTuple	htup;
+	Datum		kinds[1];
+	ArrayType  *stxkind;
+	int16		attnums[1];
+	ArrayType  *stxkeys;
+	char		namebuf[NAMEDATALEN];
+	HeapTuple	classtuple;
+	Form_pg_class classform;
+	int16		joinkeys[2];
+	int2vector *jk;
+	Oid			stxowner;
+	ObjectAddress myself;
+	ObjectAddress parentobject;
+
+	/*
+	 * Generate a name for this statistics object in format:
+	 * "_join_mcv_<refrel>_<refattr>_<refedrel>_<refedattr>_<filterattr>"
+	 */
+	snprintf(namebuf, NAMEDATALEN, "_join_mcv_%u_%d_%u_%d_%d",
+			 referencing_rel, referencing_attr,
+			 referenced_rel, referenced_attr, filter_attr);
+	namestrcpy(&stat_name, namebuf);
+
+	/*
+	 * Check if a statistics object with this name already exists in this
+	 * namespace. If so, we'll reuse it rather than creating a duplicate.
+	 */
+	stat_oid = GetSysCacheOid2(STATEXTNAMENSP, Anum_pg_statistic_ext_oid,
+							   PointerGetDatum(&stat_name),
+							   ObjectIdGetDatum(get_rel_namespace(referencing_rel)));
+	if (OidIsValid(stat_oid))
+		return stat_oid;
+
+	/* Create new statistics object */
+	memset(values, 0, sizeof(values));
+	memset(nulls, false, sizeof(nulls));
+
+	/* Generate new OID */
+	stat_oid = GetNewOidWithIndex(pg_stext, StatisticExtOidIndexId,
+								  Anum_pg_statistic_ext_oid);
+
+	values[Anum_pg_statistic_ext_oid - 1] = ObjectIdGetDatum(stat_oid);
+	values[Anum_pg_statistic_ext_stxrelid - 1] = ObjectIdGetDatum(referencing_rel);
+	values[Anum_pg_statistic_ext_stxname - 1] = NameGetDatum(&stat_name);
+	values[Anum_pg_statistic_ext_stxnamespace - 1] =
+		ObjectIdGetDatum(get_rel_namespace(referencing_rel));
+
+	/* Get owner from pg_class */
+	classtuple = SearchSysCache1(RELOID, ObjectIdGetDatum(referencing_rel));
+	if (!HeapTupleIsValid(classtuple))
+		elog(ERROR, "cache lookup failed for relation %u", referencing_rel);
+	classform = (Form_pg_class) GETSTRUCT(classtuple);
+	stxowner = classform->relowner;
+	values[Anum_pg_statistic_ext_stxowner - 1] = ObjectIdGetDatum(stxowner);
+	ReleaseSysCache(classtuple);
+
+	/* stxkeys contains the filter column (from referenced table) */
+	attnums[0] = filter_attr;
+	stxkeys = construct_array_builtin((Datum *) attnums, 1, INT2OID);
+	values[Anum_pg_statistic_ext_stxkeys - 1] = PointerGetDatum(stxkeys);
+
+	/* stxkind contains only STATS_EXT_JOIN_MCV */
+	kinds[0] = CharGetDatum(STATS_EXT_JOIN_MCV);
+	stxkind = construct_array_builtin(kinds, 1, CHAROID);
+	values[Anum_pg_statistic_ext_stxkind - 1] = PointerGetDatum(stxkind);
+
+	/* stxstattarget is NULL (use default) */
+	nulls[Anum_pg_statistic_ext_stxstattarget - 1] = true;
+
+	/* stxexprs is NULL (no expressions) */
+	nulls[Anum_pg_statistic_ext_stxexprs - 1] = true;
+
+	/* stxotherrel - the referenced/orther table */
+	values[Anum_pg_statistic_ext_stxotherrel - 1] = ObjectIdGetDatum(referenced_rel);
+
+	/* stxjoinkeys - join column pairs: [referencing_attr, referenced_attr] */
+	joinkeys[0] = referencing_attr;
+	joinkeys[1] = referenced_attr;
+	jk = buildint2vector(joinkeys, 2);
+	values[Anum_pg_statistic_ext_stxjoinkeys - 1] = PointerGetDatum(jk);
+
+	/* Insert the tuple */
+	htup = heap_form_tuple(RelationGetDescr(pg_stext), values, nulls);
+	CatalogTupleInsert(pg_stext, htup);
+	heap_freetuple(htup);
+
+	/*
+	 * Add dependencies on columns used in the stats, so that the stats object
+	 * goes away if any or all of them get dropped.
+	 */
+	ObjectAddressSet(myself, StatisticExtRelationId, stat_oid);
+
+	/* Dependency on filter column (from referenced table) */
+	ObjectAddressSubSet(parentobject, RelationRelationId,
+						referenced_rel, filter_attr);
+	recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO);
+
+	/* Dependencies on join columns from both tables */
+	ObjectAddressSubSet(parentobject, RelationRelationId,
+						referencing_rel, referencing_attr);
+	recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO);
+
+	ObjectAddressSubSet(parentobject, RelationRelationId,
+						referenced_rel, referenced_attr);
+	recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO);
+
+	/*
+	 * Also add dependencies on namespace and owner.  These are required
+	 * because the stats object might have a different namespace and/or owner
+	 * than the underlying table(s).
+	 */
+	ObjectAddressSet(parentobject, NamespaceRelationId,
+					 get_rel_namespace(referencing_rel));
+	recordDependencyOn(&myself, &parentobject, DEPENDENCY_NORMAL);
+
+	recordDependencyOnOwner(StatisticExtRelationId, stat_oid, stxowner);
+
+	return stat_oid;
+}
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 982d1babaa8..6a9668b13ec 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,6 +68,17 @@ typedef struct StatsBuildData
 	bool	  **nulls;
 } StatsBuildData;
 
+/* candidate for implicit foreign key join MCV statistics collection */
+typedef struct FKJoinStatsCandidate
+{
+	Oid			referencing_rel;
+	AttrNumber	referencing_attr;	/* FK column in referencing table */
+	Oid			referenced_rel;
+	AttrNumber	referenced_attr;	/* PK column in referenced table */
+	AttrNumber	filter_attr;	/* filter column in referenced table
+								 * (dependent on referenced_attr) */
+}			FKJoinStatsCandidate;
+
 extern MVNDistinct *statext_ndistinct_build(double totalrows, StatsBuildData *data);
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
 extern MVNDistinct *statext_ndistinct_deserialize(bytea *data);
diff --git a/src/test/regress/expected/stats_ext_crossrel.out b/src/test/regress/expected/stats_ext_crossrel.out
index d41dd37db3f..2be46c87a16 100644
--- a/src/test/regress/expected/stats_ext_crossrel.out
+++ b/src/test/regress/expected/stats_ext_crossrel.out
@@ -283,6 +283,255 @@ CREATE STATISTICS bad_stats5 (mcv) ON lower(k.keyword) FROM movie_keywords2 mk J
 ERROR:  expressions are not supported in join statistics
 CREATE STATISTICS bad_stats6 (mcv) ON k.keyword FROM (movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id)) JOIN keywords2 k2 ON (k.id = k2.id);
 ERROR:  extended join statistics currently support only simple two-table joins
+--
+-- Test automatic collection of join MCV statistics for FK joins
+-- with functional dependencies.
+--
+CREATE TABLE keywords (
+    id INTEGER PRIMARY KEY,
+    keyword TEXT NOT NULL,
+    phonetic_code character varying(5)
+);
+CREATE TABLE movie_keywords (
+    movie_id INTEGER PRIMARY KEY,
+    keyword_id INTEGER NOT NULL REFERENCES keywords(id)
+);
+-- Insert tightly correlated data into the "referenced" table
+INSERT INTO keywords (id, keyword, phonetic_code)
+SELECT
+    i,
+    'keyword_' || i,
+    'ph_' || i
+FROM generate_series(1, 50) i;
+-- Insert data into the referencing table with skewed distribution
+INSERT INTO movie_keywords (movie_id, keyword_id)
+SELECT
+    i,
+    CASE
+        WHEN i % 100 < 60 THEN (i % 10) + 1      -- 60% keyword_ids 1-10 (6% frequency per keyword)
+        WHEN i % 100 < 90 THEN (i % 10) + 11     -- 30% keyword_ids 11-20 (3% frequency per keyword)
+        ELSE (i % 10) + 21                       -- 10% keyword_ids 21-30 (1% frequency per keyword)
+        END
+FROM generate_series(1, 10000) i;
+-- Create functional dependency statistics on the referenced table
+CREATE STATISTICS keywords_deps_stat (dependencies) ON id, keyword, phonetic_code FROM keywords;
+ANALYZE keywords;
+-- Analyze the referencing table to trigger join MCV collection
+ANALYZE movie_keywords;
+-- Show the join statistics
+SELECT s.stxrelid::regclass,
+       s.stxotherrel::regclass,
+       s.stxjoinkeys,
+       s.stxkeys,
+       s.stxkind,
+       s.stxstattarget,
+       s.stxexprs
+FROM pg_statistic_ext s
+WHERE s.stxrelid = 'movie_keywords'::regclass
+ORDER BY s.oid;
+    stxrelid    | stxotherrel | stxjoinkeys | stxkeys | stxkind | stxstattarget | stxexprs 
+----------------+-------------+-------------+---------+---------+---------------+----------
+ movie_keywords | keywords    | 2 1         | 2       | {c}     |               | 
+ movie_keywords | keywords    | 2 1         | 3       | {c}     |               | 
+(2 rows)
+
+-- Note: the MCV list items only store the filter values (join values are implicit)
+SELECT m.index,
+       m.values,
+       m.nulls,
+       ROUND(m.frequency::numeric, 2) AS frequency
+FROM pg_statistic_ext s
+         JOIN pg_statistic_ext_data d ON (s.oid = d.stxoid)
+         CROSS JOIN LATERAL pg_join_mcv_list_items(d.stxdjoinmcv) AS m
+WHERE s.stxrelid = 'movie_keywords'::regclass
+  AND 'c' = ANY(s.stxkind)
+ORDER BY s.stxname, m.index;
+ index |    values    | nulls | frequency 
+-------+--------------+-------+-----------
+     0 | {keyword_1}  | {f}   |      0.06
+     1 | {keyword_2}  | {f}   |      0.06
+     2 | {keyword_3}  | {f}   |      0.06
+     3 | {keyword_4}  | {f}   |      0.06
+     4 | {keyword_5}  | {f}   |      0.06
+     5 | {keyword_6}  | {f}   |      0.06
+     6 | {keyword_7}  | {f}   |      0.06
+     7 | {keyword_8}  | {f}   |      0.06
+     8 | {keyword_9}  | {f}   |      0.06
+     9 | {keyword_10} | {f}   |      0.06
+    10 | {keyword_11} | {f}   |      0.03
+    11 | {keyword_12} | {f}   |      0.03
+    12 | {keyword_13} | {f}   |      0.03
+    13 | {keyword_14} | {f}   |      0.03
+    14 | {keyword_15} | {f}   |      0.03
+    15 | {keyword_16} | {f}   |      0.03
+    16 | {keyword_17} | {f}   |      0.03
+    17 | {keyword_18} | {f}   |      0.03
+    18 | {keyword_19} | {f}   |      0.03
+    19 | {keyword_20} | {f}   |      0.03
+    20 | {keyword_21} | {f}   |      0.01
+    21 | {keyword_22} | {f}   |      0.01
+    22 | {keyword_23} | {f}   |      0.01
+    23 | {keyword_24} | {f}   |      0.01
+    24 | {keyword_25} | {f}   |      0.01
+    25 | {keyword_26} | {f}   |      0.01
+    26 | {keyword_27} | {f}   |      0.01
+    27 | {keyword_28} | {f}   |      0.01
+    28 | {keyword_29} | {f}   |      0.01
+    29 | {keyword_30} | {f}   |      0.01
+     0 | {ph_1}       | {f}   |      0.06
+     1 | {ph_2}       | {f}   |      0.06
+     2 | {ph_3}       | {f}   |      0.06
+     3 | {ph_4}       | {f}   |      0.06
+     4 | {ph_5}       | {f}   |      0.06
+     5 | {ph_6}       | {f}   |      0.06
+     6 | {ph_7}       | {f}   |      0.06
+     7 | {ph_8}       | {f}   |      0.06
+     8 | {ph_9}       | {f}   |      0.06
+     9 | {ph_10}      | {f}   |      0.06
+    10 | {ph_11}      | {f}   |      0.03
+    11 | {ph_12}      | {f}   |      0.03
+    12 | {ph_13}      | {f}   |      0.03
+    13 | {ph_14}      | {f}   |      0.03
+    14 | {ph_15}      | {f}   |      0.03
+    15 | {ph_16}      | {f}   |      0.03
+    16 | {ph_17}      | {f}   |      0.03
+    17 | {ph_18}      | {f}   |      0.03
+    18 | {ph_19}      | {f}   |      0.03
+    19 | {ph_20}      | {f}   |      0.03
+    20 | {ph_21}      | {f}   |      0.01
+    21 | {ph_22}      | {f}   |      0.01
+    22 | {ph_23}      | {f}   |      0.01
+    23 | {ph_24}      | {f}   |      0.01
+    24 | {ph_25}      | {f}   |      0.01
+    25 | {ph_26}      | {f}   |      0.01
+    26 | {ph_27}      | {f}   |      0.01
+    27 | {ph_28}      | {f}   |      0.01
+    28 | {ph_29}      | {f}   |      0.01
+    29 | {ph_30}      | {f}   |      0.01
+(60 rows)
+
+-- Ensure the join MCV statistics are used for single equality predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword = ''keyword_1'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       600 |    600
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword = ''keyword_15'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       300 |    300
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword = ''keyword_25'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       100 |    100
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code = ''ph_1'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       600 |    600
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code = ''ph_15'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       300 |    300
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code = ''ph_25'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       100 |    100
+(1 row)
+
+-- No filter on referenced table, should not use join stats
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+     10000 |  10000
+(1 row)
+
+-- OR Predicates (currently NOT supported by join MCV stats)
+-- Expected: Will likely underestimate
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE (k.keyword = ''keyword_1'' OR k.keyword = ''keyword_2'')
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       400 |   1200
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE (k.keyword = ''keyword_1'' OR k.keyword = ''keyword_15'')
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       400 |    900
+(1 row)
+
+-- Ensure the join MCV statistics are used for IN predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword IN (''keyword_1'', ''keyword_2'', ''keyword_3'')
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+      1800 |   1800
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword IN (''keyword_1'', ''keyword_15'', ''keyword_25'')
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+      1000 |   1000
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code IN (''ph_1'', ''ph_15'')
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       900 |    900
+(1 row)
+
 -- Cleanup
 DROP TABLE movie_keywords2 CASCADE;
 DROP TABLE keywords2 CASCADE;
+DROP TABLE movie_keywords CASCADE;
+DROP TABLE keywords CASCADE;
diff --git a/src/test/regress/sql/stats_ext_crossrel.sql b/src/test/regress/sql/stats_ext_crossrel.sql
index b771a30f34d..4ba9a9a926c 100644
--- a/src/test/regress/sql/stats_ext_crossrel.sql
+++ b/src/test/regress/sql/stats_ext_crossrel.sql
@@ -167,6 +167,146 @@ CREATE STATISTICS bad_stats4 (mcv) ON keyword FROM movie_keywords2 mk JOIN keywo
 CREATE STATISTICS bad_stats5 (mcv) ON lower(k.keyword) FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
 CREATE STATISTICS bad_stats6 (mcv) ON k.keyword FROM (movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id)) JOIN keywords2 k2 ON (k.id = k2.id);
 
+--
+-- Test automatic collection of join MCV statistics for FK joins
+-- with functional dependencies.
+--
+
+CREATE TABLE keywords (
+    id INTEGER PRIMARY KEY,
+    keyword TEXT NOT NULL,
+    phonetic_code character varying(5)
+);
+
+CREATE TABLE movie_keywords (
+    movie_id INTEGER PRIMARY KEY,
+    keyword_id INTEGER NOT NULL REFERENCES keywords(id)
+);
+
+-- Insert tightly correlated data into the "referenced" table
+INSERT INTO keywords (id, keyword, phonetic_code)
+SELECT
+    i,
+    'keyword_' || i,
+    'ph_' || i
+FROM generate_series(1, 50) i;
+
+-- Insert data into the referencing table with skewed distribution
+INSERT INTO movie_keywords (movie_id, keyword_id)
+SELECT
+    i,
+    CASE
+        WHEN i % 100 < 60 THEN (i % 10) + 1      -- 60% keyword_ids 1-10 (6% frequency per keyword)
+        WHEN i % 100 < 90 THEN (i % 10) + 11     -- 30% keyword_ids 11-20 (3% frequency per keyword)
+        ELSE (i % 10) + 21                       -- 10% keyword_ids 21-30 (1% frequency per keyword)
+        END
+FROM generate_series(1, 10000) i;
+
+-- Create functional dependency statistics on the referenced table
+CREATE STATISTICS keywords_deps_stat (dependencies) ON id, keyword, phonetic_code FROM keywords;
+ANALYZE keywords;
+
+-- Analyze the referencing table to trigger join MCV collection
+ANALYZE movie_keywords;
+
+-- Show the join statistics
+SELECT s.stxrelid::regclass,
+       s.stxotherrel::regclass,
+       s.stxjoinkeys,
+       s.stxkeys,
+       s.stxkind,
+       s.stxstattarget,
+       s.stxexprs
+FROM pg_statistic_ext s
+WHERE s.stxrelid = 'movie_keywords'::regclass
+ORDER BY s.oid;
+
+-- Note: the MCV list items only store the filter values (join values are implicit)
+SELECT m.index,
+       m.values,
+       m.nulls,
+       ROUND(m.frequency::numeric, 2) AS frequency
+FROM pg_statistic_ext s
+         JOIN pg_statistic_ext_data d ON (s.oid = d.stxoid)
+         CROSS JOIN LATERAL pg_join_mcv_list_items(d.stxdjoinmcv) AS m
+WHERE s.stxrelid = 'movie_keywords'::regclass
+  AND 'c' = ANY(s.stxkind)
+ORDER BY s.stxname, m.index;
+
+-- Ensure the join MCV statistics are used for single equality predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword = ''keyword_1'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword = ''keyword_15'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword = ''keyword_25'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code = ''ph_1'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code = ''ph_15'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code = ''ph_25'' AND k.id = mk.keyword_id
+');
+
+-- No filter on referenced table, should not use join stats
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.id = mk.keyword_id
+');
+
+-- OR Predicates (currently NOT supported by join MCV stats)
+-- Expected: Will likely underestimate
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE (k.keyword = ''keyword_1'' OR k.keyword = ''keyword_2'')
+      AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE (k.keyword = ''keyword_1'' OR k.keyword = ''keyword_15'')
+      AND k.id = mk.keyword_id
+');
+
+-- Ensure the join MCV statistics are used for IN predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword IN (''keyword_1'', ''keyword_2'', ''keyword_3'')
+      AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.keyword IN (''keyword_1'', ''keyword_15'', ''keyword_25'')
+      AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords mk, keywords k
+    WHERE k.phonetic_code IN (''ph_1'', ''ph_15'')
+      AND k.id = mk.keyword_id
+');
+
 -- Cleanup
 DROP TABLE movie_keywords2 CASCADE;
 DROP TABLE keywords2 CASCADE;
+DROP TABLE movie_keywords CASCADE;
+DROP TABLE keywords CASCADE;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v1-0001-Extend-CREATE-STATISTICS-syntax-for-join-MCV-stat.patch (127.5K, 5-v1-0001-Extend-CREATE-STATISTICS-syntax-for-join-MCV-stat.patch)
  download | inline diff:
From db7092824ad3ca532104e71b76390aabf8c3a207 Mon Sep 17 00:00:00 2001
From: Alexandra Wang <[email protected]>
Date: Tue, 20 Jan 2026 09:59:35 -0800
Subject: [PATCH v1 1/2] Extend CREATE STATISTICS syntax for join MCV
 statistics

This patch extends CREATE STATISTICS to support creating join MCV
statistics using a JOIN clause.

Proposed syntax:

  CREATE STATISTICS stat_name (mcv)
  ON other_table.filter_col
  FROM primary_table JOIN other_table ON (primary_table.fk = other_table.pk);

Join MCV statistics capture the joint distribution of filter column
values and their frequencies in the primary (referencing) table. This
enables improved cardinality estimation for queries with selective
filters on joined tables.

In order to store the join mcv stats in the cataloag, The following
changes are made in pg_statistic_ext and pg_statistic_ext_data:

- Add new kind 'c' as an option of stxkind
- Add stxotherrel (oid) and stxjoinkeys (int2vector) fields to
  pg_statistic_ext
- Use the existing stxkeys (int2vector) to store the filter attributes
  of stxortherrel
- Create pg_statistic_ext_otherrel_index on (stxrelid, stxotherrel)
- Add stxdjoinmcv (pg_join_mcv_list) to pg_statistic_ext_data

Pattern detection (join + filter combinations) are added in the
following planner functions to calculate join mcv selectivity:

- clauselist_selectivity_ext()
- get_foreign_key_join_selectivity()

Current scope and limitations:

- Consider only joins on 2 tables
- Consider only one pair of equality join clause, but can be extended to multiple easily
- Filter clause support only Equality and IN operators for now
- Consider only Vars, not considering Expressions yet
- Mainly looked at inner join, but semi/anti/full shouldn't be too difficult
---
 src/backend/commands/statscmds.c              |  457 +++--
 src/backend/optimizer/path/clausesel.c        |   43 +-
 src/backend/optimizer/path/costsize.c         |   57 +-
 src/backend/parser/gram.y                     |    7 +
 src/backend/parser/parse_utilcmd.c            |  128 +-
 src/backend/statistics/Makefile               |    1 +
 src/backend/statistics/extended_stats.c       |  238 ++-
 src/backend/statistics/extended_stats_funcs.c |    1 +
 src/backend/statistics/join_mcv.c             | 1493 +++++++++++++++++
 src/backend/statistics/meson.build            |    1 +
 src/backend/tcop/utility.c                    |   52 +-
 src/include/catalog/catversion.h              |    2 +-
 src/include/catalog/pg_proc.dat               |   20 +
 src/include/catalog/pg_statistic_ext.h        |    8 +
 src/include/catalog/pg_statistic_ext_data.h   |    1 +
 src/include/catalog/pg_type.dat               |    7 +
 .../statistics/extended_stats_internal.h      |   13 +-
 src/include/statistics/statistics.h           |   84 +
 src/test/regress/expected/oidjoins.out        |   13 +
 src/test/regress/expected/stats_ext.out       |   14 +-
 .../regress/expected/stats_ext_crossrel.out   |  288 ++++
 src/test/regress/expected/type_sanity.out     |   10 +-
 src/test/regress/parallel_schedule            |    3 +
 src/test/regress/sql/oidjoins.sql             |   12 +
 src/test/regress/sql/stats_ext_crossrel.sql   |  172 ++
 25 files changed, 2943 insertions(+), 182 deletions(-)
 create mode 100644 src/backend/statistics/join_mcv.c
 create mode 100644 src/test/regress/expected/stats_ext_crossrel.out
 create mode 100644 src/test/regress/sql/stats_ext_crossrel.sql

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index c1da79f36ba..529293576d0 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -43,6 +43,9 @@
 static char *ChooseExtendedStatisticName(const char *name1, const char *name2,
 										 const char *label, Oid namespaceid);
 static char *ChooseExtendedStatisticNameAddition(List *exprs);
+static bool extract_join_columns(JoinExpr *join, int stxkeys_varno,
+								 Oid *relid, Oid *other_relid,
+								 int16 *jk_attnums);
 
 
 /* qsort comparator for the attnums in CreateStatistics */
@@ -78,7 +81,7 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	Datum		exprsDatum;
 	Relation	statrel;
 	Relation	rel = NULL;
-	Oid			relid;
+	Oid			relid = InvalidOid;
 	ObjectAddress parentobject,
 				myself;
 	Datum		types[4];		/* one for each possible type of statistic */
@@ -92,29 +95,30 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	int			i;
 	ListCell   *cell;
 	ListCell   *cell2;
+	Node	   *rln;
+	JoinExpr   *join = NULL;
+	Relation	other_rel = NULL;
+	Oid			other_relid = InvalidOid;
+	int16		jk_attnums[2];	/* Join key attribute numbers: [rel's jk
+								 * attnum1, other_rel's jk attnum2] */
+	int			stxkeys_varno = 0;	/* For joins: which table has the
+									 * statistics columns */
 
 	Assert(IsA(stmt, CreateStatsStmt));
 
 	/*
-	 * Examine the FROM clause.  Currently, we only allow it to be a single
-	 * simple table, but later we'll probably allow multiple tables and JOIN
-	 * syntax.  The grammar is already prepared for that, so we have to check
-	 * here that what we got is what we can support.
+	 * Examine the FROM clause. We support either: 1. Single RangeVar for
+	 * single-table statistics 2. JoinExpr for join statistics
 	 */
 	if (list_length(stmt->relations) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("only a single relation is allowed in CREATE STATISTICS")));
-
-	foreach(cell, stmt->relations)
-	{
-		Node	   *rln = (Node *) lfirst(cell);
+				 errmsg("only a single relation or JOIN is allowed in CREATE STATISTICS")));
 
-		if (!IsA(rln, RangeVar))
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("only a single relation is allowed in CREATE STATISTICS")));
+	rln = (Node *) linitial(stmt->relations);
 
+	if (IsA(rln, RangeVar))
+	{
 		/*
 		 * CREATE STATISTICS will influence future execution plans but does
 		 * not interfere with currently executing plans.  So it should be
@@ -152,69 +156,19 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 					(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 					 errmsg("permission denied: \"%s\" is a system catalog",
 							RelationGetRelationName(rel))));
-	}
-
-	Assert(rel);
-	relid = RelationGetRelid(rel);
 
-	/*
-	 * If the node has a name, split it up and determine creation namespace.
-	 * If not, put the object in the same namespace as the relation, and cons
-	 * up a name for it.  (This can happen either via "CREATE STATISTICS ..."
-	 * or via "CREATE TABLE ... (LIKE)".)
-	 */
-	if (stmt->defnames)
-		namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames,
-														&namestr);
-	else
-	{
-		namespaceId = RelationGetNamespace(rel);
-		namestr = ChooseExtendedStatisticName(RelationGetRelationName(rel),
-											  ChooseExtendedStatisticNameAddition(stmt->exprs),
-											  "stat",
-											  namespaceId);
+		relid = RelationGetRelid(rel);
 	}
-	namestrcpy(&stxname, namestr);
-
-	/*
-	 * Check we have creation rights in target namespace.  Skip check if
-	 * caller doesn't want it.
-	 */
-	if (check_rights)
+	else if (IsA(rln, JoinExpr))
 	{
-		AclResult	aclresult;
-
-		aclresult = object_aclcheck(NamespaceRelationId, namespaceId,
-									GetUserId(), ACL_CREATE);
-		if (aclresult != ACLCHECK_OK)
-			aclcheck_error(aclresult, OBJECT_SCHEMA,
-						   get_namespace_name(namespaceId));
+		Assert(stmt->transformed);
+		join = (JoinExpr *) rln;
 	}
-
-	/*
-	 * Deal with the possibility that the statistics object already exists.
-	 */
-	if (SearchSysCacheExists2(STATEXTNAMENSP,
-							  CStringGetDatum(namestr),
-							  ObjectIdGetDatum(namespaceId)))
+	else
 	{
-		if (stmt->if_not_exists)
-		{
-			/*
-			 * Since stats objects aren't members of extensions (see comments
-			 * below), no need for checkMembershipInCurrentExtension here.
-			 */
-			ereport(NOTICE,
-					(errcode(ERRCODE_DUPLICATE_OBJECT),
-					 errmsg("statistics object \"%s\" already exists, skipping",
-							namestr)));
-			relation_close(rel, NoLock);
-			return InvalidObjectAddress;
-		}
-
 		ereport(ERROR,
-				(errcode(ERRCODE_DUPLICATE_OBJECT),
-				 errmsg("statistics object \"%s\" already exists", namestr)));
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("only a single relation or JOIN is allowed in CREATE STATISTICS")));
 	}
 
 	/*
@@ -252,6 +206,12 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 			Form_pg_attribute attForm;
 			TypeCacheEntry *type;
 
+			/* Join stats require table-qualified column names */
+			if (join)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+						 errmsg("join statistics require table-qualified column names")));
+
 			attname = selem->name;
 
 			atttuple = SearchSysCacheAttName(relid, attname);
@@ -290,6 +250,7 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 		{
 			Var		   *var = (Var *) selem->expr;
 			TypeCacheEntry *type;
+			Oid			var_relid;
 
 			/* Disallow use of system attributes in extended stats */
 			if (var->varattno <= 0)
@@ -297,22 +258,48 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						 errmsg("statistics creation on system columns is not supported")));
 
-			/* Disallow use of virtual generated columns in extended stats */
-			if (get_attgenerated(relid, var->varattno) == ATTRIBUTE_GENERATED_VIRTUAL)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("statistics creation on virtual generated columns is not supported")));
+			/*
+			 * For join stats, track which table the stats objects are from.
+			 * Currently only supports one table (other_rel).
+			 */
+			if (join)
+			{
+				/* Track which table has the statistics columns */
+				if (stxkeys_varno == 0)
+					stxkeys_varno = var->varno;
+				else if (stxkeys_varno != var->varno)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+							 errmsg("join statistics columns from different tables are not supported")));
+
+				/*
+				 * For joins, we'll validate the column after extracting join
+				 * info and determining which table is which.
+				 */
+				attnums[nattnums] = var->varattno;
+				nattnums++;
+			}
+			else
+			{
+				var_relid = relid;
 
-			/* Disallow data types without a less-than operator */
-			type = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
-			if (type->lt_opr == InvalidOid)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
-								get_attname(relid, var->varattno, false), format_type_be(var->vartype))));
+				/* Disallow use of virtual generated columns in extended stats */
+				if (get_attgenerated(var_relid, var->varattno) == ATTRIBUTE_GENERATED_VIRTUAL)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("statistics creation on virtual generated columns is not supported")));
 
-			attnums[nattnums] = var->varattno;
-			nattnums++;
+				/* Disallow data types without a less-than operator */
+				type = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+				if (type->lt_opr == InvalidOid)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
+									get_attname(var_relid, var->varattno, false), format_type_be(var->vartype))));
+
+				attnums[nattnums] = var->varattno;
+				nattnums++;
+			}
 		}
 		else					/* expression */
 		{
@@ -324,6 +311,12 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 
 			Assert(expr != NULL);
 
+			/* Join stats only support simple column references */
+			if (join)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("expressions are not supported in join statistics")));
+
 			pull_varattnos(expr, 1, &attnums);
 
 			k = -1;
@@ -366,6 +359,111 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 		}
 	}
 
+	/*
+	 * For join statistics, now that we know which table has the statistics
+	 * columns, extract the join information.
+	 */
+	if (join)
+	{
+		Assert(stxkeys_varno != 0);
+
+		if (!extract_join_columns(join, stxkeys_varno,
+								  &relid, &other_relid,
+								  jk_attnums))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("join statistics require a simple equijoin condition")));
+
+		/* Open both relations now for uniform access later */
+		rel = relation_open(relid, NoLock);
+		other_rel = relation_open(other_relid, NoLock);
+
+		/*
+		 * Validate statistics columns now that we know which tables they're
+		 * from
+		 */
+		for (i = 0; i < nattnums; i++)
+		{
+			TypeCacheEntry *type;
+			Oid			atttype;
+
+			if (get_attgenerated(other_relid, attnums[i]) == ATTRIBUTE_GENERATED_VIRTUAL)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("statistics creation on virtual generated columns is not supported")));
+
+			/* Disallow data types without a less-than operator */
+			atttype = get_atttype(other_relid, attnums[i]);
+			type = lookup_type_cache(atttype, TYPECACHE_LT_OPR);
+			if (type->lt_opr == InvalidOid)
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
+								get_attname(other_relid, attnums[i], false),
+								format_type_be(atttype))));
+		}
+	}
+
+	/*
+	 * Now determine namespace and name. Use rel (primary table for joins).
+	 */
+	if (stmt->defnames)
+		namespaceId = QualifiedNameGetCreationNamespace(stmt->defnames,
+														&namestr);
+	else
+	{
+		namespaceId = RelationGetNamespace(rel);
+		namestr = ChooseExtendedStatisticName(RelationGetRelationName(rel),
+											  ChooseExtendedStatisticNameAddition(stmt->exprs),
+											  "stat",
+											  namespaceId);
+	}
+	namestrcpy(&stxname, namestr);
+
+	/*
+	 * Check we have creation rights in target namespace.  Skip check if
+	 * caller doesn't want it.
+	 */
+	if (check_rights)
+	{
+		AclResult	aclresult;
+
+		aclresult = object_aclcheck(NamespaceRelationId, namespaceId,
+									GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+						   get_namespace_name(namespaceId));
+	}
+
+	/*
+	 * Deal with the possibility that the statistics object already exists.
+	 */
+	if (SearchSysCacheExists2(STATEXTNAMENSP,
+							  CStringGetDatum(namestr),
+							  ObjectIdGetDatum(namespaceId)))
+	{
+		if (stmt->if_not_exists)
+		{
+			/*
+			 * Since stats objects aren't members of extensions (see comments
+			 * below), no need for checkMembershipInCurrentExtension here.
+			 */
+			ereport(NOTICE,
+					(errcode(ERRCODE_DUPLICATE_OBJECT),
+					 errmsg("statistics object \"%s\" already exists, skipping",
+							namestr)));
+			/* Close relations */
+			relation_close(rel, NoLock);
+			if (other_rel)
+				relation_close(other_rel, NoLock);
+			return InvalidObjectAddress;
+		}
+
+		ereport(ERROR,
+				(errcode(ERRCODE_DUPLICATE_OBJECT),
+				 errmsg("statistics object \"%s\" already exists", namestr)));
+	}
+
 	/*
 	 * Parse the statistics kinds.
 	 *
@@ -431,13 +529,24 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	build_expressions = (stxexprs != NIL);
 
 	/*
-	 * Check that at least two columns were specified in the statement, or
-	 * that we're building statistics on a single expression.
+	 * Check column count requirements.
 	 */
-	if ((numcols < 2) && (list_length(stxexprs) != 1))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-				 errmsg("extended statistics require at least 2 columns")));
+	if (join)
+	{
+		/* Join stats need at least 1 column (join adds implicit correlation) */
+		if (numcols < 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("extended join statistics require at least one column")));
+	}
+	else
+	{
+		/* Single-table stats need at least 2 columns or 1 expression */
+		if (numcols < 2 && list_length(stxexprs) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+					 errmsg("extended statistics require at least 2 columns")));
+	}
 
 	/*
 	 * Sort the attnums, which makes detecting duplicates somewhat easier, and
@@ -496,14 +605,23 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	/* Form an int2vector representation of the sorted column list */
 	stxkeys = buildint2vector(attnums, nattnums);
 
-	/* construct the char array of enabled statistic types */
+	/*
+	 * Construct the char array of enabled statistic types. For join
+	 * statistics, only MCV is supported (as STATS_EXT_JOIN_MCV).
+	 */
+	if (join && (build_ndistinct || build_dependencies || build_expressions))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("only MCV statistics are supported for join statistics"),
+				 errhint("ndistinct, dependencies, and expression statistics require a single table.")));
+
 	ntypes = 0;
 	if (build_ndistinct)
 		types[ntypes++] = CharGetDatum(STATS_EXT_NDISTINCT);
 	if (build_dependencies)
 		types[ntypes++] = CharGetDatum(STATS_EXT_DEPENDENCIES);
 	if (build_mcv)
-		types[ntypes++] = CharGetDatum(STATS_EXT_MCV);
+		types[ntypes++] = CharGetDatum(other_rel ? STATS_EXT_JOIN_MCV : STATS_EXT_MCV);
 	if (build_expressions)
 		types[ntypes++] = CharGetDatum(STATS_EXT_EXPRESSIONS);
 	Assert(ntypes > 0 && ntypes <= lengthof(types));
@@ -544,6 +662,30 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	if (exprsDatum == (Datum) 0)
 		nulls[Anum_pg_statistic_ext_stxexprs - 1] = true;
 
+	/*
+	 * For join statistics, populate stxotherrel and stxjoinkeys. For
+	 * single-table statistics, these fields are NULL.
+	 */
+	if (join)
+	{
+		int2vector *stxjoinkeys;
+
+		/*
+		 * Build the join keys vector from jk_attnums: [primary_attnum,
+		 * other_attnum]
+		 */
+		stxjoinkeys = buildint2vector(jk_attnums, 2);
+
+		values[Anum_pg_statistic_ext_stxotherrel - 1] = ObjectIdGetDatum(other_relid);
+		values[Anum_pg_statistic_ext_stxjoinkeys - 1] = PointerGetDatum(stxjoinkeys);
+	}
+	else
+	{
+		/* Cross-table fields are NULL for single-table statistics */
+		nulls[Anum_pg_statistic_ext_stxotherrel - 1] = true;
+		nulls[Anum_pg_statistic_ext_stxjoinkeys - 1] = true;
+	}
+
 	/* insert it into pg_statistic_ext */
 	htup = heap_form_tuple(statrel->rd_att, values, nulls);
 	CatalogTupleInsert(statrel, htup);
@@ -560,11 +702,17 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	InvokeObjectPostCreateHook(StatisticExtRelationId, statoid, 0);
 
 	/*
-	 * Invalidate relcache so that others see the new statistics object.
+	 * Invalidate relcache so that others see the new statistics object. For
+	 * join stats, invalidate both tables.
 	 */
 	CacheInvalidateRelcache(rel);
+	if (other_rel)
+		CacheInvalidateRelcache(other_rel);
 
+	/* Close relations */
 	relation_close(rel, NoLock);
+	if (other_rel)
+		relation_close(other_rel, NoLock);
 
 	/*
 	 * Add an AUTO dependency on each column used in the stats, so that the
@@ -575,7 +723,21 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	/* add dependencies for plain column references */
 	for (i = 0; i < nattnums; i++)
 	{
-		ObjectAddressSubSet(parentobject, RelationRelationId, relid, attnums[i]);
+		Oid			col_relid = other_rel ? other_relid : relid;
+
+		ObjectAddressSubSet(parentobject, RelationRelationId, col_relid, attnums[i]);
+		recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO);
+	}
+
+	/* For join stats, also add dependencies on join columns */
+	if (other_rel)
+	{
+		ObjectAddressSubSet(parentobject, RelationRelationId,
+							relid, jk_attnums[0]);
+		recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO);
+
+		ObjectAddressSubSet(parentobject, RelationRelationId,
+							other_relid, jk_attnums[1]);
 		recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO);
 	}
 
@@ -589,8 +751,11 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	 * XXX We intentionally don't consider the expressions before adding this
 	 * dependency, because recordDependencyOnSingleRelExpr may not create any
 	 * dependencies for whole-row Vars.
+	 *
+	 * Join stats are excluded because they span two tables (no single "whole
+	 * table" exists).
 	 */
-	if (!nattnums)
+	if (!join && !nattnums)
 	{
 		ObjectAddressSet(parentobject, RelationRelationId, relid);
 		recordDependencyOn(&myself, &parentobject, DEPENDENCY_AUTO);
@@ -930,6 +1095,106 @@ ChooseExtendedStatisticNameAddition(List *exprs)
 	return pstrdup(buf);
 }
 
+/*
+ * extract_join_columns
+ *
+ * Extract join column information from a JoinExpr for join statistics.
+ * Uses stxkeys_varno to determine which table has the statistics columns (currently
+ * this must be the "other" table, not the "primary" table).
+ *
+ * Populates jk_attnums array with join key attribute numbers: [rel's jk, other_rel's jk]
+ *
+ * Returns true if successful, false if the join condition is too complex.
+ *
+ * Currently, supports only simple equijoin: table1.col1 = table2.col2
+ */
+static bool
+extract_join_columns(JoinExpr *join, int stxkeys_varno,
+					 Oid *relid, Oid *other_relid,
+					 int16 *jk_attnums)
+{
+	Oid			left_relid;
+	Oid			right_relid;
+	Node	   *quals;
+	OpExpr	   *opexpr;
+	Var		   *left_var;
+	Var		   *right_var;
+	AttrNumber	left_attnum;
+	AttrNumber	right_attnum;
+
+	if (!IsA(join->larg, RangeVar) || !IsA(join->rarg, RangeVar))
+		return false;
+
+	left_relid = RangeVarGetRelid((RangeVar *) join->larg, NoLock, false);
+	right_relid = RangeVarGetRelid((RangeVar *) join->rarg, NoLock, false);
+
+	quals = join->quals;
+	if (quals == NULL || !IsA(quals, OpExpr))
+		return false;
+
+	opexpr = (OpExpr *) quals;
+
+	/*
+	 * Must be a binary operator (simple equality join) and both operands must
+	 * be column references
+	 */
+	if (list_length(opexpr->args) != 2 ||
+		!IsA(linitial(opexpr->args), Var) || !IsA(lsecond(opexpr->args), Var))
+		return false;
+
+	left_var = (Var *) linitial(opexpr->args);
+	right_var = (Var *) lsecond(opexpr->args);
+
+	/* Extract join columns */
+	if (left_var->varno == 1 && right_var->varno == 2)
+	{
+		left_attnum = left_var->varattno;
+		right_attnum = right_var->varattno;
+	}
+	else if (left_var->varno == 2 && right_var->varno == 1)
+	{
+		left_attnum = right_var->varattno;
+		right_attnum = left_var->varattno;
+	}
+	else
+	{
+		/* Unexpected varno values */
+		return false;
+	}
+
+	/*
+	 * Now assign primary vs other based on which table has the statistics
+	 * columns. Currently, statistics columns must be on the "other" table
+	 * (specified by stxkeys_varno), not the primary table. The primary table
+	 * is the estimation target.
+	 *
+	 * Populate jk_attnums as: [primary_attnum, other_attnum]
+	 */
+	if (stxkeys_varno == 1)
+	{
+		/* Left table has statistics columns, so it's the other table */
+		*other_relid = left_relid;
+		*relid = right_relid;
+		jk_attnums[0] = right_attnum;	/* primary table join column */
+		jk_attnums[1] = left_attnum;	/* other table join column */
+	}
+	else if (stxkeys_varno == 2)
+	{
+		/* Right table has statistics columns, so it's the other table */
+		*other_relid = right_relid;
+		*relid = left_relid;
+		jk_attnums[0] = left_attnum;	/* primary table join column */
+		jk_attnums[1] = right_attnum;	/* other table join column */
+	}
+	else
+	{
+		/* Invalid stxkeys_varno */
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * StatisticsGetRelation: given a statistics object's OID, get the OID of
  * the relation it is defined on.  Uses the system cache.
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 25c4d177ad9..91630df6b1a 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -129,19 +129,44 @@ clauselist_selectivity_ext(PlannerInfo *root,
 	int			listidx;
 
 	/*
-	 * If there's exactly one clause, just go directly to
-	 * clause_selectivity_ext(). None of what we might do below is relevant.
+	 * Determine if these clauses reference a single relation.
 	 */
-	if (list_length(clauses) == 1)
+	rel = find_single_rel_for_clauses(root, clauses);
+
+	/*
+	 * If there's exactly one clause that references a single relation, just
+	 * go directly to clause_selectivity_ext(). None of what we might do below
+	 * is relevant. If multiple relations are referenced, we must NOT
+	 * short-circuit because we need to detect join stats opportunities.
+	 */
+	if (list_length(clauses) == 1 && rel != NULL)
 		return clause_selectivity_ext(root, (Node *) linitial(clauses),
 									  varRelid, jointype, sjinfo,
 									  use_extended_stats);
 
 	/*
-	 * Determine if these clauses reference a single relation.  If so, and if
-	 * it has extended statistics, try to apply those.
+	 * Try applying extended statistics for joins/parameterized scans.
+	 *
+	 * Note: FK-based joins are already handled by
+	 * get_foreign_key_join_selectivity() which runs before
+	 * clauselist_selectivity(). This primarily benefits: - Parameterized path
+	 * estimation (before FK selectivity runs) - Non-FK join path estimation
+	 */
+	if (use_extended_stats && !rel)
+	{
+		/*
+		 * Estimate as many clauses as possible using extended statistics.
+		 *
+		 * 'estimatedclauses' is populated with the 0-based list position
+		 * index of clauses estimated here, and that should be ignored below.
+		 */
+		s1 *= statext_join_mcv_clauselist_selectivity(root, clauses, varRelid,
+													  &estimatedclauses);
+	}
+
+	/*
+	 * Try applying extended statistics for single table scans.
 	 */
-	rel = find_single_rel_for_clauses(root, clauses);
 	if (use_extended_stats && rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
 	{
 		/*
@@ -150,9 +175,9 @@ clauselist_selectivity_ext(PlannerInfo *root,
 		 * 'estimatedclauses' is populated with the 0-based list position
 		 * index of clauses estimated here, and that should be ignored below.
 		 */
-		s1 = statext_clauselist_selectivity(root, clauses, varRelid,
-											jointype, sjinfo, rel,
-											&estimatedclauses, false);
+		s1 *= statext_clauselist_selectivity(root, clauses, varRelid,
+											 jointype, sjinfo, rel,
+											 &estimatedclauses, false);
 	}
 
 	/*
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 16bf1f61a0f..0b1281d43e1 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -108,6 +108,7 @@
 #include "utils/selfuncs.h"
 #include "utils/spccache.h"
 #include "utils/tuplesort.h"
+#include "statistics/statistics.h"
 
 
 #define LOG2(x)  (log(x) / 0.693147180559945)
@@ -5692,6 +5693,10 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 		bool		ref_is_outer;
 		List	   *removedlist;
 		ListCell   *cell;
+		Selectivity join_mcv_sel = 0.0;
+		RelOptInfo *con_rel;
+		RelOptInfo *ref_rel;
+		JoinMCVOpportunity *join_opp;
 
 		/*
 		 * This FK is not relevant unless it connects a baserel on one side of
@@ -5821,6 +5826,10 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 		 * knowledge that each referencing row will match exactly one row in
 		 * the referenced table.
 		 *
+		 * ENHANCED: First check if extended join MCV statistics exist for
+		 * this FK. If so, use the more accurate MCV-based selectivity instead
+		 * of the FK heuristic (1.0 / ref_tuples).
+		 *
 		 * XXX that's not true in the presence of nulls in the referencing
 		 * column(s), so in principle we should derate the estimate for those.
 		 * However (1) if there are any strict restriction clauses for the
@@ -5839,6 +5848,36 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 		 * work, it is uncommon in practice to have an FK referencing a parent
 		 * table.  So, at least for now, disregard inheritance here.
 		 */
+		con_rel = find_base_rel(root, fkinfo->con_relid);
+		ref_rel = find_base_rel(root, fkinfo->ref_relid);
+
+		/* Try to detect join MCV opportunity */
+		join_opp = find_join_mcv_opportunity(root,
+											 ref_is_outer ? ref_rel : con_rel,
+											 ref_is_outer ? con_rel : ref_rel,
+											 *restrictlist);
+		if (join_opp)
+		{
+			JoinMCVList *join_mcv;
+
+			join_mcv = statext_join_mcv_load(join_opp->target_rel,
+											 join_opp->target_joinkey,
+											 join_opp->other_rel,
+											 join_opp->other_joinkey,
+											 join_opp->filter_attnums);
+			if (join_mcv)
+			{
+				join_mcv_sel = join_mcv_clauselist_selectivity(join_mcv,
+															   join_opp->filter_values,
+															   join_opp->filter_attnums,
+															   join_opp->collation);
+				pfree(join_mcv);
+
+				ereport(DEBUG1,
+						(errmsg("  FK: extended join MCV found, selectivity = %.6f", join_mcv_sel)));
+			}
+		}
+
 		if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
 		{
 			/*
@@ -5852,11 +5891,26 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 			 * restriction clauses, which is rows / tuples; but we must guard
 			 * against tuples == 0.
 			 */
-			RelOptInfo *ref_rel = find_base_rel(root, fkinfo->ref_relid);
 			double		ref_tuples = Max(ref_rel->tuples, 1.0);
 
 			fkselec *= ref_rel->rows / ref_tuples;
 		}
+		else if (join_mcv_sel > 0)
+		{
+			/*
+			 * Apply extended join MCV selectivity.
+			 *
+			 * For IN clauses: join_mcv_sel = average selectivity per filter
+			 * constant For equality:   join_mcv_sel = selectivity for that
+			 * single constant Using the average ensures multiplying by
+			 * inner_rows recovers the correct total selectivity.
+			 */
+			fkselec *= join_mcv_sel;
+
+			ereport(DEBUG1,
+					(errmsg("  FK APPLIED (MCV): selectivity = %.6f, cumulative fkselec = %.6f",
+							join_mcv_sel, fkselec)));
+		}
 		else
 		{
 			/*
@@ -5864,7 +5918,6 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 			 * guard against tuples == 0.  Note we should use the raw table
 			 * tuple count, not any estimate of its filtered or joined size.
 			 */
-			RelOptInfo *ref_rel = find_base_rel(root, fkinfo->ref_relid);
 			double		ref_tuples = Max(ref_rel->tuples, 1.0);
 
 			fkselec *= 1.0 / ref_tuples;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 713ee5c10a2..2bccc1147b1 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4867,6 +4867,13 @@ stats_param:	ColId
 					$$->name = $1;
 					$$->expr = NULL;
 				}
+			| ColId indirection
+				{
+					/* Table-qualified column reference (e.g., tbl.col) */
+					$$ = makeNode(StatsElem);
+					$$->name = NULL;
+					$$->expr = (Node *) makeColumnRef($1, $2, @1, yyscanner);
+				}
 			| func_expr_windowless
 				{
 					$$ = makeNode(StatsElem);
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index b5f4c72459d..897836f7fc8 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -3155,7 +3155,10 @@ transformStatsStmt(Oid relid, CreateStatsStmt *stmt, const char *queryString)
 	ParseState *pstate;
 	ParseNamespaceItem *nsitem;
 	ListCell   *l;
-	Relation	rel;
+	Relation	rel = NULL;
+	bool		isjoin = !OidIsValid(relid);
+	Relation	left_rel = NULL;
+	Relation	right_rel = NULL;
 
 	/* Nothing to do if statement already transformed. */
 	if (stmt->transformed)
@@ -3165,18 +3168,80 @@ transformStatsStmt(Oid relid, CreateStatsStmt *stmt, const char *queryString)
 	pstate = make_parsestate(NULL);
 	pstate->p_sourcetext = queryString;
 
-	/*
-	 * Put the parent table into the rtable so that the expressions can refer
-	 * to its fields without qualification.  Caller is responsible for locking
-	 * relation, but we still need to open it.
-	 */
-	rel = relation_open(relid, NoLock);
-	nsitem = addRangeTableEntryForRelation(pstate, rel,
-										   AccessShareLock,
-										   NULL, false, true);
+	if (isjoin)
+	{
+		/*
+		 * Extract the two tables from the JOIN expression and add them to the
+		 * namespace.
+		 */
+		Node	   *fromNode = (Node *) linitial(stmt->relations);
+		JoinExpr   *join;
+		RangeVar   *lrel;
+		RangeVar   *rrel;
+		ParseNamespaceItem *left_nsitem;
+		ParseNamespaceItem *right_nsitem;
+
+		if (!IsA(fromNode, JoinExpr))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("extended join statistics require a JOIN clause")));
 
-	/* no to join list, yes to namespaces */
-	addNSItemToQuery(pstate, nsitem, false, true, true);
+		join = (JoinExpr *) fromNode;
+
+		/* Extract left and right RangeVars */
+		if (!IsA(join->larg, RangeVar) || !IsA(join->rarg, RangeVar))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("extended join statistics currently support only simple two-table joins")));
+
+		lrel = (RangeVar *) join->larg;
+		rrel = (RangeVar *) join->rarg;
+
+		/*
+		 * Open both relations and add them to the namespace. We need
+		 * ShareUpdateExclusiveLock for statistics creation (see comments in
+		 * CreateStatistics).
+		 */
+		left_rel = table_openrv(lrel, ShareUpdateExclusiveLock);
+		right_rel = table_openrv(rrel, ShareUpdateExclusiveLock);
+
+		/* Add left table to rtable and namespace */
+		left_nsitem = addRangeTableEntryForRelation(pstate, left_rel,
+													AccessShareLock,
+													lrel->alias, false, true);
+		addNSItemToQuery(pstate, left_nsitem, false, true, true);
+
+		/* Add right table to rtable and namespace */
+		right_nsitem = addRangeTableEntryForRelation(pstate, right_rel,
+													 AccessShareLock,
+													 rrel->alias, false, true);
+		addNSItemToQuery(pstate, right_nsitem, false, true, true);
+
+		/*
+		 * Transform the join quals so they become Var nodes. This is
+		 * necessary for extract_join_columns() to work.
+		 */
+		if (join->quals)
+		{
+			join->quals = transformExpr(pstate, join->quals, EXPR_KIND_JOIN_ON);
+			assign_expr_collations(pstate, join->quals);
+		}
+	}
+	else
+	{
+		/*
+		 * Put the parent table into the rtable so that the expressions can
+		 * refer to its fields without qualification.  Caller is responsible
+		 * for locking relation, but we still need to open it.
+		 */
+		rel = relation_open(relid, NoLock);
+		nsitem = addRangeTableEntryForRelation(pstate, rel,
+											   AccessShareLock,
+											   NULL, false, true);
+
+		/* no to join list, yes to namespaces */
+		addNSItemToQuery(pstate, nsitem, false, true, true);
+	}
 
 	/* take care of any expressions */
 	foreach(l, stmt->exprs)
@@ -3194,19 +3259,36 @@ transformStatsStmt(Oid relid, CreateStatsStmt *stmt, const char *queryString)
 		}
 	}
 
-	/*
-	 * Check that only the base rel is mentioned.  (This should be dead code
-	 * now that add_missing_from is history.)
-	 */
-	if (list_length(pstate->p_rtable) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-				 errmsg("statistics expressions can refer only to the table being referenced")));
+	if (!isjoin)
+	{
+		/*
+		 * Check that only the base rel is mentioned.  (This should be dead
+		 * code now that add_missing_from is history.)
+		 */
+		if (list_length(pstate->p_rtable) != 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+					 errmsg("statistics expressions can refer only to the table being referenced")));
 
-	free_parsestate(pstate);
+		/* Close relation */
+		table_close(rel, NoLock);
+	}
+	else
+	{
+		/*
+		 * For extended join stats, we expect exactly 2 tables in the rtable.
+		 */
+		if (list_length(pstate->p_rtable) != 2)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+					 errmsg("extended join statistics must reference exactly two tables")));
 
-	/* Close relation */
-	table_close(rel, NoLock);
+		/* Close relations */
+		table_close(left_rel, NoLock);
+		table_close(right_rel, NoLock);
+	}
+
+	free_parsestate(pstate);
 
 	/* Mark statement as successfully transformed */
 	stmt->transformed = true;
diff --git a/src/backend/statistics/Makefile b/src/backend/statistics/Makefile
index 7ff5938b027..9639e0b95f5 100644
--- a/src/backend/statistics/Makefile
+++ b/src/backend/statistics/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	dependencies.o \
 	extended_stats.o \
 	extended_stats_funcs.o \
+	join_mcv.o \
 	mcv.o \
 	mvdistinct.o \
 	relation_stats.o \
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 3895ed72ef7..8939f56cc67 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -70,15 +70,18 @@ typedef struct StatExtEntry
 	List	   *types;			/* 'char' list of enabled statistics kinds */
 	int			stattarget;		/* statistics target (-1 for default) */
 	List	   *exprs;			/* expressions */
+
+	/* join statistics fields (NULL/invalid for single-table stats) */
+	Oid			otherrel;		/* other table's OID (for join stats) */
+	int2vector *joinkeys;		/* join keys [target_joinkey, other_joinkey] */
 } StatExtEntry;
 
 
 static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
 static VacAttrStats **lookup_var_attr_stats(Bitmapset *attrs, List *exprs,
 											int nvacatts, VacAttrStats **vacatts);
-static void statext_store(Oid statOid, bool inh,
-						  MVNDistinct *ndistinct, MVDependencies *dependencies,
-						  MCVList *mcv, Datum exprs, VacAttrStats **stats);
+static void statext_store(Oid statOid, bool inh, MVNDistinct *ndistinct, MVDependencies *dependencies, MCVList *mcv, Datum exprs,
+						  VacAttrStats **stats, JoinMCVList * join_mcv);
 static int	statext_compute_stattarget(int stattarget,
 									   int nattrs, VacAttrStats **stats);
 
@@ -99,6 +102,33 @@ static StatsBuildData *make_build_data(Relation rel, StatExtEntry *stat,
 									   int numrows, HeapTuple *rows,
 									   VacAttrStats **stats, int stattarget);
 
+/*
+ * bms_to_int2vector
+ *		Convert a Bitmapset to an int2vector.
+ *
+ * Returns NULL if the Bitmapset is empty or NULL.
+ * The result is palloc'd in the current memory context.
+ */
+static int2vector *
+bms_to_int2vector(Bitmapset *bms)
+{
+	int			nmembers;
+	int16	   *attnums;
+	int			i = 0;
+	int			k = -1;
+
+	if (bms_is_empty(bms))
+		return NULL;
+
+	nmembers = bms_num_members(bms);
+	attnums = palloc(nmembers * sizeof(int16));
+
+	while ((k = bms_next_member(bms, k)) >= 0)
+		attnums[i++] = (int16) k;
+
+	return buildint2vector(attnums, nmembers);
+}
+
 
 /*
  * Compute requested extended stats, using the rows sampled for the plain
@@ -155,6 +185,7 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 		MVNDistinct *ndistinct = NULL;
 		MVDependencies *dependencies = NULL;
 		MCVList    *mcv = NULL;
+		JoinMCVList *join_mcv = NULL;
 		Datum		exprstats = (Datum) 0;
 		VacAttrStats **stats;
 		ListCell   *lc2;
@@ -164,10 +195,14 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 		/*
 		 * Check if we can build these stats based on the column analyzed. If
 		 * not, report this fact (except in autovacuum) and move on.
+		 *
+		 * Note: join mcv stats don't have columns in the primary table, so
+		 * lookup_var_attr_stats will return NULL. We handle this below.
 		 */
 		stats = lookup_var_attr_stats(stat->columns, stat->exprs,
 									  natts, vacattrstats);
-		if (!stats)
+		if (!stats && !(list_length(stat->types) == 1 &&
+						linitial_int(stat->types) == STATS_EXT_JOIN_MCV))
 		{
 			if (!AmAutoVacuumWorkerProcess())
 				ereport(WARNING,
@@ -180,21 +215,34 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 			continue;
 		}
 
-		/* compute statistics target for this statistics object */
-		stattarget = statext_compute_stattarget(stat->stattarget,
-												bms_num_members(stat->columns),
-												stats);
-
 		/*
-		 * Don't rebuild statistics objects with statistics target set to 0
-		 * (we just leave the existing values around, just like we do for
-		 * regular per-column statistics).
+		 * Join-mcv stats reuse MCVs already computed for the join column, so
+		 * no separate stattarget or StatsBuildData is needed.
 		 */
-		if (stattarget == 0)
-			continue;
+		if (list_length(stat->types) == 1 &&
+			linitial_int(stat->types) == STATS_EXT_JOIN_MCV)
+		{
+			stattarget = -1;
+			data = NULL;
+		}
+		else
+		{
+			/* compute statistics target for this statistics object */
+			stattarget = statext_compute_stattarget(stat->stattarget,
+													bms_num_members(stat->columns),
+													stats);
+
+			/*
+			 * Don't rebuild statistics objects with statistics target set to
+			 * 0 (we just leave the existing values around, just like we do
+			 * for regular per-column statistics).
+			 */
+			if (stattarget == 0)
+				continue;
 
-		/* evaluate expressions (if the statistics object has any) */
-		data = make_build_data(onerel, stat, numrows, rows, stats, stattarget);
+			/* evaluate expressions (if the statistics object has any) */
+			data = make_build_data(onerel, stat, numrows, rows, stats, stattarget);
+		}
 
 		/* compute statistic of each requested type */
 		foreach(lc2, stat->types)
@@ -223,11 +271,46 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 
 				exprstats = serialize_expr_stats(exprdata, nexprs);
 			}
+			else if (t == STATS_EXT_JOIN_MCV)
+			{
+				int2vector *filter_attnums;
+
+				if (!OidIsValid(stat->otherrel))
+				{
+					elog(WARNING, "skipping join mcv stat \"%s.%s\": missing otherrel",
+						 stat->schema, stat->name);
+					continue;
+				}
+
+				if (!stat->joinkeys || stat->joinkeys->dim1 != 2)
+				{
+					elog(WARNING, "skipping join mcv stat \"%s.%s\": invalid joinkeys",
+						 stat->schema, stat->name);
+					continue;
+				}
+
+				/* Convert columns Bitmapset to int2vector for filter attnums */
+				filter_attnums = bms_to_int2vector(stat->columns);
+
+				join_mcv = statext_join_mcv_build(stat->statOid,
+												  RelationGetRelid(onerel),
+												  stat->otherrel,
+												  stat->joinkeys,
+												  filter_attnums,
+												  numrows,
+												  rows,
+												  natts,
+												  vacattrstats);
+
+				if (filter_attnums)
+					pfree(filter_attnums);
+			}
 		}
 
 		/* store the statistics in the catalog */
 		statext_store(stat->statOid, inh,
-					  ndistinct, dependencies, mcv, exprstats, stats);
+					  ndistinct, dependencies, mcv, exprstats,
+					  stats, join_mcv);
 
 		/* for reporting progress */
 		pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_STATS_COMPUTED,
@@ -405,6 +488,10 @@ statext_is_kind_built(HeapTuple htup, char type)
 			attnum = Anum_pg_statistic_ext_data_stxdexpr;
 			break;
 
+		case STATS_EXT_JOIN_MCV:
+			attnum = Anum_pg_statistic_ext_data_stxdjoinmcv;
+			break;
+
 		default:
 			elog(ERROR, "unexpected statistics type requested: %d", type);
 	}
@@ -474,7 +561,8 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
 			Assert((enabled[i] == STATS_EXT_NDISTINCT) ||
 				   (enabled[i] == STATS_EXT_DEPENDENCIES) ||
 				   (enabled[i] == STATS_EXT_MCV) ||
-				   (enabled[i] == STATS_EXT_EXPRESSIONS));
+				   (enabled[i] == STATS_EXT_EXPRESSIONS) ||
+				   (enabled[i] == STATS_EXT_JOIN_MCV));
 			entry->types = lappend_int(entry->types, (int) enabled[i]);
 		}
 
@@ -507,6 +595,27 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
 
 		entry->exprs = exprs;
 
+		/*
+		 * Fetch join statistics fields (stxotherrel, stxjoinkeys). These are
+		 * NULL/invalid for single-table statistics.
+		 */
+		datum = SysCacheGetAttr(STATEXTOID, htup,
+								Anum_pg_statistic_ext_stxotherrel, &isnull);
+		entry->otherrel = isnull ? InvalidOid : DatumGetObjectId(datum);
+
+		datum = SysCacheGetAttr(STATEXTOID, htup,
+								Anum_pg_statistic_ext_stxjoinkeys, &isnull);
+		if (!isnull)
+		{
+			/* Copy the int2vector so it survives catalog cache invalidation */
+			int2vector *orig = (int2vector *) DatumGetPointer(datum);
+
+			entry->joinkeys = (int2vector *) palloc(VARSIZE(orig));
+			memcpy(entry->joinkeys, orig, VARSIZE(orig));
+		}
+		else
+			entry->joinkeys = NULL;
+
 		result = lappend(result, entry);
 	}
 
@@ -756,9 +865,8 @@ lookup_var_attr_stats(Bitmapset *attrs, List *exprs,
  *	tuple.
  */
 static void
-statext_store(Oid statOid, bool inh,
-			  MVNDistinct *ndistinct, MVDependencies *dependencies,
-			  MCVList *mcv, Datum exprs, VacAttrStats **stats)
+statext_store(Oid statOid, bool inh, MVNDistinct *ndistinct, MVDependencies *dependencies, MCVList *mcv, Datum exprs,
+			  VacAttrStats **stats, JoinMCVList * join_mcv)
 {
 	Relation	pg_stextdata;
 	HeapTuple	stup;
@@ -808,6 +916,13 @@ statext_store(Oid statOid, bool inh,
 		nulls[Anum_pg_statistic_ext_data_stxdexpr - 1] = false;
 		values[Anum_pg_statistic_ext_data_stxdexpr - 1] = exprs;
 	}
+	if (join_mcv != NULL)
+	{
+		bytea	   *data = statext_join_mcv_serialize(join_mcv);
+
+		nulls[Anum_pg_statistic_ext_data_stxdjoinmcv - 1] = (data == NULL);
+		values[Anum_pg_statistic_ext_data_stxdjoinmcv - 1] = PointerGetDatum(data);
+	}
 
 	/*
 	 * Delete the old tuple if it exists, and insert a new one. It's easier
@@ -1973,6 +2088,87 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	return sel;
 }
 
+/*
+ * statext_join_mcv_clauselist_selectivity
+ *		Estimate selectivity using extended statistics for joins.
+ *
+ * This function handles cases where clauses involve exactly two base relations:
+ * - Regular joins (varRelid == 0)
+ * - Parameterized paths (varRelid != 0, but clauses reference 2 relations)
+ *
+ * Returns selectivity if join stats were successfully applied, or 1.0 otherwise.
+ * 'estimatedclauses' is populated with the 0-based list position indexes of
+ * clauses whose selectivity was estimated here. The caller should skip these
+ * clauses to avoid double-counting.
+ */
+Selectivity
+statext_join_mcv_clauselist_selectivity(PlannerInfo *root,
+										List *clauses,
+										int varRelid,
+										Bitmapset **estimatedclauses)
+{
+	JoinMCVOpportunity *join_opp;
+
+	/* Try to detect join MCV opportunity from clauses */
+	join_opp = find_join_mcv_opportunity_in_clauses(root, clauses);
+
+	if (join_opp)
+	{
+		JoinMCVList *join_mcv;
+		Selectivity selec = 0.0;
+
+		join_mcv = statext_join_mcv_load(join_opp->target_rel,
+										 join_opp->target_joinkey,
+										 join_opp->other_rel,
+										 join_opp->other_joinkey,
+										 join_opp->filter_attnums);
+		if (join_mcv)
+		{
+			selec = join_mcv_clauselist_selectivity(join_mcv,
+													join_opp->filter_values,
+													join_opp->filter_attnums,
+													join_opp->collation);
+			pfree(join_mcv);
+		}
+
+		if (selec > 0 && selec < 1.0)
+		{
+			int			clause_idx;
+			ListCell   *lc;
+
+			/*
+			 * Mark the join and filter clauses as estimated so they won't be
+			 * double-counted in the per-clause loop.
+			 */
+			clause_idx = -1;
+			foreach(lc, clauses)
+			{
+				RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+				clause_idx++;
+
+				/* Mark join clauses */
+				if (list_member_ptr(join_opp->join_rinfos, rinfo))
+				{
+					*estimatedclauses = bms_add_member(*estimatedclauses, clause_idx);
+					continue;
+				}
+
+				/* Mark filter clauses */
+				if (list_member_ptr(join_opp->filter_rinfos, rinfo))
+				{
+					*estimatedclauses = bms_add_member(*estimatedclauses, clause_idx);
+				}
+			}
+
+			return selec;
+		}
+	}
+
+	/* No join stats applicable */
+	return 1.0;
+}
+
 /*
  * statext_clauselist_selectivity
  *		Estimate clauses using the best multi-column statistics.
diff --git a/src/backend/statistics/extended_stats_funcs.c b/src/backend/statistics/extended_stats_funcs.c
index 48fa2efee76..d59f03b54c1 100644
--- a/src/backend/statistics/extended_stats_funcs.c
+++ b/src/backend/statistics/extended_stats_funcs.c
@@ -370,6 +370,7 @@ extended_statistics_update(FunctionCallInfo fcinfo)
 	nulls[Anum_pg_statistic_ext_data_stxddependencies - 1] = true;
 	nulls[Anum_pg_statistic_ext_data_stxdmcv - 1] = true;
 	nulls[Anum_pg_statistic_ext_data_stxdexpr - 1] = true;
+	nulls[Anum_pg_statistic_ext_data_stxdjoinmcv - 1] = true;
 
 	/*
 	 * For each stats kind, deserialize the data at hand and perform a round
diff --git a/src/backend/statistics/join_mcv.c b/src/backend/statistics/join_mcv.c
new file mode 100644
index 00000000000..efca5cc4c48
--- /dev/null
+++ b/src/backend/statistics/join_mcv.c
@@ -0,0 +1,1493 @@
+/*-------------------------------------------------------------------------
+ *
+ * join_mcv.c
+ *	  POSTGRES join MCV lists
+ *
+ *
+ * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/statistics/join_mcv.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_statistic_ext.h"
+#include "catalog/pg_statistic_ext_data.h"
+#include "commands/defrem.h"
+#include "fmgr.h"
+#include "funcapi.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "statistics/extended_stats_internal.h"
+#include "statistics/statistics.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/datum.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "utils/fmgroids.h"
+#include "utils/typcache.h"
+
+/*
+ * statext_join_mcv_serialize
+ * 			Serialize a JoinMCVList into bytea format for storage
+ *
+ * The overall structure of the serialized representation looks like this:
+ *
+ * +---------+-------+----------+
+ * | header  | items | varlena  |
+ * +---------+-------+----------+
+ *
+ * The header contains magic number, type, and filter metadata (number of
+ * filters, type OIDs, typlens, typbyvaltypbyvals).  Items store the frequency values
+ * plus Datum arrays.  For each Datum, byval types are stored inline while
+ * non-byval types are stored as offsets into the varlena section, which
+ * contains the actual variable-length data.
+ *
+ * Unlike regular MCV lists which deduplicate values and use indexes into
+ * separate arrays, join MCV lists store all values directly in each item.
+ * During serialization, Datum pointers are converted to offsets; during
+ * deserialization, offsets are converted back to pointers.
+ */
+bytea *
+statext_join_mcv_serialize(JoinMCVList * mcvlist)
+{
+	int			i,
+				j;
+	Size		len;
+	bytea	   *result;
+	char	   *ptr;
+	char	   *data_ptr;
+	Size		header_len;
+	AttrNumber	nfilters;
+
+	if (!mcvlist || mcvlist->nitems == 0)
+		return NULL;
+
+	nfilters = mcvlist->ndimensions;
+
+	Assert(nfilters > 0 && nfilters <= STATS_MAX_DIMENSIONS);
+	Assert(mcvlist->nitems <= MAX_STATISTICS_TARGET);
+
+	/* Calculate header size (up to items array) */
+	header_len = offsetof(JoinMCVList, items);
+
+	/* Start with header size */
+	len = header_len;
+
+	/* Add space for item structs */
+	len += mcvlist->nitems * sizeof(JoinMCVItem);
+
+	/* Add space for isnull arrays (one per item) */
+	len += mcvlist->nitems * nfilters * sizeof(bool);
+
+	/* Add space for values arrays (one per item) */
+	len += mcvlist->nitems * nfilters * sizeof(Datum);
+
+	/* Add space for actual datum data (for non-byval types) */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		JoinMCVItem *item = &mcvlist->items[i];
+
+		for (j = 0; j < nfilters; j++)
+		{
+			if (!item->isnull[j])
+			{
+				bool		typbyval = get_typbyval(mcvlist->filter_types[j]);
+				int16		typlen = get_typlen(mcvlist->filter_types[j]);
+
+				if (!typbyval)
+					len += datumGetSize(item->values[j], false, typlen);
+			}
+		}
+	}
+
+	/* Allocate result */
+	result = (bytea *) palloc(len + VARHDRSZ);
+	SET_VARSIZE(result, len + VARHDRSZ);
+
+	/* Initialize pointers */
+	ptr = VARDATA(result);
+	data_ptr = ptr;
+
+	/* Copy header (fixed fields before items array) */
+	memcpy(data_ptr, mcvlist, header_len);
+	data_ptr += header_len;
+
+	/* Copy item structs (we'll fix pointers later) */
+	memcpy(data_ptr, mcvlist->items, mcvlist->nitems * sizeof(JoinMCVItem));
+	data_ptr += mcvlist->nitems * sizeof(JoinMCVItem);
+
+	/* Copy isnull arrays and set pointers */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		JoinMCVItem *out_item = &((JoinMCVList *) ptr)->items[i];
+		JoinMCVItem *in_item = &mcvlist->items[i];
+
+		/* Store offset to isnull array */
+		out_item->isnull = (bool *) (data_ptr - ptr);
+
+		/* Copy isnull array */
+		memcpy(data_ptr, in_item->isnull, nfilters * sizeof(bool));
+		data_ptr += nfilters * sizeof(bool);
+	}
+
+	/* Copy values arrays and set pointers */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		JoinMCVItem *out_item = &((JoinMCVList *) ptr)->items[i];
+		JoinMCVItem *in_item = &mcvlist->items[i];
+
+		/* Store offset to values array */
+		out_item->values = (Datum *) (data_ptr - ptr);
+
+		/* Copy values array (byval types) and prepare for non-byval */
+		memcpy(data_ptr, in_item->values, nfilters * sizeof(Datum));
+		data_ptr += nfilters * sizeof(Datum);
+	}
+
+	/* Now copy actual data for non-byval types and fix up pointers */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		JoinMCVItem *out_item = &((JoinMCVList *) ptr)->items[i];
+		JoinMCVItem *in_item = &mcvlist->items[i];
+		Datum	   *out_values = (Datum *) (ptr + (size_t) out_item->values);
+
+		for (j = 0; j < nfilters; j++)
+		{
+			if (!in_item->isnull[j])
+			{
+				bool		typbyval = get_typbyval(mcvlist->filter_types[j]);
+				int16		typlen = get_typlen(mcvlist->filter_types[j]);
+
+				if (!typbyval)
+				{
+					Size		datum_len = datumGetSize(in_item->values[j], false, typlen);
+					Size		offset = data_ptr - ptr;
+
+					/* Store offset in the values array */
+					out_values[j] = PointerGetDatum((char *) NULL + offset);
+
+					/* Copy the actual data */
+					memcpy(data_ptr, DatumGetPointer(in_item->values[j]), datum_len);
+					data_ptr += datum_len;
+				}
+			}
+		}
+	}
+
+	/* Verify we didn't write past the allocated size */
+	Assert(data_ptr <= ptr + len);
+
+	return result;
+}
+
+/*
+ * statext_join_mcv_deserialize
+ *		Deserialize a JoinMCVList from bytea format
+ */
+JoinMCVList *
+statext_join_mcv_deserialize(bytea *data)
+{
+	JoinMCVList *mcvlist;
+	Size		len;
+	char	   *base;
+	int			i,
+				j;
+	AttrNumber	nfilters;
+
+	if (!data)
+		return NULL;
+
+	len = VARSIZE_ANY_EXHDR(data);
+
+	/* Allocate and copy */
+	mcvlist = (JoinMCVList *) palloc(len);
+	base = VARDATA_ANY(data);
+	memcpy(mcvlist, base, len);
+
+	/* Verify magic number */
+	if (mcvlist->magic != STATS_JOIN_MCV_MAGIC)
+		elog(ERROR, "invalid magic number in cross-table MCV list");
+
+	nfilters = mcvlist->ndimensions;
+	Assert(nfilters > 0 && nfilters <= STATS_MAX_DIMENSIONS);
+
+	/*
+	 * Fix up pointers for isnull arrays, values arrays, and non-byval Datums.
+	 * These are stored as offsets from the start of the data.
+	 */
+	for (i = 0; i < mcvlist->nitems; i++)
+	{
+		JoinMCVItem *item = &mcvlist->items[i];
+
+		/* Fix isnull array pointer */
+		item->isnull = (bool *) ((char *) mcvlist + (size_t) item->isnull);
+
+		/* Fix values array pointer */
+		item->values = (Datum *) ((char *) mcvlist + (size_t) item->values);
+
+		/* Fix individual non-byval Datum pointers */
+		for (j = 0; j < nfilters; j++)
+		{
+			if (!item->isnull[j])
+			{
+				bool		typbyval = get_typbyval(mcvlist->filter_types[j]);
+
+				if (!typbyval)
+				{
+					Size		offset = (Size) DatumGetPointer(item->values[j]);
+
+					item->values[j] = PointerGetDatum((char *) mcvlist + offset);
+				}
+			}
+		}
+	}
+
+	return mcvlist;
+}
+
+/*
+ * statext_join_mcv_build
+ *		Build join MCV statistics from sampled rows
+ *
+ * This function builds a JoinMCVList by:
+ * 1. Extracting the referencing column values from the sampled rows
+ * 2. Looking up the corresponding filter column values from referenced table
+ * 3. Counting occurrences of each (referencing, filter) pair
+ * 4. Keeping the most common combinations
+ *
+ * The sampled rows come from ANALYZE of the referencing table.
+ */
+JoinMCVList *
+statext_join_mcv_build(Oid stxoid,
+					   Oid primary_relid,
+					   Oid orther_relid,
+					   int2vector *joinkeys,
+					   int2vector *filter_attnums,
+					   int numrows,
+					   HeapTuple *rows,
+					   int natts,
+					   VacAttrStats **vacattrstats)
+{
+	JoinMCVList *mcvlist;
+	Relation	other_rel;
+	TupleDesc	other_desc;
+	AttrNumber	primary_joinkey;
+	AttrNumber	other_joinkey;
+	AttrNumber	nfilters;
+	int			i;
+	Datum	   *primary_joinkey_mcv_values = NULL;
+	float4	   *primary_joinkey_mcv_freqs = NULL;
+	int			num_primary_joinkey_mcvs = 0;
+	Snapshot	snapshot;
+	bool		pushed_snapshot = false;
+	double		total_freq = 0.0;
+	VacAttrStats *primary_joinkey_stats = NULL;
+	int			mcv_slot_idx = -1;
+
+	/* Skip invalid stats */
+	if (!OidIsValid(primary_relid) || !OidIsValid(orther_relid))
+	{
+		elog(DEBUG1, "statext_join_mcv_build: invalid relation OIDs (primary=%u, other=%u)",
+			 primary_relid, orther_relid);
+		return NULL;
+	}
+
+	/*
+	 * Extract join keys TODO: currently only supports single equality qual
+	 */
+	Assert(joinkeys->dim1 == 2);
+	primary_joinkey = joinkeys->values[0];
+	other_joinkey = joinkeys->values[1];
+
+	/*
+	 * Extract filter attributes TODO: currently filter attributes are only
+	 * from the other_rel
+	 */
+	nfilters = filter_attnums->dim1;
+	Assert(nfilters > 0 && nfilters <= STATS_MAX_DIMENSIONS);
+
+	elog(DEBUG1, "statext_join_mcv_build: stxoid=%u, target_rel=%u, other_rel=%u, target_joinkey=%d, other_joinkey=%d, ndimensions=%d",
+		 stxoid, primary_relid, orther_relid, primary_joinkey, other_joinkey, nfilters);
+
+	/*
+	 * We reuse the already-computed MCV statistics for the join key column of
+	 * the primary relation. This gives us ~100 most common values with their
+	 * frequencies already calculated by the current ANALYZE.
+	 *
+	 * We get the MCV data from the in-memory VacAttrStats structure, not from
+	 * pg_statistic, so this works in a single ANALYZE pass.
+	 *
+	 * FIXME: This approach works well for FK joins (many-to-one) where MCV in
+	 * primary table ≈ MCV after join. For general joins (M:N), we should
+	 * join the sample rows with other_rel and find MCVs in the join result,
+	 * not assume primary table MCVs remain representative.
+	 */
+	for (i = 0; i < natts; i++)
+	{
+		if (vacattrstats[i]->tupattnum == primary_joinkey)
+		{
+			primary_joinkey_stats = vacattrstats[i];
+			break;
+		}
+	}
+
+	if (!primary_joinkey_stats)
+	{
+		elog(DEBUG1, "statext_join_mcv_build: VacAttrStats not found for primary joinkey attr=%d",
+			 primary_joinkey);
+		return NULL;
+	}
+
+	/* Find the MCV slot in the stats */
+	for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+	{
+		if (primary_joinkey_stats->stakind[i] == STATISTIC_KIND_MCV)
+		{
+			mcv_slot_idx = i;
+			break;
+		}
+	}
+
+	if (mcv_slot_idx < 0)
+	{
+		elog(DEBUG1, "statext_join_mcv_build: No MCV statistics for primary joinkey attr=%d",
+			 primary_joinkey);
+		return NULL;
+	}
+
+	primary_joinkey_mcv_values = primary_joinkey_stats->stavalues[mcv_slot_idx];
+	primary_joinkey_mcv_freqs = primary_joinkey_stats->stanumbers[mcv_slot_idx];
+	num_primary_joinkey_mcvs = primary_joinkey_stats->numvalues[mcv_slot_idx];
+
+	if (num_primary_joinkey_mcvs <= 0)
+		return NULL;
+
+	elog(DEBUG1, "statext_join_mcv_build: Found %d MCV entries, building join mcv stats",
+		 num_primary_joinkey_mcvs);
+
+	/* Open the other table */
+	other_rel = table_open(orther_relid, AccessShareLock);
+	other_desc = RelationGetDescr(other_rel);
+
+	/* Ensure we have an active snapshot */
+	if (!ActiveSnapshotSet())
+	{
+		PushActiveSnapshot(GetTransactionSnapshot());
+		pushed_snapshot = true;
+	}
+	snapshot = GetActiveSnapshot();
+
+	/* Build the join mcv list */
+	mcvlist = (JoinMCVList *) palloc0(
+									  offsetof(JoinMCVList, items) +
+									  num_primary_joinkey_mcvs * sizeof(JoinMCVItem));
+
+	mcvlist->magic = STATS_JOIN_MCV_MAGIC;
+	mcvlist->type = STATS_JOIN_MCV_TYPE_BASIC;
+	mcvlist->nitems = num_primary_joinkey_mcvs;
+	mcvlist->ndimensions = nfilters;
+
+	/* Get filter attribute numbers and types from the other table */
+	for (i = 0; i < nfilters; i++)
+	{
+		mcvlist->filter_attnums[i] = filter_attnums->values[i];
+		mcvlist->filter_types[i] = TupleDescAttr(other_desc,
+												 filter_attnums->values[i] - 1)->atttypid;
+	}
+
+	/*
+	 * For each MCV value of the join key column, look up the corresponding
+	 * filter value in the referenced table.
+	 */
+	for (i = 0; i < num_primary_joinkey_mcvs; i++)
+	{
+		Datum		joinkey_value = primary_joinkey_mcv_values[i];
+		float4		frequency = primary_joinkey_mcv_freqs[i];
+		ScanKeyData scankey;
+		TableScanDesc scan;
+		HeapTuple	orthertuple;
+		Oid			eq_opr;
+		Oid			eq_func;
+		Oid			primary_type;
+		int			j;
+
+		primary_type = get_atttype(primary_relid, primary_joinkey);
+
+		/* Get the equality operator for lookups */
+		eq_opr = get_opfamily_member(get_opclass_family(
+														GetDefaultOpClass(primary_type, BTREE_AM_OID)),
+									 primary_type, primary_type,
+									 BTEqualStrategyNumber);
+		eq_func = get_opcode(eq_opr);
+
+		ScanKeyInit(&scankey,
+					other_joinkey,
+					BTEqualStrategyNumber,
+					eq_func,
+					joinkey_value);
+
+		/* Set collation for the scan key */
+		scankey.sk_collation = TupleDescAttr(other_desc, other_joinkey - 1)->attcollation;
+
+		/* TODO: table_beginscan does sequential scan - use index if available */
+		scan = table_beginscan(other_rel, snapshot, 1, &scankey);
+		orthertuple = heap_getnext(scan, ForwardScanDirection);
+
+		/* Allocate arrays for all filter columns */
+		mcvlist->items[i].frequency = frequency;
+		mcvlist->items[i].isnull = (bool *) palloc(nfilters * sizeof(bool));
+		mcvlist->items[i].values = (Datum *) palloc(nfilters * sizeof(Datum));
+
+		if (HeapTupleIsValid(orthertuple))
+		{
+			/* Extract all filter column values */
+			for (j = 0; j < nfilters; j++)
+			{
+				AttrNumber	filter_attr = filter_attnums->values[j];
+				Datum		filter_value;
+				bool		filter_isnull;
+
+				filter_value = heap_getattr(orthertuple, filter_attr,
+											other_desc, &filter_isnull);
+
+				/* Copy the value if not null */
+				if (!filter_isnull)
+					filter_value = datumCopy(filter_value,
+											 TupleDescAttr(other_desc, filter_attr - 1)->attbyval,
+											 TupleDescAttr(other_desc, filter_attr - 1)->attlen);
+
+				mcvlist->items[i].isnull[j] = filter_isnull;
+				mcvlist->items[i].values[j] = filter_value;
+			}
+		}
+		else
+		{
+			/* No matching row found - mark all filter values as NULL */
+			for (j = 0; j < nfilters; j++)
+			{
+				mcvlist->items[i].isnull[j] = true;
+				mcvlist->items[i].values[j] = (Datum) 0;
+			}
+		}
+
+		table_endscan(scan);
+
+		total_freq += frequency;
+	}
+
+	/* Clean up */
+	if (pushed_snapshot)
+		PopActiveSnapshot();
+
+	table_close(other_rel, AccessShareLock);
+
+	elog(DEBUG1, "statext_join_mcv_build: Built join MCV with %d items, ndimensions=%d, total_freq=%.3f",
+		 num_primary_joinkey_mcvs, nfilters, total_freq);
+
+	return mcvlist;
+}
+
+/*
+ * statext_join_mcv_load
+ *		Look up join MCV statistics from the catalog
+ *
+ * Uses the index on (stxrelid, stxotherrel) for lookup, then post-filters
+ * on stxjoinkeys and filter_attnums since int2vector can't be indexed.
+ *
+ * Returns deserialized JoinMCVList if found, NULL otherwise.
+ * The caller is responsible for freeing the returned structure.
+ */
+JoinMCVList *
+statext_join_mcv_load(Oid relid,
+					  AttrNumber rel_joinkey_attnum,
+					  Oid other_relid,
+					  AttrNumber otherrel_joinkey_attnum,
+					  List *filter_attnums)
+{
+	Relation	statext_rel;
+	Relation	statext_data_rel;
+	SysScanDesc scan;
+	ScanKeyData keys[2];
+	HeapTuple	htup;
+	JoinMCVList *mcvlist = NULL;
+	int16		expected_joinkeys[2];
+
+	/* Build expected joinkeys for comparison */
+	expected_joinkeys[0] = rel_joinkey_attnum;
+	expected_joinkeys[1] = otherrel_joinkey_attnum;
+
+	/* Open both catalog relations */
+	statext_rel = table_open(StatisticExtRelationId, AccessShareLock);
+	statext_data_rel = table_open(StatisticExtDataRelationId, AccessShareLock);
+
+	/*
+	 * Use the index on (stxrelid, stxotherrel) to find statistics objects for
+	 * this table pair.
+	 */
+	ScanKeyInit(&keys[0],
+				Anum_pg_statistic_ext_stxrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relid));
+
+	ScanKeyInit(&keys[1],
+				Anum_pg_statistic_ext_stxotherrel,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(other_relid));
+
+	scan = systable_beginscan(statext_rel, StatisticExtOtherrelIndexId, true,
+							  NULL, 2, keys);
+
+	/*
+	 * Iterate through matching rows (typically 1-2 for a given table pair).
+	 * Post-filter on stxjoinkeys and filter_attnums.
+	 */
+	while (HeapTupleIsValid(htup = systable_getnext(scan)))
+	{
+		Form_pg_statistic_ext st = (Form_pg_statistic_ext) GETSTRUCT(htup);
+		Datum		datum;
+		bool		isnull;
+		int2vector *joinkeys;
+		int2vector *stxkeys;
+		bool		match = true;
+		ArrayType  *arr;
+		char	   *kinds;
+		int			nkinds;
+		bool		has_join_mcv = false;
+		HeapTuple	data_htup;
+		int			i,
+					j;
+
+		/* Skip if not join MCV type */
+		datum = SysCacheGetAttrNotNull(STATEXTOID, htup, Anum_pg_statistic_ext_stxkind);
+		arr = DatumGetArrayTypeP(datum);
+		kinds = (char *) ARR_DATA_PTR(arr);
+		nkinds = ArrayGetNItems(ARR_NDIM(arr), ARR_DIMS(arr));
+
+		for (i = 0; i < nkinds; i++)
+		{
+			if (kinds[i] == STATS_EXT_JOIN_MCV)
+			{
+				has_join_mcv = true;
+				break;
+			}
+		}
+
+		if (!has_join_mcv)
+			continue;
+
+		/* Check stxjoinkeys matches */
+		datum = SysCacheGetAttr(STATEXTOID, htup, Anum_pg_statistic_ext_stxjoinkeys, &isnull);
+		if (isnull)
+			continue;
+
+		joinkeys = (int2vector *) DatumGetPointer(datum);
+		if (joinkeys->dim1 != 2 ||
+			joinkeys->values[0] != expected_joinkeys[0] ||
+			joinkeys->values[1] != expected_joinkeys[1])
+			continue;
+
+		/* Check filter columns match (stxkeys) */
+		datum = SysCacheGetAttrNotNull(STATEXTOID, htup, Anum_pg_statistic_ext_stxkeys);
+		stxkeys = (int2vector *) DatumGetPointer(datum);
+
+		if (filter_attnums != NIL)
+		{
+			/*
+			 * Check if the stat contains ALL the queried filter columns. For
+			 * marginalization, we allow the stat to have more columns than
+			 * the query (subset matching), but the query columns must all be
+			 * present in the stat.
+			 *
+			 * Example: Query filters on phonetic_code (col 3), stat has
+			 * (keyword, phonetic_code) = {2, 3} -> match!
+			 */
+			ListCell   *lc;
+
+			foreach(lc, filter_attnums)
+			{
+				AttrNumber	attnum = lfirst_int(lc);
+				bool		found = false;
+
+				for (j = 0; j < stxkeys->dim1; j++)
+				{
+					if (attnum == stxkeys->values[j])
+					{
+						found = true;
+						break;
+					}
+				}
+				if (!found)
+				{
+					match = false;
+					break;
+				}
+			}
+
+			if (!match)
+				continue;
+		}
+
+		/*
+		 * Found a matching stats object! Now load the join MCV data.
+		 */
+		data_htup = SearchSysCache2(STATEXTDATASTXOID,
+									ObjectIdGetDatum(st->oid),
+									BoolGetDatum(false));
+
+		if (HeapTupleIsValid(data_htup))
+		{
+			datum = SysCacheGetAttr(STATEXTDATASTXOID, data_htup,
+									Anum_pg_statistic_ext_data_stxdjoinmcv,
+									&isnull);
+
+			if (!isnull)
+			{
+				/* Deserialize the MCV list */
+				mcvlist = statext_join_mcv_deserialize(DatumGetByteaP(datum));
+				ReleaseSysCache(data_htup);
+				break;			/* Found it! */
+			}
+
+			ReleaseSysCache(data_htup);
+		}
+	}
+
+	systable_endscan(scan);
+	table_close(statext_data_rel, AccessShareLock);
+	table_close(statext_rel, AccessShareLock);
+
+	return mcvlist;
+}
+
+/*
+ * join_mcv_clauselist_selectivity
+ *		Apply join MCV statistics to estimate selectivity
+ *
+ * Given a JoinMCVList and filter values, compute the selectivity by
+ * summing the frequencies of MCV items that match the filter criteria.
+ *
+ * For single-column queries on multi-column stats, we compute the
+ * marginal distribution by summing frequencies across non-queried
+ * dimensions.
+ *
+ * For multi-column queries, we match ALL queried columns with their
+ * corresponding dimensions and find MCV items where all dimensions match.
+ */
+Selectivity
+join_mcv_clauselist_selectivity(JoinMCVList * mcvlist,
+								List *filter_values,
+								List *filter_attnums,
+								Oid collation)
+{
+	int			item_idx;
+	Selectivity total_sel;
+	ListCell   *lc;
+	ListCell   *lc_attnum;
+	int			nqcols;
+	int			qcol;
+	int		   *qcol_dims;
+	FmgrInfo   *eq_funcs;
+	FunctionCallInfo *fcinfo_arr;
+
+	if (!mcvlist || mcvlist->nitems == 0 || filter_values == NIL || filter_attnums == NIL)
+		return 0.0;
+
+	/*
+	 * Map queried columns to stat dimensions.
+	 *
+	 * For single-column queries on multi-column stats, we compute the
+	 * marginal distribution by summing frequencies across non-queried
+	 * dimensions.
+	 *
+	 * For multi-column queries, we match ALL queried columns with their
+	 * corresponding dimensions and find MCV items where all dimensions match.
+	 */
+	nqcols = list_length(filter_attnums);
+
+	if (nqcols > STATS_MAX_DIMENSIONS)
+	{
+		elog(DEBUG1, "join_mcv_clauselist_selectivity: too many query columns: %d", nqcols);
+		return 0.0;
+	}
+	qcol_dims = (int *) palloc(nqcols * sizeof(int));
+	qcol = 0;
+
+	/* Map each queried column to its dimension in the stat */
+	foreach(lc_attnum, filter_attnums)
+	{
+		AttrNumber	qattnum = lfirst_int(lc_attnum);
+		bool		found = false;
+
+		for (int d = 0; d < mcvlist->ndimensions; d++)
+		{
+			if (mcvlist->filter_attnums[d] == qattnum)
+			{
+				qcol_dims[qcol] = d;
+				found = true;
+				break;
+			}
+		}
+
+		if (!found)
+		{
+			elog(DEBUG1, "join_mcv_clauselist_selectivity: query attnum %d not found in stat", qattnum);
+			return 0.0;
+		}
+		qcol++;
+	}
+
+	elog(DEBUG1, "join_mcv_clauselist_selectivity: matched %d query columns to stat dimensions (ndimensions=%d)",
+		 nqcols, mcvlist->ndimensions);
+
+	/* Setup equality functions for each queried dimension */
+	eq_funcs = (FmgrInfo *) palloc(nqcols * sizeof(FmgrInfo));
+	fcinfo_arr = (FunctionCallInfo *) palloc(nqcols * sizeof(FunctionCallInfo));
+
+	for (qcol = 0; qcol < nqcols; qcol++)
+	{
+		int			dim = qcol_dims[qcol];
+		Oid			qtype = mcvlist->filter_types[dim];
+		TypeCacheEntry *typentry;
+		Oid			eq_func_oid;
+
+		typentry = lookup_type_cache(qtype, TYPECACHE_EQ_OPR);
+		if (!OidIsValid(typentry->eq_opr))
+			return 0.0;
+
+		eq_func_oid = get_opcode(typentry->eq_opr);
+		if (!OidIsValid(eq_func_oid))
+			return 0.0;
+
+		fmgr_info(eq_func_oid, &eq_funcs[qcol]);
+
+		/* Allocate fcinfo for this dimension */
+		fcinfo_arr[qcol] = (FunctionCallInfo) palloc(SizeForFunctionCallInfo(2));
+		InitFunctionCallInfoData(*fcinfo_arr[qcol], &eq_funcs[qcol], 2, collation, NULL, NULL);
+	}
+
+	total_sel = 0.0;
+
+	elog(DEBUG1, "join_mcv_clauselist_selectivity: searching %d filter values in %d MCV items",
+		 list_length(filter_values), mcvlist->nitems);
+
+	/* For single-column queries: iterate filter values */
+	if (nqcols == 1)
+	{
+		int			dim = qcol_dims[0];
+		bool	   *matched_items;
+
+		/*
+		 * Track which MCV items have been matched to avoid double-counting
+		 * when multiple filter values match the same MCV item.
+		 */
+		matched_items = (bool *) palloc0(mcvlist->nitems * sizeof(bool));
+
+		foreach(lc, filter_values)
+		{
+			Datum		filter_value = PointerGetDatum(lfirst(lc));
+
+			fcinfo_arr[0]->args[1].value = filter_value;
+			fcinfo_arr[0]->args[1].isnull = false;
+
+			for (item_idx = 0; item_idx < mcvlist->nitems; item_idx++)
+			{
+				JoinMCVItem *item;
+				Datum		fresult;
+
+				if (matched_items[item_idx])
+					continue;
+
+				item = &mcvlist->items[item_idx];
+
+				if (item->isnull[dim])
+					continue;
+
+				fcinfo_arr[0]->args[0].value = item->values[dim];
+				fcinfo_arr[0]->args[0].isnull = false;
+				fcinfo_arr[0]->isnull = false;
+
+				fresult = FunctionCallInvoke(fcinfo_arr[0]);
+
+				if (!fcinfo_arr[0]->isnull && DatumGetBool(fresult))
+				{
+					elog(DEBUG1, "  MATCH at item[%d] dim[%d], frequency=%.6f", item_idx, dim, item->frequency);
+					total_sel += item->frequency;
+					matched_items[item_idx] = true;
+
+					if (mcvlist->ndimensions == 1)
+						break;
+				}
+			}
+		}
+	}
+	/* For multi-column queries: exact match on all dimensions */
+	else
+	{
+		Datum	   *query_values;
+
+		/*
+		 * TODO: Support IN clauses on multiple columns. Currently we only
+		 * support exact equality (one value per column) in multi-column
+		 * filters.
+		 */
+		if (list_length(filter_values) != nqcols)
+		{
+			elog(DEBUG1, "join MCV selectivity: multi-column IN clauses not yet supported (expected %d values, got %d)",
+				 nqcols, list_length(filter_values));
+			return 0.0;
+		}
+
+		/* Extract query values for each dimension */
+		query_values = (Datum *) palloc(nqcols * sizeof(Datum));
+		qcol = 0;
+		foreach(lc, filter_values)
+		{
+			/* Each element should be a single-element list for equality */
+			List	   *val_list = (List *) lfirst(lc);
+
+			if (list_length(val_list) != 1)
+			{
+				elog(DEBUG1, "join MCV selectivity: multi-value filters not yet supported for multi-column filters");
+				return 0.0;
+			}
+			query_values[qcol] = PointerGetDatum(linitial(val_list));
+			qcol++;
+		}
+
+		/* Find MCV items that match on ALL queried dimensions */
+		for (item_idx = 0; item_idx < mcvlist->nitems; item_idx++)
+		{
+			JoinMCVItem *item = &mcvlist->items[item_idx];
+			bool		all_match = true;
+
+			/* Check each queried dimension */
+			for (qcol = 0; qcol < nqcols; qcol++)
+			{
+				int			dim = qcol_dims[qcol];
+				Datum		fresult;
+
+				if (item->isnull[dim])
+				{
+					all_match = false;
+					break;
+				}
+
+				fcinfo_arr[qcol]->args[0].value = item->values[dim];
+				fcinfo_arr[qcol]->args[0].isnull = false;
+				fcinfo_arr[qcol]->args[1].value = query_values[qcol];
+				fcinfo_arr[qcol]->args[1].isnull = false;
+				fcinfo_arr[qcol]->isnull = false;
+
+				fresult = FunctionCallInvoke(fcinfo_arr[qcol]);
+
+				if (fcinfo_arr[qcol]->isnull || !DatumGetBool(fresult))
+				{
+					all_match = false;
+					break;
+				}
+			}
+
+			if (all_match)
+			{
+				elog(DEBUG1, "  MATCH at item[%d] (all %d dimensions), frequency=%.6f", item_idx, nqcols, item->frequency);
+				total_sel += item->frequency;
+			}
+		}
+	}
+
+	elog(DEBUG1, "join_mcv_clauselist_selectivity: total_sel=%.6f", total_sel);
+
+	/*
+	 * For IN clauses with multiple values, return PER-VALUE selectivity.
+	 * PostgreSQL's join size formula multiplies by inner_rows (number of
+	 * values in IN list), so returning per-value selectivity gives the
+	 * correct result.
+	 *
+	 * Currently only single-column queries support multiple filter values;
+	 * multi-column queries bails out early. When multi-column IN support is
+	 * added, this adjustment will apply there too.
+	 */
+	if (nqcols == 1 && list_length(filter_values) > 1)
+	{
+		int			num_values = list_length(filter_values);
+
+		elog(DEBUG1, "join_mcv_clauselist_selectivity: IN clause with %d values, returning per-value: %.6f / %d = %.6f",
+			 num_values, total_sel, num_values, total_sel / num_values);
+
+		total_sel /= num_values;
+	}
+
+	return total_sel;
+}
+
+/*
+ * extract_filter_info
+ *		Extract filter column and constant value(s) from a filter clause
+ *
+ * Handles two types of filter clauses:
+ * 1. OpExpr: col = constant
+ * 2. ScalarArrayOpExpr: col IN (const1, const2, ...)
+ *
+ * Returns true if a valid filter pattern is found, false otherwise.
+ * On success, sets *filter_var, *filter_values (list of Datums),
+ * *filter_type, *collation, and *is_in_clause.
+ */
+static bool
+extract_filter_info(Node *clause,
+					Index expected_relid,
+					Var **filter_var,
+					List **filter_values,
+					Oid *filter_type,
+					Oid *collation,
+					bool *is_in_clause)
+{
+	*filter_var = NULL;
+	*filter_values = NIL;
+	*is_in_clause = false;
+
+	/* Case 1: OpExpr - simple equality (col = const) */
+	if (IsA(clause, OpExpr))
+	{
+		OpExpr	   *opexpr = (OpExpr *) clause;
+		Node	   *var_node = NULL;
+		Const	   *const_node = NULL;
+		bool		expronleft;
+
+		if (list_length(opexpr->args) != 2)
+			return false;
+
+		/* Check for pattern: Var = Const or Const = Var */
+		if (!examine_opclause_args(opexpr->args, &var_node, &const_node, &expronleft))
+			return false;
+
+		if (!var_node || !const_node || !IsA(var_node, Var))
+			return false;
+
+		*filter_var = (Var *) var_node;
+
+		/* Verify the Var is from the expected relation */
+		if ((*filter_var)->varno != expected_relid)
+			return false;
+
+		/* Create single-element list - store Datum as pointer */
+		*filter_values = list_make1(DatumGetPointer(const_node->constvalue));
+		*filter_type = const_node->consttype;
+		*collation = (*filter_var)->varcollid;
+		*is_in_clause = false;
+
+		return true;
+	}
+
+	/* Case 2: ScalarArrayOpExpr - IN clause (col IN (...)) */
+	else if (IsA(clause, ScalarArrayOpExpr))
+	{
+		ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+		Node	   *scalar_node;
+		Node	   *array_node;
+		Const	   *array_const;
+		ArrayType  *arr;
+		int			nitems;
+		Datum	   *items;
+		bool	   *nulls;
+		int			i;
+		Oid			elmtype;
+		int16		elmlen;
+		bool		elmbyval;
+		char		elmalign;
+
+		/* Only support ANY (IN), not ALL */
+		if (!saop->useOr)
+			return false;
+
+		if (list_length(saop->args) != 2)
+			return false;
+
+		scalar_node = (Node *) linitial(saop->args);
+		array_node = (Node *) lsecond(saop->args);
+
+		/* Strip any RelabelType nodes (e.g., varchar cast to text) */
+		scalar_node = strip_implicit_coercions(scalar_node);
+
+		/* Scalar must be a Var after stripping coercions */
+		if (!IsA(scalar_node, Var))
+			return false;
+
+		*filter_var = (Var *) scalar_node;
+
+		/* Verify the Var is from the expected relation */
+		if ((*filter_var)->varno != expected_relid)
+			return false;
+
+		/* Array must be a Const for us to extract values */
+		if (!IsA(array_node, Const))
+			return false;
+
+		array_const = (Const *) array_node;
+
+		/* Can't handle NULL arrays */
+		if (array_const->constisnull)
+			return false;
+
+		/* Deconstruct the array */
+		arr = DatumGetArrayTypeP(array_const->constvalue);
+		elmtype = ARR_ELEMTYPE(arr);
+
+		/* Get type info for deconstruction */
+		get_typlenbyvalalign(elmtype, &elmlen, &elmbyval, &elmalign);
+
+		deconstruct_array(arr, elmtype, elmlen, elmbyval, elmalign,
+						  &items, &nulls, &nitems);
+
+		/* Build list of non-NULL Datums */
+		*filter_values = NIL;
+		for (i = 0; i < nitems; i++)
+		{
+			if (!nulls[i])
+			{
+				/*
+				 * Store Datum as pointer - safe since Datum and pointer are
+				 * same size
+				 */
+				*filter_values = lappend(*filter_values, DatumGetPointer(items[i]));
+			}
+		}
+
+		/* If all values were NULL, we can't use this */
+		if (*filter_values == NIL)
+			return false;
+
+		*filter_type = elmtype;
+		*collation = saop->inputcollid;
+		*is_in_clause = true;
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * find_join_mcv_opportunity
+ *		Detect if a join+filter combination can use join MCV statistics.
+ *
+ * Looks for two patterns:
+ *
+ * Pattern 1: outer_rel (referencing) JOIN inner_rel (referenced)
+ *            WHERE inner_rel.col = constant [or IN (...)]
+ * Pattern 2: outer_rel (referenced) JOIN inner_rel (referencing)
+ *            WHERE outer_rel.col = constant [or IN (...)]
+ *
+ * Returns an allocated JoinMCVOpportunity if pattern is found, NULL otherwise.
+ * The caller is responsible for freeing the returned structure.
+ */
+JoinMCVOpportunity *
+find_join_mcv_opportunity(PlannerInfo *root,
+						  RelOptInfo *outer_rel,
+						  RelOptInfo *inner_rel,
+						  List *restrictlist)
+{
+	ListCell   *lc;
+	Var		   *join_var_outer = NULL;
+	Var		   *join_var_inner = NULL;
+	RestrictInfo *join_rinfo = NULL;
+	RelOptInfo *filtered_rel;
+	Var		   *filtered_var;
+	Var		   *target_var;
+	RangeTblEntry *filtered_rte;
+	RangeTblEntry *target_rte;
+	List	   *filter_attnums_list = NIL;
+	List	   *filter_values_list = NIL;
+	List	   *filter_types_list = NIL;
+	List	   *filter_rinfos_list = NIL;
+	Oid			collation = InvalidOid;
+	bool		all_filters_valid = true;
+	JoinMCVOpportunity *join_opp;
+
+	/* Find join clause: outer_rel.col = inner_rel.col */
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo;
+		OpExpr	   *opexpr;
+		Var		   *left_var;
+		Var		   *right_var;
+		Node	   *node = lfirst(lc);
+
+		/*
+		 * Skip if not a RestrictInfo. During index path costing, restrictlist
+		 * can contain raw expression nodes (OpExpr, ScalarArrayOpExpr, etc.)
+		 * from index predicates, not just RestrictInfo wrappers.
+		 */
+		if (!IsA(node, RestrictInfo))
+			continue;
+
+		rinfo = (RestrictInfo *) node;
+		if (!IsA(rinfo->clause, OpExpr))
+			continue;
+
+		opexpr = (OpExpr *) rinfo->clause;
+
+		if (list_length(opexpr->args) != 2)
+			continue;
+
+		/* FIXME: Extract left and right arguments smarter */
+		left_var = (Var *) linitial(opexpr->args);
+		right_var = (Var *) lsecond(opexpr->args);
+
+		/* Check if this matches our join pattern */
+		if (IsA(left_var, Var) && IsA(right_var, Var))
+		{
+			if (bms_is_member(left_var->varno, outer_rel->relids) &&
+				bms_is_member(right_var->varno, inner_rel->relids))
+			{
+				join_var_outer = left_var;
+				join_var_inner = right_var;
+				join_rinfo = rinfo;
+				break;
+			}
+			else if (bms_is_member(right_var->varno, outer_rel->relids) &&
+					 bms_is_member(left_var->varno, inner_rel->relids))
+			{
+				join_var_outer = right_var;
+				join_var_inner = left_var;
+				join_rinfo = rinfo;
+				break;
+			}
+		}
+	}
+
+	if (!join_var_outer || !join_var_inner || !join_rinfo)
+		return NULL;			/* No join clause found */
+
+	/*
+	 * Find filter clauses: other_rel.col = constant [or IN (...)]
+	 *
+	 * Determine which relation has filter conditions. We look for a base
+	 * relation (singleton) with baserestrictinfo clauses.
+	 */
+	if (bms_membership(inner_rel->relids) == BMS_SINGLETON &&
+		inner_rel->baserestrictinfo != NIL)
+	{
+		/* Pattern 1: inner has filters, outer is target */
+		filtered_rel = inner_rel;
+		filtered_var = join_var_inner;
+		target_var = join_var_outer;
+	}
+	else if (bms_membership(outer_rel->relids) == BMS_SINGLETON &&
+			 outer_rel->baserestrictinfo != NIL)
+	{
+		/* Pattern 2: outer has filters, inner is target */
+		filtered_rel = outer_rel;
+		filtered_var = join_var_outer;
+		target_var = join_var_inner;
+	}
+	else
+	{
+		/* No base relation with filters found */
+		return NULL;
+	}
+
+	/* Try to extract filter info from ALL baserestrictinfo clauses */
+	foreach(lc, filtered_rel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		Var		   *filter_var;
+		List	   *filter_values;
+		Oid			filter_type;
+		Oid			filter_collation;
+		bool		filter_is_in;
+
+		if (extract_filter_info((Node *) rinfo->clause, filtered_rel->relid,
+								&filter_var, &filter_values, &filter_type,
+								&filter_collation, &filter_is_in))
+		{
+			filter_attnums_list = lappend_int(filter_attnums_list, filter_var->varattno);
+			filter_values_list = lappend(filter_values_list, filter_values);
+			filter_types_list = lappend_oid(filter_types_list, filter_type);
+			filter_rinfos_list = lappend(filter_rinfos_list, rinfo);
+
+			/* Use first filter's collation (should be consistent) */
+			if (!OidIsValid(collation))
+				collation = filter_collation;
+		}
+		else
+		{
+			/* This filter clause doesn't match our pattern */
+			all_filters_valid = false;
+			break;
+		}
+	}
+
+	/* If all filters were successfully extracted, create the match */
+	if (!all_filters_valid || filter_attnums_list == NIL)
+		return NULL;
+
+	/* Build the JoinMCVOpportunity result */
+	join_opp = palloc(sizeof(JoinMCVOpportunity));
+
+	/* Use Var's varno to get the actual table RTEs, not the join relid */
+	target_rte = root->simple_rte_array[target_var->varno];
+	filtered_rte = root->simple_rte_array[filtered_var->varno];
+
+	join_opp->target_rel = target_rte->relid;
+	join_opp->target_joinkey = target_var->varattno;
+	join_opp->other_rel = filtered_rte->relid;
+	join_opp->other_joinkey = filtered_var->varattno;
+	join_opp->filter_attnums = filter_attnums_list;
+	join_opp->filter_values = list_length(filter_values_list) == 1 ?
+		linitial(filter_values_list) : filter_values_list;
+	join_opp->collation = collation;
+
+	/* Track which clauses were used */
+	join_opp->join_rinfos = list_make1(join_rinfo);
+	join_opp->filter_rinfos = filter_rinfos_list;
+
+	return join_opp;
+}
+
+/*
+ * find_join_mcv_opportunity_in_clauses
+ *		Detect join MCV opportunity from a list of clauses
+ *
+ * Examines the clause list to identify if exactly two base relations are involved,
+ * then calls find_join_mcv_opportunity() with those relations.
+ *
+ * Returns NULL if no join mcv stats would possibly be applicable.
+ */
+JoinMCVOpportunity *
+find_join_mcv_opportunity_in_clauses(PlannerInfo *root, List *clauses)
+{
+	Relids		clause_relids = NULL;
+	ListCell   *lc;
+	int			relid1 = -1;
+	int			relid2 = -1;
+	RelOptInfo *rel1;
+	RelOptInfo *rel2;
+	JoinMCVOpportunity *join_opp;
+
+	/* Collect all relids mentioned in clauses */
+	foreach(lc, clauses)
+	{
+		RestrictInfo *rinfo;
+
+		if (!IsA(lfirst(lc), RestrictInfo))
+			continue;
+
+		rinfo = (RestrictInfo *) lfirst(lc);
+		clause_relids = bms_union(clause_relids, rinfo->clause_relids);
+	}
+
+	if (clause_relids == NULL)
+		return NULL;
+
+	/* Extract exactly two base relids */
+	relid1 = bms_next_member(clause_relids, -1);
+	if (relid1 < 0)
+		return NULL;
+
+	relid2 = bms_next_member(clause_relids, relid1);
+	if (relid2 < 0)
+		return NULL;			/* Only one relation */
+
+	if (bms_next_member(clause_relids, relid2) >= 0)
+		return NULL;			/* More than two relations */
+
+	/* Get RelOptInfo for both relations (skip if not base relations) */
+	rel1 = find_base_rel_ignore_join(root, relid1);
+	rel2 = find_base_rel_ignore_join(root, relid2);
+
+	if (!rel1 || !rel2)
+		return NULL;			/* One or both are join relations, not base
+								 * tables */
+
+	/* Call the main detection function - try both orders */
+	join_opp = find_join_mcv_opportunity(root, rel1, rel2, clauses);
+	if (join_opp)
+		return join_opp;
+
+	return find_join_mcv_opportunity(root, rel2, rel1, clauses);
+}
+
+/*
+ * pg_join_mcv_list_items
+ *		Returns a set of rows with information about join MCV items.
+ *
+ * For the lean structure, we only return filter column values (no join values).
+ * Returns tuples with:
+ * - index (int)
+ * - values (text[]) - filter column values as text array
+ * - nulls (bool[]) - NULL flags for filter columns
+ * - frequency (float8)
+ */
+Datum
+pg_join_mcv_list_items(PG_FUNCTION_ARGS)
+{
+	FuncCallContext *funcctx;
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		MemoryContext oldcontext;
+		JoinMCVList *mcvlist;
+		TupleDesc	tupdesc;
+
+		funcctx = SRF_FIRSTCALL_INIT();
+		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		mcvlist = statext_join_mcv_deserialize(PG_GETARG_BYTEA_P(0));
+		funcctx->user_fctx = mcvlist;
+
+		funcctx->max_calls = 0;
+		if (mcvlist != NULL)
+			funcctx->max_calls = mcvlist->nitems;
+
+		if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context that cannot accept type record")));
+
+		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	funcctx = SRF_PERCALL_SETUP();
+
+	if (funcctx->call_cntr < funcctx->max_calls)
+	{
+		Datum		values[4];
+		bool		nulls[4];
+		HeapTuple	tuple;
+		JoinMCVList *mcvlist;
+		JoinMCVItem *item;
+		int			i;
+		int			dims[1];
+		int			lbs[1];
+		Datum	   *text_values;
+		bool	   *text_nulls;
+		ArrayType  *text_array;
+		ArrayType  *nulls_array;
+
+		mcvlist = (JoinMCVList *) funcctx->user_fctx;
+		item = &mcvlist->items[funcctx->call_cntr];
+
+		if (mcvlist->ndimensions <= 0 || mcvlist->ndimensions > STATS_MAX_DIMENSIONS)
+			elog(ERROR, "pg_join_mcv_list_items: corrupted mcvlist->ndimensions=%d (magic=%u, nitems=%u)",
+				 mcvlist->ndimensions, mcvlist->magic, mcvlist->nitems);
+
+		values[0] = Int32GetDatum(funcctx->call_cntr);
+		nulls[0] = false;
+
+		/* values[] - convert filter Datums to text array */
+		text_values = (Datum *) palloc(mcvlist->ndimensions * sizeof(Datum));
+		text_nulls = (bool *) palloc(mcvlist->ndimensions * sizeof(bool));
+
+		for (i = 0; i < mcvlist->ndimensions; i++)
+		{
+			if (item->isnull[i])
+			{
+				text_values[i] = (Datum) 0;
+				text_nulls[i] = true;
+			}
+			else
+			{
+				Oid			outfunc;
+				bool		isvarlena;
+
+				getTypeOutputInfo(mcvlist->filter_types[i], &outfunc, &isvarlena);
+				text_values[i] = PointerGetDatum(cstring_to_text(
+																 OidOutputFunctionCall(outfunc, item->values[i])));
+				text_nulls[i] = false;
+			}
+		}
+		dims[0] = mcvlist->ndimensions;
+		lbs[0] = 1;				/* Arrays are 1-indexed */
+
+		text_array = construct_md_array(text_values, text_nulls, 1, dims, lbs, TEXTOID, -1, false, TYPALIGN_INT);
+		values[1] = PointerGetDatum(text_array);
+		nulls[1] = false;
+
+		/* Convert bool array to Datum array */
+		{
+			Datum	   *bool_datums = (Datum *) palloc(mcvlist->ndimensions * sizeof(Datum));
+
+			for (i = 0; i < mcvlist->ndimensions; i++)
+				bool_datums[i] = BoolGetDatum(item->isnull[i]);
+
+			nulls_array = construct_array(bool_datums, mcvlist->ndimensions, BOOLOID,
+										  sizeof(bool), true, TYPALIGN_CHAR);
+			pfree(bool_datums);
+		}
+		values[2] = PointerGetDatum(nulls_array);
+		nulls[2] = false;
+
+		/* frequency */
+		values[3] = Float8GetDatum(item->frequency);
+		nulls[3] = false;
+
+		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+	}
+
+	SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * pg_join_mcv_list_in - input routine for type pg_join_mcv_list.
+ *
+ * pg_join_mcv_list stores data in binary form and parsing text input
+ * is not needed, so disallow this.
+ */
+Datum
+pg_join_mcv_list_in(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_join_mcv_list")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_join_mcv_list_out - output routine for type pg_join_mcv_list.
+ *
+ * Join MCV lists are serialized into a bytea value, so we simply call
+ * byteaout() to serialize the value into text.
+ */
+Datum
+pg_join_mcv_list_out(PG_FUNCTION_ARGS)
+{
+	return byteaout(fcinfo);
+}
+
+/*
+ * pg_join_mcv_list_recv - binary input routine for type pg_join_mcv_list.
+ */
+Datum
+pg_join_mcv_list_recv(PG_FUNCTION_ARGS)
+{
+	ereport(ERROR,
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+			 errmsg("cannot accept a value of type %s", "pg_join_mcv_list")));
+
+	PG_RETURN_VOID();			/* keep compiler quiet */
+}
+
+/*
+ * pg_join_mcv_list_send - binary output routine for type pg_join_mcv_list.
+ */
+Datum
+pg_join_mcv_list_send(PG_FUNCTION_ARGS)
+{
+	return byteasend(fcinfo);
+}
diff --git a/src/backend/statistics/meson.build b/src/backend/statistics/meson.build
index 9a7bf55e301..450b5ef0c64 100644
--- a/src/backend/statistics/meson.build
+++ b/src/backend/statistics/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'dependencies.c',
   'extended_stats.c',
   'extended_stats_funcs.c',
+  'join_mcv.c',
   'mcv.c',
   'mvdistinct.c',
   'relation_stats.c',
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 34dd6e18df5..edd441bd9b6 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1878,28 +1878,44 @@ ProcessUtilitySlow(ParseState *pstate,
 				{
 					Oid			relid;
 					CreateStatsStmt *stmt = (CreateStatsStmt *) parsetree;
-					RangeVar   *rel = (RangeVar *) linitial(stmt->relations);
-
-					if (!IsA(rel, RangeVar))
-						ereport(ERROR,
-								(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-								 errmsg("CREATE STATISTICS only supports relation names in the FROM clause")));
+					Node	   *fromNode = (Node *) linitial(stmt->relations);
 
 					/*
-					 * CREATE STATISTICS will influence future execution plans
-					 * but does not interfere with currently executing plans.
-					 * So it should be enough to take ShareUpdateExclusiveLock
-					 * on relation, conflicting with ANALYZE and other DDL
-					 * that sets statistical information, but not with normal
-					 * queries.
-					 *
-					 * XXX RangeVarCallbackOwnsRelation not needed here, to
-					 * keep the same behavior as before.
+					 * For single-table stats: FROM clause is a simple
+					 * RangeVar, for join stats: FROM clause is a JoinExpr
 					 */
-					relid = RangeVarGetRelid(rel, ShareUpdateExclusiveLock, false);
+					if (IsA(fromNode, RangeVar))
+					{
+						RangeVar   *rel = (RangeVar *) fromNode;
 
-					/* Run parse analysis ... */
-					stmt = transformStatsStmt(relid, stmt, queryString);
+						/*
+						 * CREATE STATISTICS will influence future execution
+						 * plans but does not interfere with currently
+						 * executing plans. So it should be enough to take
+						 * ShareUpdateExclusiveLock on relation, conflicting
+						 * with ANALYZE and other DDL that sets statistical
+						 * information, but not with normal queries.
+						 *
+						 * XXX RangeVarCallbackOwnsRelation not needed here,
+						 * to keep the same behavior as before.
+						 */
+						relid = RangeVarGetRelid(rel, ShareUpdateExclusiveLock, false);
+
+						/* Run parse analysis ... */
+						stmt = transformStatsStmt(relid, stmt, queryString);
+					}
+					else if (IsA(fromNode, JoinExpr))
+					{
+						/* Join statistics passes 0 as relid */
+						relid = InvalidOid;
+
+						/* Run parse analysis ... */
+						stmt = transformStatsStmt(relid, stmt, queryString);
+					}
+					else
+						ereport(ERROR,
+								(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+								 errmsg("CREATE STATISTICS only supports relation names or JOIN clauses in the FROM clause")));
 
 					address = CreateStatistics(stmt, true);
 				}
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index fb577026666..6cf6ea998eb 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202601261
+#define CATALOG_VERSION_NO	202601262
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5e5e33f64fc..15420484c1e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5530,6 +5530,26 @@
   proname => 'pg_mcv_list_send', provolatile => 's', prorettype => 'bytea',
   proargtypes => 'pg_mcv_list', prosrc => 'pg_mcv_list_send' },
 
+{ oid => '8329', descr => 'I/O',
+  proname => 'pg_join_mcv_list_in', prorettype => 'pg_join_mcv_list',
+  proargtypes => 'cstring', prosrc => 'pg_join_mcv_list_in' },
+{ oid => '8330', descr => 'I/O',
+  proname => 'pg_join_mcv_list_out', prorettype => 'cstring',
+  proargtypes => 'pg_join_mcv_list', prosrc => 'pg_join_mcv_list_out' },
+{ oid => '8331', descr => 'I/O',
+  proname => 'pg_join_mcv_list_recv', provolatile => 's',
+  prorettype => 'pg_join_mcv_list', proargtypes => 'internal',
+  prosrc => 'pg_join_mcv_list_recv' },
+{ oid => '8332', descr => 'I/O',
+  proname => 'pg_join_mcv_list_send', provolatile => 's', prorettype => 'bytea',
+  proargtypes => 'pg_join_mcv_list', prosrc => 'pg_join_mcv_list_send' },
+{ oid => '8333', descr => 'inspect cross-table MCV list',
+  proname => 'pg_join_mcv_list_items', prorows => '100', proretset => 't',
+  provolatile => 's', prorettype => 'record', proargtypes => 'pg_join_mcv_list',
+  proallargtypes => '{pg_join_mcv_list,int4,_text,_bool,float8}',
+  proargmodes => '{i,o,o,o,o}',
+  proargnames => '{mcvlist,index,values,nulls,frequency}',
+  prosrc => 'pg_join_mcv_list_items' },
 { oid => '3427', descr => 'details about MCV list items',
   proname => 'pg_mcv_list_items', prorows => '1000', proretset => 't',
   provolatile => 's', prorettype => 'record', proargtypes => 'pg_mcv_list',
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 6842954d4cb..0a05d92b8e8 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -57,6 +57,12 @@ CATALOG(pg_statistic_ext,3381,StatisticExtRelationId)
 	pg_node_tree stxexprs;		/* A list of expression trees for stats
 								 * attributes that are not simple column
 								 * references. */
+
+	/* Fields for join statistics (NULL for single-table stats) */
+	Oid			stxotherrel BKI_LOOKUP_OPT(pg_class);	/* other table in join
+														 * (for join stats) */
+	int2vector	stxjoinkeys;	/* join column pairs: [target_joinkey,
+								 * other_joinkey] (for join stats) */
 #endif
 
 } FormData_pg_statistic_ext;
@@ -73,6 +79,7 @@ DECLARE_TOAST(pg_statistic_ext, 3439, 3440);
 DECLARE_UNIQUE_INDEX_PKEY(pg_statistic_ext_oid_index, 3380, StatisticExtOidIndexId, pg_statistic_ext, btree(oid oid_ops));
 DECLARE_UNIQUE_INDEX(pg_statistic_ext_name_index, 3997, StatisticExtNameIndexId, pg_statistic_ext, btree(stxname name_ops, stxnamespace oid_ops));
 DECLARE_INDEX(pg_statistic_ext_relid_index, 3379, StatisticExtRelidIndexId, pg_statistic_ext, btree(stxrelid oid_ops));
+DECLARE_INDEX(pg_statistic_ext_otherrel_index, 9876, StatisticExtOtherrelIndexId, pg_statistic_ext, btree(stxrelid oid_ops, stxotherrel oid_ops));
 
 MAKE_SYSCACHE(STATEXTOID, pg_statistic_ext_oid_index, 4);
 MAKE_SYSCACHE(STATEXTNAMENSP, pg_statistic_ext_name_index, 4);
@@ -85,6 +92,7 @@ DECLARE_ARRAY_FOREIGN_KEY((stxrelid, stxkeys), pg_attribute, (attrelid, attnum))
 #define STATS_EXT_DEPENDENCIES		'f'
 #define STATS_EXT_MCV				'm'
 #define STATS_EXT_EXPRESSIONS		'e'
+#define STATS_EXT_JOIN_MCV			'c'
 
 #endif							/* EXPOSE_TO_CLIENT_CODE */
 
diff --git a/src/include/catalog/pg_statistic_ext_data.h b/src/include/catalog/pg_statistic_ext_data.h
index 7b7f2593491..8fef6851531 100644
--- a/src/include/catalog/pg_statistic_ext_data.h
+++ b/src/include/catalog/pg_statistic_ext_data.h
@@ -40,6 +40,7 @@ CATALOG(pg_statistic_ext_data,3429,StatisticExtDataRelationId)
 	pg_dependencies stxddependencies;	/* dependencies (serialized) */
 	pg_mcv_list stxdmcv;		/* MCV (serialized) */
 	pg_statistic stxdexpr[1];	/* stats for expressions */
+	pg_join_mcv_list stxdjoinmcv;	/* join MCV (serialized) */
 
 #endif
 
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index a1a753d1797..358cd0e316e 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -166,6 +166,13 @@
   typinput => 'pg_mcv_list_in', typoutput => 'pg_mcv_list_out',
   typreceive => 'pg_mcv_list_recv', typsend => 'pg_mcv_list_send',
   typalign => 'i', typstorage => 'x', typcollation => 'default' },
+{ oid => '8328', descr => 'join MCV list',
+  typname => 'pg_join_mcv_list', typlen => '-1', typbyval => 'f',
+  typcategory => 'Z', typinput => 'pg_join_mcv_list_in',
+  typoutput => 'pg_join_mcv_list_out',
+  typreceive => 'pg_join_mcv_list_recv',
+  typsend => 'pg_join_mcv_list_send', typalign => 'i', typstorage => 'x',
+  typcollation => 'default' },
 { oid => '32', descr => 'internal type for passing CollectedCommand',
   typname => 'pg_ddl_command', typlen => 'SIZEOF_POINTER', typbyval => 't',
   typtype => 'p', typcategory => 'P', typinput => 'pg_ddl_command_in',
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 54b4a26273d..982d1babaa8 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,7 +68,6 @@ typedef struct StatsBuildData
 	bool	  **nulls;
 } StatsBuildData;
 
-
 extern MVNDistinct *statext_ndistinct_build(double totalrows, StatsBuildData *data);
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
 extern MVNDistinct *statext_ndistinct_deserialize(bytea *data);
@@ -91,6 +90,18 @@ extern bytea *statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats);
 extern MCVList *statext_mcv_deserialize(bytea *data);
 extern void statext_mcv_free(MCVList *mcvlist);
 
+extern JoinMCVList * statext_join_mcv_build(Oid stxoid,
+											Oid primary_relid,
+											Oid orther_relid,
+											int2vector *joinkeys,
+											int2vector *filter_attnums,
+											int numrows,
+											HeapTuple *rows,
+											int natts,
+											VacAttrStats **vacattrstats);
+extern bytea *statext_join_mcv_serialize(JoinMCVList * mcvlist);
+extern JoinMCVList * statext_join_mcv_deserialize(bytea *data);
+
 extern MultiSortSupport multi_sort_init(int ndims);
 extern void multi_sort_add_dimension(MultiSortSupport mss, int sortdim,
 									 Oid oper, Oid collation);
diff --git a/src/include/statistics/statistics.h b/src/include/statistics/statistics.h
index 38a56f6ccb3..a019ff41578 100644
--- a/src/include/statistics/statistics.h
+++ b/src/include/statistics/statistics.h
@@ -94,6 +94,69 @@ typedef struct MCVList
 	MCVItem		items[FLEXIBLE_ARRAY_MEMBER];	/* array of MCV items */
 } MCVList;
 
+/* Join MCV statistics */
+#define STATS_JOIN_MCV_MAGIC		0xC805E7AB	/* marks serialized bytea */
+#define STATS_JOIN_MCV_TYPE_BASIC	1	/* join MCV list type */
+
+/*
+ * Join MCV (most-common value) lists
+ *
+ * Similar to MCVItem but for join statistics.
+ * Represents: one MCV entry from filter columns observed through a join
+ * Example: for join A.fk = B.id collecting stats on B.category, an item stores
+ * one category value and its frequency in table A.
+ * Example: values=['electronics'], frequency=0.20 means 20% of A's rows
+ * join to B rows where category='electronics'
+ *
+ * Note: Join column values are NOT stored (they're implicit in the join pattern).
+ * Only filter column values from the one of the tables are stored.
+ */
+typedef struct JoinMCVItem
+{
+	double		frequency;		/* frequency in join result (0.0 to 1.0) */
+	bool	   *isnull;			/* NULL flags for filter values */
+	Datum	   *values;			/* filter column values */
+}			JoinMCVItem;
+
+/* join MCV list - essentially an array of join MCV items */
+typedef struct JoinMCVList
+{
+	uint32		magic;			/* magic constant marker */
+	uint32		type;			/* type of join MCV (BASIC) */
+	uint32		nitems;			/* number of MCV items */
+	AttrNumber	ndimensions;	/* number of filter columns (length of
+								 * stxkeys) */
+
+	/* Column metadata for filter columns (cached for lookup performance) */
+	AttrNumber	filter_attnums[STATS_MAX_DIMENSIONS];	/* attribute numbers of
+														 * filter columns */
+	Oid			filter_types[STATS_MAX_DIMENSIONS]; /* types of filter columns */
+
+	/* Variable-length array of MCV items follows */
+	JoinMCVItem items[FLEXIBLE_ARRAY_MEMBER];
+}			JoinMCVList;
+
+/*
+ * JoinMCVOpportunity - Information about a detected join MCV opportunity
+ * Used internally to track a join+filter pattern where MCV stats can help
+ */
+typedef struct JoinMCVOpportunity
+{
+	Oid			target_rel;		/* table OID of the estimation target */
+	AttrNumber	target_joinkey; /* target_rel's join column */
+	Oid			other_rel;		/* table OID of the filter source */
+	AttrNumber	other_joinkey;	/* other_rel's join column */
+	List	   *filter_attnums; /* list of AttrNumbers for filter columns in
+								 * other_rel */
+	List	   *filter_values;	/* list of Datum constant values being
+								 * filtered */
+	Oid			collation;		/* collation for comparisons */
+
+	/* Additional info to avoid duplicate work */
+	List	   *join_rinfos;	/* list of join clause RestrictInfos */
+	List	   *filter_rinfos;	/* list of filter clause RestrictInfos */
+}			JoinMCVOpportunity;
+
 extern MVNDistinct *statext_ndistinct_load(Oid mvoid, bool inh);
 extern MVDependencies *statext_dependencies_load(Oid mvoid, bool inh);
 extern MCVList *statext_mcv_load(Oid mvoid, bool inh);
@@ -127,4 +190,25 @@ extern StatisticExtInfo *choose_best_statistics(List *stats, char requiredkind,
 												int nclauses);
 extern HeapTuple statext_expressions_load(Oid stxoid, bool inh, int idx);
 
+/* Join MCV statistics functions */
+extern JoinMCVList * statext_join_mcv_load(Oid relid,
+										   AttrNumber rel_joinkey_attnum,
+										   Oid other_relid,
+										   AttrNumber otherrel_joinkey_attnum,
+										   List *filter_attnums);
+extern JoinMCVOpportunity * find_join_mcv_opportunity(PlannerInfo *root,
+													  RelOptInfo *outer_rel,
+													  RelOptInfo *inner_rel,
+													  List *restrictlist);
+extern JoinMCVOpportunity * find_join_mcv_opportunity_in_clauses(PlannerInfo *root,
+																 List *clauses);
+extern Selectivity join_mcv_clauselist_selectivity(JoinMCVList * mcvlist,
+												   List *filter_values,
+												   List *filter_attnums,
+												   Oid collation);
+extern Selectivity statext_join_mcv_clauselist_selectivity(PlannerInfo *root,
+														   List *clauses,
+														   int varRelid,
+														   Bitmapset **estimatedclauses);
+
 #endif							/* STATISTICS_H */
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 25aaae8d05a..0f2c009e9b9 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -24,6 +24,11 @@ begin
       end loop;
       cmd := cmd || ', unnest(' || quote_ident(fk.fkcols[nkeys]);
       cmd := cmd || ') as ' || quote_ident(fk.fkcols[nkeys]);
+      -- For pg_statistic_ext, include stxkind for join stats filtering
+      if fk.fktable = 'pg_statistic_ext'::regclass and
+         fk.fkcols = ARRAY['stxrelid', 'stxkeys'] then
+        cmd := cmd || ', stxkind';
+      end if;
       cmd := cmd || ' FROM ' || fk.fktable::text || ') fk WHERE ';
     else
       cmd := cmd || ' FROM ' || fk.fktable::text || ' fk WHERE ';
@@ -33,6 +38,13 @@ begin
         cmd := cmd || quote_ident(fk.fkcols[i]) || ' != 0 AND ';
       end loop;
     end if;
+    -- Special case: For join statistics, stxkeys references attributes from
+    -- stxotherrel (the referenced table), not from stxrelid (the referencing
+    -- table). Skip the FK check for join stats where stxkind contains 'c'.
+    if fk.fktable = 'pg_statistic_ext'::regclass and
+       fk.fkcols = ARRAY['stxrelid', 'stxkeys'] then
+      cmd := cmd || 'NOT (''c'' = ANY(stxkind)) AND ';
+    end if;
     cmd := cmd || 'NOT EXISTS(SELECT 1 FROM ' || fk.pktable::text || ' pk WHERE ';
     for i in 1 .. nkeys loop
       if i > 1 then cmd := cmd || ' AND '; end if;
@@ -166,6 +178,7 @@ NOTICE:  checking pg_statistic {starelid,staattnum} => pg_attribute {attrelid,at
 NOTICE:  checking pg_statistic_ext {stxrelid} => pg_class {oid}
 NOTICE:  checking pg_statistic_ext {stxnamespace} => pg_namespace {oid}
 NOTICE:  checking pg_statistic_ext {stxowner} => pg_authid {oid}
+NOTICE:  checking pg_statistic_ext {stxotherrel} => pg_class {oid}
 NOTICE:  checking pg_statistic_ext {stxrelid,stxkeys} => pg_attribute {attrelid,attnum}
 NOTICE:  checking pg_statistic_ext_data {stxoid} => pg_statistic_ext {oid}
 NOTICE:  checking pg_rewrite {ev_class} => pg_class {oid}
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index b2a06579135..b42a8a03d6d 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -56,22 +56,22 @@ CREATE STATISTICS tst (unrecognized) ON x, y FROM ext_stats_test;
 ERROR:  unrecognized statistics kind "unrecognized"
 -- unsupported targets
 CREATE STATISTICS tst ON a FROM (VALUES (x)) AS foo;
-ERROR:  CREATE STATISTICS only supports relation names in the FROM clause
+ERROR:  CREATE STATISTICS only supports relation names or JOIN clauses in the FROM clause
 CREATE STATISTICS tst ON a FROM foo NATURAL JOIN bar;
-ERROR:  CREATE STATISTICS only supports relation names in the FROM clause
+ERROR:  relation "foo" does not exist
 CREATE STATISTICS tst ON a FROM (SELECT * FROM ext_stats_test) AS foo;
-ERROR:  CREATE STATISTICS only supports relation names in the FROM clause
+ERROR:  CREATE STATISTICS only supports relation names or JOIN clauses in the FROM clause
 CREATE STATISTICS tst ON a FROM ext_stats_test s TABLESAMPLE system (x);
-ERROR:  CREATE STATISTICS only supports relation names in the FROM clause
+ERROR:  CREATE STATISTICS only supports relation names or JOIN clauses in the FROM clause
 CREATE STATISTICS tst ON a FROM XMLTABLE('foo' PASSING 'bar' COLUMNS a text);
-ERROR:  CREATE STATISTICS only supports relation names in the FROM clause
+ERROR:  CREATE STATISTICS only supports relation names or JOIN clauses in the FROM clause
 CREATE STATISTICS tst ON a FROM JSON_TABLE(jsonb '123', '$' COLUMNS (item int));
-ERROR:  CREATE STATISTICS only supports relation names in the FROM clause
+ERROR:  CREATE STATISTICS only supports relation names or JOIN clauses in the FROM clause
 CREATE FUNCTION tftest(int) returns table(a int, b int) as $$
 SELECT $1, $1+i FROM generate_series(1,5) g(i);
 $$ LANGUAGE sql IMMUTABLE STRICT;
 CREATE STATISTICS alt_stat2 ON a FROM tftest(1);
-ERROR:  CREATE STATISTICS only supports relation names in the FROM clause
+ERROR:  CREATE STATISTICS only supports relation names or JOIN clauses in the FROM clause
 DROP FUNCTION tftest;
 -- incorrect expressions
 CREATE STATISTICS tst ON (y) FROM ext_stats_test; -- single column reference
diff --git a/src/test/regress/expected/stats_ext_crossrel.out b/src/test/regress/expected/stats_ext_crossrel.out
new file mode 100644
index 00000000000..d41dd37db3f
--- /dev/null
+++ b/src/test/regress/expected/stats_ext_crossrel.out
@@ -0,0 +1,288 @@
+-- Join MCV statistics tests
+--
+-- Note: tables for which we check estimated row counts should be created
+-- with autovacuum_enabled = off, so that we don't have unstable results
+-- from auto-analyze happening when we didn't expect it.
+--
+--
+-- Test CREATE STATISTICS syntax for join MCV statistics.
+--
+CREATE TABLE keywords2 (
+    id INTEGER PRIMARY KEY,
+    keyword TEXT NOT NULL,
+    phonetic_code character varying(5)
+);
+CREATE TABLE movie_keywords2 (
+    movie_id INTEGER PRIMARY KEY,
+    keyword_id INTEGER NOT NULL  -- No FOREIGN KEY reference
+);
+-- Insert tightly correlated data into the "referenced" table
+INSERT INTO keywords2 (id, keyword, phonetic_code)
+SELECT
+    i,
+    'keyword_' || i,
+    'ph_' || i
+FROM generate_series(1, 50) i;
+-- Insert data into the "referencing" table with skewed distribution
+INSERT INTO movie_keywords2 (movie_id, keyword_id)
+SELECT
+    i,
+    CASE
+        WHEN i % 100 < 60 THEN (i % 10) + 1      -- 60% keyword_ids 1-10 (6% frequency per keyword)
+        WHEN i % 100 < 90 THEN (i % 10) + 11     -- 30% keyword_ids 11-20 (3% frequency per keyword)
+        ELSE (i % 10) + 21                       -- 10% keyword_ids 21-30 (1% frequency per keyword)
+    END
+FROM generate_series(1, 10000) i;
+ANALYZE keywords2;
+ANALYZE movie_keywords2;
+-- Create join MCV statistics on a single filter column (keyword)
+CREATE STATISTICS movie_keywords2_keyword_stats (mcv)
+ON k.keyword
+FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+ANALYZE movie_keywords2;
+-- Show the stats in catalog
+SELECT s.stxname,
+       s.stxrelid::regclass,
+       s.stxotherrel::regclass,
+       s.stxjoinkeys,
+       s.stxkeys,
+       s.stxkind,
+       s.stxstattarget,
+       s.stxexprs
+FROM pg_statistic_ext s
+WHERE s.stxname = 'movie_keywords2_keyword_stats';
+            stxname            |    stxrelid     | stxotherrel | stxjoinkeys | stxkeys | stxkind | stxstattarget | stxexprs 
+-------------------------------+-----------------+-------------+-------------+---------+---------+---------------+----------
+ movie_keywords2_keyword_stats | movie_keywords2 | keywords2   | 2 1         | 2       | {c}     |               | 
+(1 row)
+
+-- FIXME: this is incorrect.
+-- Need to implement pg_get_statisticsobjdef for join MCV statistics
+SELECT pg_get_statisticsobjdef(oid) FROM pg_statistic_ext WHERE stxname = 'movie_keywords2_keyword_stats';
+                                  pg_get_statisticsobjdef                                  
+-------------------------------------------------------------------------------------------
+ CREATE STATISTICS public.movie_keywords2_keyword_stats ON keyword_id FROM movie_keywords2
+(1 row)
+
+SELECT m.index,
+       m.values,
+       m.nulls,
+       ROUND(m.frequency::numeric, 2) AS frequency
+FROM pg_statistic_ext s
+JOIN pg_statistic_ext_data d ON (s.oid = d.stxoid)
+CROSS JOIN LATERAL pg_join_mcv_list_items(d.stxdjoinmcv) AS m
+WHERE s.stxname = 'movie_keywords2_keyword_stats'
+ORDER BY m.index;
+ index |    values    | nulls | frequency 
+-------+--------------+-------+-----------
+     0 | {keyword_1}  | {f}   |      0.06
+     1 | {keyword_2}  | {f}   |      0.06
+     2 | {keyword_3}  | {f}   |      0.06
+     3 | {keyword_4}  | {f}   |      0.06
+     4 | {keyword_5}  | {f}   |      0.06
+     5 | {keyword_6}  | {f}   |      0.06
+     6 | {keyword_7}  | {f}   |      0.06
+     7 | {keyword_8}  | {f}   |      0.06
+     8 | {keyword_9}  | {f}   |      0.06
+     9 | {keyword_10} | {f}   |      0.06
+    10 | {keyword_11} | {f}   |      0.03
+    11 | {keyword_12} | {f}   |      0.03
+    12 | {keyword_13} | {f}   |      0.03
+    13 | {keyword_14} | {f}   |      0.03
+    14 | {keyword_15} | {f}   |      0.03
+    15 | {keyword_16} | {f}   |      0.03
+    16 | {keyword_17} | {f}   |      0.03
+    17 | {keyword_18} | {f}   |      0.03
+    18 | {keyword_19} | {f}   |      0.03
+    19 | {keyword_20} | {f}   |      0.03
+    20 | {keyword_21} | {f}   |      0.01
+    21 | {keyword_22} | {f}   |      0.01
+    22 | {keyword_23} | {f}   |      0.01
+    23 | {keyword_24} | {f}   |      0.01
+    24 | {keyword_25} | {f}   |      0.01
+    25 | {keyword_26} | {f}   |      0.01
+    26 | {keyword_27} | {f}   |      0.01
+    27 | {keyword_28} | {f}   |      0.01
+    28 | {keyword_29} | {f}   |      0.01
+    29 | {keyword_30} | {f}   |      0.01
+(30 rows)
+
+-- Create join MCV statistics on multiple filter columns (keyword + phonetic_code)
+CREATE STATISTICS movie_keywords2_multi_stats (mcv)
+ON k.keyword, k.phonetic_code
+FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+ANALYZE movie_keywords2;
+-- Show the stats in catalog
+SELECT s.stxname,
+       s.stxrelid::regclass,
+       s.stxotherrel::regclass,
+       s.stxjoinkeys,
+       s.stxkeys,
+       s.stxkind,
+       s.stxstattarget,
+       s.stxexprs
+FROM pg_statistic_ext s
+WHERE s.stxname = 'movie_keywords2_multi_stats';
+           stxname           |    stxrelid     | stxotherrel | stxjoinkeys | stxkeys | stxkind | stxstattarget | stxexprs 
+-----------------------------+-----------------+-------------+-------------+---------+---------+---------------+----------
+ movie_keywords2_multi_stats | movie_keywords2 | keywords2   | 2 1         | 2 3     | {c}     |               | 
+(1 row)
+
+SELECT m.index,
+       m.values,
+       m.nulls,
+       ROUND(m.frequency::numeric, 2) AS frequency
+FROM pg_statistic_ext s
+JOIN pg_statistic_ext_data d ON (s.oid = d.stxoid)
+CROSS JOIN LATERAL pg_join_mcv_list_items(d.stxdjoinmcv) AS m
+WHERE s.stxname = 'movie_keywords2_multi_stats'
+ORDER BY m.index;
+ index |       values       | nulls | frequency 
+-------+--------------------+-------+-----------
+     0 | {keyword_1,ph_1}   | {f,f} |      0.06
+     1 | {keyword_2,ph_2}   | {f,f} |      0.06
+     2 | {keyword_3,ph_3}   | {f,f} |      0.06
+     3 | {keyword_4,ph_4}   | {f,f} |      0.06
+     4 | {keyword_5,ph_5}   | {f,f} |      0.06
+     5 | {keyword_6,ph_6}   | {f,f} |      0.06
+     6 | {keyword_7,ph_7}   | {f,f} |      0.06
+     7 | {keyword_8,ph_8}   | {f,f} |      0.06
+     8 | {keyword_9,ph_9}   | {f,f} |      0.06
+     9 | {keyword_10,ph_10} | {f,f} |      0.06
+    10 | {keyword_11,ph_11} | {f,f} |      0.03
+    11 | {keyword_12,ph_12} | {f,f} |      0.03
+    12 | {keyword_13,ph_13} | {f,f} |      0.03
+    13 | {keyword_14,ph_14} | {f,f} |      0.03
+    14 | {keyword_15,ph_15} | {f,f} |      0.03
+    15 | {keyword_16,ph_16} | {f,f} |      0.03
+    16 | {keyword_17,ph_17} | {f,f} |      0.03
+    17 | {keyword_18,ph_18} | {f,f} |      0.03
+    18 | {keyword_19,ph_19} | {f,f} |      0.03
+    19 | {keyword_20,ph_20} | {f,f} |      0.03
+    20 | {keyword_21,ph_21} | {f,f} |      0.01
+    21 | {keyword_22,ph_22} | {f,f} |      0.01
+    22 | {keyword_23,ph_23} | {f,f} |      0.01
+    23 | {keyword_24,ph_24} | {f,f} |      0.01
+    24 | {keyword_25,ph_25} | {f,f} |      0.01
+    25 | {keyword_26,ph_26} | {f,f} |      0.01
+    26 | {keyword_27,ph_27} | {f,f} |      0.01
+    27 | {keyword_28,ph_28} | {f,f} |      0.01
+    28 | {keyword_29,ph_29} | {f,f} |      0.01
+    29 | {keyword_30,ph_30} | {f,f} |      0.01
+(30 rows)
+
+-- Verify the join MCV statistics are used for single equality predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_1'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       600 |    600
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_15'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       300 |    300
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_25'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       100 |    100
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.phonetic_code = ''ph_1'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       600 |    600
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.phonetic_code = ''ph_15'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       300 |    300
+(1 row)
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.phonetic_code = ''ph_25'' AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       100 |    100
+(1 row)
+
+-- Ensure the join MCV statistics are used for IN predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword IN (''keyword_1'', ''keyword_2'', ''keyword_3'')
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+      1800 |   1800
+(1 row)
+
+-- Ensure the join MCV statistics are used for equality predicates
+-- on multiple filter columns of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_1''
+      AND k.phonetic_code = ''ph_1''
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       600 |    600
+(1 row)
+
+-- Zero join MCV match, falls back to standard estimation: 10000 * 1 / 50 = 200.
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_1''
+      AND k.phonetic_code = ''ph_15''
+      AND k.id = mk.keyword_id
+');
+ estimated | actual 
+-----------+--------
+       200 |      0
+(1 row)
+
+-- Verify syntax error cases
+CREATE STATISTICS bad_stats1 (mcv) ON k.keyword;
+ERROR:  syntax error at or near ";"
+LINE 1: CREATE STATISTICS bad_stats1 (mcv) ON k.keyword;
+                                                       ^
+CREATE STATISTICS bad_stats2 (mcv) ON k.keyword FROM movie_keywords2 mk, keywords2 k;
+ERROR:  missing FROM-clause entry for table "k"
+LINE 1: CREATE STATISTICS bad_stats2 (mcv) ON k.keyword FROM movie_k...
+                                              ^
+CREATE STATISTICS bad_stats3 (mcv) FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+ERROR:  syntax error at or near "FROM"
+LINE 1: CREATE STATISTICS bad_stats3 (mcv) FROM movie_keywords2 mk J...
+                                           ^
+CREATE STATISTICS bad_stats4 (mcv) ON keyword FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+ERROR:  join statistics require table-qualified column names
+CREATE STATISTICS bad_stats5 (mcv) ON lower(k.keyword) FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+ERROR:  expressions are not supported in join statistics
+CREATE STATISTICS bad_stats6 (mcv) ON k.keyword FROM (movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id)) JOIN keywords2 k2 ON (k.id = k2.id);
+ERROR:  extended join statistics currently support only simple two-table joins
+-- Cleanup
+DROP TABLE movie_keywords2 CASCADE;
+DROP TABLE keywords2 CASCADE;
diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out
index 1d21d3eb446..cb95cfabb1f 100644
--- a/src/test/regress/expected/type_sanity.out
+++ b/src/test/regress/expected/type_sanity.out
@@ -79,7 +79,8 @@ ORDER BY t1.oid;
  4600 | pg_brin_bloom_summary
  4601 | pg_brin_minmax_multi_summary
  5017 | pg_mcv_list
-(6 rows)
+ 8328 | pg_join_mcv_list
+(7 rows)
 
 -- Make sure typarray points to a "true" array type of our own base
 SELECT t1.oid, t1.typname as basetype, t2.typname as arraytype,
@@ -815,7 +816,8 @@ SELECT oid, typname, typtype, typelem, typarray
                     WHERE a.atttypid=t.oid AND
                           a.attnum > 0 AND
                           a.attrelid='tab_core_types'::regclass);
- oid | typname | typtype | typelem | typarray 
------+---------+---------+---------+----------
-(0 rows)
+ oid  |     typname      | typtype | typelem | typarray 
+------+------------------+---------+---------+----------
+ 8328 | pg_join_mcv_list | b       |       0 |        0
+(1 row)
 
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 021d57f66bb..f95802ba3ca 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -83,6 +83,9 @@ test: create_table_like alter_generic alter_operator misc async dbsize merge mis
 # amutils depends on geometry, create_index_spgist, hash_index, brin
 test: rules psql psql_crosstab psql_pipeline amutils stats_ext collate.linux.utf8 collate.windows.win1252
 
+# join MCV statistics tests
+test: stats_ext_crossrel
+
 # ----------
 # Run these alone so they don't run out of parallel workers
 # select_parallel depends on create_misc
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 8b22e6d10c5..c59551f1b8c 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -24,6 +24,11 @@ begin
       end loop;
       cmd := cmd || ', unnest(' || quote_ident(fk.fkcols[nkeys]);
       cmd := cmd || ') as ' || quote_ident(fk.fkcols[nkeys]);
+      -- For pg_statistic_ext, include stxkind for join stats filtering
+      if fk.fktable = 'pg_statistic_ext'::regclass and
+         fk.fkcols = ARRAY['stxrelid', 'stxkeys'] then
+        cmd := cmd || ', stxkind';
+      end if;
       cmd := cmd || ' FROM ' || fk.fktable::text || ') fk WHERE ';
     else
       cmd := cmd || ' FROM ' || fk.fktable::text || ' fk WHERE ';
@@ -33,6 +38,13 @@ begin
         cmd := cmd || quote_ident(fk.fkcols[i]) || ' != 0 AND ';
       end loop;
     end if;
+    -- Special case: For join statistics, stxkeys references attributes from
+    -- stxotherrel (the referenced table), not from stxrelid (the referencing
+    -- table). Skip the FK check for join stats where stxkind contains 'c'.
+    if fk.fktable = 'pg_statistic_ext'::regclass and
+       fk.fkcols = ARRAY['stxrelid', 'stxkeys'] then
+      cmd := cmd || 'NOT (''c'' = ANY(stxkind)) AND ';
+    end if;
     cmd := cmd || 'NOT EXISTS(SELECT 1 FROM ' || fk.pktable::text || ' pk WHERE ';
     for i in 1 .. nkeys loop
       if i > 1 then cmd := cmd || ' AND '; end if;
diff --git a/src/test/regress/sql/stats_ext_crossrel.sql b/src/test/regress/sql/stats_ext_crossrel.sql
new file mode 100644
index 00000000000..b771a30f34d
--- /dev/null
+++ b/src/test/regress/sql/stats_ext_crossrel.sql
@@ -0,0 +1,172 @@
+-- Join MCV statistics tests
+--
+-- Note: tables for which we check estimated row counts should be created
+-- with autovacuum_enabled = off, so that we don't have unstable results
+-- from auto-analyze happening when we didn't expect it.
+--
+
+--
+-- Test CREATE STATISTICS syntax for join MCV statistics.
+--
+
+CREATE TABLE keywords2 (
+    id INTEGER PRIMARY KEY,
+    keyword TEXT NOT NULL,
+    phonetic_code character varying(5)
+);
+
+CREATE TABLE movie_keywords2 (
+    movie_id INTEGER PRIMARY KEY,
+    keyword_id INTEGER NOT NULL  -- No FOREIGN KEY reference
+);
+
+-- Insert tightly correlated data into the "referenced" table
+INSERT INTO keywords2 (id, keyword, phonetic_code)
+SELECT
+    i,
+    'keyword_' || i,
+    'ph_' || i
+FROM generate_series(1, 50) i;
+
+-- Insert data into the "referencing" table with skewed distribution
+INSERT INTO movie_keywords2 (movie_id, keyword_id)
+SELECT
+    i,
+    CASE
+        WHEN i % 100 < 60 THEN (i % 10) + 1      -- 60% keyword_ids 1-10 (6% frequency per keyword)
+        WHEN i % 100 < 90 THEN (i % 10) + 11     -- 30% keyword_ids 11-20 (3% frequency per keyword)
+        ELSE (i % 10) + 21                       -- 10% keyword_ids 21-30 (1% frequency per keyword)
+    END
+FROM generate_series(1, 10000) i;
+
+ANALYZE keywords2;
+ANALYZE movie_keywords2;
+
+-- Create join MCV statistics on a single filter column (keyword)
+CREATE STATISTICS movie_keywords2_keyword_stats (mcv)
+ON k.keyword
+FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+ANALYZE movie_keywords2;
+
+-- Show the stats in catalog
+SELECT s.stxname,
+       s.stxrelid::regclass,
+       s.stxotherrel::regclass,
+       s.stxjoinkeys,
+       s.stxkeys,
+       s.stxkind,
+       s.stxstattarget,
+       s.stxexprs
+FROM pg_statistic_ext s
+WHERE s.stxname = 'movie_keywords2_keyword_stats';
+
+-- FIXME: this is incorrect.
+-- Need to implement pg_get_statisticsobjdef for join MCV statistics
+SELECT pg_get_statisticsobjdef(oid) FROM pg_statistic_ext WHERE stxname = 'movie_keywords2_keyword_stats';
+
+SELECT m.index,
+       m.values,
+       m.nulls,
+       ROUND(m.frequency::numeric, 2) AS frequency
+FROM pg_statistic_ext s
+JOIN pg_statistic_ext_data d ON (s.oid = d.stxoid)
+CROSS JOIN LATERAL pg_join_mcv_list_items(d.stxdjoinmcv) AS m
+WHERE s.stxname = 'movie_keywords2_keyword_stats'
+ORDER BY m.index;
+
+-- Create join MCV statistics on multiple filter columns (keyword + phonetic_code)
+CREATE STATISTICS movie_keywords2_multi_stats (mcv)
+ON k.keyword, k.phonetic_code
+FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+ANALYZE movie_keywords2;
+
+-- Show the stats in catalog
+SELECT s.stxname,
+       s.stxrelid::regclass,
+       s.stxotherrel::regclass,
+       s.stxjoinkeys,
+       s.stxkeys,
+       s.stxkind,
+       s.stxstattarget,
+       s.stxexprs
+FROM pg_statistic_ext s
+WHERE s.stxname = 'movie_keywords2_multi_stats';
+
+SELECT m.index,
+       m.values,
+       m.nulls,
+       ROUND(m.frequency::numeric, 2) AS frequency
+FROM pg_statistic_ext s
+JOIN pg_statistic_ext_data d ON (s.oid = d.stxoid)
+CROSS JOIN LATERAL pg_join_mcv_list_items(d.stxdjoinmcv) AS m
+WHERE s.stxname = 'movie_keywords2_multi_stats'
+ORDER BY m.index;
+
+-- Verify the join MCV statistics are used for single equality predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_1'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_15'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_25'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.phonetic_code = ''ph_1'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.phonetic_code = ''ph_15'' AND k.id = mk.keyword_id
+');
+
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.phonetic_code = ''ph_25'' AND k.id = mk.keyword_id
+');
+
+-- Ensure the join MCV statistics are used for IN predicates
+-- on the filter column of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword IN (''keyword_1'', ''keyword_2'', ''keyword_3'')
+      AND k.id = mk.keyword_id
+');
+
+-- Ensure the join MCV statistics are used for equality predicates
+-- on multiple filter columns of the referenced table
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_1''
+      AND k.phonetic_code = ''ph_1''
+      AND k.id = mk.keyword_id
+');
+
+-- Zero join MCV match, falls back to standard estimation: 10000 * 1 / 50 = 200.
+SELECT * FROM check_estimated_rows('
+    SELECT * FROM movie_keywords2 mk, keywords2 k
+    WHERE k.keyword = ''keyword_1''
+      AND k.phonetic_code = ''ph_15''
+      AND k.id = mk.keyword_id
+');
+
+-- Verify syntax error cases
+CREATE STATISTICS bad_stats1 (mcv) ON k.keyword;
+CREATE STATISTICS bad_stats2 (mcv) ON k.keyword FROM movie_keywords2 mk, keywords2 k;
+CREATE STATISTICS bad_stats3 (mcv) FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+CREATE STATISTICS bad_stats4 (mcv) ON keyword FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+CREATE STATISTICS bad_stats5 (mcv) ON lower(k.keyword) FROM movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id);
+CREATE STATISTICS bad_stats6 (mcv) ON k.keyword FROM (movie_keywords2 mk JOIN keywords2 k ON (mk.keyword_id = k.id)) JOIN keywords2 k2 ON (k.id = k2.id);
+
+-- Cleanup
+DROP TABLE movie_keywords2 CASCADE;
+DROP TABLE keywords2 CASCADE;
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] 16b.sql (668B, 6-16b.sql)
  download

  [text/plain] 16b_js.txt (9.4K, 7-16b_js.txt)
  download | inline:
Pager usage is off.
SET
                                                                                                QUERY PLAN                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=280264.91..280264.92 rows=1 width=64) (actual time=2741.164..2763.554 rows=1.00 loops=1)
   Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
   ->  Gather  (cost=280264.69..280264.90 rows=2 width=64) (actual time=2738.152..2763.550 rows=3.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
         ->  Partial Aggregate  (cost=279264.69..279264.70 rows=1 width=64) (actual time=2736.217..2736.250 rows=1.00 loops=3)
               Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
               ->  Parallel Hash Join  (cost=136368.90..271857.81 rows=1481375 width=33) (actual time=2244.259..2397.555 rows=1236864.00 loops=3)
                     Hash Cond: (n.id = an.person_id)
                     Buffers: shared hit=4178268 read=1093110, temp read=22572 written=22652
                     ->  Nested Loop  (cost=114301.82..235944.12 rows=635491 width=25) (actual time=198.873..2044.905 rows=944185.00 loops=3)
                           Buffers: shared hit=4177986 read=1081976, temp read=1698 written=1704
                           ->  Nested Loop  (cost=114301.38..188137.25 rows=635491 width=21) (actual time=198.784..1236.382 rows=944185.00 loops=3)
                                 Join Filter: (ci.movie_id = t.id)
                                 Buffers: shared hit=2546486 read=732334, temp read=1698 written=1704
                                 ->  Nested Loop  (cost=114300.94..131790.25 rows=30693 width=29) (actual time=198.716..273.204 rows=22772.00 loops=3)
                                       Buffers: shared hit=268202 read=61791, temp read=1698 written=1704
                                       ->  Merge Join  (cost=114300.51..116833.42 rows=30693 width=8) (actual time=198.599..222.967 rows=22772.00 loops=3)
                                             Merge Cond: (mc.movie_id = mk.movie_id)
                                             Buffers: shared hit=23792 read=32935, temp read=1698 written=1704
                                             ->  Sort  (cost=81821.71..82837.92 rows=406485 width=4) (actual time=126.514..138.221 rows=384599.00 loops=3)
                                                   Sort Key: mc.movie_id
                                                   Sort Method: external merge  Disk: 4584kB
                                                   Buffers: shared hit=8 read=21864, temp read=1698 written=1704
                                                   Worker 0:  Sort Method: external merge  Disk: 4536kB
                                                   Worker 1:  Sort Method: external merge  Disk: 4464kB
                                                   ->  Parallel Hash Join  (cost=5393.42..38393.85 rows=406485 width=4) (actual time=10.192..94.773 rows=384599.00 loops=3)
                                                         Hash Cond: (mc.company_id = cn.id)
                                                         Buffers: shared read=21864
                                                         ->  Parallel Seq Scan on movie_companies mc  (cost=0.00..30052.81 rows=1122881 width=8) (actual time=0.108..28.237 rows=869709.33 loops=3)
                                                               Buffers: shared read=18824
                                                         ->  Parallel Hash  (cost=4767.91..4767.91 rows=50041 width=4) (actual time=10.007..10.007 rows=28281.00 loops=3)
                                                               Buckets: 131072  Batches: 1  Memory Usage: 4416kB
                                                               Buffers: shared read=3040
                                                               ->  Parallel Seq Scan on company_name cn  (cost=0.00..4767.91 rows=50041 width=4) (actual time=0.104..8.026 rows=28281.00 loops=3)
                                                                     Filter: ((country_code)::text = '[us]'::text)
                                                                     Rows Removed by Filter: 50051
                                                                     Buffers: shared read=3040
                                             ->  Sort  (cost=32478.65..32575.53 rows=38749 width=4) (actual time=72.067..73.364 rows=52744.00 loops=3)
                                                   Sort Key: mk.movie_id
                                                   Sort Method: quicksort  Memory: 1537kB
                                                   Buffers: shared hit=23784 read=11071
                                                   Worker 0:  Sort Method: quicksort  Memory: 1537kB
                                                   Worker 1:  Sort Method: quicksort  Memory: 1537kB
                                                   ->  Nested Loop  (cost=444.74..29525.62 rows=38749 width=4) (actual time=2.513..69.487 rows=41840.00 loops=3)
                                                         Buffers: shared hit=23784 read=11071
                                                         ->  Seq Scan on keyword k  (cost=0.00..2685.11 rows=1 width=4) (actual time=0.302..3.581 rows=1.00 loops=3)
                                                               Filter: (keyword = 'character-name-in-title'::text)
                                                               Rows Removed by Filter: 134168
                                                               Buffers: shared hit=2987 read=37
                                                         ->  Bitmap Heap Scan on movie_keyword mk  (cost=444.74..26453.01 rows=38749 width=8) (actual time=2.209..63.804 rows=41840.00 loops=3)
                                                               Recheck Cond: (k.id = keyword_id)
                                                               Heap Blocks: exact=11547
                                                               Buffers: shared hit=20797 read=11034
                                                               ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..435.05 rows=38749 width=0) (actual time=1.266..1.266 rows=41840.00 loops=3)
                                                                     Index Cond: (keyword_id = k.id)
                                                                     Index Searches: 3
                                                                     Buffers: shared hit=80 read=36
                                       ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.002..0.002 rows=1.00 loops=68316)
                                             Index Cond: (id = mk.movie_id)
                                             Index Searches: 68316
                                             Buffers: shared hit=244410 read=28856
                                 ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.44..1.36 rows=38 width=8) (actual time=0.003..0.040 rows=41.46 loops=68316)
                                       Index Cond: (movie_id = mk.movie_id)
                                       Index Searches: 68316
                                       Buffers: shared hit=2278284 read=670543
                           ->  Memoize  (cost=0.44..0.46 rows=1 width=4) (actual time=0.001..0.001 rows=1.00 loops=2832555)
                                 Cache Key: ci.person_id
                                 Cache Mode: logical
                                 Estimates: capacity=71064 distinct keys=71064 lookups=635491 hit percent=88.82%
                                 Hits: 724363  Misses: 200258  Evictions: 119594  Overflows: 0  Memory Usage: 8193kB
                                 Buffers: shared hit=1631500 read=349642
                                 Worker 0:  Hits: 744868  Misses: 209441  Evictions: 128778  Overflows: 0  Memory Usage: 8193kB
                                 Worker 1:  Hits: 745857  Misses: 207768  Evictions: 127108  Overflows: 0  Memory Usage: 8193kB
                                 ->  Index Only Scan using name_pkey on name n  (cost=0.43..0.45 rows=1 width=4) (actual time=0.003..0.003 rows=1.00 loops=617467)
                                       Index Cond: (id = ci.person_id)
                                       Heap Fetches: 0
                                       Index Searches: 617467
                                       Buffers: shared hit=1631500 read=349642
                     ->  Parallel Hash  (cost=15171.59..15171.59 rows=375559 width=20) (actual time=48.749..48.750 rows=300447.33 loops=3)
                           Buckets: 131072  Batches: 8  Memory Usage: 7296kB
                           Buffers: shared hit=282 read=11134, temp written=4116
                           ->  Parallel Seq Scan on aka_name an  (cost=0.00..15171.59 rows=375559 width=20) (actual time=0.095..18.852 rows=300447.33 loops=3)
                                 Buffers: shared hit=282 read=11134
 Planning:
   Buffers: shared hit=837 read=144
 Planning Time: 5.321 ms
 Execution Time: 2764.320 ms
(89 rows)


  [text/plain] 16b_baseline.txt (5.6K, 8-16b_baseline.txt)
  download | inline:
Pager usage is off.
SET
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=5353.71..5353.72 rows=1 width=64) (actual time=10607.642..10607.644 rows=1.00 loops=1)
   Buffers: shared hit=22223141 read=1381834
   ->  Nested Loop  (cost=9.37..5337.88 rows=3165 width=33) (actual time=2.503..9525.673 rows=3710592.00 loops=1)
         Join Filter: (an.person_id = n.id)
         Buffers: shared hit=22223141 read=1381834
         ->  Nested Loop  (cost=8.95..4667.49 rows=1378 width=25) (actual time=2.490..6179.436 rows=2832555.00 loops=1)
               Buffers: shared hit=11897439 read=1107501
               ->  Nested Loop  (cost=8.52..4048.53 rows=1378 width=21) (actual time=2.481..3069.508 rows=2832555.00 loops=1)
                     Join Filter: (ci.movie_id = t.id)
                     Buffers: shared hit=3159885 read=751362
                     ->  Nested Loop  (cost=8.08..3929.09 rows=63 width=29) (actual time=2.245..937.683 rows=68316.00 loops=1)
                           Buffers: shared hit=815660 read=146762
                           ->  Nested Loop  (cost=7.66..3850.15 rows=177 width=33) (actual time=2.224..598.822 rows=148552.00 loops=1)
                                 Join Filter: (mc.movie_id = t.id)
                                 Buffers: shared hit=295838 read=72376
                                 ->  Nested Loop  (cost=7.23..3829.71 rows=34 width=25) (actual time=2.185..494.519 rows=41840.00 loops=1)
                                       Buffers: shared hit=125415 read=54538
                                       ->  Nested Loop  (cost=6.80..3813.14 rows=34 width=4) (actual time=2.177..346.122 rows=41840.00 loops=1)
                                             Buffers: shared hit=24 read=12569
                                             ->  Seq Scan on keyword k  (cost=0.00..2685.11 rows=1 width=4) (actual time=0.382..7.022 rows=1.00 loops=1)
                                                   Filter: (keyword = 'character-name-in-title'::text)
                                                   Rows Removed by Filter: 134168
                                                   Buffers: shared hit=22 read=986
                                             ->  Bitmap Heap Scan on movie_keyword mk  (cost=6.80..1124.98 rows=305 width=8) (actual time=1.794..336.613 rows=41840.00 loops=1)
                                                   Recheck Cond: (k.id = keyword_id)
                                                   Heap Blocks: exact=11547
                                                   Buffers: shared hit=2 read=11583
                                                   ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..6.72 rows=305 width=0) (actual time=0.965..0.965 rows=41840.00 loops=1)
                                                         Index Cond: (keyword_id = k.id)
                                                         Index Searches: 1
                                                         Buffers: shared hit=2 read=36
                                       ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.003..0.003 rows=1.00 loops=41840)
                                             Index Cond: (id = mk.movie_id)
                                             Index Searches: 41840
                                             Buffers: shared hit=125391 read=41969
                                 ->  Index Scan using movie_id_movie_companies on movie_companies mc  (cost=0.43..0.54 rows=5 width=8) (actual time=0.002..0.002 rows=3.55 loops=41840)
                                       Index Cond: (movie_id = mk.movie_id)
                                       Index Searches: 41840
                                       Buffers: shared hit=170423 read=17838
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..0.45 rows=1 width=4) (actual time=0.002..0.002 rows=0.46 loops=148552)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
                                 Rows Removed by Filter: 1
                                 Index Searches: 148552
                                 Buffers: shared hit=519822 read=74386
                     ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.44..1.40 rows=40 width=8) (actual time=0.003..0.029 rows=41.46 loops=68316)
                           Index Cond: (movie_id = mk.movie_id)
                           Index Searches: 68316
                           Buffers: shared hit=2344225 read=604600
               ->  Index Only Scan using name_pkey on name n  (cost=0.43..0.45 rows=1 width=4) (actual time=0.001..0.001 rows=1.00 loops=2832555)
                     Index Cond: (id = ci.person_id)
                     Heap Fetches: 0
                     Index Searches: 2832555
                     Buffers: shared hit=8737554 read=356139
         ->  Index Scan using person_id_aka_name on aka_name an  (cost=0.42..0.46 rows=2 width=20) (actual time=0.001..0.001 rows=1.31 loops=2832555)
               Index Cond: (person_id = ci.person_id)
               Index Searches: 2832555
               Buffers: shared hit=10325702 read=274333
 Planning:
   Buffers: shared hit=550 read=125
 Planning Time: 4.686 ms
 Execution Time: 10607.707 ms
(62 rows)


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
  In-Reply-To: <CAK98qZ0LwJbUoiZjjFXitojHy4UskkjYDiSd_JZfGE9LbfZm9w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox