Message-ID: <a652515e-278a-4838-815b-ecd2ad4495f6@vondra.me>
Date: Mon, 2 Feb 2026 00:54:14 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: New access method for b-tree.
To: Alexandre Felipe <alexandre.felipe@tpro.io>,
 "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
References: <AS1PR02MB784695AFEC37179FFAF7EAE19A9DA@AS1PR02MB7846.eurprd02.prod.outlook.com>
Content-Language: en-US
From: Tomas Vondra <tomas@vondra.me>
In-Reply-To: <AS1PR02MB784695AFEC37179FFAF7EAE19A9DA@AS1PR02MB7846.eurprd02.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Archived-At: <https://www.postgresql.org/message-id/a652515e-278a-4838-815b-ecd2ad4495f6%40vondra.me>
Precedence: bulk

Hello Felipe,

On 2/1/26 11:02, Alexandre Felipe wrote:
> Hello Hackers,
> 
> Please check this out,
> 
> It is an access method to scan a table sorting by the second column of
> an index, filtered on the first.
> Queries like
> SELECT x, y FROM grid
> WHERE x in (array of Nx elements)
> ORDER BY y, x
> LIMIT M
> 
> Can execute streaming the rows directly from disk instead of loading
> everything.
> 
> Using  btree index on (x, y)
> 
> On a grid with N x N will run by fetching only what is necessary
> A skip scal will run with O(N * Nx) I/O, O(N x Nx) space, O(N x Nx *
> log( N * Nx)) comput (assuming a generic in memory sort)
> 
> The proposed access method does it O(M + Nx) I/O, O(Nx) space, and O(M *
> log(Nx)) compute.
> 

So how does this compare to skip scan in practice? It's hard to compare,
as the patch does not implement an actual access path, but I tried this:

  CREATE TABLE merge_test_int (
    prefix_col int4,
    suffix_col int4
  );

  INSERT INTO merge_test_int
  SELECT p, s
  FROM generate_series(1, 10000) AS p,
       generate_series(1, 1000) AS s;
  CREATE INDEX merge_test_int_idx
      ON merge_test_int (prefix_col, suffix_col);

and then

1) master

  SELECT * FROM merge_test_int
  WHERE prefix_col IN (1,3,4,5,6,7,8,9,10,11,12,13,14,15)
    AND suffix_col >= 900
  ORDER BY suffix_col LIMIT 100;

vs.

2) merge scan

  SELECT * FROM test_btree_merge_scan_int(

      'merge_test_int',
      'merge_test_int_idx',
      ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
      900,
      100);

And with explain analyze we get this:

1) Buffers: shared hit=26 read=25

2) Buffers: shared hit=143 read=17

So it seems to access many more buffers, even if the number of reads is
lower. Presumably the merge scan is not always better than skip scan,
probably depending on number of prefixes in the query etc. What is the
cost model to decide between those two?

If you had to construct the best case and worst cases (vs. skip scan),
what would that look like?

I'm also wondering how common is the targeted query pattern? How common
it is to have an IN condition on the leading column in an index, and
ORDER BY on the second one?


regards

-- 
Tomas Vondra