Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmhGs-00FOCP-2a for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Feb 2026 23:54:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vmhGo-00Akvg-2M for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Feb 2026 23:54:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmhGo-00AkvY-14 for pgsql-hackers@lists.postgresql.org; Sun, 01 Feb 2026 23:54:23 +0000 Received: from relay4-d.mail.gandi.net ([2001:4b98:dc4:8::224]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vmhGm-000000006NB-20Kw for pgsql-hackers@postgresql.org; Sun, 01 Feb 2026 23:54:21 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 9DE9843350; Sun, 1 Feb 2026 23:54:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1769990055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B++rdM2ToC2HZzY6tt6cqXK46DS0snaqawd29kmYbUg=; b=fQM7gJV6lLFqtfCljTaOSX+EEn90ntin16Xy0vJXnbAPFUz9lsSeU/MEOmzrSE46QidmZr G+8D4B9VcaDJUAtoD71TdLIxb6yRj9562jlHHLGpPTI65/+2vT1zpZZYz1RwAum6dAO3V+ EEyWy2/fJ/xKVPTPlsXhq+Y1s8o4QVgdJS+Vq9LsXVlbpvVuK8cfEI5QKDHf94cJiSjfll noNFn2oo3Vf0iUSC9e9Y3MeY5zOmc6YFioGTAExrQp9SxBT655FlErmIEsJGnTl0B8wD4n J5D5ScbxOG+Ucpz3a8WOS/mszz3J7VQkgg4coBGU+++7OfKdcIwYN0jcHn4XBA== Message-ID: Date: Mon, 2 Feb 2026 00:54:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: New access method for b-tree. To: Alexandre Felipe , "pgsql-hackers@postgresql.org" References: Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-GND-Sasl: tomas@vondra.me X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddujeeiudeiucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfhfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpefvohhmrghsucggohhnughrrgcuoehtohhmrghssehvohhnughrrgdrmhgvqeenucggtffrrghtthgvrhhnpeekffdvudegteefieelffetkeelffeggffhuefffefhleekleethfefieeggfffkeenucfkphepvddufedrvdegiedrvdefkedrudekudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpedvudefrddvgeeirddvfeekrddukedupdhhvghloheplgdutddrudefjedrtddrudekngdpmhgrihhlfhhrohhmpehtohhmrghssehvohhnughrrgdrmhgvpdhqihgupeelfffgleekgeeffeehtddpmhhouggvpehsmhhtphhouhhtpdhnsggprhgtphhtthhopedvpdhrtghpthhtoheprghlvgigrghnughrvgdrfhgvlhhiphgvsehtphhrohdrihhopdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrshesphhoshhtghhrvghsqhhlrdhorhhg X-GND-State: clean List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello Felipe, On 2/1/26 11:02, Alexandre Felipe wrote: > Hello Hackers, > > Please check this out, > > It is an access method to scan a table sorting by the second column of > an index, filtered on the first. > Queries like > SELECT x, y FROM grid > WHERE x in (array of Nx elements) > ORDER BY y, x > LIMIT M > > Can execute streaming the rows directly from disk instead of loading > everything. > > Using  btree index on (x, y) > > On a grid with N x N will run by fetching only what is necessary > A skip scal will run with O(N * Nx) I/O, O(N x Nx) space, O(N x Nx * > log( N * Nx)) comput (assuming a generic in memory sort) > > The proposed access method does it O(M + Nx) I/O, O(Nx) space, and O(M * > log(Nx)) compute. > So how does this compare to skip scan in practice? It's hard to compare, as the patch does not implement an actual access path, but I tried this: CREATE TABLE merge_test_int ( prefix_col int4, suffix_col int4 ); INSERT INTO merge_test_int SELECT p, s FROM generate_series(1, 10000) AS p, generate_series(1, 1000) AS s; CREATE INDEX merge_test_int_idx ON merge_test_int (prefix_col, suffix_col); and then 1) master SELECT * FROM merge_test_int WHERE prefix_col IN (1,3,4,5,6,7,8,9,10,11,12,13,14,15) AND suffix_col >= 900 ORDER BY suffix_col LIMIT 100; vs. 2) merge scan SELECT * FROM test_btree_merge_scan_int( 'merge_test_int', 'merge_test_int_idx', ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 900, 100); And with explain analyze we get this: 1) Buffers: shared hit=26 read=25 2) Buffers: shared hit=143 read=17 So it seems to access many more buffers, even if the number of reads is lower. Presumably the merge scan is not always better than skip scan, probably depending on number of prefixes in the query etc. What is the cost model to decide between those two? If you had to construct the best case and worst cases (vs. skip scan), what would that look like? I'm also wondering how common is the targeted query pattern? How common it is to have an IN condition on the leading column in an index, and ORDER BY on the second one? regards -- Tomas Vondra