Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ulwv2-007iGI-TS for pgsql-hackers@arkaria.postgresql.org; Tue, 12 Aug 2025 21:52:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ulwuz-00A1EL-HK for pgsql-hackers@arkaria.postgresql.org; Tue, 12 Aug 2025 21:52:29 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ulwuz-00A1Dx-53 for pgsql-hackers@lists.postgresql.org; Tue, 12 Aug 2025 21:52:29 +0000 Received: from relay15.mail.gandi.net ([217.70.178.235]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1ulwuv-000QIu-0c for pgsql-hackers@lists.postgresql.org; Tue, 12 Aug 2025 21:52:27 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 41B8143142; Tue, 12 Aug 2025 21:52:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1755035543; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Pcb/W739ctBWqocYba5Ez9u7JsXvxn+1ySd1r5f/9Cg=; b=BjpMHojnVZSg6MY9ym5AUkBReicuzWfKENsSoF8dWkqVkPdSEcSwT5i1JjEGsked96d6B5 hT6OhZGyKibox81ZDUi/mlIGzx5PdQk5AdbrFwpPQ46qm0XoqSRityycaJtwuyM/+EF4Z3 Dxze7yjy8s1QTpNPxxoM5SG7VUZ3xhppaKu7mj7HH4szCkP71Cf6ppyUt51Xo+uq/VdnWT RBR6BosMuLZopp7Kulx8wQ8x8pyVMUUSdjOoUnV71oly2ddysXSskV98wA/0Sr78ctPx2J IASuh5z5MJHBzJ0iZjQwrYLN+DzcJ9BFEFf/itqO8f9ZsVqq07mC2LaFKjMRcQ== Message-ID: <51b5f71b-5f19-4453-91ff-2b9f2a840c58@vondra.me> Date: Tue, 12 Aug 2025 23:52:17 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: index prefetching To: Peter Geoghegan , Thomas Munro Cc: Andres Freund , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar References: <8ed1d326-5c6e-476e-b3fd-30d3da210546@vondra.me> <38b865bd-2ae9-4a94-a788-6e3dc99ccd70@vondra.me> <306fc8c0-c882-4602-86f5-a106b9ace603@vondra.me> <152ea782-5bd4-4435-b021-0ab2da61e63d@vondra.me> <7c2f6350-6fca-4e39-b0a8-8ac735f5d58a@vondra.me> <2e63cadd-2a03-46b1-866e-7ea5d3ffd37f@vondra.me> Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-GND-State: clean X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddufeeigeeiucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfevfhfhjggtgfesthejredttddvjeenucfhrhhomhepvfhomhgrshcugghonhgurhgruceothhomhgrshesvhhonhgurhgrrdhmvgeqnecuggftrfgrthhtvghrnhepledugeeikefglefhgfffuedvleetteevgefhvdeikeefudduuddvhfevudefhfevnecukfhppeekiedrgeelrddvfedtrddvtdeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepkeeirdegledrvdeftddrvddtiedphhgvlhhopegluddtrddufeejrddtrddvngdpmhgrihhlfhhrohhmpehtohhmrghssehvohhnughrrgdrmhgvpdhnsggprhgtphhtthhopedutddprhgtphhtthhopehpghessghofihtrdhivgdprhgtphhtthhopehthhhomhgrshdrmhhunhhrohesghhmrghilhdrtghomhdprhgtphhtthhopegrnhgurhgvshesrghnrghrrgiivghlrdguvgdprhgtphhtthhopegshigrvhhuiiekudesghhmrghilhdrtghomhdprhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilhdrtghomhdprhgtphhtt hhopehpghhsqhhlqdhhrggtkhgvrhhssehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrghdprhgtphhtthhopehgkhhokhholhgrthhoshesphhrohhtohhnmhgrihhlrdgtohhm List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 8/12/25 23:22, Peter Geoghegan wrote: > ... > > It looks like the patch does significantly better with the forwards scan, > compared to the backwards scan (though both are improved by a lot). But that's > not the main thing about these results that I find interesting. > > The really odd thing is that we get "shared hit=6619 read=49933" for the > forwards scan, and "shared hit=10350 read=49933" for the backwards scan. The > latter matches master (regardless of the scan direction used on master), while > the former just looks wrong. What explains the "missing buffer hits" seen with > the forwards scan? > > Discrepancies > ------------- > > All 4 query executions agree that "rows=1048576.00", so the patch doesn't appear > to simply be broken/giving wrong answers. Might it be that the "Buffers" > instrumentation is broken? > I think a bug in the prefetch patch is more likely. I tried with a patch that adds various prefetch-related counters to explain, and I see this: test=# EXPLAIN (ANALYZE, VERBOSE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a; QUERY PLAN ------------------------------------------------------------------------ Index Scan using idx on public.t (actual time=0.682..527.055 rows=1048576.00 loops=1) Output: a, b Index Cond: ((t.a >= 16336) AND (t.a <= 49103)) Index Searches: 1 Prefetch Distance: 271.263 Prefetch Count: 60888 Prefetch Stalls: 1 Prefetch Skips: 991211 Prefetch Resets: 3 Prefetch Histogram: [2,4) => 2, [4,8) => 8, [8,16) => 17, [16,32) => 24, [32,64) => 34, [64,128) => 52, [128,256) => 82, [256,512) => 60669 Buffers: shared hit=5027 read=50872 I/O Timings: shared read=33.528 Planning: Buffers: shared hit=78 read=23 I/O Timings: shared read=2.349 Planning Time: 3.686 ms Execution Time: 559.659 ms (17 rows) test=# EXPLAIN (ANALYZE, VERBOSE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a DESC; QUERY PLAN ------------------------------------------------------------------------ Index Scan Backward using idx on public.t (actual time=1.110..4116.201 rows=1048576.00 loops=1) Output: a, b Index Cond: ((t.a >= 16336) AND (t.a <= 49103)) Index Searches: 1 Prefetch Distance: 271.061 Prefetch Count: 118806 Prefetch Stalls: 1 Prefetch Skips: 962515 Prefetch Resets: 3 Prefetch Histogram: [2,4) => 2, [4,8) => 7, [8,16) => 12, [16,32) => 17, [32,64) => 24, [64,128) => 3, [128,256) => 4, [256,512) => 118737 Buffers: shared hit=30024 read=50872 I/O Timings: shared read=581.353 Planning: Buffers: shared hit=82 read=23 I/O Timings: shared read=3.168 Planning Time: 4.289 ms Execution Time: 4185.407 ms (17 rows) These two parts are interesting: Prefetch Count: 60888 Prefetch Skips: 991211 Prefetch Count: 118806 Prefetch Skips: 962515 It looks like the backwards scan skips fewer blocks. This is based on the lastBlock optimization, i.e. looking for runs of the same block number. I don't quite see why would it affect just the backwards scan, though. Seems weird. > The premise of my original complaint was that big inconsistencies in performance > shouldn't happen between similar forwards and backwards scans (at least not with > direct I/O). I now have serious doubts about that premise, since it looks like > OS readahead remains a big factor with direct I/O. Did I just miss something > obvious? > I don't think you missed anything. It does seem the assumption relies on the OS handling the underlying I/O patterns equally, and unfortunately that does not seem to be the case. Maybe we could "invert" the data set, i.e. make it "descending" instead of "ascending"? That would make the heap access direction "forward" again ... regards -- Tomas Vondra