Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1urifX-00BDEK-AM for pgsql-hackers@arkaria.postgresql.org; Thu, 28 Aug 2025 19:52:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1urifU-004b19-IE for pgsql-hackers@arkaria.postgresql.org; Thu, 28 Aug 2025 19:52:21 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1urifT-004b10-Io for pgsql-hackers@lists.postgresql.org; Thu, 28 Aug 2025 19:52:20 +0000 Received: from fout-b2-smtp.messagingengine.com ([202.12.124.145]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1urifR-002NCH-0K for pgsql-hackers@lists.postgresql.org; Thu, 28 Aug 2025 19:52:19 +0000 Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfout.stl.internal (Postfix) with ESMTP id 123BB1D000DA; Thu, 28 Aug 2025 15:52:14 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Thu, 28 Aug 2025 15:52:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1756410733; x=1756497133; bh=tPO1hwtR2c Mf0MPsUPbYmjjuzpURr5QR/lDwLxS9mnU=; b=Wqw+CB0cew8/1o/hbpFCHe4xtP HRXkaVZkCbIP9U34DLweZxMa1PSDlHE7hf2b84icOEH3nKULnPPQ6tyAy7941APQ 6jcol0qV62jq4OVEjnMCCoCqt/1GgQL8remN6kp55tdWyP2ucscQOxDNdVkXk0h2 gTfghAC5obm51qwBAwtCN3SBoHYWrH2hAmA+rr2yYGfGF8G0FbjyE1O4tUQLEzPA ka0glalSyOfbKPvFj/hQRqSzwRrfT+za6kgE51vf2bDtnfTUFM6S+9tQ3rnjfi69 AsCMNxzBhq9pvygRATuw1VcMsHL483kCcDqqSDlt8EATksm90ZSBWjZYsN+A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1756410733; x=1756497133; bh=tPO1hwtR2cMf0MPsUPbYmjjuzpURr5QR/lD wLxS9mnU=; b=CAsidlIHaSYGgiaDaZOseaMVdQvDZRn2UsiONgMpMEw9HoPPFAx gpzh+Z7ixxnZyieNYT2ZmLeVMTnk1oBWSWjmwmmIz4qK9zLquXrOe7uoy4dvKwUq UcJHcoWDrTz1At+yv2elNQVW9BQotvMWeMFVSwS2S/F3YzDFFScoPxv1LJim/nG9 Jd5Ow/cOAw9b/ew+sAkOlB+PBXvO27KDVqTuQ5wMAiRO6vQoSVyqUDumAF9lhMDS ZWTmDlgvijuqMubCOSXDkg6bQNKGj0hsut5YhiGMAa83/3aHROhtYDFTx1OswLhu 4g4bvTIbjFyYLrXj2FjTH6B4CzOeUc70G0g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddukedukeekucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggujgesthdtsfdttddtvdenucfhrhhomheptehnughrvghs ucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecuggftrfgrth htvghrnhepfeffgfelvdffgedtveelgfdtgefghfdvkefggeetieevjeekteduleevjefh ueegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprg hnughrvghssegrnhgrrhgriigvlhdruggvpdhnsggprhgtphhtthhopedutddpmhhouggv pehsmhhtphhouhhtpdhrtghpthhtohepphhgsegsohifthdrihgvpdhrtghpthhtohepkh hnihiihhhnihhksehgrghrrhgvthdrrhhupdhrtghpthhtohepsgihrghvuhiikedusehg mhgrihhlrdgtohhmpdhrtghpthhtohepughilhhiphgsrghlrghuthesghhmrghilhdrtg homhdprhgtphhtthhopehmvghlrghnihgvphhlrghgvghmrghnsehgmhgrihhlrdgtohhm pdhrtghpthhtoheprhhosggvrhhtmhhhrggrshesghhmrghilhdrtghomhdprhgtphhtth hopehthhhomhgrshdrmhhunhhrohesghhmrghilhdrtghomhdprhgtphhtthhopehpghhs qhhlqdhhrggtkhgvrhhssehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrghdprhgtph htthhopehgkhhokhholhgrthhoshesphhrohhtohhnmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 28 Aug 2025 15:52:12 -0400 (EDT) Date: Thu, 28 Aug 2025 15:52:12 -0400 From: Andres Freund To: Tomas Vondra Cc: Peter Geoghegan , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: <5pltwb73d7cynsxo2yb54ygjk7haviatkrx43mnzihc6kkield@ahnstpgof46i> References: <5v2wuxg65l5e3s6uf373zskcqqoukmraxiucnvgn4t7b5cmeqx@5mhqsurdj6xn> <6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev> <0dd33755-cab8-49c8-b1ed-698732577fbb@vondra.me> <1c9302da-c834-4773-a527-1c1a7029c5a3@vondra.me> <6d59c277-c440-4d1f-a46e-157958c06a5f@vondra.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6d59c277-c440-4d1f-a46e-157958c06a5f@vondra.me> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-08-28 19:08:40 +0200, Tomas Vondra wrote: > On 8/28/25 18:16, Andres Freund wrote: > >> So I think the IPC overhead with "worker" can be quite significant, > >> especially for cases with distance=1. I don't think it's a major issue > >> for PG18, because seq/bitmap scans are unlikely to collapse the distance > >> like this. And with larger distances the cost amortizes. It's much > >> bigger issue for the index prefetching, it seems. > > > > I couldn't keep up with all the discussion, but is there actually valid I/O > > bound cases (i.e. not ones were we erroneously keep the distance short) where > > index scans end can't have a higher distance? > > > > I don't know, really. > > Is the presented exaple really a case of an "erroneously short > distance"? I think the query isn't actually measuring something particularly useful in the general case. You're benchmarking something were the results are never looked at - which means the time between two index fetches is unrealistically short. That means any tiny latency increase matters a lot more than with realistic queries. And this is, IIUC, on a local SSD. I'd bet that on cloud latencies AIO would still be a huge win. > From the 2x regression (compared to master) it might seem like that, but > even with the increased distance it's still slower than master (by 25%). So > maybe the "error" is to use AIO in these cases, instead of just switching to > I/O done by the backend. If it's slower at a higher distance, we're missing something. > It may be a bit worse for non-btree indexes, e.g. for for ordered scans > on gist indexes (getting the next tuple may require reading many leaf > pages, so maybe we can't look too far ahead?). Or for indexes with > naturally "fat" tuples, which limits how many tuples we see ahead. I am not worried at all about those cases. If you have to read a lot of index leaf pages to get a heap fetch, a distance of even just 2 will be fine, because the IPC overhead is a neglegible cost compared to the index processing. Similarly, if you have to do very deep index traversals due to wide index tuples, there's going to be more time between two table fetches. > > Obviously you can construct cases with a low distance by having indexes point > > to a lot of tiny tuples pointing to perfectly correlated pages, but in that > > case IO can't be a significant factor. > > > > It's definitely true the examples the script finds are "adversary", but > also not entirely unrealistic. I think doing index scans where the results are just thrown out are entirely unrealistic... > I suppose there will be such cases for any heuristics we come up with. Agreed. > There's probably more cases like this, where we end up with many hits. > Say, a merge join may visit index tuples repeatedly, and so on. But then > it's likely in shared buffers, so there won't be any IPC. Yea, I'd not expect a meaningful impact of any of this in a workload like that. Greetings, Andres Freund