Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w05rq-001byB-2d for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Mar 2026 22:47:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w05ro-006Vln-0X for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Mar 2026 22:47:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w05rn-006Vle-2M for pgsql-hackers@lists.postgresql.org; Tue, 10 Mar 2026 22:47:56 +0000 Received: from fout-a4-smtp.messagingengine.com ([103.168.172.147]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w05rl-00000001WFJ-32sO for pgsql-hackers@lists.postgresql.org; Tue, 10 Mar 2026 22:47:55 +0000 Received: from phl-compute-11.internal (phl-compute-11.internal [10.202.2.51]) by mailfout.phl.internal (Postfix) with ESMTP id 0D060EC08DE; Tue, 10 Mar 2026 18:47:53 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-11.internal (MEProxy); Tue, 10 Mar 2026 18:47:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1773182873; x=1773269273; bh=2kkUz8fygoerHns7EFgoWpxODUqXJCU7KT0sDIihY0Q=; b= YqabAVE3wFt2tUrnpzTlo6t0gss7sGaH/eXIAsYaC8dm7XzoN6epw1N1Lua2Us8r EuWQUJfZrIXoq8Nl+Nb6cr2CBUU4CHSqtS8huIK0R49uNLb08e8hf0FjoLasGYX8 G56QA7Ll7mp4ubaKcduzIa1QG29EkVF7U2LiiVPlmmSx5z4LIadPfHFQ3zgQ6Sa3 Dxbg5mTXuuQ3UslYYJR81coYcuYutBraHtB72GFVWm57VkRMPu0NERHXtAxoorSj kvG0QevVyArUpqEL7EbSwtiZyCVEr/nVuDBnGc3SZ4LJl8E27T0PeEOjPWrut8jl AogA2bxlhJ3mBWDj4qCw1g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1773182873; x= 1773269273; bh=2kkUz8fygoerHns7EFgoWpxODUqXJCU7KT0sDIihY0Q=; b=4 iX27UxQAreyFMlMpgCxdiGid9WHrall7Bx11oqse97pD6zIHYPx5xHQKNzgFbvJW 16f8FBhyafGE1eE8PjDHYPGc5BI3DNZZQJ+lHx1cj89cNFhiXLGiFxJGS1njbF3n S0xBe/ULnuKRJ4aP4WN7OIV6ICsY8/4OkofB0S2/8K4zujUZLw91V61GHYErC2W0 FqHOVCXJgSDlNKfDTD38P3uXS58SSU/Oj1VGIWdyObaCnZ9iJ5D5Yusymcx1Kche brFFdlRKqiKlqK2G8qNnzCFDE4ZGiiLgqgVHCPnoBBAavmyrjCQieWkz2JgDezHY yN1giSPNb44ru1N13AvcA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvkedvvdeiucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpeeluefgueegfeevvddttddtieegveelkeetgfeuhefhvdfhueetudehhfdt gffgieenucffohhmrghinhepphhoshhtghhrrdgvshenucevlhhushhtvghrufhiiigvpe dtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnhgurhgvshesrghnrghrrgiivghlrdgu vgdpnhgspghrtghpthhtohepuddupdhmohguvgepshhmthhpohhuthdprhgtphhtthhope hpghessghofihtrdhivgdprhgtphhtthhopehknhhiiihhnhhikhesghgrrhhrvghtrdhr uhdprhgtphhtthhopegshigrvhhuiiekudesghhmrghilhdrtghomhdprhgtphhtthhope guihhlihhpsggrlhgruhhtsehgmhgrihhlrdgtohhmpdhrtghpthhtohepmhgvlhgrnhhi vghplhgrghgvmhgrnhesghhmrghilhdrtghomhdprhgtphhtthhopehordgrlhgvgigrnh gurhgvrdhfvghlihhpvgesghhmrghilhdrtghomhdprhgtphhtthhopehrohgsvghrthhm hhgrrghssehgmhgrihhlrdgtohhmpdhrtghpthhtohepthhhohhmrghsrdhmuhhnrhhose hgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrsheslhhishht shdrphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 10 Mar 2026 18:47:52 -0400 (EDT) Date: Tue, 10 Mar 2026 18:47:52 -0400 From: Andres Freund To: Peter Geoghegan Cc: Tomas Vondra , Alexandre Felipe , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: References: <52512325-b1f2-4fff-819e-f68122b2e427@vondra.me> <64mfcfv7iihc4pmqlxarii4esnmqry52ckz5m7lmwylnfnuxuz@oxh4ioxkjtep> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-03-10 16:57:35 -0400, Peter Geoghegan wrote: > On Fri, Feb 27, 2026 at 6:52 PM Andres Freund wrote: > > This is a huge change. Is there a chance we can break it up into more > > manageable chunks? > > Attached is v12, which has revisions that address most of your > feedback items. It also includes items that address problems that I > noticed during performance validation work. > > Highlights: > > * Substantial revisions that give table AMs and index AMs direct > control over batch layout -- without giving up on batch > recycling/caching. This is essentially what you (Andres) requested > because the design from v11 was not sufficiently AM agnostic. In > particular: > > - Table AMs now control the size and layout of visibility information > (in practice heapam uses this to store per-item visibility state from > the visibility map). > > - Index AMs have their own opaque state for things like sibling link > block numbers, avoiding the assumption that other index AMs supporting > amgetbatch will need to work like nbtree and hash as regards how they > navigate to the next index page/index keyspace associated with each > batch. Nice! > * No more read stream yielding. Numerous new patches from Andres are > now included, which helps with this. In particular, "WIP: read_stream: > Only increase distance when waiting for IO" fixes the problematic > regression in an adversarial query -- the one that prompted me to > invent yielding in the first place. As a result of all this, the read > stream callback added by the prefetching commit itself is now > substantially simpler than it was in v11. Yay. > * There are now a couple of extra patches created by breaking things > into more distinct commits. Namely, there's a new "heapam: Track heap > block in IndexFetchHeapData using xs_blk" commit, as well as a new > "Make IndexScanInstrumentation a pointer in executor scan nodes" > commit. Yay^2. > * Moreover, some commits now appear in a slightly different order, > prioritizing work closer to being committable; those commits now come > first. Yay^3. > * New commit "Use simple hash for PrivateRefCount" addresses some of > the problems we were seeing with PrivateRefCount performance. This > generic optimization addresses an existing problem that would > otherwise be much worse with the index prefetching work in place. Let's get that in soon. Alexandre Felipe posted an implementation of this in https://postgr.es/m/CAE8JnxNTETEUiAOF31%3D_yo%3DpvyAi9npOeJfcTvEJJbi4vomtYA%40mail.gmail.com I don't agree with many of the other changes, but the simplehash conversion contains an interesting piece - the ability to avoid the status field. I'd encourage Alexandre to upstream that separately from this thread (and also separately from the rest of the patches in the above thread). > However, I have NOT yet acted on a few feedback items from Andres: > > * I still don't know what Andres meant about requiring table AMs to > free batch index page buffer pins representing a modularity violation. > I don't see how we can reasonably avoid it while still preserving the > guarantees needed to safely drop buffer pins eagerly during index-only > scans that require prefetching. > > * I'm also not at all sure what Andres meant about index AMs like hash > not holding onto their own buffer pins, given that prefetching uses a > read stream sensitive to the number of buffer pins the backend holds. I tried to respond in https://postgr.es/m/vbb4naf2tvm2tm7yoml54pzvrmn77p4nvq4awfa4wufc3hn7qx%40mof5q6li3xzv to explain my concerns / what I think needs to happen. Greetings, Andres Freund