Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5D3H-0030kD-2g for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 01:28:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5D3G-00AAPq-1F for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 01:28:54 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5D3F-00AAPa-36 for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 01:28:54 +0000 Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w5D3D-00000000ydx-2f68 for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 01:28:54 +0000 Received: by mail-wm1-x32f.google.com with SMTP id 5b1f17b1804b1-486fba7ce4cso18235245e9.3 for ; Tue, 24 Mar 2026 18:28:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774402130; cv=none; d=google.com; s=arc-20240605; b=B3pqPCuIyeR0nJvSZjmQz49lLTGNCay7XI+Fb/f1dE/e/b8EdPEyrjV6CeYwPjyIHY ogziF5121h7pyBvcrYsCaBt2MGHd9SejA4I7+eKOstEacFslypIC6hmH3zSCbYrbyhpV IITF0acWVx9cmrAEppn3nMUR4X69/CkrFTnSv1yUKNlnLJ0rumC6FlUMG0m7WoxbSUMF uhSsWkJ9/Q74KaAQ4waz7jJd4z0hMlX5kkaK34pNlWCAcwgCIQ8/fZa7hRrpEkQXSdht 899Ed+gjZ5E0B4tP5AFltS6qApgx+HKKJKjwKOywRYolez7KvRLp5K+Brem4VtER9we4 Xbiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ar7p4lSMoxwaxq6N1tkkgqkscxyHiJk0H//DoLz+BzQ=; fh=kR9PJTn2czNKlhZ6hv/reSp7XrFn78EUOPft5u2rwSM=; b=EQWeAOrwe0Z48BbxzNevwImurZlQr/+08IPEOE7AsI69P3wr8kK4Zq4uAFDyFnbgLo /80CWCpu/6imgU56T1CCSxMvkH0lRcTII3smbdDVEwHBEgj53FBfUrBqdJhQYRt8cJ8a QRLjiUG2IfXzceHBvkav3DufcIX8Kh4vyja5C/MCt9GMaQjm6yfr1Adb6gHQFC0L8PW1 hMbGPRkDyVJ5LfmxlFtP0BN9EHLZo/MFZnTJgBekYHy1C08VeZCGH3M+h5Uo31THw1z+ 9RencwoTxoxLM2du+jeVAcgp0hah8TmIkaDE+rUtCuVYQtsQmRmZY71OVGpIcXTJHi1a z+lg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1774402130; x=1775006930; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ar7p4lSMoxwaxq6N1tkkgqkscxyHiJk0H//DoLz+BzQ=; b=L2DKBqVSjDJdlBsrosT9wvV0p3aSBG1X7StlXgflsyTv19t0auIFn0OjgdK4C2oV3Z kUh5+JW6/aLHG6Zm37ALhW39zpGhwUpnOS876+osN/LcPHW89JxLiig96VU0ol6Wdikg EGntxfMUC1lDCPI8oU/mQZ32/a+6FA9l1v0EjCtWwQ1BpBg6iCsfF6yfVKNHgRCFtw5d nsMeLqBectv5MIgFWYO4yWyMmqYloZWljE4EXnvXjaL7kr5QBBZHO5YMIVCzy71rZpO3 xWVVSzPvglXT3t/tF02GuWohZyPX+jvYy4WdPLGdtHQOUFRPkPltlgZzxiXBGjNZ58Ly MHmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774402130; x=1775006930; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ar7p4lSMoxwaxq6N1tkkgqkscxyHiJk0H//DoLz+BzQ=; b=b+eDYSw9wGyzhmiXaYrLlvof7yolDkXzlR6lOMEfzhcfFMiBHM5BSVgkmkV/bPTy4h HRBlBNDhIxs6EHrbXY/lfathUpXEtjsNzysHIINgWl3pqI/11NY2HYtFwTOnjychFbDC z4uN4HRNVtgRNzks7gcwgnqaXLlXVlFrwCZka4kJ/vRIyCc/hdETnqcjO4sGVVQVdVnE FR8KomuTTxEyZtXwAd8c3JlCWvDJGtxnPqEce2Xdmx0axQ3icY6wvLh5foL8FnK0E4gw qxxx0dzPr3WSyV/BB8E5jSpprRyXfHiJJGP28lWJ1ttJX1avjCuQcmiG/orLm22tjLik CrQw== X-Forwarded-Encrypted: i=1; AJvYcCXv7ocBm+8aCYPzfiMIBdiceGHZf1nibWJ8g4vYuiXuZd501JvMuFn8MAyRc9jpUIjFcOftzw6zgdTPYImk@lists.postgresql.org X-Gm-Message-State: AOJu0YwwvSpYz4g+biAM1IfiVt0iQVglzWJ6Qm6zmCYnVtMUXLd6lsqX By7G8Zxt+jt6igeVsrXZDiIr09vaHRmyR/I/dzXT/NfNSf3SID3MOL46elu0cfRvehiwr+DdoD0 PmT/y1gm8/0RSOZ2MutD3mSh5aQlSNU2vue+MIvEdQA== X-Gm-Gg: ATEYQzw1GBmL6eOHcTJUinyBRB0cSlCzuRUEHddD8GgEAxgXcMaxl7si1/8uuNRm/75 c4Mj3mczR/q0mUM1GzHl2Ob6oY+iSnIgN9Pmbe3oKAL9p4iCbKKCPXdLf11Z7JFi8EmiLeOdd/T wZ7w0ZOHYzhOaOxdRS/Bg/smb/fIrcLbjEtzmigYBxiQf7ZZ078u5df84uz+wgE4jzjMWGmVvNw SRSCQ7AtSShjgKusDDyLAwhrjuNbkaWiuZXMbYhIiQmoNiyoPNvgNNEVLYGcQ9mw5DxCxZJHKlX H7mGvmP2GTRnziCzVeJQD57qV/YrafhedThV+qaaXGw+GOtCOd1vNd4JQ/641535 X-Received: by 2002:a05:600c:8011:b0:485:3f65:94a1 with SMTP id 5b1f17b1804b1-4871605c226mr22117355e9.18.1774402129767; Tue, 24 Mar 2026 18:28:49 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Peter Geoghegan Date: Tue, 24 Mar 2026 21:28:23 -0400 X-Gm-Features: AQROBzD35wTf5JD92_IvRaZQscPXtW4gCU6RdxARAVtAsjPAN7_BlUN01o5lz9M Message-ID: Subject: Re: index prefetching To: Andres Freund Cc: Tomas Vondra , Alexandre Felipe , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Mar 24, 2026 at 12:26=E2=80=AFPM Andres Freund = wrote: > > > > This is preparatory work for an upcoming commit that will need xs_b= lk > > > > to manage buffer pin transfers between the scan and the executor sl= ot. > > > > > > A subsequent commit adds an earlier ExecClearTuple(slot) to make the = buffer > > > refcount decrement cheaper (due to hitting the one-element cache in > > > bufmgr.c). I wonder if it's worth pulling that into this commit? Most= ly to > > > make that larger commit smaller. > > > > Is it really worth doing that without also doing the > > xs_lastinblock/ExecStorePinnedBufferHeapTuple stuff? We need batches > > to do the latter. > > I see perf benefits from it alone, yes. Does that mean it should be in a separate commit? The upcoming v18 will significantly change the patch set's structure by breaking up the big patch so that the slot interface changes are in their own commit. Along with pushing the VM accesses down into heapam. That feels like a big enough change on its own. There is bound to be a v19, so to me it makes sense to defer a decision on this kind of thing until you see v18. > > > The commit message doesn't mention that this affects ammarkpos/amrest= rpos. > > > > It does not. But FWIW there's a prominent "Note" about it in the SGML d= ocs. > > Approximately nobody looking at the commit to see what they need to chang= e > will see that... In any case v17 updated the commit message to point out the removal of ammarkpos/amrestrpos. > > No, I didn't try. I doubt saving space is worthwhile, since we'll need > > a relatively large allocation for batches used during index-only scans > > regardless. > > Seems fine to not care for now. But, FWIW, the motivating reason wouldn't= be > to to really save memory, it'd be to make it more likely the data fits in= to a > higher level of the cache. Understood. > > We're now advertising that indexam_util_batch_unlock is optional, and > > that index AMs can go there own way when needed. > > Fair enough. Cool. > > > > + > > > > + /* > > > > + * heap blocks fetched counts (incremented by index_getnext_s= lot calls > > > > + * within table AMs, though only during index-only scans) > > > > + */ > > > > + uint64 nheapfetches; > > > > } IndexScanInstrumentation; > > > > > > s/heap/table/ for anything new imo. > > > > Usually I'd agree, but here we're using the same name as the one > > presented in EXPLAIN ANALYZE. IMV this should match that. (Maybe the > > EXPLAIN ANALYZE output should change, and this field name along with > > it, but that's another discussion entirely.) > > I think if we add it into more and more places it'll get harder and harde= r to > eventually fix... Okay, I'll make this change for v18. > Code reuse and making it easier for other AMs to adapt this... Of course, that was what I meant. > > > I also suspect it'd be worth creating a new heapam.c file for this ne= w > > > code. heapam_index.c or such. > > > > I had thought about that myself. How would that be structured, in > > terms of the commits? > > I'm imagining something like heapam_iscan.c or heapam_indexfetch.c or suc= h. > I'd introduce it by adding it in a commit that moves heap_hot_search_buff= er(), > and heapam_index_fetch_{begin,reset,end}() into it. > > Moving heap_hot_search_buffer() into the same file will be nice because i= t'll > allow partial inlining of it into some really performance sensitive funct= ions. I'm not opposed, of course. But let's leave that question until after I post v18. > I think we should move the seq/tid scan stuff into its own file too, but > that's obviously a separate thread. Makes sense. > > > Any reason this isn't in index_beginscan_internal(), given both > > > index_beginscan() and index_beginscan_parallel() need it? I realize = you'd > > > need to add arguments to index_beginscan_internal(), but I don't see = a problem > > > with that. Alternatively a helper for this seems like a possibility = too. > > > > Fixed, by moving much more of the initialization done by each variant > > (index_beginscan, index_beginscan_bitmap, index_beginscan_parallel) > > into index_beginscan_internal itself. > > Nice. Haven't checked out your new version yet - are you doing that as a > separate commit? Maybe. Again, let's see how things shake out in v18, and then revisit. > I don't think we necessarily need the coverage of your full torture test = suite > in core, but I feel some basic sanity tests really ought to be in the cor= e > tests. There's very little, from what I can tell. Even just making sure = we > have coverage for a index [only] scans going forward and backward, and th= e > same for merge & nestloop joins would be quite a win. We probably have that level of coverage. What we lack is coverage for things that are actually tricky. For example, the regression tests lack coverage for an nbtree scan that uses a scrollable cursor + an SAOP scan key that backs up across a page boundary, relying directly on a call to the new amposreset routine for correct behavior. Nor do we have the equivalent SAOP merge join that restores a mark across a page/batch boundary (note that we can safely skip the amposreset in the restore "scanBatch =3D=3D markBatch" happy path). The underlying difficulty is that these things tend to "acidentally fail to fail". Some of these cases were rather difficult to write any kind of test for, even for my own purposes. > > IOW, when we're called through index_batchscan_end (specifically, when > > we're called through index_batchscan_reset when it is called by > > index_batchscan_end) we don't want to use the cache. It seemed to make > > sense to implement this in a way that didn't require any special > > handling from within indexam_util_batch_release (since it's not a > > concern of index AMs). > > I am not really following. Right now there's two close copies of this co= de: > Yes, one of them has the additional "if (scan->xs_heapfetch)" condition, = but > that hardly seems like a real problem preventing sharing the code. I'll try to address this for v18. Thanks --=20 Peter Geoghegan