Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w47Ro-001spK-3A for pgsql-hackers@arkaria.postgresql.org; Sun, 22 Mar 2026 01:17:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w47Rl-00CBwo-33 for pgsql-hackers@arkaria.postgresql.org; Sun, 22 Mar 2026 01:17:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w47Rl-00CBwg-1Y for pgsql-hackers@lists.postgresql.org; Sun, 22 Mar 2026 01:17:42 +0000 Received: from mail-wm1-x333.google.com ([2a00:1450:4864:20::333]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w47Ri-00000000Qg3-1IUr for pgsql-hackers@lists.postgresql.org; Sun, 22 Mar 2026 01:17:41 +0000 Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-486ff201041so13145505e9.1 for ; Sat, 21 Mar 2026 18:17:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774142256; cv=none; d=google.com; s=arc-20240605; b=aPm0vJP7V1vbvYjR5mhMEUIZ/Ne5nHBIaUfrkESEVOFUHHDEVmXZ4YfpBT6gXxHSRQ ZtTGI78FMheICDzWi//ev3AZR/wTRUa1Rby0OERHe5ZovpZzXh03kUeYxQXQ1Os2MVF9 8OCZgHULOUhP3uvOiCalZ8I8dyDkE769SXt+41WXcq207Ffdg2epMwbXKtzWkwxdLitb EvuXAPoWM5lbYSO23FkZde8yFX5uBkO0SHp+1PSGcYkokL84n/B2aXyDv2Ht1WEMUXcR cNfpf4AwbX2IP2I6RIleUiV70sv7kHCDUKMX+3fwQ8xGXT/smrMNryCpfIzMmNG9aiRo pEng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=KMxAQrULJMVvHO6TrIHHNrihVWOd0sLKYgAcLopcZBE=; fh=1xrGKSk2y/Gbg4iNnZYnMNOtZcu/k5YCSN5+56Vc4ew=; b=cZe3RPYe6fcb6lcT+3T9TaNkPAnuy9jrEeYoC/kKmMgmNksQH3l3OFF8D1NKj9VpPU JOD8f8V1E0EpxoIPGf2Xbs5rWEaqAXcY6YzvZfzwBgicMYjOUszA3vD/EA/5Zvtu+4ur oCs+UmcD00yCNrAOkrkeB7BqD49/ipjm5M5AlfwESIBRkf8OalfWdbeY2Vfgs3UDt+CT 2PZU21P39paSx2heUxpikS4XncTljebQESyfrKkYbSpxdXsACuqlPhiSKizg34XJ0Hni 1DpWsaJ+zWGMj5MBC6V3smphTh6TNkS9Q4pXzp7PA3XAJZVSmgH4rLQ4l71bnvBbmGxH qV5Q==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1774142256; x=1774747056; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KMxAQrULJMVvHO6TrIHHNrihVWOd0sLKYgAcLopcZBE=; b=m3bse1WUnU0Bx6Ku9uCkusfQgLm2mZfODWSvD4RftRhXCteDF93wLVc1Qa/2JbnVWt 4HZuaD1RFrtw0ZexfgRYWfsfRCPN4Tng86D2Us0XDDG+IaLHR7WlhaEGamG8LDb8V809 sPrbBo8lXkA4tKvj+mLwJyNoY0IymWt/R0OXU8mj5Sulk+m746Gc2duvnJQUknuZ0wzO zofBO8qKs3yqcBO0Kf+oIgcBDiypHW41XDK1hLWcHybD4fZkYgIlcYQhpa+7Lv1IE3qH wk3EazvoyE08FAliBxU7xChSUH7c5FSaDfmMZ3VTmcIfDTyouwaqvDk4r+OxYlYM15Bx hyxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774142256; x=1774747056; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=KMxAQrULJMVvHO6TrIHHNrihVWOd0sLKYgAcLopcZBE=; b=LOIB3joDszsjnKwoXef4OgmmXau7XmVf4M9CvtKTEY0eoWgUYlg9j613wrljrjy2Kx LmLTqLoyDBl8PcO1iOifXdeUd3NFy/poq1CgxGpeBXvMTWqgdIg0KkGb5/hrDknTKKv5 YGKKivrj16oZEqDMj0A/OJWFlToldZT2//GtJAsM4afCpFrGCDWVrFhoo/NDR2yOQf1n UgIaS+dYnJPFpQl1KsVAk/Wful/AE2SQOXysTld57eWOKP3boGXczB1pwnJf48gOHFrF BLnSbi1Z0FbfGjwacQaFwCuoIIROlT4L1qPsH8vbeO2ZeFKITXo0cf9xF1IbJJCFbq2N 2u4A== X-Forwarded-Encrypted: i=1; AJvYcCUak3iTd/6nILpcfJg+7aIzaxM3uj2TMtsmqk9yc2dJ5I15Ik/yS7AL5IRvl0oItbyD8klzyfXJLkIEXjFz@lists.postgresql.org X-Gm-Message-State: AOJu0YxXk2MO4w1lyrLrce4YGp7LGy2jo4h9eb4Et472FI7X9ZIOpVPu hFObImBMjR40/Ukq7DGNLbpXKmwiJhpQ+Dfqp/RUKpgL1Kt/IzPfgxRFscHhhGQ9oAeXPJrWEMx gGI8LlMZy3G1kCXiqxAXSTAPIKjEPCnfijqCJ+/gewg== X-Gm-Gg: ATEYQzwuwFXEV466IIG86zFJx0wivs9n6RLU4xAg0U5IQZWHyFy+U6jDr3Qe0yE4v/R eztTbngNCvNU5goVlf0YbPoNW8vgVPqqTE4fvt6+MWKSXrAvQPUQBuULOMn74NcjPHeZ/zunTyZ ybNEl0i5undlnx/F7KF3O6M658C3rvU5EsaRcbBGeAqfQd1N5JG9rESxveBzWdZKepYEKDo/k4i yK3djOIcXdzg9Ibdsh9ENMX8demz3tu1E0Yg7uGzyuKQlwQiC30+beoMfvHpcxOKV5AL5oGWr8x HQsc7Y4OVpleOYalV2TSc76iUc33jdTNor8GeSHzgwOZiJy6E3Z+7w== X-Received: by 2002:a05:600c:528e:b0:485:3fc8:de9c with SMTP id 5b1f17b1804b1-486fedb2ef0mr101788935e9.12.1774142254653; Sat, 21 Mar 2026 18:17:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Peter Geoghegan Date: Sat, 21 Mar 2026 21:17:08 -0400 X-Gm-Features: AaiRm53HUf2RXbPUkLzSSRLygzpwxMPT0jjxiyzV1BG1PdwnqIReoU902Ep8h44 Message-ID: Subject: Re: index prefetching To: Andres Freund Cc: Tomas Vondra , Alexandre Felipe , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Mar 21, 2026 at 3:54=E2=80=AFPM Andres Freund = wrote: > I'm still wondering about whether 0003 can be split apart - it's just hug= e and > does a lot of things at once. I find it hard to get patches of that size = into > a mergeable state at once. > > What about splitting out the changes to the indexscan API, i.e. moving IO= S > handling to indexam.c / heapam_handler.c into its own commit? I feel lik= e > that's a sizeable chunk that needs to touch a lot of places that don't ne= ed to > be touched in the commit adding batching. And it's such an architectural > improvement that it's a worthwhile thing to do on its own. I'll try to get that done for v18 (the version *after* the next version I'll post, not the next version/v17). I want to commit the 2 patches that are now ready ahead of producing a v17. I don't want to wait for a solution to this unrelated problem (nor do I want to leave CFTester unhappy). > One advantage of that would be that we could merge the VISITED_PAGES_LIMI= T > before the introduction of batches. Good point. > If the the amgetbatch introduction didn't also replace ammarkpos/amrestrp= os > with amposreset (see also question in previous email), I'd say that I'd s= plit > off the introduction of the batch interface from the conversion of btree = to > it. But it looks like it'd be not entirely trivial to keep both interface= s > around, so maybe that's not worth it. Right. The whole concept of a batch kinda exists in nbtree already, and has for a long time: there's always been markPos, which overwrites the scan's currPos on restore. This shares many of the same features (e.g., an independent markTuples array is used during index-only scans). So there's just no practical way to separate the btgetbatch/nbtree changes from the core changes that add functionality to indexbatch.c. > > + /* > > + * Limit on distinct heap pages visited before giving up (0 =3D n= o limit). > > + * Used by selfuncs.c to bound the cost of get_actual_variable_en= dpoint(). > > + */ > > + uint8 xs_visited_pages_limit; > > I'd maybe phrase it a bit more vaguely, e.g > > An approximate limit on the amount of work, measured in pages touched= , > imposed on the index scan. The default, 0, means no limit. Used by > selfuncs.c to bound the cost of get_actual_variable_endpoint(). > > so that, in the near future (presumably 20), we can just start to also ta= ke > the number of index pages visited into account. Fixed, using your wording now. > Because I do think your > concern that leaving out the number of index pages from the limit is > correct. The easiest way there seems to be to just have one shared limit. Yeah, there's just no question that it's a real problem. > > @@ -824,6 +824,7 @@ heapam_index_only_getnext_slot(IndexScanDesc scan, = ScanDirection direction, > > heap_continue =3D false; > > > > Assert(scan->xs_want_itup); > > + Assert(scan->xs_visited_pages_limit =3D=3D 0); > Why do we not need to support this for amgettuple indexes? Fixed -- amgettuple index AMs will support the xs_visited_pages_limit stuff in the next version. > I wonder if the accounting for the number of visited pages wouldn't bette= r be > done in heapam_index_fetch_tuple(), as that already has a dedicated branc= h for > the cross-heap-page-boundary case. I guess the current location was > convenient because it provides for an easy control flow for ending the sc= an? That's right, that's what makes it convenient. > > @@ -7180,6 +7180,7 @@ get_actual_variable_endpoint(Relation heapRel, > > &Snapsho= tNonVacuumable, NULL, > > 1, 0); > > Assert(index_scan->xs_want_itup); > > + index_scan->xs_visited_pages_limit =3D VISITED_PAGES_LIMIT; > > index_rescan(index_scan, scankeys, 1, NULL, 0); > > Wonder if it'd be worth doing this via a wrapper like index_limit_effort(= ) > that could check that the scan is an IOS, uses batch mode etc. We could always add it in the future. Maybe when fixing the issue with scanning too many index leaf pages. > > Testing has shown that index prefetching can make index scans much > > faster. Large range scans that return many tuples can be as much as 35= x > > faster. > > I think on networked storage (unfortunately what most postgres instances = run > on these days), there's easily much bigger wins :) I'm excited to see what users do with this. That's such an enormous difference that it likely enables using the system in ways that wouldn't have made sense in the past. > Perhaps worth mentioning that the wins are bigger with higher latency sto= rage? Will fix for next version. > > The heuristic used to decide when to begin prefetching -- delaying unti= l > > the scan's second batch -- is imperfect. > > Perhaps worth introducing the heuristic, and the need for it, in a bit mo= re > detail? Will do. I need to mention that the heuristic doesn't handle certain cases well (which we've discussed privately). These are nestloop join queries with a limit on the inner side of the scan that the prior batch heuristic doesn't quite save. (Previously, the tuples_needed patch helped with this, but since we're removing that patch, it makes sense to frame this as a deficiency of the priorbatch heuristic.) > > Selective index scans that > > access randomly-ordered heap pages will benefit from prefetching, but > > s/will/would/? Will fix. > Continuing to think that a bunch of this really doesn't belong into relsc= an.h > but should be in a dedicated indexbatch.h or such. It'll work that way in the next revision. > Just reminded me: At some point soon we're going to have to adjust the co= sting > model to account for prefetching. I think there will be a fewer cases whe= re > bitmap scans are the correct choice than there are today... I think so too. Obviously, the main benefit here is significantly improved performance. But ISTM that there's a second order benefit: the uncertainty about how much slower an index scan will be in the worst case is likely much lower. This is particularly true of index-only scans that require at least some heap fetches; it's difficult to predict how many a given scan will perform. While it isn't all that bad when the heap pages are in shared_buffers, it's much much slower when they aren't. The optimizer has only a very basic understanding of how much data is likely cached in shared_buffers. It's very hard to improve that model because it relies so much on things that can change quickly. Making the true cost profile of index scans simpler and more predictable might make the optimizer's cost model much more accurate (perhaps with a little work on the cost function side). It's just easier to fit a cost function to that profile because being wrong (about IoS heap fetches or correlation) matters significantly less. > I suspect there's a good argument for moving the decision about whether t= o > enable prefetching to the planner instead of doing it here. But doing it = here > doesn't have a lot of architectural impact for now, so we don't necessari= ly > have to move the point of decision making now. I think that it makes sense for the optimizer to give advice without being prescriptive. Basically something a bit like the tuples_needed patch, that table AMs are free to ignore when they find the hint to be wrong. > I think heapam_getnext_stream() is too generally named. For one it could = just > as well be for a sequential scan, for another it sounds like it's a funct= ion > like heap_getnext() or such. Maybe heapam_index_scan_next_block() or suc= h? Fixed, will use your name in the next version. > > + /* Reset read stream direction unconditionally */ > > + hscan->xs_read_stream_dir =3D NoMovementScanDirection; > > That's obviously the case given the assignment, but it doesn't even hint = at > why? It's just defensive. I'll add a comment to that effect. > > @@ -192,7 +210,14 @@ heapam_index_fetch_tuple(struct IndexFetchTableDat= a *scan, > > if (BufferIsValid(hscan->xs_cbuf)) > > ReleaseBuffer(hscan->xs_cbuf); > > > > - hscan->xs_cbuf =3D ReadBuffer(hscan->xs_base.rel, hscan->= xs_blk); > > + /* > > + * When using a read stream, the stream will already know= which block > > + * number comes next (though an assertion will verify a m= atch below) > > + */ > > + if (hscan->xs_read_stream) > > + hscan->xs_cbuf =3D read_stream_next_buffer(hscan-= >xs_read_stream, NULL); > > + else > > + hscan->xs_cbuf =3D ReadBuffer(hscan->xs_base.rel,= hscan->xs_blk); > > > > /* > > * Prune page when it is pinned for the first time > > The assertion mentioned is only after the call to heap_pageprune_opt(). W= hich > is probably fine, but perhaps it's worth just doing it immediately after = the > above if block, so if the assertion is triggered one can figure out quick= ly > it's the fault of the read stream getting confused. But that means that it won't be triggered when we don't enter the "if (hscan->xs_blk !=3D ItemPointerGetBlockNumber(tid))" block that contains all this code. Besides, it just doesn't seem possible that heap_page_prune_opt would release its caller's pin. > I think it'd be worth also explaining why it's safe to use the cached VM = bits > in the case where the tuples on an all-vsibile pge were only deleted afte= r > after we fetched the VM. I.e. that it's ok for this scan to think the pag= e is > all visible, because for the purpose of the tids it already had fetched f= rom > the index, it was all visible, i.e. all those tids will also be visible t= o the > current mvcc snapshot. Added a small additional paragraph to that effect to my pending v17 of the patch set. > > @@ -558,6 +609,35 @@ heapam_batch_getnext(IndexScanDesc scan, ScanDirec= tion direction, > > I'm also worried a bit about the namespacing of these functions (were > introduced in an earlier commit, but ...). Not at all clear that a functi= on of > that name couldn't be part of a seqscan or such. > > Maybe we should prefix all of these in something like heapam_iscan_ or su= ch? I'm going to rename things as follows, which is both self-consistent and as consistent with established convention as possible: heapam_batch_getnext_tid -> heapam_index_getnext_tid heapam_batch_getnext -> heapam_index_batch_getnext heapam_batch_resolve_visibility -> heapam_index_resolve_visibility heapam_batch_return_tid -> heapam_index_return_tid heapam_dirchange_readstream_reset -> heapam_index_dirchange_reset That'll be in the next version. > > + if (!hscan->xs_read_stream && priorBatch && scan->MVCCSca= n && > > + hscan->xs_blk !=3D InvalidBlockNumber && /* for = index-only scans */ > > + enable_indexscan_prefetch) > > I wonder if it'd be worth moving this into its own helper function > (e.g. heapam_index_consider_prefetching()), just to make it a bit more ob= vious > that that's an important decision point. I'd also expect the logic to be= come > a bit more complicated in the future. That'll be in the next revision, too. > It's too bad efficiency wise that we can't safely use > READ_STREAM_USE_BATCHING. For some workloads it looks like it's a decent > improvement. We could probably do the work to make it safe (basically ex= iting > batchmode in heapam_getnext_stream() when we have to do index IO and then > re-enabling it afterwards), but that's clearly for later. Agreed. Though I wasn't able to see much benefit here, when I tried that. > > + */ > > +static pg_noinline void > > +heapam_dirchange_readstream_reset(IndexFetchHeapData *hscan, > Comparing this to heapam_index_fetch_reset(): > Should this also invalidate hscan->xs_prefetch_block? It isn't necessary, because hscan->xs_prefetch_block is initialized from scratch when the read stream callback is called following our invalidating prefetchPos. > And the other way round, should heapam_index_fetch_reset() not reset > batchringbuf->prefetchPos.valid? Nope, for the same reason. > > + /* > > + * If we're about to release the batch that prefetchPos c= urrently > > + * points to, just invalidate prefetchPos. We'll reiniti= alize it > > + * using scanPos if and when heapam_getnext_stream is nex= t called. (We > > + * must avoid confusing a prefetchPos->batch that's actua= lly before > > + * headBatch with one that's after nextBatch due to uint8= overflow; > > + * simplest way is to invalidate prefetchPos like this.) > > + */ > > + if (prefetchPos->valid && > > + prefetchPos->batch =3D=3D batchringbuf->headBatch= ) > > + prefetchPos->valid =3D false; > > + > > What causes us to reach this condition? As a general rule, prefetchPos can only advance in the read stream callback. It is mostly up to the read stream callback to notice when prefetchPos has fallen behind so it can reinitialize from scanPos. But we need this extra check to make its approach robust against prefetchPos->batch getting so far behind that uint8 wrap makes it look like it's ahead instead of behind. Importantly, while prefetchPos can fall behind scanPos, it's only possible in a limited, trivial sense: it can only fall behind for as long as we go without calling the read stream callback. Once the read stream callback is called, we can rely on its postcondition: when it returns, prefetchPos must be >=3D scanPos. So, in effect, everything behaves exactly as it would if it really was impossible for prefetchPos to be < scanPos for even a picosecond (we don't want to have to check prefetchPos needlessly in hot code paths). > Seems like we should take care that > the scan position doesn't overtake the prefetch position? That's precisely what we're doing here. We're just not checking prefetchPos on every item returned (we only do so when advancing to the next batch) to save cycles. > Does this only > happen when paused? This "prefetchPos->valid =3D false" stuff is approximately the opposite of pausing. Pausing resolves the problem of prefetchPos getting so far ahead of scanPos that the batch ring buffer runs out of slots. Whereas this prefetchPos invalidation code helps the read stream deal with prefetchPos falling behind scanPos. This is a fairly rare edge-case, that doesn't actually need consideration in the high-level design -- in a certain sense the high level design pretends that prefetchPos cannot fall behind scanPos for even one picosecond (it can, but not in a way that actually matters/affects the invariants for the read stream callback). Note that prefetchPos falling behind scanPos is most likely during index-only scans that require some heap fetches, but not too many. The read stream is seldom called, but is called with such a scan. So prefetchPos could fall far behind scanPos (or would, if we didn't have this code). > > + /* > > + * prefetchPos might not yet be valid. It might have also fallen= behind > > + * scanPos. Deal with both. > > + * > > + * If prefetchPos has not been initialized yet, that typically in= dicates > > + * that this is the first call here for the entire scan. We init= ialize > > + * prefetchPos using the current scanPos, since the current scanB= atch > > + * item's TID should have its block number returned by the read s= tream > > + * first. > > Just for my understanding: I assume right now we'll always at the start o= f a > scan batch when prefetchPos is invalid? Not quite. We might have invalidated prefetchPos explicitly in that code path that you just asked about. We deal with prefetchPos falling behind scanPos there, but also here, in the read stream -- which has a "index_scan_pos_cmp(prefetchPos, scanPos, xs_read_stream_dir) < 0" check for that. > I guess this could (not necessarily should) be avoided by having > heapam_getnext_stream() advance the prefetchPos beyond a block of runs wi= th > the same buffer before returning? If we did that, we should never fall b= ehind > anymore, is that right? The problem isn't what the read stream callback does or doesn't do. As I said, we can rely on its postcondition: when it returns, prefetchPos must be >=3D scanPos. However, there's no way to be sure the read stream callback will get called again before we consume many more batches -- particularly during index-only scans that need only a few heap fetches (but also with plain index scans that see a run of leaf pages that each have items that all point to the same heap block, which is unlikely but certainly possible). > Wonder if it's worth somehow asserting that after this the page is actual= ly > unguarded after the call. We used to, but the new layering forced me to remove it. Any ideas about how to add it back? > > + /* > > + * We never call amgetbatch without immediately releasing= the batch's > > + * index AM resources (which requires special care during= index-only > > + * scans). The read stream is sensitive to buffer shorta= ges, so we > > + * defensively avoid anything that visibly affects the pe= r-backend > > + * buffer limit. > > + */ > > It's probably just me (not the read stream) being sensitive, but I'd say = that > the read stream tries to be careful about not pinning too many buffers, a= nd > that's harder to do reliably if there are variable numbers of pins taken > without such care... Will use that wording. > > + hscan->xs_paused =3D true; > > + return read_stream_pause(stream); > > + } > > Just for my understanding: What is the easiest way to hit this? I assume= a > well correlated indexscan with small heap tuples (and thus few heap pages= ) is > the most obvious path? Depends on what you mean. If you just want to test it quickly, then you can easily do so by temporarily reducing INDEX_SCAN_MAX_BATCHES to 2 (the minimum usable value). If you meant to ask the most likely way this will happen in the real world, I can think of a couple of things. Possibly, the easiest way is to perform an index-only scan that requires only a few heap fetches. These fetches can be spaced out such that we have a hard time maintaining an adequate prefetch distance, and have to read many leaf pages just to do one more fetch. (In general, these kinds of index-only scans might be the best way to create extreme conditions in the read stream callback.) > > + prefetchBatch =3D heapam_batch_getnext(scan, xs_r= ead_stream_dir, > > + = prefetchBatch, prefetchPos); > > + if (!prefetchBatch) > > + { > > + /* > > + * Failed to load next batch, so all the = batches that the scan > > + * will ever require (barring a change in= scan direction) are > > + * now loaded > > + */ > > + return InvalidBlockNumber; > > + } > > I assume failed means that we reached the end of the scan? If so failed s= ounds > a bit odd. Fixed. > > From 01164d1c646d441b508fb7b67fc8bea7583cb222 Mon Sep 17 00:00:00 2001 > > From: Peter Geoghegan > > Date: Sun, 18 Jan 2026 11:32:52 -0500 > > Subject: [PATCH v16 16/17] Add fake LSN support to hash index AM. > If you reformulated the commit message to not reference the commit I see = no > real reason to hold off pushing this? I'll commit this one ahead of the next revision, then. > Admittedly I only skimmed it, but it looks like it looks pretty much the = same > as when I last looked at it (which is fine, I think it was ready to commi= t > then, except for the dependency on XLogGetFakeLSN, which hadn't yet been > committed)? Right. > > @@ -510,7 +512,11 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf,= Buffer ovflbuf, > .. > > + else /* !RelationNeeds= WAL(rel) */ > > + { > > + recptr =3D XLogGetFakeLSN(rel); > > + > > + /* Determine if wbuf is modified */ > > + if (nitups > 0) > > + mod_wbuf =3D true; > > + else if (is_prev_bucket_same_wrt) > > + mod_wbuf =3D true; > > + } > > Any reason to not just compute mod_wbuf before the if (NeedsWal()), just = like > you did for is_prim_bucket_same_wrt etc? No good reason. Will fix. Thanks again for the review! --=20 Peter Geoghegan