public inbox for [email protected]
help / color / mirror / Atom feedFrom: lakshmi <[email protected]>
To: Kirill Reshke <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: Use log_newpage_range in HASH index build
Date: Mon, 5 Jan 2026 14:20:27 +0530
Message-ID: <CAEvyyTjGtYRq3Aa0tY6=_QGV6z+6KTQVF=J75Acx2L6i0JOkGw@mail.gmail.com> (raw)
In-Reply-To: <CAEvyyTjUTLXvrme3CNMjJYwA9j4usrq53PyBPv47Apk58Qz+GA@mail.gmail.com>
References: <CALdSSPgu6fnoOYzgiFF4_Etr96zEHvSwvYJDemc3o++EZbUQMA@mail.gmail.com>
<CAEvyyTjUTLXvrme3CNMjJYwA9j4usrq53PyBPv47Apk58Qz+GA@mail.gmail.com>
Hi Kirill,
Following up on my earlier note, I implemented the proposed HASH overflow
page reuse enhancement.Recently freed overflow pages are recorded in
_hash_freeovflpage( ),and _hash_addovflpage( ) now prefers reusing those
pages during allocation before falling back to the bitmap scan.
The change is backend-local and allocation-only,with no WAL or on-disk
format changes.I verified correctness using index build/drop and VACUUM
cycles,and confirmed WAL neutrality by comparing WAL generated during HASH
index builds with and without this change (no observable difference beyond
normal noise).
The patch is attached for review.Feedback is welcome.
Best regards,
Lakshmi
On Tue, Dec 23, 2025 at 5:23 PM lakshmi <[email protected]> wrote:
> Hi Kirill,
>
> I tested your patch on the current master and confirmed the WAL reduction
> during HASH index build.
>
> While testing, I noticed a possible small follow-up improvement in HASH
> overflow handling. Currently, any free overflow page may be reused, which
> can scatter overflow chains and hurt cache locality. Reusing recently freed
> overflow pages first could help, without changing WAL behavior or on-disk
> format.
>
> I would like to work on this as a follow-up enhancement and would welcome
> any suggestions.
>
> Best regards,
> Lakshmi
>
> On Tue, Dec 23, 2025 at 2:31 PM Kirill Reshke <[email protected]>
> wrote:
>
>> There exists an optimization to index creation process, when we omit
>> to write any WAL
>> for index build. It is currently supported in B Tree, GIN, GiST, spg
>> indexes.
>> It works because we do not need to recover anything if index creation
>> fails, because if was not used by any query. So, the index can be
>> built on-disk, and then, just before making the index alive, we can
>> simply log all pages to WAL.
>>
>> Hash index currently lacks this optimization.
>> PFA implementation.
>>
>> During my testing, I checked the amount of WAL generated by index
>> build before and after patch applied. My script was something like:
>>
>> select pg_current_wal_insert_lsn();
>>
>> create index on t using hash (i);
>>
>> select pg_current_wal_insert_lsn();
>>
>> select pg_lsn_wal_diff(lsn1, lsn2);
>>
>> Resulting numbers depend on index size, but I got 2.5-3.5 times less
>> WAL with this patch and 8 times less WAL with this patch +
>> wal_compression=on.
>> Index creation time, however, did not change much...
>>
>> About implementation:
>> These are many types of record that can be generated during index build.
>> I know for sure these are possible (double-checked using pg_waldump):
>>
>> SPLIT_COMPLETE
>> INSERT
>> SPLIT_ALLOCATE_PAGE
>> SPLIT_PAGE
>> ADD_OVFL_PAGE
>> SQUEEZE_PAGE
>> INIT_META_PAGE
>> INIT_BITMAP_PAGE
>>
>>
>> Looks like SPLIT_COMPLETE and VACUUM_ONE_PAGE are never generated
>> during index build. I'm not sure about MOVE_PAGE_CONTENTS.
>>
>> So, implementation is simply pass isbuild flag everywhere something is
>> wal-logged. Looks like it is less invasive than alternatives.
>>
>> --
>> Best regards,
>> Kirill Reshke
>>
>
Attachments:
[text/x-patch] 0001-hash-reuse-recently-freed-overflow-pages.patch (3.4K, 3-0001-hash-reuse-recently-freed-overflow-pages.patch)
download | inline diff:
From 34d4d99baa5377a4dabb8cf9c4bcd0e9bb2c6376 Mon Sep 17 00:00:00 2001
From: Lakshmi <[email protected]>
Date: Mon, 5 Jan 2026 10:38:23 +0530
Subject: [PATCH] hash: reuse recently freed overflow pages
---
src/backend/access/hash/hashovfl.c | 77 ++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index 3277da19840..c66f8adc056 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -22,6 +22,63 @@
#include "access/xloginsert.h"
#include "miscadmin.h"
#include "utils/rel.h"
+/*-------------------------------------------------------------------------
+ * Backend-local reuse cache for recently freed overflow pages.
+ * Best-effort hint only: no correctness or WAL dependency.
+ *-------------------------------------------------------------------------
+ */
+#define HASH_OVFL_REUSE_CACHE_SIZE 32
+
+typedef struct HashOvflReuseCache
+{
+ BlockNumber blocks[HASH_OVFL_REUSE_CACHE_SIZE];
+ int head;
+ int count;
+} HashOvflReuseCache;
+
+static HashOvflReuseCache ovfl_reuse_cache;
+
+/* Record a freed overflow page */
+static void
+hash_record_freed_ovflpage(BlockNumber blkno)
+{
+ ovfl_reuse_cache.blocks[ovfl_reuse_cache.head] = blkno;
+ ovfl_reuse_cache.head =
+ (ovfl_reuse_cache.head + 1) % HASH_OVFL_REUSE_CACHE_SIZE;
+
+ if (ovfl_reuse_cache.count < HASH_OVFL_REUSE_CACHE_SIZE)
+ ovfl_reuse_cache.count++;
+}
+
+/* Try to reuse a recently freed overflow page */
+static bool
+hash_try_reuse_cached_ovflpage(Relation rel, Buffer metabuf,
+ BlockNumber *reuse_blkno)
+{
+ HashMetaPage metap = HashPageGetMeta(BufferGetPage(metabuf));
+ int i;
+
+ for (i = 0; i < ovfl_reuse_cache.count; i++)
+ {
+ int idx =
+ (ovfl_reuse_cache.head - 1 - i + HASH_OVFL_REUSE_CACHE_SIZE)
+ % HASH_OVFL_REUSE_CACHE_SIZE;
+
+ BlockNumber blkno = ovfl_reuse_cache.blocks[idx];
+ uint32 bitno = _hash_ovflblkno_to_bitno(metap, blkno);
+
+ /* Bitmap is authoritative */
+ if (_hash_isbitset(metap->hashm_mapp, bitno))
+ {
+ _hash_clrbit(metap->hashm_mapp, bitno);
+ *reuse_blkno = blkno;
+ return true;
+ }
+ }
+
+ return false;
+}
+
static uint32 _hash_firstfreebit(uint32 map);
@@ -132,6 +189,10 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin, boo
uint32 i,
j;
bool page_found = false;
+ BlockNumber reuse_blkno;
+
+
+
/*
* Write-lock the tail page. Here, we need to maintain locking order such
@@ -186,6 +247,16 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin, boo
_hash_checkpage(rel, metabuf, LH_META_PAGE);
metap = HashPageGetMeta(BufferGetPage(metabuf));
+ /*
+ * NEW: Try reuse cache before scanning bitmap
+ */
+ if (hash_try_reuse_cached_ovflpage(rel, metabuf, &reuse_blkno))
+ {
+ ovflbuf = _hash_getinitbuf(rel, reuse_blkno);
+ page_found = true;
+ goto found;
+ }
+
/* start search at hashm_firstfree */
orig_firstfree = metap->hashm_firstfree;
first_page = orig_firstfree >> BMPG_SHIFT(metap);
@@ -633,6 +704,12 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
CLRBIT(freep, bitmapbit);
MarkBufferDirty(mapbuf);
+ /*
+ * NEW: Remember this overflow page for reuse
+ */
+
+ hash_record_freed_ovflpage(ovflblkno);
+
/* if this is now the first free page, update hashm_firstfree */
if (ovflbitno < metap->hashm_firstfree)
{
--
2.39.5
view thread (3+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Use log_newpage_range in HASH index build
In-Reply-To: <CAEvyyTjGtYRq3Aa0tY6=_QGV6z+6KTQVF=J75Acx2L6i0JOkGw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox