public inbox for [email protected]
help / color / mirror / Atom feedFrom: Peter Geoghegan <[email protected]>
To: Andrey Borodin <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Kirk Wolak <[email protected]>
Cc: Nikolay Samokhvalov <[email protected]>
Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat
Date: Wed, 27 Aug 2025 17:06:28 -0400
Message-ID: <CAH2-Wz=e2LG=a9X9bsHVCdCsouH=6EE94NSBErwF5pY=1K2efA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
On Tue, Aug 26, 2025 at 5:40 AM Andrey Borodin <[email protected]> wrote:
> *** Correctness:
>
> The implementation reuses existing locking, critical sections and WAL logging infrastructure. To handle cross-page dependencies during WAL replay, when tuples are merged, the right sibling buffer is registered with `REGBUF_FORCE_IMAGE`, this is a temporary workaround.
I don't think that you can reuse existing locking for this. It needs
to be totally reevaluated in light of the way that you've changed page
deletion.
In general, page deletion ends up being very complicated with the
Lehman & Yao design -- even with the existing restrictions. It's a
natural downside of the very permissive locking rules: there is no
need for most searchers and inserters to ever hold more than a single
buffer lock at a time. Searchers generally hold *no* locks when
descending the index at points where they're "between levels". Almost
anything can happen in the meantime -- including page deletion (it's
just that the deleted page cannot be concurrently recycled right away,
which is kind of a separate process).
That original block number reference still has to work for these
searchers, no matter what. It must at least not give them an
irredeemably bad picture of what's going on; they might have to move
right to deal with concurrent page deletion and page splits, but
that's it.
If you merge together underful pages like this, you change the key
space boundaries between pages. I don't see a way to make that safe/an
atomic operation, except by adding significantly more locking in the
critical path of searchers.
> *** Current Status & Questions:
>
> The patch successfully reduces index bloat and handles basic scenarios, but we've identified some areas that need community input:
Have you benchmarked it?
I wouldn't assume that free-at-empty actually is worse than merging
underfull pages. It's complicated, but see this paper for some
counter-arguments:
https://www.sciencedirect.com/science/article/pii/002200009390020W
This old Dr. Dobbs article from the same authors gives a lighter
summary of the same ideas:
https://web.archive.org/web/20200811014238/https://www.drdobbs.com/reexamining-b-trees/184408694?pgn...
In any case, I believe that this optimization isn't widely implemented
by other major RDBMSs, despite being a textbook technique that was
known about and described early in the history of B-Trees.
--
Peter Geoghegan
view thread (13+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat
In-Reply-To: <CAH2-Wz=e2LG=a9X9bsHVCdCsouH=6EE94NSBErwF5pY=1K2efA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox