public inbox for [email protected]  
help / color / mirror / Atom feed
From: Matthias van de Meent <[email protected]>
To: Andrey Borodin <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Kirk Wolak <[email protected]>
Cc: Nikolay Samokhvalov <[email protected]>
Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat
Date: Tue, 11 Nov 2025 13:06:30 +0100
Message-ID: <CAEze2WizYvqCerpb4F_YzW6n=t23PaDgkeCkp0kvWJ=C=Vhi4A@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAH2-Wz=e2LG=a9X9bsHVCdCsouH=6EE94NSBErwF5pY=1K2efA@mail.gmail.com>
	<[email protected]>
	<[email protected]>

On Sun, 31 Aug 2025 at 14:15, Andrey Borodin <[email protected]> wrote:
> > On 29 Aug 2025, at 13:39, Andrey Borodin <[email protected]> wrote:
> >
> > I think to establish baseline for locking correctness we are going to start from writing index scan tests, that fail with proposed merge patch and pass on current HEAD. I want to observe that forward scan is showing duplicates and backward scan misses tuples.
>
> Well, that was unexpectedly easy. See patch 0001. It brings a test where we create sparse tree, and injection point that will wait on a scan stepping into some middle leaf page.
> Then the test invokes vacuum. There are ~35 leaf pages, most of them will be merged into just a few pages.
> As expected, both scans produce incorrect results.
> t/008_btree_merge_scan_correctness.pl .. 1/?
> #   Failed test 'Forward scan returns correct count'
> #   at t/008_btree_merge_scan_correctness.pl line 132.
> #          got: '364'
> #     expected: '250'
>
> #   Failed test 'Backward scan returns correct count'
> #   at t/008_btree_merge_scan_correctness.pl line 133.
> #          got: '142'
> #     expected: '250'
> # Looks like you failed 2 tests of 2.
>
>
> > From that we will try to design locking that does not affect performance significantly, but allows to merge pages. Perhaps, we can design a way to switch new index scans to "safe mode" during index vacuum and waiting for existing scans to complete.
>
> What if we just abort a scan, that stepped on the page where tuples were moved out?

I don't think that's viable nor valid. We can't let vacuum processes
abort active scans; it'd go against the principles of vacuum being a
background process; and it'd be a recipe for disaster when it triggers
in catalog scans. It'd probably also fail to detect the case where a
non-backward index scan's just-accessed data was merged to the right,
onto the page that the scan will access next (or, a backward scan's
just-accessed data merged to the left).

I think the issue isn't _just_ what to do when we detect that a page
was merged (it's relatively easy to keep track of what data was
present on a merged-now-empty page), but rather how we detect that
pages got merged, and how we then work to return to a valid scan
position.
Preferably, we'd have a way that guarantees that a scan can not fail
to notice that a keyspace with live tuples was concurrently merged
onto a page, what keyspace this was, and whether that change was
relevant to the currently active index scan; all whilst still obeying
the other nbtree invariants that scans currently rely on.

> I've prototype this approach, please see patch 0002. Maybe in future we will improve locking protocol if we will observe high error rates.
> Unfortunately, this approach leads to default mergefactor 0 instead of 5%.
>
> What do you think? Should we add this to CF or the idea is too wild for a review?

Aborting queries to be able to merge pages is not a viable approach;
-1 on that. It would make running VACUUM concurrently with other
workloads unsafe, and that's not acceptable.


Kind regards,

Matthias van de Meent
Databricks (https://databricks.com)





view thread (13+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat
  In-Reply-To: <CAEze2WizYvqCerpb4F_YzW6n=t23PaDgkeCkp0kvWJ=C=Vhi4A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox