public inbox for [email protected]
help / color / mirror / Atom feedFrom: Alexander Korotkov <[email protected]>
To: Pavel Borisov <[email protected]>
Cc: Xuneng Zhou <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: [email protected]
Subject: Re: Odd code around ginScanToDelete
Date: Thu, 12 Mar 2026 11:05:51 +0200
Message-ID: <CAPpHfdsCgY2a3DeFZttTAj5cmk7K3L7y3PH9dMZTTPVqb8Picg@mail.gmail.com> (raw)
In-Reply-To: <CALT9ZEE6Ragvz4Gnjef4Lq30dL-_FDf1bOGoYooqVT-jm01FrA@mail.gmail.com>
References: <utrlxij43fbguzw4kldte2spc4btoldizutcqyrfakqnbrp3ir@ph3sphpj4asz>
<CAPpHfdt5BHkpvFnwg_RiMrCdD8W5FhOoahqY2Td-y3kb45UpZw@mail.gmail.com>
<CAPpHfduJQPTrXpJr=-mHtsJH8+1v_X389K6trajPzhnA37C9nA@mail.gmail.com>
<CALT9ZEGioDdvkx1rmjVzOW5nyxsxjX16FVifYY4b7MVFuc_80g@mail.gmail.com>
<CABPTF7X2HXXiUk1+Vd_pzPhiYOHw0y=OnDXW87G7hfPL0577hw@mail.gmail.com>
<CAPpHfdszQKx+OmxPtU+th3SLiJbf7KbhBbHnTZ1yE6ohQ3-wDA@mail.gmail.com>
<CALT9ZEE6Ragvz4Gnjef4Lq30dL-_FDf1bOGoYooqVT-jm01FrA@mail.gmail.com>
Hi, Pavel!
On Thu, Mar 12, 2026 at 10:41 AM Pavel Borisov <[email protected]> wrote:
>
> On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov <[email protected]> wrote:
> >
> > On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <[email protected]> wrote:
> > > On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <[email protected]> wrote:
> > > >
> > > > Hi, Andres and Alexander!
> > > >
> > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <[email protected]> wrote:
> > > > >
> > > > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <[email protected]> wrote:
> > > > > > >
> > > > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > > > > operations, I found this code:
> > > > > > >
> > > > > > > static bool
> > > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > > > > DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > > > > ...
> > > > > > >
> > > > > > > if (!meDelete)
> > > > > > > {
> > > > > > > if (BufferIsValid(me->leftBuffer))
> > > > > > > UnlockReleaseBuffer(me->leftBuffer);
> > > > > > > me->leftBuffer = buffer;
> > > > > > > }
> > > > > > > else
> > > > > > > {
> > > > > > > if (!isRoot)
> > > > > > > LockBuffer(buffer, GIN_UNLOCK);
> > > > > > >
> > > > > > > ReleaseBuffer(buffer);
> > > > > > > }
> > > > > > >
> > > > > > > if (isRoot)
> > > > > > > ReleaseBuffer(buffer);
> > > > > > >
> > > > > > >
> > > > > > > Which sure looks like it'd release buffer twice if isRoot is set? I guess
> > > > > > > that's not reachable, because presumably the root page will always go down the
> > > > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > > > > >
> > > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > > > handling meDelete case should better do Assert(!isRoot).
> > > > > > > This code was introduced in
> > > > > > > commit e14641197a5
> > > > > > > Author: Alexander Korotkov <[email protected]>
> > > > > > > Date: 2019-11-19 23:07:36 +0300
> > > > > > >
> > > > > > > Fix deadlock between ginDeletePage() and ginStepRight()
> > > > > > >
> > > > > > > I didn't trace it further to see if it existed before that in some fashion.
> > > > > >
> > > > > > Yes. I think generally this area needs to be reworked to become more
> > > > > > clear, and have vast more comments. It was wrong from my side trying
> > > > > > to fix bugs there without reworking it into something more
> > > > > > appropriate. I'm planning to put work on this during this week.
> > > > > >
> > > > > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > > > > then have code to avoid locking it again, because that would not have worked?
> > > > > > > Seems odd.
> > > > > >
> > > > > >
> > > > > > It seems a bit odd for me that caller already have locked buffer, but
> > > > > > passes BlockNumber making us re-read the buffer. But I'm not sure
> > > > > > that's the same as your point. Could you, please, elaborate more on
> > > > > > this?
> > > > >
> > > > > Here is the refactoring patch. Sorry for the delay.
> > > >
> > > > Hi, Andres and Alexander!
> > > >
> > > > I've looked into the patch v1.
> > > > Overall, it looks good to me.
> > >
> > > The refactor LGTM in general. The buffer-ownership rewrite looks
> > > cleaner and safer overall.
> > >
> > > > Some thoughts:
> > > >
> > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > > allocated myStackItem->child after processing all children of the
> > > > current level, when they are not needed anymore?
> > > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > > recursion levels.
> > >
> > > I think freeing myStackItem->child inside recursive calls might not be
> > > worthwhile here. That node is intentionally reused for subsequent
> > > siblings at the same depth, and it carries state (leftBuffer) that can
> > > still be needed until the level is fully processed.
> > > Freeing/reallocating it per subtree would add churn and make the
> > > lifetime rules harder to reason about without meaningful memory
> > > savings (the number of nodes is bounded by tree depth, not number of
> > > pages). We currently free the chain once after ginScanToDelete()
> > > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > > boundary
> > >
> > > > Could limiting the maximum recursion level be useful?
> > >
> > > Posting-tree depth is naturally small; a hard cap seems to add failure
> > > risk with little practical benefit.
> > >
> > > > In the comment to myStackItem before ginScanToDelete(), it might be
> > > > worth adding that after processing all pages on the current level,
> > > > myStackItem is not needed anymore.
> > > >
> > > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > > > handling meDelete case should better do Assert(!isRoot).
> > > > Looks like this additional Assert is not in patch v1.
> > > >
> > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert
> > > > checking that its return result is false:
> > > > - ginScanToDelete(gvs, &root);
> > > > + deleted = ginScanToDelete(gvs, &root);
> > > > +. Assert(!deleted); /* Root page is never deleted */
> > >
> > > + 1, this is a good invariant check and improves readability
> > >
> > > One minor nit for the comment:
> > >
> > > The DataPageDeleteStack.buffer field comment says "valid only while
> > > recursing into children"
> > > this is true for internal pages, but for leaf pages the buffer is
> > > valid until the pageWasDeleted / leftBuffer decision. The validity
> > > window is actually "from when the caller sets it until the
> > > post-recursion cleanup."
> >
> > Thank you for catching this. I decided to remove this statement from
> > v2. It's hard to explain the life cycle of the buffer clearly in one
> > sentence. On the other hand, it's explained in the comments of
> > ginScanPostingTreeToDelete().
>
> Patch v2 looks good to me. I also agree with Xuneng that this
> refactoring improved the logic to make it look clearer.
> Thank you for the explanation of buffers lifetime!
Thank you for your feedback. I'll push the patch if no objections.
------
Regards,
Alexander Korotkov
Supabase
view thread (13+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: Odd code around ginScanToDelete
In-Reply-To: <CAPpHfdsCgY2a3DeFZttTAj5cmk7K3L7y3PH9dMZTTPVqb8Picg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox