Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0SPN-001vFL-1C for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 22:52:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0SPL-00CRKb-1U for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 22:52:03 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0SPL-00CRKR-01 for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 22:52:03 +0000 Received: from mail-ot1-x329.google.com ([2607:f8b0:4864:20::329]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0SPJ-00000001gLA-0UeG for pgsql-hackers@postgresql.org; Wed, 11 Mar 2026 22:52:02 +0000 Received: by mail-ot1-x329.google.com with SMTP id 46e09a7af769-7d74c1157a4so512936a34.1 for ; Wed, 11 Mar 2026 15:52:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773269520; cv=none; d=google.com; s=arc-20240605; b=RX1xow+PR8kjhz3gMRTq4Dkhysqtdavvw6bJ2ST36jUOSjSsiG8sCETPd4s7CBJi9J Uc+liQuixhCjLd/sMz1IWY842xolMyL1YWpPQNVX/8kYv8edmJ0cB4uWJJFCoUNd6K8t ILzNXs1iwvjBemO+LArr5b1QJnqBOw4Q2UI8iFt6CdyDEFwbJgfLRR0/flPA8OCyDQEU 8dqMhNJlTurXVi33LK8ciKQYX3m1X0sx11qTBZURZqqJeG+ZlaKfzmXvUZ9pITfZHQEN S3iZnOITR9A2RG5IPcPz31WIWrFfeggaeU0D9tE0j26Bulmv8gfd2HKON4JBAkL/ikHR OM8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=rdzue9XGqzFNCcMaf78Isq2n228/qGBm+ZGR/4pJdN0=; fh=Mmu5MPAeSdenancbp07q8OgBw0nBncr8Dkq8jKrlu+k=; b=T3asE5GDqSvi3Vy3J67Zx1GyPstoAgIlSY1aKXLiUiYGP5RCB7f3bJIJ7SEn8YEpKh mknQ1uLEMi47azwGfxCyIZE6f7p//o3L6N6NJJMjdVy4jhP08n/A4dkQZlYQU8982lLG HrMmPct9p9BYoBxFwkgyPw6pVen5x4mauQcBtdBwaRecmojIQE2rxp6W6wwLWwzjsPr3 td4GXYgVwQSovKysRZm9xpTIT3T+eDC6Gun6d/Ta5in88y5vDqQjMVRW/JVAP/nXK6jl Og6lnuFcbtiZeuRTjBnc/HCGU3zZ96gE3e037HeHWCN3+6ZCuhNkFi/a+QEtCGTixYck Z4uA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773269520; x=1773874320; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rdzue9XGqzFNCcMaf78Isq2n228/qGBm+ZGR/4pJdN0=; b=BeGfeJN7eDC3ySQiNfgkhdWbb6I1OYagqj4cqOHxst7w5HdmUh+tQQu7uFkgSUWMXR 0yZnRBh3Q2/j4tfJanTP55gAFSjaUv7AFBimnkDmjbMtgr2h17AWP0Rd5hdRMUd0wvQm iwm2t4KlCI5TOPKkb28Yw1bG+K6JopLIT7hXjFnVSSnQ8Vr1U3niK0wx+WuSJuLvc0XV 45dAokERb4AvvimEXyp/vFbv8CUu8SfCsi27fbr9SIk4/joij206QZ0vN6GoVQuKKmYZ tPRdDM9HNW6v4HubzB+6X4ihJCxSerDDZSCT0s6SG72q4Nh3kwd0QEnjtyPdzfx5d1dK 2wbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773269520; x=1773874320; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=rdzue9XGqzFNCcMaf78Isq2n228/qGBm+ZGR/4pJdN0=; b=JVT3m6W6InGtMF9OM2WUU2vUI6+wHLA9bJzguxFVZDxJ+/uNCKjmpRZ4OxUxYsi/JW 5FpZurdIOBq7MFC5ZUzEbkSwQSqnC3kRwBgnVhEGgl+r1dQvBROCtYvby9sJeCFh8z9z yk+fYD6oJc+ELd+aeb/QIzWCV7Mon5TLcVBho0aR/IVhBYweUIMzyrb88oNPzvE6rHgZ uQ9pXzVgsE/3mVJeQe5xWzga4MOlc1KsNfRUmoVeRmz4GLFIMcuhdOwgNUj8mnYpbCXM g4YiRXshr4pLHc00TmZSxlFQ4hOpadJY3M3OZViMgBnzDNvG25V+ZHjzkvXBsJSM5Nl1 9p/A== X-Forwarded-Encrypted: i=1; AJvYcCWLuASXH3y3qbUy9Oog33KKKr/13Wi2jqa7/YdxK6sf/4XNSe6uJtnIypZntmz6DZeOALgi1xZVSEyVkDVL@postgresql.org X-Gm-Message-State: AOJu0YxkP0YZGh5WcT4jdkgXyEqZ6dbFmSe1iVPNycBlOydL6Qepfh1m O0bUYf/lyhEPCZkW2UPX+jM3EpO7uPEBRHTTiIyG9ABbkN8PiwcikjoXSUiGCJzzrPZ8xQRTA4m qGbMknSNima47HljGLGBWiqfruJdtnSNBgackMng= X-Gm-Gg: ATEYQzxkZWFFU5X6EEjTswBTuQji8w2ICIMSYz+r5dA3Y1/iWCyvII7EDFk3Y3yUhAe ZIHzRTYJEZ9S9axUYJyY2axEGM0HD97VY4U5wSLFnl5kdZu3XAwhHQgLvx7/6T9OyJZrGOrb//c ci8E+zYv2452hDuj9fxUVwS3poWMOqcoVmd936hvGebtaEPKbkyMwJ9Gt5stGJuP31nQdHOF7Gm bxgXQquIBdZaZTCeG88P5FxiROIa+57AP25hY6WFVvOQP4tGYO7ZjOEA8nK8RyGWlnUFjrFqcqq sTpMxSkDB7oO6M4f60MDdRTclRc6EGnxJlbpOES3+OSg4Pi4E6H39OxjSmPh7JM6wTUSuVgc3sy hai0hgmDuwmhNaDpKqpdowh8M8jDvQ01Tg3z+ZXM= X-Received: by 2002:a4a:e909:0:b0:67b:c368:1369 with SMTP id 006d021491bc7-67bc8a63d2fmr2755653eaf.67.1773269519836; Wed, 11 Mar 2026 15:51:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexander Korotkov Date: Thu, 12 Mar 2026 00:51:47 +0200 X-Gm-Features: AaiRm51YB6gaHclQIwfXE3rAPHI-fopVNbiirc3e1Dah-APTzTfeEQCqoFvcfGQ Message-ID: Subject: Re: Odd code around ginScanToDelete To: Xuneng Zhou Cc: Pavel Borisov , Andres Freund , pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Mar 10, 2026 at 11:19=E2=80=AFAM Xuneng Zhou = wrote: > On Fri, Mar 6, 2026 at 10:45=E2=80=AFPM Pavel Borisov wrote: > > > > Hi, Andres and Alexander! > > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov = wrote: > > > > > > On Wed, Feb 4, 2026 at 12:32=E2=80=AFAM Alexander Korotkov wrote: > > > > > > > > On Tue, Feb 3, 2026 at 6:26=E2=80=AFPM Andres Freund wrote: > > > > > > > > > > While looking at converting more places to UnlockReleaseBuffer(),= in the > > > > > course of making UnlockReleaseBuffer() faster than the two separa= te > > > > > operations, I found this code: > > > > > > > > > > static bool > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRo= ot, > > > > > DataPageDeleteStack *parent, Offs= etNumber myoff) > > > > > ... > > > > > > > > > > if (!meDelete) > > > > > { > > > > > if (BufferIsValid(me->leftBuffer)) > > > > > UnlockReleaseBuffer(me->leftBuffer); > > > > > me->leftBuffer =3D buffer; > > > > > } > > > > > else > > > > > { > > > > > if (!isRoot) > > > > > LockBuffer(buffer, GIN_UNLOCK); > > > > > > > > > > ReleaseBuffer(buffer); > > > > > } > > > > > > > > > > if (isRoot) > > > > > ReleaseBuffer(buffer); > > > > > > > > > > > > > > > Which sure looks like it'd release buffer twice if isRoot is set?= I guess > > > > > that's not reachable, because presumably the root page will alway= s go down the > > > > > !meDelete path. But it sure made me wonder if there's a hard to r= each bug. > > > > > > > > Yes, it's not possible to have meDelete set for root, because > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch > > > > handling meDelete case should better do Assert(!isRoot). > > > > > This code was introduced in > > > > > commit e14641197a5 > > > > > Author: Alexander Korotkov > > > > > Date: 2019-11-19 23:07:36 +0300 > > > > > > > > > > Fix deadlock between ginDeletePage() and ginStepRight() > > > > > > > > > > I didn't trace it further to see if it existed before that in som= e fashion. > > > > > > > > Yes. I think generally this area needs to be reworked to become mo= re > > > > clear, and have vast more comments. It was wrong from my side tryi= ng > > > > to fix bugs there without reworking it into something more > > > > appropriate. I'm planning to put work on this during this week. > > > > > > > > > There's another oddity here: ginScanToDelete() requires that the = root page has > > > > > been locked by the caller already, but will afaict re-read the ro= ot page? But > > > > > then have code to avoid locking it again, because that would not = have worked? > > > > > Seems odd. > > > > > > > > > > > > It seems a bit odd for me that caller already have locked buffer, b= ut > > > > passes BlockNumber making us re-read the buffer. But I'm not sure > > > > that's the same as your point. Could you, please, elaborate more o= n > > > > this? > > > > > > Here is the refactoring patch. Sorry for the delay. > > > > Hi, Andres and Alexander! > > > > I've looked into the patch v1. > > Overall, it looks good to me. > > The refactor LGTM in general. The buffer-ownership rewrite looks > cleaner and safer overall. > > > Some thoughts: > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free > > allocated myStackItem->child after processing all children of the > > current level, when they are not needed anymore? > > Previously to this patch, palloc-ed "me" variable also was't freed at > > recursion levels. > > I think freeing myStackItem->child inside recursive calls might not be > worthwhile here. That node is intentionally reused for subsequent > siblings at the same depth, and it carries state (leftBuffer) that can > still be needed until the level is fully processed. > Freeing/reallocating it per subtree would add churn and make the > lifetime rules harder to reason about without meaningful memory > savings (the number of nodes is bounded by tree depth, not number of > pages). We currently free the chain once after ginScanToDelete() > returns in ginVacuumPostingTree(), which matches the natural lifetime > boundary > > > Could limiting the maximum recursion level be useful? > > Posting-tree depth is naturally small; a hard cap seems to add failure > risk with little practical benefit. > > > In the comment to myStackItem before ginScanToDelete(), it might be > > worth adding that after processing all pages on the current level, > > myStackItem is not needed anymore. > > > > > > Yes, it's not possible to have meDelete set for root, because > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch > > > > handling meDelete case should better do Assert(!isRoot). > > Looks like this additional Assert is not in patch v1. > > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert > > checking that its return result is false: > > - ginScanToDelete(gvs, &root); > > + deleted =3D ginScanToDelete(gvs, &root); > > +. Assert(!deleted); /* Root page is never deleted */ > > + 1, this is a good invariant check and improves readability > > One minor nit for the comment: > > The DataPageDeleteStack.buffer field comment says "valid only while > recursing into children" > this is true for internal pages, but for leaf pages the buffer is > valid until the pageWasDeleted / leftBuffer decision. The validity > window is actually "from when the caller sets it until the > post-recursion cleanup." Thank you for catching this. I decided to remove this statement from v2. It's hard to explain the life cycle of the buffer clearly in one sentence. On the other hand, it's explained in the comments of ginScanPostingTreeToDelete(). ------ Regards, Alexander Korotkov Supabase