Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0bzh-0023Uk-1a for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 09:06:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0bzf-00Dven-11 for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 09:06:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0bze-00Dvee-3D for pgsql-hackers@lists.postgresql.org; Thu, 12 Mar 2026 09:06:11 +0000 Received: from mail-ot1-x334.google.com ([2607:f8b0:4864:20::334]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0bza-00000002H6S-07s2 for pgsql-hackers@postgresql.org; Thu, 12 Mar 2026 09:06:11 +0000 Received: by mail-ot1-x334.google.com with SMTP id 46e09a7af769-7d73ccee442so856731a34.1 for ; Thu, 12 Mar 2026 02:06:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773306364; cv=none; d=google.com; s=arc-20240605; b=MKzdcvj/AgLYY+rHvdX5KzE6+w7G0NHAcHmRwhmO8dzk4g3sRiU36v0bNJxLWly1tM bz/RxKiTMighMhhiyIgOhZZLjzlAnGe33V0tZOlXAhusXFo9rhhlVufKfQMeVtEh5A7v iN5tGgrctI/fm3ln8h9lq78wIDkiTtrLRVppJeHwNWRtr47P7k0lJxoE+xr+X06NUx4x gIoGDOHHq7dBDfWu1CbkuFCARtGdmF28+qyJzF7XIZ5qtiie3IW0/cD6pfA2fpgUbghL quFRX2t8AanvRrCDadctbhQiqlclHuhWIbNCKnjUTkpYERTLSWvQXUJuKqr/wbm++TZR 8Fqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=PlHazSf5tImhVYvTH4T+sUMowoTsjXuGQOyXgYfetdM=; fh=ozGdSIcZ9GoQifmEggpaoJrI+H6WDf/VrjCSOOUBMR4=; b=WQC369OJ88ujRPhev2Qz2//GzLXui35PxqLSVMZEF8BuIxky9ks1nwL/sCIYQyM2Ym 7T38Rtg6TGD6drMYEwf/oCtvEzfB9MMj/BASA40fMiYNRgDyME1v9NdQseG1WACHNjkU zJnriupAuFhOSjAX3iD4Spep3IURMcxVeTlm+NNFPdZLAd37ETz21sEKvv1wN1CrmalB k1QCV7udPfpNjglrUv7zUmfIB+BmZl9CD+hyW1l+Fs5A+CqCTTPVkKsIcRRSPnRqFpbm X6KW0WNOC0PXszCwk9Jw0zu1+w+pTdYGM8/6TkgpzcU7pxltM5kC3oe06X/+hBcqL+ko gXVQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773306364; x=1773911164; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PlHazSf5tImhVYvTH4T+sUMowoTsjXuGQOyXgYfetdM=; b=Fo37NrCAXCEc5OPtgVhl6y6pFTvBe/2JUJREzQ4nQAtM8CglknJUKnpqE3DwfWrIdl UxJx2p7MSMWuw6L5wCXjeI7THjbRSjO7xcTpUM/3L77vHrA0gcSqg8pQNdLLd8fcGYQs 2gd9bmgPPiWrZBKdZIxeEOlFtv7sEOxswZ9u/0ExMjK9qijck0UUDgZhI3zey7RDYdJ7 oYtV5LOQj3yGocB6PfXn1S5dl6oWWFLMF0atlBpe0N0nc8Q03fS2RrQgWIWCrDOMyye4 dMH2ztqx/+eC9VYF2SoLYQ7vGvWkrSz6Hw+4mQ7PRie8UN1qF88IQEYva40hu1Mex4ge NizQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773306364; x=1773911164; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PlHazSf5tImhVYvTH4T+sUMowoTsjXuGQOyXgYfetdM=; b=YRSRjSbmCH/g0bfm2v73jTpZK+Ki8ZFCK9IY6926ACBvcJa0LVBCaKb3pR5EnbT7Zl zqgEsxjh4933kmKApkz7jclSyMtbLJG0Pqo3qjYolLGhCaSVebfa4ZPHqK2CAH+C0QdL Go24VgNvlPcyXZkCe0LEPqAU50aBs9JuUJ0MbVMGUzb46fQ2l1Bw/IWZ5531Swz0QljI eYik9AfKXj0vp47gJjW92tUqyx4WJi5Ge8h6JE+Vc3yDD0JhWIEgkjZfCBTOhXv4BYs5 2Ph9RAmBJhq5N3L5vuhSKo6T8xC/k7skeJPDrd4JyvU9AKZ9xI/B8HwH8uiHHCbVMkrs 7qqw== X-Forwarded-Encrypted: i=1; AJvYcCUUbnRrMbWksO1wB+nAxc7nW0DZciu5uzsx9Kl6YxIfGKu2H+DMXsLP5U2xwmetnuBBFsnjuzp4QepxIQHp@postgresql.org X-Gm-Message-State: AOJu0Yxts+gg6u0CuYfyx5QKzN+XvYs+2YaZlVxul/53xrKFEn90/FgM xXhQmxyQLhDAbyUmcIUbLgHQ+6iUUtdC2JlAdlmMqTCpxbYTSZ1Tk6tud5p6FSP+/E66r60/WjP x+ol++vLHu+tShsRix+VoyFepBPhRLnc= X-Gm-Gg: ATEYQzwYyFfsZf4/pMu33hSJByCRoOktK4YmTxCanKGT/l+mXtzao1SQPCA2PKagLiW wZj+IadaBhETRNkNBbEgAdeHSkNppromQQ9EFwJhSFn3Oj9BT+Sbdat+X/RPyWbFnIL6HJRKVvv eBU7PmxkPrZHdge/oFZTA6onppd29aHmX1t9SY3b0EEXr1QG2BA+D4vNZ1P4MGXQ9MSSkJsnaWC 4oMdApR7w1HH80MZ1iBtLWANrFJ3J5Ce5Sr2JdbDiOy2fvYIl96ypL88BA5jmV0zQ77NXk94fAh 7Bqvjovx/UR9LzqX1z2vxhNiiN8aF/N7xvhqzzP/YhbrL7K6CjPueL4ZK53K8rf6MC1Ue55PtBw lyvhVVNM= X-Received: by 2002:a05:6820:3086:b0:67b:b45c:bd71 with SMTP id 006d021491bc7-67bc8a8a653mr3870927eaf.73.1773306364150; Thu, 12 Mar 2026 02:06:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexander Korotkov Date: Thu, 12 Mar 2026 11:05:51 +0200 X-Gm-Features: AaiRm50hnizGBTfZZwKgBcrvpyQsr7zSLAHDBPvDRV4RENZscXbaY7_l3IfVplI Message-ID: Subject: Re: Odd code around ginScanToDelete To: Pavel Borisov Cc: Xuneng Zhou , Andres Freund , pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, Pavel! On Thu, Mar 12, 2026 at 10:41=E2=80=AFAM Pavel Borisov wrote: > > On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov w= rote: > > > > On Tue, Mar 10, 2026 at 11:19=E2=80=AFAM Xuneng Zhou wrote: > > > On Fri, Mar 6, 2026 at 10:45=E2=80=AFPM Pavel Borisov wrote: > > > > > > > > Hi, Andres and Alexander! > > > > > > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov wrote: > > > > > > > > > > On Wed, Feb 4, 2026 at 12:32=E2=80=AFAM Alexander Korotkov wrote: > > > > > > > > > > > > On Tue, Feb 3, 2026 at 6:26=E2=80=AFPM Andres Freund wrote: > > > > > > > > > > > > > > While looking at converting more places to UnlockReleaseBuffe= r(), in the > > > > > > > course of making UnlockReleaseBuffer() faster than the two se= parate > > > > > > > operations, I found this code: > > > > > > > > > > > > > > static bool > > > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool = isRoot, > > > > > > > DataPageDeleteStack *parent, = OffsetNumber myoff) > > > > > > > ... > > > > > > > > > > > > > > if (!meDelete) > > > > > > > { > > > > > > > if (BufferIsValid(me->leftBuffer)) > > > > > > > UnlockReleaseBuffer(me->leftBuffer); > > > > > > > me->leftBuffer =3D buffer; > > > > > > > } > > > > > > > else > > > > > > > { > > > > > > > if (!isRoot) > > > > > > > LockBuffer(buffer, GIN_UNLOCK); > > > > > > > > > > > > > > ReleaseBuffer(buffer); > > > > > > > } > > > > > > > > > > > > > > if (isRoot) > > > > > > > ReleaseBuffer(buffer); > > > > > > > > > > > > > > > > > > > > > Which sure looks like it'd release buffer twice if isRoot is = set? I guess > > > > > > > that's not reachable, because presumably the root page will a= lways go down the > > > > > > > !meDelete path. But it sure made me wonder if there's a hard = to reach bug. > > > > > > > > > > > > Yes, it's not possible to have meDelete set for root, because > > > > > > me->leftBuffer is always InvalidBuffer for the root. So the br= anch > > > > > > handling meDelete case should better do Assert(!isRoot). > > > > > > > This code was introduced in > > > > > > > commit e14641197a5 > > > > > > > Author: Alexander Korotkov > > > > > > > Date: 2019-11-19 23:07:36 +0300 > > > > > > > > > > > > > > Fix deadlock between ginDeletePage() and ginStepRight() > > > > > > > > > > > > > > I didn't trace it further to see if it existed before that in= some fashion. > > > > > > > > > > > > Yes. I think generally this area needs to be reworked to becom= e more > > > > > > clear, and have vast more comments. It was wrong from my side = trying > > > > > > to fix bugs there without reworking it into something more > > > > > > appropriate. I'm planning to put work on this during this week= . > > > > > > > > > > > > > There's another oddity here: ginScanToDelete() requires that = the root page has > > > > > > > been locked by the caller already, but will afaict re-read th= e root page? But > > > > > > > then have code to avoid locking it again, because that would = not have worked? > > > > > > > Seems odd. > > > > > > > > > > > > > > > > > > It seems a bit odd for me that caller already have locked buffe= r, but > > > > > > passes BlockNumber making us re-read the buffer. But I'm not s= ure > > > > > > that's the same as your point. Could you, please, elaborate mo= re on > > > > > > this? > > > > > > > > > > Here is the refactoring patch. Sorry for the delay. > > > > > > > > Hi, Andres and Alexander! > > > > > > > > I've looked into the patch v1. > > > > Overall, it looks good to me. > > > > > > The refactor LGTM in general. The buffer-ownership rewrite looks > > > cleaner and safer overall. > > > > > > > Some thoughts: > > > > > > > > Is it worth/possible in recursive calls of ginScanToDelete() to fre= e > > > > allocated myStackItem->child after processing all children of the > > > > current level, when they are not needed anymore? > > > > Previously to this patch, palloc-ed "me" variable also was't freed = at > > > > recursion levels. > > > > > > I think freeing myStackItem->child inside recursive calls might not b= e > > > worthwhile here. That node is intentionally reused for subsequent > > > siblings at the same depth, and it carries state (leftBuffer) that ca= n > > > still be needed until the level is fully processed. > > > Freeing/reallocating it per subtree would add churn and make the > > > lifetime rules harder to reason about without meaningful memory > > > savings (the number of nodes is bounded by tree depth, not number of > > > pages). We currently free the chain once after ginScanToDelete() > > > returns in ginVacuumPostingTree(), which matches the natural lifetime > > > boundary > > > > > > > Could limiting the maximum recursion level be useful? > > > > > > Posting-tree depth is naturally small; a hard cap seems to add failur= e > > > risk with little practical benefit. > > > > > > > In the comment to myStackItem before ginScanToDelete(), it might be > > > > worth adding that after processing all pages on the current level, > > > > myStackItem is not needed anymore. > > > > > > > > > > Yes, it's not possible to have meDelete set for root, because > > > > > > me->leftBuffer is always InvalidBuffer for the root. So the br= anch > > > > > > handling meDelete case should better do Assert(!isRoot). > > > > Looks like this additional Assert is not in patch v1. > > > > > > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert > > > > checking that its return result is false: > > > > - ginScanToDelete(gvs, &root); > > > > + deleted =3D ginScanToDelete(gvs, &root); > > > > +. Assert(!deleted); /* Root page is never deleted */ > > > > > > + 1, this is a good invariant check and improves readability > > > > > > One minor nit for the comment: > > > > > > The DataPageDeleteStack.buffer field comment says "valid only while > > > recursing into children" > > > this is true for internal pages, but for leaf pages the buffer is > > > valid until the pageWasDeleted / leftBuffer decision. The validity > > > window is actually "from when the caller sets it until the > > > post-recursion cleanup." > > > > Thank you for catching this. I decided to remove this statement from > > v2. It's hard to explain the life cycle of the buffer clearly in one > > sentence. On the other hand, it's explained in the comments of > > ginScanPostingTreeToDelete(). > > Patch v2 looks good to me. I also agree with Xuneng that this > refactoring improved the logic to make it look clearer. > Thank you for the explanation of buffers lifetime! Thank you for your feedback. I'll push the patch if no objections. ------ Regards, Alexander Korotkov Supabase