Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0bcH-0023AW-2j for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 08:42:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0bcG-00DrpP-0z for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 08:42:00 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0bcF-00DrpG-2a for pgsql-hackers@lists.postgresql.org; Thu, 12 Mar 2026 08:42:00 +0000 Received: from mail-ed1-x52b.google.com ([2a00:1450:4864:20::52b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0bcE-00000001k5i-0f58 for pgsql-hackers@postgresql.org; Thu, 12 Mar 2026 08:41:59 +0000 Received: by mail-ed1-x52b.google.com with SMTP id 4fb4d7f45d1cf-660a58841d4so860731a12.0 for ; Thu, 12 Mar 2026 01:41:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773304917; cv=none; d=google.com; s=arc-20240605; b=gKBfd4WlKX4oas40MtqJQQWK9FgDemBmA90qKjRtFOpagT7+a5GkSK5QH1C+XFS4Qk hfcWP3ROhW6BlUcbOBz49vB/OQoyEvURhF1ZEDChAqP6pW313khcV3wOuNG3iPTwI6jB DdQ4xM9O0DwcW/Mch+fUDz/n1Hw8zswyA/+g8iRLFo4M5zEArpQLSt9BDcHdrbxzNyL4 iAEomciPHJ2lr6C/VoS8wed4wnO4w5sTq5hGuV+Way72kfTzXIssPH/Jx+lo0AzdNrtB J561A7ekKoV1cgeLvdDJXdEV3VtxDjcnOr+4/8JWXNEreaWjWw5Hwc6pd5n3mMOzmFTE 5F6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=VTesBsOSu7pqw59TiR4tg4wLaoZsxip9oIYu6Jngs6A=; fh=z4vrAGE85rs5d0S5Z7zPalhDPUcDCFqn5EP1Qa7ynn4=; b=NpLceJrRyP/3CvX4evhtxqpnM6DBbgAeRkVzysS4lcCoEyxo1SlEKCmdruiSj97ID8 1k1GCHQlhW/u0vuPQKFwlV50ThFtkISBKo11iYVKoqu0XU+E6oSD6hQpXWBkUwnUzWt8 h2pqpnkzssmzLrSTJoZUuQAjKmyTMU63ZXvcpcJC3vOOhZZPsw8KYTwsollFQvkFJVYG T7MKHCyRiXE9paF4b8ZvVrbZjkMWsyliymbEvmxwxFCPJcPO3/1RJ29+iwNDNPslSUj6 SckenXsalG7cuCu1fp2Ez/WoT/wZg2/Hg7o7LMR7jbSLaU8MmQQ3vxBf4ZEKjAUabB45 nurg==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773304917; x=1773909717; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VTesBsOSu7pqw59TiR4tg4wLaoZsxip9oIYu6Jngs6A=; b=NcmwJEMXS+Yc9o9kVcHeNYOru6IicJvDYBwRIZjA5YHOAbWXF5NEY8mJP2VrFDA86B Pe929LtNGQhZh0+HQrqW9OFtp58BCpZ0B7KLpqcTdn+TfzFOPc3tCM+80PAcuHRkIXkg btgCjMrf4xspBVoiEIPBWH7uejmlmpxzbOYapl0DNI9Ahf5utIzk6asqTsHfg96VrwxP 4u6964V6E5HCwNkLt/q/frs2Fq6io+tCA8xLKDOTVeDdJa2ZNvIqJzLnuAjlWxnoSVXh OCGfYlqf0hAQ5N39zT5sUk9WsMzljzpY055QYmBgV28yOAAH6iVXgKOrWzVHRLYW/mqd x4Dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773304917; x=1773909717; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VTesBsOSu7pqw59TiR4tg4wLaoZsxip9oIYu6Jngs6A=; b=PunSRgx1q5jXC1FH4vX/ge2ajRviALC+5OYAdeilFiRdLD9HJRVN+GkFIPtXT0VkJr pclYOiIE5WP5h8klaJUBMgUi0GXVDdSfYUd7nELwb7m+ZMI/6AYghN0c/xZQ7XRY7dHk /OMS1CBOvAF8jE1wqKK+HudmEMSuFVzd4R2V4RtAFXZ2qQ29x+0P44SxM0yRPPX+GWdv 85hhm4kMylDH90OPmiHZMXjrApZiQrlMCznYZmCpEGhHiu5RQUP35RyQwrzXMf0n7kGQ GnhVKOM9spzONarARE+JjqUYH5Sld8Q0H+D8pPu8DXQAIEV+08/PX9f75zbR44CUQeLI sPmA== X-Forwarded-Encrypted: i=1; AJvYcCW9G6XFRvBi8nRaBrZFqTr6onQYfvCz/7+2R9THqdGUpzRC6nmMYvkpq+jQCkcBIkMOPtm+FfxzsMTJ1MKN@postgresql.org X-Gm-Message-State: AOJu0Yy/8cQ1H93/Kiv+AqHmL4EN5ppqMojVQqNx+B60FhGms0nzowVx 1psStcVGkHx6blIZlEbPNXZG5IZo30JrTzfdnuAh4QrHGgLHWfJBbvok3wRjARIoASEzDYCTMHX erIkv89SzgTlBCySvLJbjVtbDq7Ohce8= X-Gm-Gg: ATEYQzzq8KcAuSPV7S/XpkOoN4rrA/nBHQqtcerPeNshnL9hliw1Jg8duiAgzZP1jeN tmRv6Jw2xLGyRVeKwbCHKgpjRGqaUitYxcO+EE/FhEYq9lqhXdOzwQQgJbudsf0AQIZ6BjyQh6Q hrzp4rgH+svO/O0V3Y6j522GZ6Zzd11x9kzojTDA8N+Ju0X4Rks6DBLC86U1stPqpLhTx8O08Gk cjaHHcOhIxNksqxZR53JZduPguu3G636gRF9Hhkts6V5qdXst+pk6dkn2qwa7o+YjJkTAyQXkqY Cf+GynKa6JYeVM56U6VY X-Received: by 2002:a05:6402:5188:b0:660:a23d:4666 with SMTP id 4fb4d7f45d1cf-66319cc6d5amr3067601a12.18.1773304916413; Thu, 12 Mar 2026 01:41:56 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Pavel Borisov Date: Thu, 12 Mar 2026 12:41:44 +0400 X-Gm-Features: AaiRm514W1AB8WzRDIFI5CXQk5MF5-BaZ6ex8K6Mlu961OElLc0KGr2qa5uys_0 Message-ID: Subject: Re: Odd code around ginScanToDelete To: Alexander Korotkov Cc: Xuneng Zhou , Andres Freund , pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov wro= te: > > On Tue, Mar 10, 2026 at 11:19=E2=80=AFAM Xuneng Zhou wrote: > > On Fri, Mar 6, 2026 at 10:45=E2=80=AFPM Pavel Borisov wrote: > > > > > > Hi, Andres and Alexander! > > > > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov wrote: > > > > > > > > On Wed, Feb 4, 2026 at 12:32=E2=80=AFAM Alexander Korotkov wrote: > > > > > > > > > > On Tue, Feb 3, 2026 at 6:26=E2=80=AFPM Andres Freund wrote: > > > > > > > > > > > > While looking at converting more places to UnlockReleaseBuffer(= ), in the > > > > > > course of making UnlockReleaseBuffer() faster than the two sepa= rate > > > > > > operations, I found this code: > > > > > > > > > > > > static bool > > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool is= Root, > > > > > > DataPageDeleteStack *parent, Of= fsetNumber myoff) > > > > > > ... > > > > > > > > > > > > if (!meDelete) > > > > > > { > > > > > > if (BufferIsValid(me->leftBuffer)) > > > > > > UnlockReleaseBuffer(me->leftBuffer); > > > > > > me->leftBuffer =3D buffer; > > > > > > } > > > > > > else > > > > > > { > > > > > > if (!isRoot) > > > > > > LockBuffer(buffer, GIN_UNLOCK); > > > > > > > > > > > > ReleaseBuffer(buffer); > > > > > > } > > > > > > > > > > > > if (isRoot) > > > > > > ReleaseBuffer(buffer); > > > > > > > > > > > > > > > > > > Which sure looks like it'd release buffer twice if isRoot is se= t? I guess > > > > > > that's not reachable, because presumably the root page will alw= ays go down the > > > > > > !meDelete path. But it sure made me wonder if there's a hard to= reach bug. > > > > > > > > > > Yes, it's not possible to have meDelete set for root, because > > > > > me->leftBuffer is always InvalidBuffer for the root. So the bran= ch > > > > > handling meDelete case should better do Assert(!isRoot). > > > > > > This code was introduced in > > > > > > commit e14641197a5 > > > > > > Author: Alexander Korotkov > > > > > > Date: 2019-11-19 23:07:36 +0300 > > > > > > > > > > > > Fix deadlock between ginDeletePage() and ginStepRight() > > > > > > > > > > > > I didn't trace it further to see if it existed before that in s= ome fashion. > > > > > > > > > > Yes. I think generally this area needs to be reworked to become = more > > > > > clear, and have vast more comments. It was wrong from my side tr= ying > > > > > to fix bugs there without reworking it into something more > > > > > appropriate. I'm planning to put work on this during this week. > > > > > > > > > > > There's another oddity here: ginScanToDelete() requires that th= e root page has > > > > > > been locked by the caller already, but will afaict re-read the = root page? But > > > > > > then have code to avoid locking it again, because that would no= t have worked? > > > > > > Seems odd. > > > > > > > > > > > > > > > It seems a bit odd for me that caller already have locked buffer,= but > > > > > passes BlockNumber making us re-read the buffer. But I'm not sur= e > > > > > that's the same as your point. Could you, please, elaborate more= on > > > > > this? > > > > > > > > Here is the refactoring patch. Sorry for the delay. > > > > > > Hi, Andres and Alexander! > > > > > > I've looked into the patch v1. > > > Overall, it looks good to me. > > > > The refactor LGTM in general. The buffer-ownership rewrite looks > > cleaner and safer overall. > > > > > Some thoughts: > > > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free > > > allocated myStackItem->child after processing all children of the > > > current level, when they are not needed anymore? > > > Previously to this patch, palloc-ed "me" variable also was't freed at > > > recursion levels. > > > > I think freeing myStackItem->child inside recursive calls might not be > > worthwhile here. That node is intentionally reused for subsequent > > siblings at the same depth, and it carries state (leftBuffer) that can > > still be needed until the level is fully processed. > > Freeing/reallocating it per subtree would add churn and make the > > lifetime rules harder to reason about without meaningful memory > > savings (the number of nodes is bounded by tree depth, not number of > > pages). We currently free the chain once after ginScanToDelete() > > returns in ginVacuumPostingTree(), which matches the natural lifetime > > boundary > > > > > Could limiting the maximum recursion level be useful? > > > > Posting-tree depth is naturally small; a hard cap seems to add failure > > risk with little practical benefit. > > > > > In the comment to myStackItem before ginScanToDelete(), it might be > > > worth adding that after processing all pages on the current level, > > > myStackItem is not needed anymore. > > > > > > > > Yes, it's not possible to have meDelete set for root, because > > > > > me->leftBuffer is always InvalidBuffer for the root. So the bran= ch > > > > > handling meDelete case should better do Assert(!isRoot). > > > Looks like this additional Assert is not in patch v1. > > > > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert > > > checking that its return result is false: > > > - ginScanToDelete(gvs, &root); > > > + deleted =3D ginScanToDelete(gvs, &root); > > > +. Assert(!deleted); /* Root page is never deleted */ > > > > + 1, this is a good invariant check and improves readability > > > > One minor nit for the comment: > > > > The DataPageDeleteStack.buffer field comment says "valid only while > > recursing into children" > > this is true for internal pages, but for leaf pages the buffer is > > valid until the pageWasDeleted / leftBuffer decision. The validity > > window is actually "from when the caller sets it until the > > post-recursion cleanup." > > Thank you for catching this. I decided to remove this statement from > v2. It's hard to explain the life cycle of the buffer clearly in one > sentence. On the other hand, it's explained in the comments of > ginScanPostingTreeToDelete(). Hi, Alexander! Patch v2 looks good to me. I also agree with Xuneng that this refactoring improved the logic to make it look clearer. Thank you for the explanation of buffers lifetime! Regards, Pavel Borisov