Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxi5j-00H3rv-1s for pgsql-hackers@arkaria.postgresql.org; Wed, 04 Mar 2026 09:00:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vxi5i-00Bp7b-07 for pgsql-hackers@arkaria.postgresql.org; Wed, 04 Mar 2026 09:00:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxi5h-00Bp7Q-23 for pgsql-hackers@lists.postgresql.org; Wed, 04 Mar 2026 09:00:26 +0000 Received: from mail-pl1-x62e.google.com ([2607:f8b0:4864:20::62e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vxi5f-00000000LTV-1cwH for pgsql-hackers@lists.postgresql.org; Wed, 04 Mar 2026 09:00:25 +0000 Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-2ae45a4cc54so22207045ad.0 for ; Wed, 04 Mar 2026 01:00:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772614823; x=1773219623; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GGoh5POFN91h1TlG3+CY5/7lvk4COWxyg4pzljkGi6w=; b=gMCcSmwnpj7dnghYryrd9zkWBQOI3pa0d3iDGFesmt5Zoe3ZiJN3s99IX5+TOUI4W7 3L9IAzG9UsLkJ2mHMGnmHMPCGU/R9xNZdH3HbbYwhDo+qEZeS5/Cf82L3TAwtqd8iSHM d/TGB8EOKcz5Im29knVpNWPWGEik4IlWYWVzEpQdYrV49XnTmS8B4SqxRVLDu2ttt9zS Dm4gAd7+f1ZSdmpDS3WgpaVGTCFLH4dbtwep4Eluy2dzRCtbe6lO81DtYLt2fl4amhRo HyIEIJZ5X64D37AcWkdwCmwmlItQdStkwlUnngX2t2qaC9XsRkBV27mMfWLOt0pAKjLw 46BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772614823; x=1773219623; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=GGoh5POFN91h1TlG3+CY5/7lvk4COWxyg4pzljkGi6w=; b=qmYH187E8QHIT8iUckHFCYfqpI+Pa7A9s/qBfLnSRgGV+XqVwRuQ0Ouv2iIr+kuh+f WX/1TirIgGJvGApTVN0YbRy3bI7+agh1r+/FZHZnCk2zPoCHaeGCi7WCF27u7JGNb/Jf V4TlIt7uSKdnaMM6fFDw9UJZ5L9GuyhRRSIPy585uVI7jEPvAdY77AKlJQgyGq/xnvy7 aX696g87KiX/17ZJD4Gm90Mm8RxafFHb+MEFbtf8unA4oEL4C0PZJRKzkVHF0TSYBaAF SQtNLyu2kMSU1AuFC8AIwTBOiNaWE6Dwtjj9MOKvBZxPhXB5Tnnz0iXlmkZT93mbRZ8E qN7w== X-Forwarded-Encrypted: i=1; AJvYcCW+1eaa3t/LDKR/uztiKMlJh/yp9gETQCW3G4BxHn2LS02A9lx2F76g36/l9LST7+mnRO0yz8Y3oD3dV6wD@lists.postgresql.org X-Gm-Message-State: AOJu0YyIuXazUuFI6gcr1qbW1VLh/2Re6/vQ7MgGh1FnWbWcP8YWuUJg c6/ARnAPQCmrfJFvySTOs8yznfQTF8FKQFnj0YLTHFw559/Rwhhr+0eh X-Gm-Gg: ATEYQzzYKtYqKoLJUX8Qkwqz89wYP6u4TYqA6Ya8X4wwl0FilhHgagPSFmvj8zgTCuU mZzPk05470g7VPRAxyBOTS3z1EbdGDJfhGuLSAgF3YpghGnKiAfjOkQ7kHMHr4JAZ/KI5CQfyxn MEPuR8OuHQSEY6qaAkR9Ptx7nI62ZUplg4Naw7gKO5y2kaV4BiXXWu2Aqu4/jnZLc2yVG5F4Bj0 slKGdGiPndQt+vczkbqQcVLDmwGW1OprOw99L6E0glMUPnE5REokhHdA0coqJzfBNdRzHwHa9WR NM3xBUq6TVGdijELfpj5Ts9CCGLRanuWBUJ8SlaN3I2JWzvOqiUEBl9jXCVXl+qa/uqkNlwUj2Z tRp/o7tcYGG4VM6aGdo0bgLF/ciA1e6FimGoqMqxxKa22cwWVu92mwexxl/3tQfSybOlWP9ZWkO DRIcotxtwDGP/v0cR1qC9H5P/64CBaF+Y= X-Received: by 2002:a17:903:2f90:b0:2a9:616c:1711 with SMTP id d9443c01a7336-2ae6ab92b0cmr14595975ad.42.1772614817818; Wed, 04 Mar 2026 01:00:17 -0800 (PST) Received: from smtpclient.apple ([203.76.245.26]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2ae496470d3sm96327685ad.15.2026.03.04.01.00.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Mar 2026 01:00:17 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) From: Chao Li In-Reply-To: Date: Wed, 4 Mar 2026 16:59:41 +0800 Cc: Andres Freund , Andrey Borodin , Kirill Reshke , Xuneng Zhou , Robert Haas , PostgreSQL Hackers , Heikki Linnakangas Content-Transfer-Encoding: quoted-printable Message-Id: References: <6BC5DBAB-6084-4BB8-8450-52E9648AB021@gmail.com> <7F5BCD7A-764D-4D8D-8E27-6F2CAAEA1CEE@gmail.com> <4379FDA3-9446-4E2C-9C15-32EFE8D4F31B@yandex-team.ru> <7ib3sa55sapwjlaz4sijbiq7iezna27kjvvvar4dpgkmadml6t@gfpkkwmdnepx> To: Melanie Plageman X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Mar 3, 2026, at 23:52, Melanie Plageman = wrote: >=20 >=20 >> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still = possibly be InvalidTransactionId, then the Assert should be changed to = something like: >>=20 >> Assert(prstate->pagefrz.FreezePageConflictXid =3D=3D = InvalidTransactionId || >> = TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, = prstate->cutoffs->OldestXmin) >=20 > This is covered by TransactionIdPrecedesOrEquals because > InvalidTransactionId is 0. We assume that in many places throughout > the code. >=20 I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, = prstate->cutoffs->OldestXmin) is true, but that would leave an = impression to code readers that prstate->pagefrz.FreezePageConflictXid = could not be InvalidTransactionId. Thus I think my version explicitly = tells that prstate->pagefrz.FreezePageConflictXid could be = InvalidTransactionId at the point. >> I will continue with 0005 tomorrow. >=20 4 - 0005 ``` * Caller must have pin on the buffer, and must *not* have a lock on = it. */ void -heap_page_prune_opt(Relation relation, Buffer buffer) +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer) ``` I don=E2=80=99t see why vmbuffer has to be of pointer type. Buffer type = is underlying int, I checked the last commit, vmbuffer only passes in = data into the function without passing out anything. As we add the new parameter vmbuffer, though it=E2=80=99s not used in = this commit, I think it=E2=80=99d be better to update the header commit = to explain what this parameter will do. 5 - 0006 ``` + * + * heap_fix_vm_corruption() makes changes to the VM and, potentially, = the heap + * page, but it does not need to be done in a critical section because + * clearing the VM is not WAL-logged. + */ +static void +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum) ``` Nit: why the last paragraph of the header comments uses the function = name instead of =E2=80=9Cthis function=E2=80=9D? Looks like a = copy-pasto. 6 - 0006 ``` + if (prstate->lpdead_items > 0) + { + ereport(WARNING, + = (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("LP_DEAD item found on = page marked as all-visible"), + errdetail("relation \"%s\", = page %u, tuple %u", + = RelationGetRelationName(prstate->relation), + = prstate->block, offnum))); + } + else + { + ereport(WARNING, + = (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("tuple not visible to = all found on page marked as all-visible"), + errdetail("relation \"%s\", = page %u, tuple %u", + = RelationGetRelationName(prstate->relation), + = prstate->block, offnum))); + } ``` I recently just learned that a detail message should use complete = sentences, and end each with a period, and capitalize the first word of = sentences. See = https://www.postgresql.org/docs/current/error-style-guide.html. 7 - 0006 ``` + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS) + { + /* + * As of PostgreSQL 9.2, the visibility map bit should = never be set if + * the page-level bit is clear. However, it's possible = that the bit + * got cleared after heap_vac_scan_next_block() was = called, so we must + * recheck with buffer lock before concluding that the = VM is corrupt. + */ + ereport(WARNING, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("page %u in \"%s\" is not marked = all-visible but visibility map bit is set", + prstate->block, + = RelationGetRelationName(prstate->relation)))); + } ``` The comment says =E2=80=9Cwe must recheck with buffer lock before=E2=80=A6= =E2=80=9D, but it only log a warning message. Is the comment stale? 8 - 0007 ``` +static void +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult = *presult) +{ + OffsetNumber maxoff =3D PageGetMaxOffsetNumber(prstate->page); + Page page =3D prstate->page; + + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN || + (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE && + !prstate->attempt_freeze)); + + /* We'll fill in presult for the caller */ + memset(presult, 0, sizeof(PruneFreezeResult)); + + /* + * Since the page is all-visible, a count of the normal ItemIds = on the + * page should be sufficient for vacuum's live tuple count. + */ + for (OffsetNumber off =3D FirstOffsetNumber; + off <=3D maxoff; + off =3D OffsetNumberNext(off)) + { + if (ItemIdIsNormal(PageGetItemId(page, off))) + prstate->live_tuples++; + } + + presult->live_tuples =3D prstate->live_tuples; + + /* Clear any stale prune hint */ + if (TransactionIdIsValid(PageGetPruneXid(page))) + { + PageClearPrunable(page); + MarkBufferDirtyHint(prstate->buffer, true); + } + + presult->vmbits =3D prstate->vmbits; + + if (!PageIsEmpty(page)) + presult->hastup =3D true; +} ``` * Given this function has done PageIsEmpty(page), that that is true, we = don=E2=80=99t need to count live_tuples, right? That could be a tiny = optimization. * I see heap_page_bypass_prune_freeze() is only called in one place and = immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so = prstate->vmbits must be 0, so do we need to do presult->vmbits =3D = prstate->vmbits;? * Do we need to set all_visible and all_frozen to presult? 0008 LGTM I will continue with 0009 tomorrow. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/