Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vTXxt-004JAC-1i for pgsql-hackers@arkaria.postgresql.org; Thu, 11 Dec 2025 04:07:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vTXxs-002BGM-0b for pgsql-hackers@arkaria.postgresql.org; Thu, 11 Dec 2025 04:07:40 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vTXxr-002BGD-2S for pgsql-hackers@lists.postgresql.org; Thu, 11 Dec 2025 04:07:40 +0000 Received: from mail-pj1-x102b.google.com ([2607:f8b0:4864:20::102b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vTXxr-00054w-0R for pgsql-hackers@lists.postgresql.org; Thu, 11 Dec 2025 04:07:39 +0000 Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-343ee44d89aso585201a91.2 for ; Wed, 10 Dec 2025 20:07:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765426058; x=1766030858; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OmxIFo76lT7So1vshWPlQ9RE0WhnIMn1W6DNw7Uu4Ho=; b=NsU268yXRHupA3yrTnD/4wTZIFeE3rYYjHksZzC+boaZeLh9nUF1rUaW7lJa4f4ujL DvAkHc5d7E7vHzM1ga8KyIsvRHWU811QNET0XKtdzlb7YR3/v6jHITiOgAYcK4qSeW3R RfUO+UcGr59BFHemgR7Q59GqVpteHcgERRVEy6fcaIm+hR9c5LhOv6cJpIOalL7ilk+i tubmlqovvsqwEuxBPOCsFPtf5h8tkq1BD538n2Cu2m7gruytMCJzbrqCdtrfx0AxjfVw lslCsN1ulx7C1IP2gOq9MUEEZhdkVc7tJBcEVnxvGqG6waTa4VmGxk/e6gOXUDcSnBXQ Dgqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765426058; x=1766030858; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=OmxIFo76lT7So1vshWPlQ9RE0WhnIMn1W6DNw7Uu4Ho=; b=GRGZixwU7d9M30bMdEao0Wh2EUZ3r2woam11J0gP9lSLAeJ+SGvFsiCV2n/VjdGN9j eQy+hSCaPGngn6D+c4vKb0jBiA0lZGjkJmHh0zwXLVjkQCRNDWDuj18rekdYiB4AdXlr E/15eUF4h7YOX96nbRuoh/cL056eaafSuctBlZyNgPrOs3c4bbW8VYBtjujwKZeMhuhx NxfLmqz6xPO52On5HPM1/PDdYx6Sig04cNUyG0xHCFBul4bSq0J4d4Xl8QA5Z7UAg04U jyySqGhCDG5CSyGVGT5qZkewkkeMCjXMY/SumnOuE2BemVfkEK+bwCc7/mVAdTWNtKKz 8mZQ== X-Forwarded-Encrypted: i=1; AJvYcCWL8fImwMeXItHeljCu/RZLdqiiDZLVLHkzgaSP0GAXsR/NjCtgmbsbo2T9AroXponWicFYL5KBX7Hp9KcY@lists.postgresql.org X-Gm-Message-State: AOJu0Yz7BdfwmrU7BFMH/ooOgODE2q5lJjvXqeHOnliXXWA9xpo8oK+W L0sTbEArvJ52+1qcZltb2VaASrVqqfOTDIaEUO+NZ5eTE4ypaH/cBLHv X-Gm-Gg: AY/fxX66UYDcZCzU3f9k2IMeJ1VXSGUAZLU/ZqeXx9j9mdWEud+KUJJOK1jLcnniclL t0uqeyubXE+UOWRoBx/HjbfLaSkJITrWqgSloXgZIRJIqcgEj4pFclDf3BOMpp4XpcyEJGPHZ97 xbO9cKDRxgekr9miWCzqDrxaTKOmnmEzia/qxTcS2IFiha77ckwVI418m7UklDkGB5ITZ9Ih5AQ Zhm5dx5AU5lZ6QbP34ZyL3Top9kHzuTpurTdBDFTykssnioVAte0mNpqw8I5s7ofdGIOhQH35Vn jFV/LcDwtZq2wZeIQsOl/LnC8OU/pIdNWexZd+C65tLGfmVqabDDhWaBxXvj/yagFyI1VWbFChI SAa8UAzTnO7RPwyWKzvm2f4XTji0udHGztrqMeaz22/vUGEi4Y397ljz+ejRhy4EVd/euwnYXB3 ni+jIIPpKxJkso+zEGBFuTtekpKTdchQ== X-Google-Smtp-Source: AGHT+IGG5DXiQBhumP4nPbvIzXp1dxRPpvAaNvmobYrdHGOKMDY3id8erOfNqm4SrehStIpmNHNTaA== X-Received: by 2002:a17:90b:3dcb:b0:32d:db5b:7636 with SMTP id 98e67ed59e1d1-34a72885f90mr4404422a91.27.1765426057813; Wed, 10 Dec 2025 20:07:37 -0800 (PST) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-34a9285fa0esm401733a91.9.2025.12.10.20.07.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Dec 2025 20:07:37 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) From: Chao Li In-Reply-To: Date: Thu, 11 Dec 2025 12:06:57 +0800 Cc: Andres Freund , Kirill Reshke , Robert Haas , Andrey Borodin , PostgreSQL Hackers , Heikki Linnakangas Content-Transfer-Encoding: quoted-printable Message-Id: <5CEAA162-67B1-44DA-B60D-8B65717E8B05@gmail.com> References: <2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl@mdyyqpqzxjek> <9B2211B5-14B4-4C22-96BD-F7B32E577DB6@gmail.com> To: Melanie Plageman X-Mailer: Apple Mail (2.3826.700.81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Dec 11, 2025, at 07:35, Melanie Plageman = wrote: >=20 > On Tue, Dec 9, 2025 at 12:48=E2=80=AFPM Melanie Plageman > wrote: >>=20 >> In this set 0001 and 0002 are independent. 0003-0007 are all small >> steps toward the single change in 0007 which combines the VM updates >> into the same WAL record as pruning and freezing. 0008 and 0009 are >> removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring >> needed to set the VM during on-access pruning. 0013 - 0015 are small >> steps toward setting the VM on-access. And 0016 sets the prune xid on >> insert so we may set the VM on-access for pages that have only new >> data. >=20 > I committed 0001 and 0002. attached v25 reflects that. > 0001-0004 refactoring steps for eliminate visible record from phase I > (not probably independent commits in the end) > 0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac > 0006-0007 removing the rest of XLOG_HEAP2_VISIBLE > 0008-0010 refactoring for setting VM on-access > 0011-0013 setting the VM on-access > 0014 - setting pd_prune_xid on insert >=20 > - Melanie > = A few more small comments. Sorry for keeping come out new comments. = Actually I learned a lot about vacuum from reviewing this patch. 1 - 0001 ``` +-- the checkpoint cleans the buffer dirtied by freezing the sole tuple +checkpoint; +-- truncating the VM ensures that the next vacuum will need to set it +select pg_truncate_visibility_map('test_vac_unmodified_heap'); +-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no = heap page +-- modification +vacuum test_vac_unmodified_heap; ``` The last vacuum is expected to set vm bits, but the test doesn=E2=80=99t = verify that. Should we verify that like: ``` evantest=3D# SELECT blkno, all_visible, all_frozen FROM = pg_visibility_map('test_vac_unmodified_heap'); blkno | all_visible | all_frozen -------+-------------+------------ 0 | t | t (1 row) ``` As you have been using the extension pg_visibility, adding the = verification with pg_visibility_map() should not be a burden. 2 - 0001 ``` if (presult.all_frozen) { + /* + * We can pass InvalidTransactionId as our = cutoff_xid, since a + * snapshotConflictHorizon sufficient to make = everything safe for + * REDO was logged when the page's tuples were = frozen. + */ = Assert(!TransactionIdIsValid(presult.vm_conflict_horizon)); - flags |=3D VISIBILITYMAP_ALL_FROZEN; + new_vmbits |=3D VISIBILITYMAP_ALL_FROZEN; } ``` The comment here is a little confusing. In the old code, the Assert() as = immediately above the call visibilitymap_set(), and cutoff_xid is a = parameter to the call. But the new code moves the Assert() as well as = the comment far away from the call visibilitymap_set(), so I think the = comment should stay together with the call of visibilitymap_set(). 3 - 0002 ``` * If it finds that the page-level visibility hint or VM is corrupted, = it will * fix them by clearing the VM bits and visibility page hint. This does = not ``` In the second line, =E2=80=9Cvisibility page hint=E2=80=9D is = understandable but feels not quite good. I know it=E2=80=99s actually = =E2=80=9Cpage-level visibility hint=E2=80=9D, so how about just = =E2=80=9Cvisibility hint=E2=80=9D. 4 - 0002 ``` /* - * As of PostgreSQL 9.2, the visibility map bit should never be = set if the - * page-level bit is clear. However, it's possible that the bit = got - * cleared after heap_vac_scan_next_block() was called, so we = must recheck - * with buffer lock before concluding that the VM is corrupt. + * For the purposes of logging, count whether or not the page = was newly + * set all-visible and, potentially, all-frozen. */ - else if (all_visible_according_to_vm && !PageIsAllVisible(page) = && - visibilitymap_get_status(vacrel->rel, blkno, = &vmbuffer) !=3D 0) + if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) =3D=3D 0 && + (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) !=3D 0) { ``` Without do_set_vm=3D=3Dtrue, old_vmbits will only be 0, thus this = =E2=80=9Cif-elseif=E2=80=9D that uses old_vmbits should be moved into = =E2=80=9Cif (do_set_vm)=E2=80=9D. =46rom this perspective, if not = do_set_vm, this function can return early, like: ``` Do_set_vm =3D heap_page_will_set_vm(&new_vmbits) If (!do_set_vm) Return presult.ndeleted; PageSetAllVisible(page); MarkBufferDirty(buf); old_vmbits =3D visibilitymap_set(new_vmbits); If (old_vmbits..) { .. } Else if (old_vmbits=E2=80=A6) { =E2=80=A6 } Return presult.ndeleted; ``` 5 - 0003 ``` /* * lazy_scan_prune() -- lazy_scan_heap() pruning and freezing. * @@ -2076,15 +1979,14 @@ lazy_scan_prune(LVRelState *vacrel, bool *vm_page_frozen) { Relation rel =3D vacrel->rel; - bool do_set_vm =3D false; - uint8 new_vmbits =3D 0; - uint8 old_vmbits =3D 0; PruneFreezeResult presult; PruneFreezeParams params =3D { .relation =3D rel, .buffer =3D buf, + .vmbuffer =3D vmbuffer, + .blk_known_av =3D all_visible_according_to_vm, .reason =3D PRUNE_VACUUM_SCAN, - .options =3D HEAP_PAGE_PRUNE_FREEZE, + .options =3D HEAP_PAGE_PRUNE_FREEZE | = HEAP_PAGE_PRUNE_UPDATE_VM, .vistest =3D vacrel->vistest, .cutoffs =3D &vacrel->cutoffs, }; ``` This maybe a legacy bug. Here presult is not initialized, and it is = immediately passed to heap_page_prune_and_freeze(): ``` heap_page_prune_and_freeze(¶ms, &presult, = <=3D=3D=3D here = &vacrel->offnum, = &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid); ``` Then heap_page_prune_and_freeze() immediately calls = prune_freeze_setup(): ``` /* Initialize prstate */ prune_freeze_setup(params, new_relfrozen_xid, = new_relmin_mxid, presult, &prstate); ``` And prune_freeze_setup() takes presult as a const pointer: ``` static void prune_freeze_setup(PruneFreezeParams *params, TransactionId *new_relfrozen_xid, MultiXactId *new_relmin_mxid, const PruneFreezeResult *presult, = <=3D=3D=3D here PruneState *prstate) { prstate->deadoffsets =3D (OffsetNumber *) presult->deadoffsets; <=3D=3D= here, presult->deadoffsets could be a random value } ``` As this is a separate issue off the current patch, I just filed a new = patch to fix it. Please take a look at: = https://www.postgresql.org/message-id/CAEoWx2%3DjiD1nqch4JQN%2BodAxZSD7mRv= doHUGJYN2r6tQG_66yQ%40mail.gmail.com 6 - 0003 ``` + * Returns true if one or both VM bits should be set, along with = returning the + * desired what bits should be set in the VM in *new_vmbits. ``` Looks like a typo: =E2=80=9Creturning the desired what bits should be = set=E2=80=9D, maybe change to =E2=80=9Creturning the desired bits to be = set=E2=80=9D. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/