Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vXaDN-00FdsU-1d for pgsql-hackers@arkaria.postgresql.org; Mon, 22 Dec 2025 07:20:22 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vXaDL-00DeOB-1R for pgsql-hackers@arkaria.postgresql.org; Mon, 22 Dec 2025 07:20:20 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vXaDK-00DeNy-32 for pgsql-hackers@lists.postgresql.org; Mon, 22 Dec 2025 07:20:19 +0000 Received: from mail-pg1-x52a.google.com ([2607:f8b0:4864:20::52a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vXaDJ-001vXq-0J for pgsql-hackers@lists.postgresql.org; Mon, 22 Dec 2025 07:20:18 +0000 Received: by mail-pg1-x52a.google.com with SMTP id 41be03b00d2f7-c075ec1a58aso2041803a12.0 for ; Sun, 21 Dec 2025 23:20:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766388016; x=1766992816; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5ulZhONoB8uLwe3ofeKOoKAJGpbYce1ln1KQRELYHTg=; b=c0INPbPhisDG0Bqgz8M8FLpN6VXuOV6f9MZx8Yb+HwwPb15ehNmLoB27UnjCjJPLxQ RMz/tyxM6lDOgzlXUKrEwVJL08qmV5Yq2xrg3QK5RPbgyHylVR7NBpQ0sCn4el6ghd0j lZY5P/dsLIL5cZeiuXTYh7LGh7IIgMmLsIxL4gNfW9j1RTS5ReuvBXlXpYP7SaCSINGh sp0KcjES9/9N8jtf8Y3PO7IRSuK7bI94EEAVi6GktSixXtp2VwxAVNj0+qZXwXT2P9p9 XGLusxtrJ/ZkV8k1DoBp8OKDiQ3nHXT5ulVIhbVLNFGyRP1pAEAswpxP5eQlFvFTK705 lqKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766388016; x=1766992816; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5ulZhONoB8uLwe3ofeKOoKAJGpbYce1ln1KQRELYHTg=; b=JfqgkWHqzEHlRj3svW7/owh4iJhK/dFfKnMo6nYzRp8iDLTUEuGOB1cJaNLP/5StpU 05CvjZzNxN/4GPfuPaqTZs6bhJY6nWQCEtCq7xRJqF9N9YXRBQi35Jx4NuCDwdbe8GEY zozVgFyrZG0TZ2vXvw5GmLMxs+ZhpNAgXUcZio5z/R0Gu/yzAcJ99osGV8b5e9aatJQ+ urxvs1o4gCOsfedOozrGmZEGwagCSm3Vp9oqfDTqeRBxLt1OhkN5YGYcmlZK3RdPHI8t i8cyLwrfGIaBhJMQuw3T4uBbJy+ZhlwpEpQ3yZMqW5c+tsaZ5p6lHJfV7sn0jr0VXww2 6elQ== X-Forwarded-Encrypted: i=1; AJvYcCUG44ztS46Lo5VUBReoLbIS6Ff3fsrwKbPiw/NE20/6xisGQCCif0scjg+fnRpMzHdKyueYQfAcYMQ1Cg5p@lists.postgresql.org X-Gm-Message-State: AOJu0YwejcecoC9BXrQWfrKaNRZYzCxEwfn4z5BaCG3topYZ7JeN4MEV kB4cHSnj8W0RucPR/vH7brGzR1Zpi6Gz1oDnu/WIDA2ImKShZDzUiQfF X-Gm-Gg: AY/fxX4gRnUDy4oMafhBtFMJ/j7a21JMNuErCY5Kkw67KMkjW/jpvqjVi88zD8ZDDDD 1aGWih9yDC89cHtHIoFHMAWikmwhjR4pamQ1t7ZfU6AsfQA8p9NDBZfN50LUvkMEVxcxwhCOuwr qR1OgsuuWK53VyH6ce/wHfLF33XjOpR3hL1XNAYRmDnYm8/8PP21Von5mLKpeW2cvz/wi5j+Te9 W6gO+2hgb8dq+5lK+bG+gCbZ8mkuYD8zoBZYhuPcoKzJd9qqT4jpYFjoLofWNBzyfznsA196oH9 MpIiK+GzSuN/L8sQigZzHmGzljHbs5sz13e6FYIEXevlPAau4KxKX6ws24XYPrZ6cKP+3aRSBPu 3CbkKMHvwnXe0qg4hN6WK/2rstdnm3+sqt02p4A7CCaxVTEfLIIFavlkGI38HbLWVqRj1438tb+ 8zM8IYCY+DI6SrLasbZ0xl X-Google-Smtp-Source: AGHT+IHjSvy+TCzVS5zMNyuyEMbISqaAWC7UX8d3w4/lXerD2EMcfUWXz/uram6My9EdRotiWDSYVA== X-Received: by 2002:a05:7300:f404:b0:2ae:5bd5:c22e with SMTP id 5a478bee46e88-2b05ec8a77emr6546730eec.30.1766388015623; Sun, 21 Dec 2025 23:20:15 -0800 (PST) Received: from smtpclient.apple ([142.171.105.12]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2b05fcfc1b7sm23765601eec.0.2025.12.21.23.20.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 21 Dec 2025 23:20:15 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) From: Chao Li In-Reply-To: Date: Mon, 22 Dec 2025 15:19:39 +0800 Cc: Xuneng Zhou , Andres Freund , Kirill Reshke , Robert Haas , Andrey Borodin , PostgreSQL Hackers , Heikki Linnakangas Content-Transfer-Encoding: quoted-printable Message-Id: <6BC5DBAB-6084-4BB8-8450-52E9648AB021@gmail.com> References: <2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl@mdyyqpqzxjek> To: Melanie Plageman X-Mailer: Apple Mail (2.3826.700.81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Dec 20, 2025, at 05:09, Melanie Plageman = wrote: >=20 > Attached v29 addresses some feedback and also corrects a small error > with the assertion I had added in the previous version's 0009. >=20 > On Thu, Dec 18, 2025 at 10:38=E2=80=AFPM Xuneng Zhou = wrote: >>=20 >> I=E2=80=99ve done a basic review of patches 1 and 2. Here are some = comments >> which may be somewhat immature, as this is a fairly large change set >> and I=E2=80=99m new to some parts of the code. >>=20 >> 1) Potential stale old_vmbits after VM repair n v2 >=20 > Good catch! I've fixed this in attached v29. >=20 >> 2) Add Assert(BufferIsDirty(buf)) >>=20 >> Since the patch's core claim is "buffer must be dirty before WAL >> registration", an assertion encodes this invariant. Should we add: >>=20 >> Assert(BufferIsValid(buf)); >> Assert(BufferIsDirty(buf)); >>=20 >> right before the visibilitymap_set() call? >=20 > There are already assertions that will trip in various places -- most > importantly in XLogRegisterBuffer(), which is the one that inspired > this refactor. >=20 >> The comment at lines: >>> "The only scenario where it is not already dirty is if the VM was = removed=E2=80=A6" >>=20 >> This phrasing could become misleading after future refactors. Can we >> make it more direct like: >>=20 >>> "We must mark the heap buffer dirty before calling = visibilitymap_set(), because it may WAL-log the buffer and = XLogRegisterBuffer() requires it." >=20 > I see your point about future refactors missing updating comments like > this. But, I don't think we are going to refactor the code such that > we can have PD_ALL_VISIBLE set without the VM bits set more often. > Also, it is common practice in Postgres to describe very specific edge > cases or odd scenarios in order to explain code that may seem > confusing without the comment. It does risk that comment later > becoming stale, but it is better that future developers understand why > the code is there. >=20 > That being said, I take your point that the comment is confusing. I > have updated it in a different way. >=20 >>> "Even if PD_ALL_VISIBLE is already set, we don't need to worry about = unnecessarily dirtying the heap buffer, as it must be marked dirty = before adding it to the WAL chain. The only scenario where it is not = already dirty is if the VM was removed..." >>=20 >> In this test we now call MarkBufferDirty() on the heap page even when >> only setting the VM, so the comments claiming =E2=80=9Cdoes not need = to modify >> the heap buffer=E2=80=9D/=E2=80=9Cno heap page modification=E2=80=9D = might be misleading. It >> might be better to say the test doesn=E2=80=99t need to modify heap >> tuples/page contents or doesn=E2=80=99t need to prune/freeze. >=20 > The point I'm trying to make is that we have to dirty the buffer even > if we don't modify the page because of the XLOG sub-system > requirements. And, it may seem like a waste to do that if not > modifying the page, but the page will rarely be clean anyway. I've > tried to make this more clear in attached v29. >=20 > - Melanie > = A few more comments on v29: 1 - 0002 - Looks like since 0002, visibilitymap_set()=E2=80=99s return = value is no longer used, so do we need to update the function and change = return type to void? I remember in some patches, to address Coverity = alerts, people had to do =E2=80=9C(void) = function_with_a_return_value()=E2=80=9D. 2 - 0003 ``` + * Helper to correct any corruption detected on an heap page and its ``` Nit: =E2=80=9Can=E2=80=9D -> =E2=80=9Ca=E2=80=9D 3 - 0003 ``` +static bool +identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer, + BlockNumber = heap_blk, Page heap_page, + int = nlpdead_items, + Buffer = vmbuffer, + uint8 vmbits) +{ + Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) =3D=3D = vmbits); ``` Right before this function is called: ``` old_vmbits =3D visibilitymap_get_status(vacrel->rel, blkno, = &vmbuffer); + if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, = page, + = presult.lpdead_items, vmbuffer, + = old_vmbits)) ``` So, the Assert() is checking if old_vmbits is newly returned from = visibilitymap_get_status(), in that case, = identify_and_fix_vm_corruption() can take vmbits as a pointer , and it = calls visibilitymap_get_status() to get vmbits itself and returns vmbits = via the pointer, so that we don=E2=80=99t need to call = visibilitymap_get_status() twice. 4 - 0004 ``` + * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is = set and + * we have attempted to update the VM. + */ + uint8 new_vmbits; + uint8 old_vmbits; ``` The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM = option is set=E2=80=9D is a clear indication, but how to decide "we have = attempted to update the VM=E2=80=9D? By reading the code: ``` + prstate->attempt_update_vm =3D + (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) !=3D 0; ``` It=E2=80=99s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, = maybe we don=E2=80=99t the =E2=80=9Cand=E2=80=9D part. 5 - 0004 ``` + * Returns true if one or both VM bits should be set, along with = returning the + * current value of the VM bits in *old_vmbits and the desired new = value of + * the VM bits in *new_vmbits. + */ +static bool +heap_page_will_set_vm(PruneState *prstate, + Relation relation, + BlockNumber heap_blk, Buffer = heap_buffer, Page heap_page, + Buffer vmbuffer, + int nlpdead_items, + uint8 *old_vmbits, + uint8 *new_vmbits) +{ + if (!prstate->attempt_update_vm) + return false; ``` old_vmbits and new_vmbits are purely output parameters. So, maybe we = should set them to 0 inside this function instead of relying on callers = to initialize them. I think this is a similar case where I raised a comment earlier about = initializing presult to {0} in the callers, and you only wanted to set = presult in heap_page_prune_and_freeze(). 6 - 0004 ``` @@ -823,13 +975,19 @@ heap_page_prune_and_freeze(PruneFreezeParams = *params, MultiXactId = *new_relmin_mxid) { Buffer buffer =3D params->buffer; + Buffer vmbuffer =3D params->vmbuffer; Page page =3D BufferGetPage(buffer); + BlockNumber blockno =3D BufferGetBlockNumber(buffer); PruneState prstate; bool do_freeze; bool do_prune; bool do_hint_prune; + bool do_set_vm; bool did_tuple_hint_fpi; int64 fpi_before =3D pgWalUsage.wal_fpi; + uint8 new_vmbits =3D 0; + uint8 old_vmbits =3D 0; + =20 /* Initialize prstate */ ``` Nit: an extra empty line is added. 7 - 0005 ``` - * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze = tuples, and - * will return 'all_visible', 'all_frozen' flags to the caller. + * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze = tuples ``` Nit: a tailing dot is needed in the end of the comment line. 8 - 0005 ``` @@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams = *params, Buffer vmbuffer =3D params->vmbuffer; Page page =3D BufferGetPage(buffer); BlockNumber blockno =3D BufferGetBlockNumber(buffer); + TransactionId vm_conflict_horizon =3D InvalidTransactionId; ``` I guess the variable name =E2=80=9Cvm_conflict_horizon=E2=80=9D comes = from the old "presult->vm_conflict_horizon=E2=80=9D. But in the new = logic, this variable is used more generic, for example = Assert(debug_cutoff =3D=3D vm_conflict_horizon). I see 0006 has renamed = to =E2=80=9Cconflict_xid=E2=80=9D, so it=E2=80=99s up to you if or not = rename it. But to make the commit self-contained, I=E2=80=99d suggest = renaming it. 9 - 0006 ``` @@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, = Buffer buf, { ItemId itemid; HeapTupleData tuple; + TransactionId dead_after =3D InvalidTransactionId; ``` This initialization seems to not needed, as = HeapTupleSatisfiesVacuumHorizon() will always set a value to it. 10 - 0010 ``` + * there is any snapshot that still = consider the newest xid on ``` Nit: consider -> considers 11 - 0011 ``` + * page. If we won't attempt freezing, just unset all-visible = now, though. */ + if (!prstate->attempt_freeze) + { + prstate->all_visible =3D false; + prstate->all_frozen =3D false; + } ``` The comment says =E2=80=9Cjust unset all-visible=E2=80=9D, but the code = actually also unset all_frozen. 12 - 0012 ``` + /* + * RT indexes of relations modified by the query either through + * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE + */ + Bitmapset *es_modified_relids; ``` As we intentionally only want indexes, does it make sense to just name = the field es_modified_rtindexes to make it more explicit. 13 - 0012 ``` + /* If it has a rowmark, the relation is modified = */ + estate->es_modified_relids =3D = bms_add_member(estate->es_modified_relids, + = rc->rti); ``` I think this comment is a little misleading, because SELECT FOR = UPDATE/SHARE doesn=E2=80=99t always modify tuples of the relation. If a = reader not associating this code with this patch, he may consider the = comment is wrong. So, I think we should make the comment more explicit. = Maybe rephrase like =E2=80=9CIf it has a rowmark, the relation may = modify or lock heap pages=E2=80=9D. 14 - 0015 - commit message ``` Setting pd_prune_xid on insert can cause a page to be dirtied and written out when it previously would not have been, affetcting the ``` Typo: affetcting -> affecting Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/