Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vy4SZ-00HOc5-18 for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 08:53:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vy4SX-00GgtK-2U for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 08:53:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vy4SX-00Ggt7-19 for pgsql-hackers@lists.postgresql.org; Thu, 05 Mar 2026 08:53:29 +0000 Received: from mail-pf1-x42e.google.com ([2607:f8b0:4864:20::42e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vy4SV-00000000VeP-0y6b for pgsql-hackers@lists.postgresql.org; Thu, 05 Mar 2026 08:53:28 +0000 Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-824c9da9928so4687512b3a.3 for ; Thu, 05 Mar 2026 00:53:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772700806; x=1773305606; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MtT7neTfdU8VzFTtTG45Q1ZcR6dk+HtzlwPxIN5L1f4=; b=UDpVaP+KFomeEDFLlf/kzV/HqUB5NfiqS/fJDos5PidPWvP9FpF3w0qKK1ivvsbYO3 qzEUmecO11C+cDg0+h2Nl8aiyzZL+F6HIgBJwW/SuXCUsjRvOrn+sR+F5kajltn9zx2V B8qHCZCrp00hpHYfQ0jCLMVdya/n6uMGTckUgQPdPrjOZKSoqlo/2qh6e060f76kM3xD cyXn8jyoyNCCS5uI6gv68pPkBRT7LSBofyUWDk2b7ZZAPjkOUjDqJiHeowHdWSVD2A5e 1cr84bWUhLzDfBytYqAo9CU4i2s29MNtGDd1mFVsNS9DzwAviuW7/bB4m8wKpLdyHC77 PQkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772700806; x=1773305606; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=MtT7neTfdU8VzFTtTG45Q1ZcR6dk+HtzlwPxIN5L1f4=; b=oQp1/1+vY5zLhYslgur+KShRklmskRNbsKvf8KwFLuKqK83whqHTyuEiSkRL01VNpp eoGp0C2f5f7n561B7UuwBWW2gA/L+r1EO5GP0Ux+X4qeXWZ+LuGHJXGoaMiR3da+hdqI Lcgp/4sS1jESKCypA6OU2Clcp3apDCVMoeLx6pIg9Nqv6hNxVM8SwRLXTMqrGi9vGFAr DsgVm3FankzMxhOBFLCFCOOoQ2WeIMsaZ4hGmjBLa6QYds4W1+J9X0F4AX2RFAfGKWKb VkBJEvMg3tylpZ3cgUYGCVgmfeYTL8iqHmKuH6+ac9VrLfoPZXi1/nGZI+5NfE6/S7cW rEcQ== X-Forwarded-Encrypted: i=1; AJvYcCVZp3+FiiTsAOSzvEPiJeguehYEGBRzgVmI1BDFKKiCV1JxK8arATiA4Zeviaf9rawQlENQTSBUhpdyF8hd@lists.postgresql.org X-Gm-Message-State: AOJu0Yz9bSAzUmGYFfi0OvzHJ1reDnOe0kMf0RbhRqN9DYVMgrrOgyWx TT/iLOW5sLmu2QhNVgsNmRor8zKkAZCsXffFCIyZS7dXSkeJOIfSwqII X-Gm-Gg: ATEYQzy87MDIOEXuooqFjkPKG7iFhmWMTled8JkqCdrIZK3pS6OiUuZ47OPNKRPTGoz XR56HljcAYfQHsU/O1fTh6IbWs02iz6+rB/T90+YqALs6VpQst4LqiXKY9Hoiiv/JHOa9bUAWT1 BdsKYwnympRhzVpZBD2jh979WjH+arkoAqwPR179R4SZVqoFtdD1IuANOs+3U9Fl40dSrsLvt1x Faa0FH9cTPb9guPnhYTNf/LvZ1Nr/cCVHf3Gpt+weS5g1jqtIwX6tMrClsZXiL/5CxyMMPfr80u Q/H5ICsOfaCVvoAHXf07wvzwDHXy4vEPdRDG1OsUxwNgoG0OQA79slc7m3WigZuljbnHGAb/8Sr sEkfYOZyb9ZovsrYdhX20pjlmyFogAkQLgRGceygf1bmHcsySyLzwqDL/gAReCbzu84D4hNnQur 0P9WohTKehanJ9j92RhN3bL8MHo7WNpJI= X-Received: by 2002:a05:6a00:2e1f:b0:7fb:cf05:93cb with SMTP id d2e1a72fcca58-82972b7df71mr4014627b3a.1.1772700806073; Thu, 05 Mar 2026 00:53:26 -0800 (PST) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-82739ff1ca9sm19815618b3a.36.2026.03.05.00.53.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 Mar 2026 00:53:25 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) From: Chao Li In-Reply-To: Date: Thu, 5 Mar 2026 16:52:45 +0800 Cc: Andres Freund , Andrey Borodin , Kirill Reshke , Xuneng Zhou , Robert Haas , PostgreSQL Hackers , Heikki Linnakangas Content-Transfer-Encoding: quoted-printable Message-Id: <81DAFACF-7D55-4A84-ACB0-0425D1669DB4@gmail.com> References: <6BC5DBAB-6084-4BB8-8450-52E9648AB021@gmail.com> <7F5BCD7A-764D-4D8D-8E27-6F2CAAEA1CEE@gmail.com> <4379FDA3-9446-4E2C-9C15-32EFE8D4F31B@yandex-team.ru> <7ib3sa55sapwjlaz4sijbiq7iezna27kjvvvar4dpgkmadml6t@gfpkkwmdnepx> To: Melanie Plageman X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Mar 4, 2026, at 16:59, Chao Li wrote: >=20 >=20 >=20 >> On Mar 3, 2026, at 23:52, Melanie Plageman = wrote: >>=20 >>=20 >>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still = possibly be InvalidTransactionId, then the Assert should be changed to = something like: >>>=20 >>> Assert(prstate->pagefrz.FreezePageConflictXid =3D=3D = InvalidTransactionId || >>> = TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, = prstate->cutoffs->OldestXmin) >>=20 >> This is covered by TransactionIdPrecedesOrEquals because >> InvalidTransactionId is 0. We assume that in many places throughout >> the code. >>=20 >=20 > I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, = prstate->cutoffs->OldestXmin) is true, but that would leave an = impression to code readers that prstate->pagefrz.FreezePageConflictXid = could not be InvalidTransactionId. Thus I think my version explicitly = tells that prstate->pagefrz.FreezePageConflictXid could be = InvalidTransactionId at the point. >=20 >=20 >>> I will continue with 0005 tomorrow. >>=20 >=20 > 4 - 0005 > ``` > * Caller must have pin on the buffer, and must *not* have a lock on = it. > */ > void > -heap_page_prune_opt(Relation relation, Buffer buffer) > +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer = *vmbuffer) > ``` >=20 > I don=E2=80=99t see why vmbuffer has to be of pointer type. Buffer = type is underlying int, I checked the last commit, vmbuffer only passes = in data into the function without passing out anything. >=20 > As we add the new parameter vmbuffer, though it=E2=80=99s not used in = this commit, I think it=E2=80=99d be better to update the header commit = to explain what this parameter will do. >=20 > 5 - 0006 > ``` > + * > + * heap_fix_vm_corruption() makes changes to the VM and, potentially, = the heap > + * page, but it does not need to be done in a critical section = because > + * clearing the VM is not WAL-logged. > + */ > +static void > +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum) > ``` >=20 > Nit: why the last paragraph of the header comments uses the function = name instead of =E2=80=9Cthis function=E2=80=9D? Looks like a = copy-pasto. >=20 > 6 - 0006 > ``` > + if (prstate->lpdead_items > 0) > + { > + ereport(WARNING, > + (errcode(ERRCODE_DATA_CORRUPTED), > + errmsg("LP_DEAD item found on page marked as all-visible"), > + errdetail("relation \"%s\", page %u, tuple %u", > + RelationGetRelationName(prstate->relation), > + prstate->block, offnum))); > + } > + else > + { > + ereport(WARNING, > + (errcode(ERRCODE_DATA_CORRUPTED), > + errmsg("tuple not visible to all found on page marked as = all-visible"), > + errdetail("relation \"%s\", page %u, tuple %u", > + RelationGetRelationName(prstate->relation), > + prstate->block, offnum))); > + } > ``` >=20 > I recently just learned that a detail message should use complete = sentences, and end each with a period, and capitalize the first word of = sentences. See = https://www.postgresql.org/docs/current/error-style-guide.html. >=20 > 7 - 0006 > ``` > + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS) > + { > + /* > + * As of PostgreSQL 9.2, the visibility map bit should never be set = if > + * the page-level bit is clear. However, it's possible that the bit > + * got cleared after heap_vac_scan_next_block() was called, so we = must > + * recheck with buffer lock before concluding that the VM is corrupt. > + */ > + ereport(WARNING, > + (errcode(ERRCODE_DATA_CORRUPTED), > + errmsg("page %u in \"%s\" is not marked all-visible but visibility = map bit is set", > + prstate->block, > + RelationGetRelationName(prstate->relation)))); > + } > ``` >=20 > The comment says =E2=80=9Cwe must recheck with buffer lock = before=E2=80=A6=E2=80=9D, but it only log a warning message. Is the = comment stale? >=20 > 8 - 0007 > ``` > +static void > +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult = *presult) > +{ > + OffsetNumber maxoff =3D PageGetMaxOffsetNumber(prstate->page); > + Page page =3D prstate->page; > + > + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN || > + (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE && > + !prstate->attempt_freeze)); > + > + /* We'll fill in presult for the caller */ > + memset(presult, 0, sizeof(PruneFreezeResult)); > + > + /* > + * Since the page is all-visible, a count of the normal ItemIds on = the > + * page should be sufficient for vacuum's live tuple count. > + */ > + for (OffsetNumber off =3D FirstOffsetNumber; > + off <=3D maxoff; > + off =3D OffsetNumberNext(off)) > + { > + if (ItemIdIsNormal(PageGetItemId(page, off))) > + prstate->live_tuples++; > + } > + > + presult->live_tuples =3D prstate->live_tuples; > + > + /* Clear any stale prune hint */ > + if (TransactionIdIsValid(PageGetPruneXid(page))) > + { > + PageClearPrunable(page); > + MarkBufferDirtyHint(prstate->buffer, true); > + } > + > + presult->vmbits =3D prstate->vmbits; > + > + if (!PageIsEmpty(page)) > + presult->hastup =3D true; > +} > ``` >=20 > * Given this function has done PageIsEmpty(page), that that is true, = we don=E2=80=99t need to count live_tuples, right? That could be a tiny = optimization. > * I see heap_page_bypass_prune_freeze() is only called in one place = and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), = so prstate->vmbits must be 0, so do we need to do presult->vmbits =3D = prstate->vmbits;? > * Do we need to set all_visible and all_frozen to presult? >=20 > 0008 LGTM >=20 > I will continue with 0009 tomorrow. >=20 9 - 0009 =C2=B7=C2=B7=C2=B7 + * Currently, only VACUUM performs freezing, but other callers = may in the + * future. Other callers must initialize prstate.all_frozen to = false, =C2=B7=C2=B7=C2=B7 Nit: prstate.all_frozen -> prstate.set_all_frozen I saw you have fixed this in 0010, but I think it=E2=80=99s better also = fix it here. 10 - 0010 ``` + * Whether or not the page was newly set all-visible and = all-frozen during + * phase I of vacuuming. */ - uint8 vmbits; + BlockNumber new_all_visible_pages; + BlockNumber new_all_visible_frozen_pages; + BlockNumber new_all_frozen_pages; ``` These 3 fields are actually counts rather than pointers to blocks, using = type BlockNumber are quite confusing, though underlying BlockNumber is = uint32. I think they can be just int type. 11 - 0010 ``` + BlockNumber new_all_visible_pages; + BlockNumber new_all_visible_frozen_pages; + BlockNumber new_all_frozen_pages; ``` I don=E2=80=99t see where these 3 fields are initialized. In = lazy_scan_prune(), presult is defined as: ``` PruneFreezeResult presult; ``` So, those fields will hold random values. 12 - 0010 ``` + * conflict would ahve been handled in reaction to the WAL = record freezing ``` Nit: ahve -> have 0011 LGTM 13 - 0012 - bufmask.c ``` + * we don't mark the page all-visible. See = heap_xlog_prune_and_freeze() + * for more details. ``` I don=E2=80=99t find a function named heap_xlog_prune_and_freeze(). 14 - 0012 - heapam_xlog.c ``` + * same approach is taken when replaying XLOG_HEAP2_PRUNE* = records (see + * heap_xlog_prune_and_freeze()). ``` Same as 13. 0013 LGTM I will try to finish the rest 5 commits tomorrow. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/