Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxDFg-003NQ6-2y for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Mar 2026 00:04:41 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vxDFe-003fx5-2x for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Mar 2026 00:04:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxDFe-003fwx-1w for pgsql-hackers@lists.postgresql.org; Tue, 03 Mar 2026 00:04:39 +0000 Received: from mail-ed1-x533.google.com ([2a00:1450:4864:20::533]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vxDFc-000000005yJ-3rGb for pgsql-hackers@lists.postgresql.org; Tue, 03 Mar 2026 00:04:38 +0000 Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-65f8c8c3a4aso9384180a12.2 for ; Mon, 02 Mar 2026 16:04:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772496275; cv=none; d=google.com; s=arc-20240605; b=MIaloRWYguJ5W4seYbWF0VbHYZRS+WQBvK/meNI8Z9KaIcOq/GY0boohpusac6A8jn YNcAj4AfQ+HqLOTRX17YHyVd3jIiWVsVmly8uquKlZtx6Xu/PCDzeSaBJXrI7+EfFtAc hvW6LZtxff6l+Gs0TsCW58HJwgm6dHoC3wpA7nFr59juiIlR05mTxMSVR8/S2bkYMUfN NnD7LnlZr4EyRp8mj0ayyOnclKtWvB7mMzOCiNuXHIgWkSduq18oXLZMkm1SSUdY0tV8 Mk17Ik10N39r7zWYdVnwaxBD5+OJBrBornbVYHzwh6q54/VzOhe/sv2wC254WEaPRH9A HoVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=eXewK6PRvmQvK5F4CC5vsYwuKFjQmRl+aMIq/n8j3+0=; fh=/L6FhSksF6H2jWPiFMC5MRpLTPfidRHTOICLuye4rjw=; b=OH2VVhfoejRPGsnoXWQf+oBH/lPTJLx8bRj7IaOk91ooWv3FnrAdUJ844rq3FrEAAx H9eCy/MBUSyoGFdCU/0YymcdQnE4q5ZdzyPi5jjgDQt5y8YiyC4UP0CX+rZTEufvBDW4 T2lco/ZrTcPGUMejaw1dqZjgVun2BBjYU8Psi7aVXBFe5v12ME1OtBbz65TieFjvKYVE dgL1Wku1qZR1xr7iRLmdloyV/u8ZM+R3j5FwD+K91cdbNPRe09QvEAp/vplCngqYLIfs Xc6wq+Zr3ytH60BsSd3nyC7S+wYhSBw9AP6VZkBglGx6vQMJiZP/QwlsjjL7smUtn5cB wcuA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772496275; x=1773101075; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eXewK6PRvmQvK5F4CC5vsYwuKFjQmRl+aMIq/n8j3+0=; b=STIyMk0PR5PmuEs0pFJcEAN8pz7SfmnftZuo9EwpkP9YqAJimncBB4bm4bQOMaxVzg 4Ga+y0gW9iZYEjkG2lxKY1PJJkBqwo8d3Fxy6xPRaUmUMcsDJqKIAS6feF65O+5HV9wN hFB6z1jEzlNEqmLnwMqnSzk4pfPhB1SzOylm2sjnaUUtsES4cZBN0zXfY7xw/FpDNC4J 7ywmGi0ak9vf3EE0Me2qM8hGikDC4mbv/1VjA9KKDiDOWp/AfWU97UhBgGsXAtLRCXD+ SxvaskGqePbepsJ7sbKVEnWEBOTh2/osJ7GgEWDyErNmKZA0BLd5790l6CPL5qR7v/Fa Dn/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772496275; x=1773101075; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=eXewK6PRvmQvK5F4CC5vsYwuKFjQmRl+aMIq/n8j3+0=; b=NC0JNFsHlkEbI63gxma/iKYnwIiet36MWvBIHmd8ls0tQewyXC6i2au7DPLlndNd5l zFP0nxkMxecJH2Eh00NDMjIFWnm8spwIdtlUTzPV1D+3guvypffCxZsEaNoD8fa8rA1P LVAln0PTB/x6rp8PUBbzKGAY4aPLQrfKVSZHmyEiY696z3ak6XAEo4qnz5b9fKqxMXNd xADva45nXOEMFkEWVhFa4itWytgLyET/QjHQdXUqQl0XEcDaPF/whLYIU4DP29WHVYYG zQ4JtmmLkgzOq+I9mJVY7mYP5dq49iDbevITyzZeUPeDF4whWqg+L72EC6JNXW4fsHK5 TmsA== X-Forwarded-Encrypted: i=1; AJvYcCWk8KKTU7Emou/ExgUzGxk+FBz/nRmOvV9adOEbi22gF3somJvVa3MZeg5uOVOWuUHCIXkct9d47wwr9iKd@lists.postgresql.org X-Gm-Message-State: AOJu0Yzm+rGImS7skB3kJydy5RE4GLZM7uwUO3HIyMKNB8LTG9iIPH5t qWQ/fkUkY1qkxUU8GuFxK11xeWKD+bUNTmsens3j9TUOna73QTgUewgwCL7uW7f7r16P9k0HXIw HtDHbOOS24FzCczrX398y3f1nbLuGpE0= X-Gm-Gg: ATEYQzxaTQ82M2nWLMdCENEPmjKvucAFH/0HA7Xw9T4JC3Wgk52RabgU09uDasJF2t4 rGJOYGv6W7GHHl0kS+V02rP69PRxRrO8oltduKzsi7Zb6D/zTj+EMAckXiYk7hqTSlO1keI9xGY P4zvGBxzHi1jbo+yubdeA/h22a4DvaqVPByOMYIz1FSy099WArQHraTOywfFS+O/a4aGRS/ykNQ pBdZOkSOt66xalnhMzbytQt2y5miPEkP0zdtguaxORRy9901mCs43zySZO3Kd0hLi0R0iB4hl69 Z5f93z8QMpJhyOhKP+dZYF/EYG37SD6LFRfs4roqaDuDBETpnKQNP9WvxNpsnytIvtlP2669Bzl t7yUAsWdgfVtKNA817qU= X-Received: by 2002:a05:6402:51ce:b0:658:377e:f05d with SMTP id 4fb4d7f45d1cf-65fddee8c32mr7423254a12.24.1772496274933; Mon, 02 Mar 2026 16:04:34 -0800 (PST) MIME-Version: 1.0 References: <6BC5DBAB-6084-4BB8-8450-52E9648AB021@gmail.com> <7F5BCD7A-764D-4D8D-8E27-6F2CAAEA1CEE@gmail.com> <4379FDA3-9446-4E2C-9C15-32EFE8D4F31B@yandex-team.ru> <7ib3sa55sapwjlaz4sijbiq7iezna27kjvvvar4dpgkmadml6t@gfpkkwmdnepx> In-Reply-To: From: Melanie Plageman Date: Mon, 2 Mar 2026 19:04:24 -0500 X-Gm-Features: AaiRm538wCU0nk25fxtndlSuXIEwLHhvi0SJDbbFneiu0AZSiQoH0Jff_BP1vzA Message-ID: Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) To: Andres Freund Cc: Andrey Borodin , Kirill Reshke , Chao Li , Xuneng Zhou , Robert Haas , PostgreSQL Hackers , Heikki Linnakangas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Feb 20, 2026 at 4:34=E2=80=AFPM Andres Freund = wrote: > > On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote: > > Subject: [PATCH v34 13/14] Allow on-access pruning to set pages all-vis= ible > > > > Many queries do not modify the underlying relation. For such queries, i= f > > on-access pruning occurs during the scan, we can check whether the page > > has become all-visible and update the visibility map accordingly. > > Previously, only vacuum and COPY FREEZE marked pages as all-visible or > > all-frozen. > > > > This commit implements on-access VM setting for sequential scans as wel= l > > as for the underlying heap relation in index scans and bitmap heap > > scans. > > For evaluating this, did you build anything that evaluates the frequency = of > this succeeding, causing unnecessary un-all-visibling etc during benchmar= ks? I didn't develop a specific micro-benchmark for this, but I did run some generic pgbenches (which does a single tuple update on accounts followed by a select) because I thought there would be a good amount of un-all-visibling there. I didn't gather stats to confirm though and who knows with a random data distribution (IIRC it was a relatively small working set, but still). I can develop something more targeted, though. > > @@ -631,7 +632,9 @@ heap_prepare_pagescan(TableScanDesc sscan) > > /* > > * Prune and repair fragmentation for the whole page, if possible= . > > */ > > - heap_page_prune_opt(scan->rs_base.rs_rd, buffer); > > + if (sscan->rs_flags & SO_HINT_REL_READ_ONLY) > > + vmbuffer =3D &scan->rs_vmbuffer; > > + heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer); > > I don't love that the signalling to heap_page_prune_opt() about this is b= y > passing vmbuffer or NULL. v35 is more explicit and heap_page_prune_opt() has a rel_read_only flag. > We clearly don't want to actually freeze rows if we're doing an update an= d > might just update the rows again. But it's less clear to me that, if we a= re > pruning dead row versions *and* the page is already all-visible after tha= t > (say because only HOT versions were removed), we shouldn't mark the page = as > such? If we're doing an update and the new tuple fits on the same page, then the page will not be all-visible by the time the update is over, right? And if the new tuple doesn't fit on the same page as the old tuple, then while it would be nice to mark the old page as all-visible, don't we on-access prune the page before actually updating the tuple? Like we are scanning in the old page to update it and on-access prune then to make space for it and then we make the page modification. > > @@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buff= er) > > .cutoffs =3D NULL, > > }; > > > > + if (vmbuffer) > > + { > > + visibilitymap_pin(relation, BufferGetBloc= kNumber(buffer), vmbuffer); > > + params.options |=3D HEAP_PAGE_PRUNE_UPDAT= E_VM; > > + params.vmbuffer =3D *vmbuffer; > > Why do we pin the buffer at this time, rather than deferring that until w= e > actually need it? I guess we just always will access it, but that doesn'= t > seem like it's inherent (c.f. my earlier points about a faster exit when > looking at an already all-frozen page or such). We would need to pin the VM to see if it is all-frozen to exit early. For the on-access case, since we won't freeze, we could rely on PD_ALL_VISIBLE to exit early, but that means we wouldn't be able to identify and fix PD_ALL_VISIBLE/VM-all-visible mismatches. > It's not clear to me why we are pinning the page in lazy_scan_heap(), bef= ore > it's clear that we need it, either. But there the cost is often very low= , > because we have a lot of sequential accesses. But here we might be calle= d > from an index scan, with very little locality of access. Now that, as of v35, we check for VM corruption unconditionally at the start of heap_page_prune_and_freeze() and check the VM to potentially exit early, there's no benefit in deferring pinning the VM in either vacuum or on-access. - Melanie