Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uzHra-001nka-Ik for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Sep 2025 16:52:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uzHrZ-000hr5-8s for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Sep 2025 16:52:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uzHoR-000eQ5-Hs for pgsql-hackers@lists.postgresql.org; Thu, 18 Sep 2025 16:48:51 +0000 Received: from fout-a1-smtp.messagingengine.com ([103.168.172.144]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uzHoN-001XdT-0U for pgsql-hackers@lists.postgresql.org; Thu, 18 Sep 2025 16:48:51 +0000 Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfout.phl.internal (Postfix) with ESMTP id 22BD0EC020E; Thu, 18 Sep 2025 12:48:47 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Thu, 18 Sep 2025 12:48:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1758214127; x=1758300527; bh=RDKVGtfmr7 v63qH9rWJMExvc83i4RfNPcv84/HuQmOg=; b=Nay8uLWEPJVqnbOyhu8oBqHsQk 0BCrXfYwzFQS9TmEiHWJhR6JtCrg9xzFDmQcWRkYJ+NjQbt2ZZF6ScSo4EPLGBcF D63w1f8MdOUaSWx4UElsskfQMO8hwQjezn2/JhccYRLji0EeHJobHfChXCeBMNg4 mMyX2wWgwDZZgu8/QJcI2LtnhCnui/XnV5UDSZE7iZdWoKQOCNRYvVSiYDhZi8eY E92EEWPpoptHhY1WtwPf/HJa7wbBnzChrJ5tHf3Qtl3XqsbC/nMDPB+FhmtLIgLT MPVdQxizkYXK4ta580EXIdB7EGiehY0YN3HRDoDy/FbF3s5ji6CF9yBhrduw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1758214127; x=1758300527; bh=RDKVGtfmr7v63qH9rWJMExvc83i4RfNPcv8 4/HuQmOg=; b=IVkuSZwi4nQoIMGZ2WissHix4PJxik8UPu0McJOLn5Da6K7US7W l8ZUVHQam+vKraAV0W0N7Vdk0LRUV+I658kukbIKzT2Q6YTZZF/h9oCN61XCmu8i 0tVXi5aXwAZqkNrBNzY3SD/tAyPp6q9M2qP6E9uvjQUL1ymy3yv5iZLuMwdut83c Mev9xMWOQJaLAT/LvwSGrQ1wElAqC+8KC0TnOgAExlyPlR1JHo1Xh2g4USSQXSrE V3rgIkwrBnFsy57zZbWKEoTe+j71WsM6JV9o2MRCeY0aKa8/sLJIewO+BXoec3Tv 0AslDAv7RIZna1R3yiwY9tE7Zg6lFTpSSkA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggdegieekhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjug hrpeffhffvvefukfhfgggtuggjsehttdfstddttddvnecuhfhrohhmpeetnhgurhgvshcu hfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrghtth gvrhhnpeeffffgledvffegtdevlefgtdeggffhvdekgfegteeiveejkeetudelveejhfeu geenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnh gurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepiedpmhhouggvpehs mhhtphhouhhtpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilh drtghomhdprhgtphhtthhopehrvghshhhkvghkihhrihhllhesghhmrghilhdrtghomhdp rhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghpthhtoh ephhhlihhnnhgrkhgrsehikhhirdhfihdprhgtphhtthhopehpghhsqhhlqdhhrggtkhgv rhhssehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrghdprhgtphhtthhopeiggehmmh hmseihrghnuggvgidqthgvrghmrdhruh X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 18 Sep 2025 12:48:46 -0400 (EDT) Date: Thu, 18 Sep 2025 12:48:45 -0400 From: Andres Freund To: Melanie Plageman Cc: Robert Haas , Kirill Reshke , Andrey Borodin , PostgreSQL Hackers , Heikki Linnakangas Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote: > 0001 is RFC but waiting on one other reviewer > From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001 > From: Melanie Plageman > Date: Tue, 17 Jun 2025 17:22:10 -0400 > Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE > diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c > index cf843277938..faa7c561a8a 100644 > --- a/src/backend/access/heap/heapam_xlog.c > +++ b/src/backend/access/heap/heapam_xlog.c > @@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record) > int i; > bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0; > XLogRedoAction action; > + Buffer vmbuffer = InvalidBuffer; > > /* > * Insertion doesn't overwrite MVCC data, so no conflict processing is > @@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record) > if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) > { > Relation reln = CreateFakeRelcacheEntry(rlocator); > - Buffer vmbuffer = InvalidBuffer; > > visibilitymap_pin(reln, blkno, &vmbuffer); > visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS); > ReleaseBuffer(vmbuffer); > + vmbuffer = InvalidBuffer; > FreeFakeRelcacheEntry(reln); > } > > @@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record) > if (BufferIsValid(buffer)) > UnlockReleaseBuffer(buffer); > > + buffer = InvalidBuffer; > + > + /* > + * Now read and update the VM block. > + * > + * Note that the heap relation may have been dropped or truncated, leading > + * us to skip updating the heap block due to the LSN interlock. I don't fully understand this - how does dropping/truncating the relation lead to skipping due to the LSN interlock? > + * even in that case, it's still safe to update the visibility map. Any > + * WAL record that clears the visibility map bit does so before checking > + * the page LSN, so any bits that need to be cleared will still be > + * cleared. > + * > + * Note that the lock on the heap page was dropped above. In normal > + * operation this would never be safe because a concurrent query could > + * modify the heap page and clear PD_ALL_VISIBLE -- violating the > + * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in > + * the VM is set. > + * > + * In recovery, we expect no other writers, so writing to the VM page > + * without holding a lock on the heap page is considered safe enough. It > + * is done this way when replaying xl_heap_visible records (see > + * heap_xlog_visible()). > + */ > + if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET && > + XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false, > + &vmbuffer) == BLK_NEEDS_REDO) > + { Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from heap_xlog_visible(), but I don't immediately understand (or remember) why we do so there either? > + Page vmpage = BufferGetPage(vmbuffer); > + Relation reln = CreateFakeRelcacheEntry(rlocator); Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd really like to eventually get rid of that and given that the new "code shape" delegates a lot more responsibility to the redo routines, they should have a fairly easy time not needing a fake relcache? Afaict the relation already is not used outside of debugging paths? > + /* initialize the page if it was read as zeros */ > + if (PageIsNew(vmpage)) > + PageInit(vmpage, BLCKSZ, 0); > + > + visibilitymap_set_vmbits(reln, blkno, > + vmbuffer, > + VISIBILITYMAP_ALL_VISIBLE | > + VISIBILITYMAP_ALL_FROZEN); > + > + /* > + * It is not possible that the VM was already set for this heap page, > + * so the vmbuffer must have been modified and marked dirty. > + */ I assume that's because we a) checked the LSN interlock b) are replaying something that needed to newly set the bit? Except for the above comments, this looks pretty good to me. Seems 0002 should just be applied... Re 0003: I wonder if it's getting to the point that a struct should be used as the argument. Greetings, Andres Freund