Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpiSC-00ED13-15 for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Feb 2026 07:46:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vpiS9-00EU7o-1k for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Feb 2026 07:46:33 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpiS8-00EU7c-34 for pgsql-hackers@lists.postgresql.org; Tue, 10 Feb 2026 07:46:32 +0000 Received: from mail-wr1-x434.google.com ([2a00:1450:4864:20::434]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vpiS6-00000001Ooh-02Qg for pgsql-hackers@postgresql.org; Tue, 10 Feb 2026 07:46:31 +0000 Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-4377174e1ebso1146852f8f.3 for ; Mon, 09 Feb 2026 23:46:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1770709588; x=1771314388; darn=postgresql.org; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:from:to:cc :subject:date:message-id:reply-to; bh=aua4FGBZuiB5XXS4tBN77ChEVR5I+ZAFYAHUf73VIec=; b=hih7HBjIs/Gjb24e28fUD3YHo0npPH3n4e6ekvvs2X9OqlkLwiQ92a63oN7ogCPlLs jtnr9XDEZaYJcu+aAOWRNQxvJO87VJyBU/2V0XGhRcqXt6hWAZJI8d6peZuD8UN6CBv7 7fcIXnyIhtWfVAvq2uJgrMxf0WAClMkLRJcNB7qVsanTkWL0+AUJ7qcYKE8yWadbXdpy 57xLmSHOVQdly6yge2N1HtyfnQkrv9skie8iHoBDM5GSSbkMZmNt3Abb57DS/VnkEvk3 ptTkuFbOeCdTWVwRB9+6boUdNVdK4xfy2nKXXtgoZtE8POmqEYAAvsAozC8BHzib0s06 O5MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770709588; x=1771314388; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aua4FGBZuiB5XXS4tBN77ChEVR5I+ZAFYAHUf73VIec=; b=OwPMHEKHQSn8ycbhewg4vIfbGbvgrYnKJ+SWwv3dCogDn9vhgfQ9qHhI5GLshQrg+b TPu6K6C5c57mHn7p0LDy5hblfYR4Wn1zX2mm4ZJCRBtnq0HNYzLRIaU6d+k9/o33h9tE C3eBIpohRY/9WFvpeVEShUIZkNkY/Gx4gvVznJI9C75XhmDsav/0EirpCFwTeh/KSf6L SrV31zXJTiYN7OMD1vLhCOIXz2FTghqUfQ1zTPy8kS74i9+FGDDk959ZXRBCPONQ9FGa xHNcCFPAYjLRJCcKZqK3/NJw3B5XJAupU1n9R+FyiPA/2zuaXYASiPh4QHTZk/KBrFVp SDag== X-Forwarded-Encrypted: i=1; AJvYcCVnxz8w+XFZqL1eCABJkEXGgvixX/pv6XkbMvFxTta0lP3JYZUcpotfges46OhP6RsSxcskK899edKxt385@postgresql.org X-Gm-Message-State: AOJu0Yx0f0xMzC/AaEkMvO2RmBdRhl2PbUTWYV+VjvHrJpV3arqdPxn+ ZQwgb1TPRdWqmLZoWVFMKNnn0Dlx1lJ9BpbvdYSqxhvjJjRdorB0Wa/RIpurYJQX72Q= X-Gm-Gg: AZuq6aLgd5oneHUAAtvN363mgO1c5IqDcWar4gJRtI0usDS7QxAQmY2+hk6MGJkgbrB AcgqROCytiJ5ze2KaOnHmgT+BfHYyfn4JkgHYNZcSo8UqXMSPWohIoHXc/SCivNTjRy+pFPKUqj +gP4o5Br8vRwbYIS6nDKGKiQ/5ZOgYJgU0JRTRTba0X5lD0p6mfNztUMv4uMRuRZFxU8fhORUzs rq+rg7KlvVOR3koSzp46yPxY1ZlfmuJNlyE4Aw0OSKGgcC4iFbznCmixQPXqO3FY21DNJLyUodf Fr5FrJdjVK/gd4pXhsc+dkX6dWXAW5VLuHOtOXxoiNGtC04ZTWpT4G0vzCLabroz3iLRoalj1Oq PJ9Jzz+fceHZrcI5krrVKQ7w01Zh28fKyL4/EEbPmeNiOBEACu0J1CvifCCwV633QdVdRCddf2b 7co1SkBR6dzqUF5XCRiFfDMU1TTt1K/Ku7dHd4 X-Received: by 2002:a05:6000:2485:b0:437:6726:80a2 with SMTP id ffacd0b85a97d-437672684f0mr12786277f8f.59.1770709588332; Mon, 09 Feb 2026 23:46:28 -0800 (PST) Received: from localhost (109-81-168-246.rct.o2.cz. [109.81.168.246]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-436296b25casm33168454f8f.4.2026.02.09.23.46.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Feb 2026 23:46:27 -0800 (PST) From: Antonin Houska To: Andres Freund cc: Kirill Reshke , Heikki Linnakangas , Melanie Plageman , Matthias van de Meent , pgsql-hackers@postgresql.org, Thomas Munro , Noah Misch , Robert Haas , Michael Paquier Subject: Re: Buffer locking is special (hints, checksums, AIO writes) In-reply-to: References: <1108f18d-cf7c-4f17-b29c-a119fe42f7e5@iki.fi> <5dwlfu2jyzkyf3nrlzxxblxctb6xio5es73ptgsahjnmfu5miu@772rc764hfhi> <4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf> <61812.1770637345@localhost> Comments: In-reply-to Andres Freund message dated "Mon, 09 Feb 2026 17:16:38 -0500." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <19719.1770709587.1@localhost> Content-Transfer-Encoding: quoted-printable Date: Tue, 10 Feb 2026 08:46:27 +0100 Message-ID: <19720.1770709587@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Andres Freund wrote: > > While troubleshooting REPACK issue [1], I realized that > > HeapTupleSatisfiesMVCCBatch() can also be called during logical decodi= ng - in > > that case we need to use a historic MVCC snapshot. > = > Huh. Indeed. That's unintentional - the path should never have been reac= hed, > we are checking that an MVCC snapshot is used. Unfortunately, somebody > (i.e. probably me) at some point defined the relevant macro as > = > /* This macro encodes the knowledge of which snapshots are MVCC-safe */ > #define IsMVCCSnapshot(snapshot) \ > ((snapshot)->snapshot_type =3D=3D SNAPSHOT_MVCC || \ > (snapshot)->snapshot_type =3D=3D SNAPSHOT_HISTORIC_MVCC) > = > Which makes sense for some places, but not for plenty others. > = > The reason this didn't cause more widespread issues is that during logic= al > decoding we mostly don't use sequential scans etc that are affected by t= he > these paths. > > My proposal to fix the problem is attached. > = > That's imo not at all the right fix - it'd make visibility during seqsca= ns > checking noticeably slower. ok > I think we ought to instead restrict the page-at-a-time scans to only ha= ppen > with "real" mvcc snapshots. I.e. this: > = > /* > * Disable page-at-a-time mode if it's not a MVCC-safe snapshot. > */ > if (!(snapshot && IsMVCCSnapshot(snapshot))) > scan->rs_base.rs_flags &=3D ~SO_ALLOW_PAGEMODE; > > should trigger for historic snapshots as well. I suppose you mean changing it to if (!(snapshot && IsMVCCSnapshot(snapshot) && !IsHistoricMVCCSnapshot(snapshot))) scan->rs_base.rs_flags &=3D ~SO_ALLOW_PAGEMODE; > Does that fix the issue for you? Yes, with this change, I don't hit the problem anymore. > What's your reproducer? Check out this branch https://github.com/michail-nikolaev/postgres/tree/repack_concurrently_repr= o_22 and run t/008_repack_concurrently.pl in contrib/amcheck. The error we saw = in most cases was "ERROR: cache lookup failed for relation". I noticed that t= he related pg_class entries had hint bits set incorrectly, so I added the following to see when exactly it happens (actually I used lower elevel fir= st, to find out that the decoding worker is responsible for the problem): diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/acc= ess/heap/heapam_visibility.c index 75ae268d753..ebf38460873 100644 --- a/src/backend/access/heap/heapam_visibility.c +++ b/src/backend/access/heap/heapam_visibility.c @@ -73,6 +73,7 @@ #include "access/transam.h" #include "access/xact.h" #include "access/xlog.h" +#include "commands/cluster.h" #include "storage/bufmgr.h" #include "storage/procarray.h" #include "utils/builtins.h" @@ -938,6 +939,8 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapsh= ot, HeapTupleHeaderGetRawXmin(= tuple)); else { + if (am_decoding_for_repack()) + elog(PANIC, "HEAP_XMIN_INVALID set"); /* it must have aborted or crashed */ SetHintBits(tuple, buffer, HEAP_XMIN_INVALID, InvalidTransactionId); The backtrace looked like: #3 0x0000000000c367ad errfinish (postgres + 0x8367ad) #4 0x0000000000517ee1 HeapTupleSatisfiesMVCC (postgres + 0x117ee1) #5 0x0000000000518e14 HeapTupleSatisfiesMVCCBatch (postgres + 0x118e1= 4) #6 0x000000000050483e page_collect_tuples (postgres + 0x10483e) #7 0x0000000000504a8e heap_prepare_pagescan (postgres + 0x104a8e) #8 0x0000000000505344 heapgettup_pagemode (postgres + 0x105344) #9 0x0000000000505cff heap_getnextslot (postgres + 0x105cff) #10 0x000000000052ca08 table_scan_getnextslot (postgres + 0x12ca08) #11 0x000000000052d4ed systable_getnext (postgres + 0x12d4ed) #12 0x0000000000c2098e ScanPgRelation (postgres + 0x82098e) #13 0x0000000000c23aa2 RelationReloadIndexInfo (postgres + 0x823aa2) #14 0x0000000000c24376 RelationRebuildRelation (postgres + 0x824376) #15 0x0000000000c23784 RelationIdGetRelation (postgres + 0x823784) #16 0x00000000004b558f relation_open (postgres + 0xb558f) #17 0x000000000052e073 index_open (postgres + 0x12e073) #18 0x000000000052d0c5 systable_beginscan (postgres + 0x12d0c5) #19 0x0000000000c2097e ScanPgRelation (postgres + 0x82097e) #20 0x0000000000c23dde RelationReloadNailed (postgres + 0x823dde) #21 0x0000000000c24399 RelationRebuildRelation (postgres + 0x824399) #22 0x0000000000c23784 RelationIdGetRelation (postgres + 0x823784) #23 0x00000000004b558f relation_open (postgres + 0xb558f) #24 0x0000000000573ab1 table_open (postgres + 0x173ab1) #25 0x0000000000c20919 ScanPgRelation (postgres + 0x820919) #26 0x0000000000c21c98 RelationBuildDesc (postgres + 0x821c98) #27 0x0000000000c237d6 RelationIdGetRelation (postgres + 0x8237d6) #28 0x00000000009a028a ReorderBufferProcessTXN (postgres + 0x5a028a) #29 0x00000000009a10ed ReorderBufferReplay (postgres + 0x5a10ed) #30 0x00000000009a116b ReorderBufferCommit (postgres + 0x5a116b) #31 0x000000000098c7fe DecodeCommit (postgres + 0x58c7fe) #32 0x000000000098ba67 xact_decode (postgres + 0x58ba67) #33 0x000000000098b685 LogicalDecodingProcessRecord (postgres + 0x58b6= 85) #34 0x00000000006a3e4b decode_concurrent_changes (postgres + 0x2a3e4b) #35 0x00000000006a5ea3 repack_worker_internal (postgres + 0x2a5ea3) #36 0x00000000006a5d5e RepackWorkerMain (postgres + 0x2a5d5e) #37 0x0000000000961b2b BackgroundWorkerMain (postgres + 0x561b2b) #38 0x0000000000964a61 postmaster_child_launch (postgres + 0x564a61) #39 0x000000000096b58d StartBackgroundWorker (postgres + 0x56b58d) #40 0x000000000096b80a maybe_start_bgworkers (postgres + 0x56b80a) #41 0x000000000096a5d9 LaunchMissingBackgroundProcesses (postgres + 0x= 56a5d9) #42 0x00000000009683fc ServerLoop (postgres + 0x5683fc) #43 0x0000000000967d37 PostmasterMain (postgres + 0x567d37) #44 0x0000000000813421 main (postgres + 0x413421) Thanks. -- = Antonin Houska Web: https://www.cybertec-postgresql.com