Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wMSON-0006cE-14 for pgsql-hackers@arkaria.postgresql.org; Mon, 11 May 2026 15:17:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wMSOL-001QL3-2J for pgsql-hackers@arkaria.postgresql.org; Mon, 11 May 2026 15:17:57 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wMSOL-001QKp-0t for pgsql-hackers@lists.postgresql.org; Mon, 11 May 2026 15:17:57 +0000 Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wMSOJ-000000003kx-0Cg1 for pgsql-hackers@lists.postgresql.org; Mon, 11 May 2026 15:17:56 +0000 Received: by mail-wm1-x32f.google.com with SMTP id 5b1f17b1804b1-488ba840146so39775615e9.1 for ; Mon, 11 May 2026 08:17:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1778512674; x=1779117474; darn=lists.postgresql.org; h=message-id:date:content-transfer-encoding:mime-version:comments :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mS303lgfe4xZN6gFE4oFj/HhnLFwFF99sVpGOgO2m8E=; b=sWkXPb1ey4VtWMZga8dPrA6LjP0zTx4iUo105vlT68NC32/Vkbvbkoa+DbexrV6H3A G22VWLfZmFC8eVsh3E2+fEjB5ZHu51uKt3oY94A0F94CvCG1Y7aqUsx68AI5TwQY9L+5 pE83xcS9spRw432im6zncIm/m3+IdjmyZnQokYe59siovlMuWDgjnpMt1vEsmYAPUVcz Fn2AVlLLoNkCoYAivHI5ECMxUoTvCMsSGueKvjV2q64yGqSEHatjHM9Yk2cqx+HXwpKe jCjeyzG33YhFS947+3lQAcG/7N0im3KqyS7HgVEFTJ7bU2dlI6vupw9BLDqbt2Jha9Zl eWlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778512674; x=1779117474; h=message-id:date:content-transfer-encoding:mime-version:comments :references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mS303lgfe4xZN6gFE4oFj/HhnLFwFF99sVpGOgO2m8E=; b=FOVOOAq2uadJIThBeJ0Ve66Lbftor1Tr88LdzlU9ojgMfZSuJZxZWHvgzD1coPr4Jy EPtZKkVlU6WAggHbM6CYjh6ooKfVKPFRNMH3YlV/hgoPFXSt6U8dKEF4IEkwMd+bzY3+ T9jzWoIGX5ZGupCX6Omo1z8IE+RAWWoq+L1sZLXw48Gs3/lv/bJIvJJ7h4an5ASfchXd pq/M+qdIpEXoto5Xg5LcQnhfDkXrjLt2CJv9dHdA1ia6WCC2RdevJQ1VYOqxtC9kYrb9 wBVbvoy8jKY9JsonSPI2NicwXC6vPVK4GA2p1mCQcjfDzvF69axHayxLIRKMTm+gs4cG InDg== X-Forwarded-Encrypted: i=1; AFNElJ9rjs9s9ri5KZx+MdS/xauptwCkiJRkZvGgZIHN1uJ2XjiY/uQ/V97PpGTSGj4cOcabJeJMjGBf7LUZHeje@lists.postgresql.org X-Gm-Message-State: AOJu0YyKDNa83z4Qkp98bH/ehW5F/nO34HxsbTa4F4tNPYAU/NKtBMDj qFGcA35D8N7h46Xzym9LmtZYjL0hOnjTIJ4JtGTi6cRDJJqVHq7nFIdp08hNpfNnRyI= X-Gm-Gg: Acq92OHF5Zpg2qQAeLDgMNIbzy6RwfnUm/LHhlULGCSthu3+8QzzaW6eCFYCe+CmDPU jrpVOx2kKfQTiHFbf5vHron1Z2XeGPFuoT95EpQ/jdsYBnQFIa9O1clhvu54ismWbuNcT3i1kod QhiMkZEQV3cIjXi1SewZTs3YeFbK6IxCO2cSALxwDHh32Eb5umI5hJTBHqmyaxY4h5nB0Hdb+15 +5RN68u2RH80ABsoUrXT4YcnWHeh7cfPaznZmq11tiP3LMhhbTnJ3tpb2yNiRLsG5/5QtqfzERb u983yPfp/5NebS+GOXEeFfJqauGYUklpmrWZ1RLxj7ixFSg1jA8sRMvXn7/7tiEYufOg2mcp0sN U1gF0po9YfrCmo22eRXRCbxikPBLFny/+JsOYO8XNCFNClv62H87YGUpQi6Jz3FJHELqYmjmhgR kka2fc2SeeXomMc5SbLuPNtb1yHd5xIv4pSDvk X-Received: by 2002:a05:600c:811b:b0:48e:89f9:9408 with SMTP id 5b1f17b1804b1-48e89f99431mr18603375e9.20.1778512674030; Mon, 11 May 2026 08:17:54 -0700 (PDT) Received: from localhost (109-81-168-142.rct.o2.cz. [109.81.168.142]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48e70410310sm203464745e9.12.2026.05.11.08.17.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 May 2026 08:17:53 -0700 (PDT) From: Antonin Houska To: Amit Kapila cc: Mihail Nikalayeu , Andres Freund , Alvaro Herrera , Srinath Reddy Sadipiralla , Matthias van de Meent , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: References: <202604071230.b5axxf3qna3m@alvherre.pgsql> <227677.1775576304@localhost> <85813.1777901089@localhost> <27869.1777985266@localhost> Comments: In-reply-to Amit Kapila message dated "Sun, 10 May 2026 17:01:04 +0530." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Mon, 11 May 2026 17:17:52 +0200 Message-ID: <70574.1778512672@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Amit Kapila wrote: > On Tue, May 5, 2026 at 6:17=E2=80=AFPM Antonin Houska wr= ote: > > > > Antonin Houska wrote: > > > > I think the problem is that with database-specific snapshot, > > SnapBuildProcessRunningXacts() returns early, w/o adjusting builder->xm= in > > > > /* > > * Database specific transaction info may exist to reach CONSIS= TENT state > > * faster, however the code below makes no use of it. Moreover,= such > > * record might cause problems because the following normal (cl= uster-wide) > > * record can have lower value of oldestRunningXid. In that cas= e, let's > > * wait with the cleanup for the next regular cluster-wide reco= rd. > > */ > > if (OidIsValid(running->dbid)) > > return; > > > > and thus some transactions whose XID is below running->oldestRunningXid= may > > continue to be incorrectly considered running. > > > > I originally thought that this should not happen because such transacti= ons > > will be added to the builder's array of committed transactions by > > SnapBuildCommitTxn() anyway. However, I failed to notice that COMMIT re= cord of > > a transaction listed in the xl_running_xacts WAL record is not guarante= ed to > > follow the xl_running_xacts record in WAL. In other words, even if > > xl_running_xacts is created before a COMMIT record of the contained > > transaction, it may end up at higher LSN in WAL. So the cleanup I relie= d on > > might not take place. > > >=20 > BTW, is it possible to write a test by using injection_points or via > manual steps (by using debugger, etc) so that we can more clearly > understand this problem and proposed fix? So far I could observe the situation in WAL, but have no idea how it can happen. For example, transaction 49242 gets committed here rmgr: Transaction len (rec/tot): 46/ 46, tx: 49242, lsn: 0/18BC28C8, prev 0/18BC2890, desc: COMMIT 2026-05-11 16:38:16.603265 CEST and then it appears in the 'xids' list of RUNNING_XACTS: rmgr: Standby len (rec/tot): 106/ 106, tx: 0, lsn: 0/18BC3140, prev 0/18BC3100, desc: RUNNING_XACTS nextXid 49255 latestCompletedXid 49241 oldestRunningXid 49242; 13 xacts: 49248 49249 49246 49243 49252 49251 49244 49245 49242 49250 49253 49254 49247; dbid:5 I thought the situation is quite common (and therefore nothing of SnapBuildProcessRunningXacts() should be skipped), but when trying to reproduce the problem, I noticed that LogStandbySnapshot() shouldn't allow that ordering issue when logical decoding is enabled: /* * GetRunningTransactionData() acquired ProcArrayLock, we must release it. * For Hot Standby this can be done before inserting the WAL record * because ProcArrayApplyRecoveryInfo() rechecks the commit status using * the clog. For logical decoding, though, the lock can't be released * early because the clog might be "in the future" from the POV of the * historic snapshot. This would allow for situations where we're waiting * for the end of a transaction listed in the xl_running_xacts record * which, according to the WAL, has committed before the xl_running_xacts * record. Fortunately this routine isn't executed frequently, and it's * only a shared lock. */ if (!logical_decoding_enabled) LWLockRelease(ProcArrayLock); So I don't have the answer right now. --=20 Antonin Houska Web: https://www.cybertec-postgresql.com