Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wMkyR-000Jrr-2I for pgsql-hackers@arkaria.postgresql.org; Tue, 12 May 2026 11:08:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wMkyQ-004O83-03 for pgsql-hackers@arkaria.postgresql.org; Tue, 12 May 2026 11:08:26 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wMkyP-004O7W-1z for pgsql-hackers@lists.postgresql.org; Tue, 12 May 2026 11:08:25 +0000 Received: from mail-wr1-x434.google.com ([2a00:1450:4864:20::434]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wMkyM-00000000Cp9-3cCE for pgsql-hackers@lists.postgresql.org; Tue, 12 May 2026 11:08:25 +0000 Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-43d75312379so4076283f8f.1 for ; Tue, 12 May 2026 04:08:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1778584101; x=1779188901; darn=lists.postgresql.org; h=message-id:date:mime-version:comments:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=qrKZYzSt/aI6cQCL6/YsrgukV0+kZkMmtJAfOVpfg6Y=; b=JY+HnBagR+HIZ3Aw04osPMaJa7R9aVDgF+WlFjL6jIWRPRtq7wVQQA96K4qfTvlyKI UOv5z0HYN/T2SRetmf+aoOQ0QljHxPxdHQ4RfEda1RSxi5I7WHpM4cBjCZCmEtfqIgf0 LtbNWJ1uL8lyS8fAZEdfn7WSyjuffWnHmpdbUxndbCr0PVWkZ3hQVSaXI4/Xa+hN9Rvj lEK9PtDDubmSMABKOfPde9NBA2utSEh5IYfzYMjtGHaA4m3xAAtc67nHZLMCEX/d7Aqz 39AMKx6H7y1AnwaltbjYPN2UctJ2aj4wCvEBziazDri1j2jq8Uz4nvJaPEgIC2+WRQ5q pjjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778584101; x=1779188901; h=message-id:date:mime-version:comments:references:in-reply-to :subject:cc:to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=qrKZYzSt/aI6cQCL6/YsrgukV0+kZkMmtJAfOVpfg6Y=; b=qf8753CYehyljd02IK8NAPHGvpxyQY5Nj2jQ6fgWNT/P17c+sJgYTR2d7owVG7fvQX HlyLn+tyrVpwyle+3sL4u6cRtLr417uNdTGWzugwok7/KkInaTK7lwdT88ppCuD92b5y hTK453zCt9JXVsUN5Xl+7g5ZSRSHOqydu37G/PsZpwmVUffF4NfHizWTyMWtd4YNqVul jppyvc1bHj5Kvq4BehReBzO9L3av0yZo8xPlbeJwCXjsqB0q0Su3J/414sViWflzCtb/ HKOGd7u2WyoxHdigHKVjrQi4VQEX7bu4I+F/nJMrs6biirzPtVdBDvbaTvM+ypnoXB8Q KIyQ== X-Forwarded-Encrypted: i=1; AFNElJ9MBhMBjEZMWeVo6OlnOcfcR+/a5O90xYPLN37MyxWcyR1tJBBR5jSJsC6NZjxfL0pOVIlqVOjYEnCK+ZuG@lists.postgresql.org X-Gm-Message-State: AOJu0YywkJAMgpgl+FRd5fZS6yeb7n6wbdhL8bYoQppukfrbrFV+qeE4 egX3uEi1BhxI3OiEde0ln25ujC3d6/yF29lD0Ew7pBiBhYfhSBjlKU2Q8M8nztYWpkA= X-Gm-Gg: Acq92OGHOCnLByswKiHIEN5C1ZGPZWRoiJOBNL4XJssBeFeaZxqdWMjUARxzZf49FSh HdAA9v1Y2+m/sDFRzJ9/cQAooIWGAoUA2MD3PBSyea7v13nI6jJvALACbFvNFTBJRlSK83M/sGh FIwlaUud21JWB+Th8KJm2fDoMcT6JB27BK6m9j5dlZUBvRkDPfMOCoUUUB8+W1qgBlZ1XGlBhT5 tWJIIBbmypc4sYZ712gVxyLhYd6bOoSqvsx00yIoxU4DaO8jOZAT3BPB7houZ0/oUKv7HpETawN f+b6QA4LB09AoiZHwnt/Ztwut78C8TtYLq0CmLmjj6Bj9iF6+T1HXDIUQ77EENhICBmtb2kG2Mk YSkVENV4q9LiN035MS0gu80z+NWGaj5lGZ7OjXnQ0bkNRF/OQ8zHh5Vjf5feANnpW6k/NM+2ITb 6fPef/ZvWT5s74KyLhATzSf1TmY7j6fx2Gu3dZ X-Received: by 2002:adf:f18f:0:b0:456:d742:8b4f with SMTP id ffacd0b85a97d-45ac18ac6eamr3669554f8f.12.1778584101460; Tue, 12 May 2026 04:08:21 -0700 (PDT) Received: from localhost (109-81-168-142.rct.o2.cz. [109.81.168.142]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-4548e6a5b65sm33131793f8f.8.2026.05.12.04.08.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2026 04:08:20 -0700 (PDT) From: Antonin Houska To: Amit Kapila cc: Mihail Nikalayeu , Andres Freund , Alvaro Herrera , Srinath Reddy Sadipiralla , Matthias van de Meent , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: References: <202604071230.b5axxf3qna3m@alvherre.pgsql> <227677.1775576304@localhost> <85813.1777901089@localhost> <27869.1777985266@localhost> <70574.1778512672@localhost> <82942.1778527801@localhost> Comments: In-reply-to Amit Kapila message dated "Tue, 12 May 2026 13:27:29 +0530." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Date: Tue, 12 May 2026 13:08:20 +0200 Message-ID: <40976.1778584100@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Amit Kapila wrote: > On Tue, May 12, 2026 at 1:00=E2=80=AFAM Antonin Houska w= rote: > > > > Antonin Houska wrote: > > > > > Amit Kapila wrote: > > > > > > > On Tue, May 5, 2026 at 6:17=E2=80=AFPM Antonin Houska wrote: > > > > > > > > > > Antonin Houska wrote: > > > > > > > > > > I think the problem is that with database-specific snapshot, > > > > > SnapBuildProcessRunningXacts() returns early, w/o adjusting build= er->xmin > > > > > > > > > > /* > > > > > * Database specific transaction info may exist to reach = CONSISTENT state > > > > > * faster, however the code below makes no use of it. Mor= eover, such > > > > > * record might cause problems because the following norm= al (cluster-wide) > > > > > * record can have lower value of oldestRunningXid. In th= at case, let's > > > > > * wait with the cleanup for the next regular cluster-wid= e record. > > > > > */ > > > > > if (OidIsValid(running->dbid)) > > > > > return; > > > > > > > > > > and thus some transactions whose XID is below running->oldestRunn= ingXid may > > > > > continue to be incorrectly considered running. > > > > > > > > > > I originally thought that this should not happen because such tra= nsactions > > > > > will be added to the builder's array of committed transactions by > > > > > SnapBuildCommitTxn() anyway. However, I failed to notice that COM= MIT record of > > > > > a transaction listed in the xl_running_xacts WAL record is not gu= aranteed to > > > > > follow the xl_running_xacts record in WAL. In other words, even if > > > > > xl_running_xacts is created before a COMMIT record of the contain= ed > > > > > transaction, it may end up at higher LSN in WAL. So the cleanup I= relied on > > > > > might not take place. > > > > > > > > > > > > > BTW, is it possible to write a test by using injection_points or via > > > > manual steps (by using debugger, etc) so that we can more clearly > > > > understand this problem and proposed fix? > > > > > > So far I could observe the situation in WAL, but have no idea how it = can > > > happen. For example, transaction 49242 gets committed here > > > > > > rmgr: Transaction len (rec/tot): 46/ 46, tx: 49242, lsn: 0/18BC28C8, = prev > > > 0/18BC2890, desc: COMMIT 2026-05-11 16:38:16.603265 CEST > > > > > > and then it appears in the 'xids' list of RUNNING_XACTS: > > > > > > rmgr: Standby len (rec/tot): 106/ 106, tx: 0, lsn: > > > 0/18BC3140, prev 0/18BC3100, desc: RUNNING_XACTS nextXid 49255 > > > latestCompletedXid 49241 oldestRunningXid 49242; 13 xacts: 49248 4924= 9 49246 > > > 49243 49252 49251 49244 49245 49242 49250 49253 49254 49247; dbid:5 > > > > > > > > > I thought the situation is quite common (and therefore nothing of > > > SnapBuildProcessRunningXacts() should be skipped), but when trying to > > > reproduce the problem, I noticed that LogStandbySnapshot() shouldn't = allow > > > that ordering issue when logical decoding is enabled: > > > > > > /* > > > * GetRunningTransactionData() acquired ProcArrayLock, we must = release it. > > > * For Hot Standby this can be done before inserting the WAL re= cord > > > * because ProcArrayApplyRecoveryInfo() rechecks the commit sta= tus using > > > * the clog. For logical decoding, though, the lock can't be re= leased > > > * early because the clog might be "in the future" from the POV= of the > > > * historic snapshot. This would allow for situations where we'= re waiting > > > * for the end of a transaction listed in the xl_running_xacts = record > > > * which, according to the WAL, has committed before the xl_run= ning_xacts > > > * record. Fortunately this routine isn't executed frequently, = and it's > > > * only a shared lock. > > > */ > > > if (!logical_decoding_enabled) > > > LWLockRelease(ProcArrayLock); > > > > > > So I don't have the answer right now. > > > > I think now that "waiting for the end of a transaction listed in the > > xl_running_xacts record" in the comment is about transaction removal fr= om > > procarray. However, the COMMIT record can still be ahead of xl_running_= xacts > > because RecordTransactionCommit() is called before > > ProcArrayEndTransaction(). > > >=20 > I see your point. Due to this, once the xmin regresses based on > cluster-wide running_xact, some transaction could appear to be running > when it should have appeared as committed. The problem is that xmin does not advance when it should. Attached is a test that reproduces the problem (it includes [1], to handle injection points in background worker), I hope the comments in the specification file are helpf= ul. It's actually not exactly the problem reported in the stress test, but IMO = the core issue is the same: effects of some transactions are lost. In the stress test, tuple deletion was lost, so the error was "could not create unique index". Here I only demonstrate lost INSERT. > Assuming, the problematic case is something > like what I described, even than the fix of skipping cluster-wide > running xacts and instead LOG db-specific running_xacts to help > updating builder's xmin sounds inelegant and probably inefficient. For > example, I think such a dependency means we can never enable > db-specific snapshots on standby. ok [1] https://www.postgresql.org/message-id/4703.1774250534%40localhost --=20 Antonin Houska Web: https://www.cybertec-postgresql.com --=-=-= Content-Type: text/x-diff Content-Disposition: attachment; filename=0001-Test-to-demonstrate-bug-in-commit-0d3dba38c7-and-.patch