Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5DOl-00316g-2T for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 01:51:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5DOj-00AHNJ-2h for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 01:51:06 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5DOj-00AHN8-15 for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 01:51:05 +0000 Received: from fhigh-a5-smtp.messagingengine.com ([103.168.172.156]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w5DOe-00000000ynq-41O8 for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 01:51:05 +0000 Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfhigh.phl.internal (Postfix) with ESMTP id 0B34A140005D; Tue, 24 Mar 2026 21:50:59 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-05.internal (MEProxy); Tue, 24 Mar 2026 21:50:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paquier.xyz; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1774403459; x=1774489859; bh=kAvUUD7VAi ErxILBodFYu2+0LUARIpk4IEipSS9+2W4=; b=FcbU1vpIiOchWKVulkvzqxC6Js D0HDGoi34WG4jDomE6tzq3xaxZ4g7oMAnnXLFOM+drDKR9Drc97CAUzkyMIMZRpK j5XHqVjuOTaACATtwpld0rxeX22H2nNFS7JnOYRCKw6+duS+30LO7axBtZCoBg2T SBG1xA41dVg0stAGDZJzr0f3NnieVIIGCohQHhN9/PXxE10yOQF+NBkV2RN0GgE8 bNKsImXlGwVAWN3JSNAQ7PGx3sj9kKJSX0yS0E1Ya0ZrM32nQtKB69lPxvPm3rkf 21iMNxibNSxdxhlgu3zWx/lvnkHjnNMVEumhEdrG7kmj9Io0U7sxVBLO2O5A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1774403459; x=1774489859; bh=kAvUUD7VAiErxILBodFYu2+0LUARIpk4IEi pSS9+2W4=; b=iPiSRAwJ5elk5RXLSYFEDlhwW1ux9r6Hy5Kw5kOFoIUrvF7GW72 Trnqdc5H9EopK8nc6yy90X6Fy5HE0YSChzcGxNVRjiA72OIXJSl0Ez4qiy4xQf0O OORgKTrm0RS3O3FW2oIkDJ6m770aQmXwoFlbTGGPcMRqKxSyvibaY3LneaAGCqmK CNER5E7uqsSK6SObo0dpiwS0+roCHkm/fn4wMN/P/Lmup6JBVR+77TkCJi6GivNr HqvKYMSqL7oBPOGG2rr8/h87irjiIZp3mhQiTZwCjUzbEfrQj8S8BfblW3yv82vP pAP4NbcNLMxoCu74RBUVU5Sf5fFO76BQRHg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdefvdefudelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnegfrh hlucfvnfffucdlfeehmdenucfjughrpeffhffvvefukfhfgggtuggjsehgtderredttddv necuhfhrohhmpefoihgthhgrvghlucfrrghquhhivghruceomhhitghhrggvlhesphgrqh huihgvrhdrgiihiieqnecuggftrfgrthhtvghrnhepgeffjeevgfevuddvjedtvddtieej heduueelvddufedtgfefjedvkeevkeeivddvnecuffhomhgrihhnpehpohhsthhgrhgvsh hqlhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehmihgthhgrvghlsehprghquhhivghrrdighiiipdhnsggprhgtphhtthhopedvpd hmohguvgepshhmthhpohhuthdprhgtphhtthhopeiihhhihhhuihhfrghnuddvudefsedu ieefrdgtohhmpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrsheslhhishhtshdrph hoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: i0fe9450f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 24 Mar 2026 21:50:57 -0400 (EDT) Date: Wed, 25 Mar 2026 10:50:53 +0900 From: Michael Paquier To: Andy Fan Cc: PostgreSQL Hackers Subject: Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC Message-ID: References: <87341p7dc4.fsf@163.com> <87h5q468us.fsf@163.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="2/I0CDEEdH6fUuui" Content-Disposition: inline In-Reply-To: <87h5q468us.fsf@163.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --2/I0CDEEdH6fUuui Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 25, 2026 at 08:39:07AM +0800, Andy Fan wrote: > I found a similar but not exactly same case at 2014 [1] which=20 > might be helpful to recall a boarder understanding on this area.=20 >=20 > [1] https://www.postgresql.org/message-id/534AF601.1030007%40vmware.com Incorrect shared state when an ERROR happens at an arbitrary location is usually bad, yes. For this one, your suggestion of delaying the end of the critical section started at StartPrepare() and ending in EndPrepare() is not an acceptable solution as far as I can see, unfortunately: it would mean doing a SyncRepWaitForLSN() while in a critical section, and I doubt we'd want to do that. Anyway, I doubt that this one is worth caring for. The current locking 2PC scheme means, as far as I remember, that it is not really possible to interact with an external command in a specific session between the EndPrepare() and the PostPrepare_Locks() calls. To put it in other words, let's imagine that we use a breakpoint between these two calls (or a wait injection point if you automate that). Is it possible for a second backend to mess with the state of the first backend waiting until its locks are transfered to the dummy PGPROC entry? That's what the 2014 thread is about: there was a race condition reachable between two sessions. If the answer to this question is yes, I'd agree that this is something that deserves a closer lookup. And before you ask: attempting to interact with a 2PC state from a second session with a first session waiting between these two points would not work: the 2PC entry is locked, cleaned up after EndPrepare() and PostPrepare_Locks() at PostPrepare_Twophase(). Trying to request an access to this entry fails, as the first backend is marked as locking it. A second backend attempting to lock it would fail, complaining that the 2PC entry with a GXID is "busy". SyncRepWaitForLSN() would be a problematic pattern between the EndPrepare() and the PostPrepare_Locks(), but we never ERROR there on=20 purpose: even if we cancel while waiting for a transaction commit we'd just get a WARNING, meaning that we'd be able to transfer our locks anyway. Or perhaps you have a realistic scenario where it is possible to mess up with the shared state, outside a elog(ERROR) forced between these two points? -- Michael --2/I0CDEEdH6fUuui Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEG72nH6vTowiyblFKnvQgOdbyQH0FAmnDP30ACgkQnvQgOdby QH3Rog//a6ip71CDzPVHl7zBmkAG7JjHg2s9Z6een6eceEL2meJ89J4nT2alp17Z gNo9MQKOImpRyDovUI/Cn8eDJNXK0naKUQB1HWXfpgN1ko0sZozaBiYeYYAcAChe POSY1j0xRkUsbYJYYh5ytk15cKL7nhsCsxUHFu4HksspfuQKDefiE2llLmxShI2v kqZWkA+dHUpuWGBQdErKcaXWE3StzNgxKMIb16yBUGLdKebwxn0QJ6FGZNxXf91B Wz6VKvSD4FVwZG7JkAHWMLjg+cgfeh57BixlV9YwkXqBZ9gGHiBQIglXNWXFy8oM NjGZJIUHFMA9sblAJEGPi4SpBGDn4LNtBrbobk4ynU5iZ1i1/bFM09jNmbEuR8IL q1z0hhYbgeHmon1Lp+F8lga8i+L+xc2OWsKlbqt/CBtCCt3UYbmhoO0v0m0bwf9X xcKm3Pv8iQgG2+aQHt0OmGs+V5tSO2sKaw3vmnPIpHni8H8ew3GhwfmngaUq0Loy a2mgx8G1XCJeu8MFxolFMM5/TUTWe9PzCCFsazoEFan5u1MbOECnw0JLJV7Ypxwl lX6I6WDsPitCClxEX8fLui4GzPelYv30ssmTycr2fqeGVlP0LQEhtocWTvtXqjiU 41tKEzRoVJmzJr23ygJVEO88RMbU3DT3UO2ssU8Smtsw+HuTCz4= =aFZo -----END PGP SIGNATURE----- --2/I0CDEEdH6fUuui--