Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wKFCp-000l3N-2d for pgsql-hackers@arkaria.postgresql.org; Tue, 05 May 2026 12:48:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wKFBn-00CCQJ-2g for pgsql-hackers@arkaria.postgresql.org; Tue, 05 May 2026 12:47:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wKFBn-00CCQB-1f for pgsql-hackers@lists.postgresql.org; Tue, 05 May 2026 12:47:51 +0000 Received: from mail-wr1-x434.google.com ([2a00:1450:4864:20::434]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wKFBl-00000000JT7-0nW3 for pgsql-hackers@lists.postgresql.org; Tue, 05 May 2026 12:47:50 +0000 Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-43d64313c39so3705513f8f.3 for ; Tue, 05 May 2026 05:47:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1777985268; x=1778590068; darn=lists.postgresql.org; h=message-id:date:content-transfer-encoding:mime-version:comments :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qzWEiMhAwuyNkr7QTgmOWuGrV+hMULxToXR+wX4T+hM=; b=sezppSQZW3b4aii0qOKxtb2XU+R7wapVLme5INqwYK/ir8q0mIyQ5j+tu9mJhR51Hw XqimHXxr7CN51EUUPg59shS+7LjX3Q4uB62P2dnTAehjGJ8AE3LzWkG6aLQPHigcwlhG //pgrZCs1a1Rq5oAEpgOumcGdaELnupUH3qXMalZAMxXddzJjuLLwABiQi4wpk0sNQXm EXdmN2XD1nB91zkRhUI5TvVIBW3vSzLCpcK0HOlj43XweIKvMohKsTzt4cU84dTZWI+8 gfSCBuSW/z1AXsbf1Rj8MyMzGe5UqJw3iNrHr/nGHkvr2dquQTI2zLcUiz3tUAMVZSSa jZqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777985268; x=1778590068; h=message-id:date:content-transfer-encoding:mime-version:comments :references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qzWEiMhAwuyNkr7QTgmOWuGrV+hMULxToXR+wX4T+hM=; b=judDn/dMLI4FDmLO167ZKk2hrSxQ6PKuNFX1Io+Br+pWB8niALAnDFa7kPTl3aXTtU L/eLz3RU0Q8neruzQA1/fnaTRY0U6iu132OpiSe+CLlWHYBv6pbc5yqf+/n78druEx8P oyvO8u0GFykf0atfPQZjTgOcZ4IXqjVcXWAg2M4U2fG1TEJ1f4ALFV8XhEcL+HX0mTx9 tw0z1go82QI4ysADk0w5VVuQiwJGULnkcDmaR/PB2I/7s8lL35aQp3qHrs0y59ZADxsU /4XuXu2b7xLRiJSvvrfnXJkPGiEsMf7oNBxDTvwNT/sbl/sWBrf5+8od/MNGd3S+v/7v H4oA== X-Forwarded-Encrypted: i=1; AFNElJ9tLb8zwFxQXpMil+/uEvquyemltjzLCNSRJhuYKABMGKERg0L7b4QIQbQl/dk5mwdm/DaUD31OYY+jKeg7@lists.postgresql.org X-Gm-Message-State: AOJu0YynvUPYD5N02EGuG8xCTIcjZ2It78SkH41rEV/CssT7T7IZLMTt sYdRNfSEH1Dunb6oL4TkPduJBW4OxytEPr7cIuk0z04nS3GEble7pTc0Jiok20JzQy8= X-Gm-Gg: AeBDieuiw3vPbQa66gpV50sd2qvaSoPa2X96IBLT1oJvgstpC3OEoXYZ2v7ofel2MM1 T25Yhs4+RJO317O0kHIGCccQFqG6BsEc3YZgCoTr4uHdKYBMuHaakB7PgtV73gSX4sQyhnd9uT2 tP7iJKeo/FPrcKgqSyMeGXvKVahbm+tDlijbF5piuqVQ93i8oT67OmKXWqFXJ7+zSCp70G+6uOD jMFj8BNI0+ZX87ZtggJjcKozk7Ak00YLLwjJ6KPqVDHoEQb1AY2O0ALp5aVWs2pWhcsxKoSDTH/ XfEMOt/Dbwms8N74HG6svs1q1xhfxSh7EwTh5GH/6ungk6FvPmB3V73/twJIkQyx9mYN1bYB1OD PxLM/awLCeYqrtQMxzYQ9Eoh1bRzfKHwirB0nXqBlGx9DJXoAZE3/awaKartq+Bl72M1/AWoDFv lcwJ+ZXFlkEiAt1BahY28KU+C5dHfnnRHR6H1oDtm4PsRFjmU= X-Received: by 2002:a05:6000:184a:b0:441:1c06:17e4 with SMTP id ffacd0b85a97d-44bb772f685mr24944025f8f.39.1777985268189; Tue, 05 May 2026 05:47:48 -0700 (PDT) Received: from localhost (109-81-168-142.rct.o2.cz. [109.81.168.142]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-45055960902sm4773088f8f.28.2026.05.05.05.47.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 May 2026 05:47:47 -0700 (PDT) From: Antonin Houska To: Mihail Nikalayeu Cc: Amit Kapila , Andres Freund , Alvaro Herrera , Srinath Reddy Sadipiralla , Matthias van de Meent , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: <85813.1777901089@localhost> References: <202604071230.b5axxf3qna3m@alvherre.pgsql> <227677.1775576304@localhost> <85813.1777901089@localhost> Comments: In-reply-to Antonin Houska message dated "Mon, 04 May 2026 15:24:49 +0200." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 05 May 2026 14:47:46 +0200 Message-ID: <27869.1777985266@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Antonin Houska wrote: > Mihail Nikalayeu wrote: >=20 > > On Mon, Apr 27, 2026 at 6:25=E2=80=AFAM Amit Kapila wrote: > > > Alvaro, others, what is your take on this? > >=20 > > I agree with you here - we should AT LEAST make that an ERROR instead > > of an assert and also check it during cache access (not only during > > the scan because of cache misses). > > But I think it will still be fragile in case of some extensions install= ed. > >=20 > > Anyway... We also have an issue with correctness right now. > >=20 > > I took the old stress test from [0] (the first two) and it fails now, > > even with the fix from [1] ("Possible premature SNAPBUILD_CONSISTENT > > with DB-specific running_xacts"). > >=20 > > It looks like [1] fixes 008_repack_concurrently.pl, but > > 007_repack_concurrently.pl fails anyway, including > >=20 > > pgbench: error: client 1 script 0 aborted in command 10 query 0: > > ERROR: could not create unique index "tbl_pkey_repacknew" > > # DETAIL: Key (i)=3D(383) is duplicated. > > and > > 'pgbench: error: pgbench:client 23 script 0 aborted in command 31 > > query 0: ERROR: division by zero > >=20 > > Last one is not MVCC-related; you can see from the logs that it > > performs something like SELECT (509063) / 0 when the table sum > > changes. > >=20 > > Setting need_shared_catalogs =3D true make them pass, so something is > > wrong with its correctness. >=20 > Thanks for testing again. Whether we keep the "database specific slots" or > not, it'd be good to know what exactly the reason of these errors is. I w= onder > if the feature just exposes a problem that remains shadowed otherwise, du= e to > the contention on replication slot. I'm going to investigate. I think the problem is that with database-specific snapshot, SnapBuildProcessRunningXacts() returns early, w/o adjusting builder->xmin /* * Database specific transaction info may exist to reach CONSISTENT state * faster, however the code below makes no use of it. Moreover, such * record might cause problems because the following normal (cluster-wide) * record can have lower value of oldestRunningXid. In that case, let's * wait with the cleanup for the next regular cluster-wide record. */ if (OidIsValid(running->dbid)) return; and thus some transactions whose XID is below running->oldestRunningXid may continue to be incorrectly considered running. I originally thought that this should not happen because such transactions will be added to the builder's array of committed transactions by SnapBuildCommitTxn() anyway. However, I failed to notice that COMMIT record= of a transaction listed in the xl_running_xacts WAL record is not guaranteed to follow the xl_running_xacts record in WAL. In other words, even if xl_running_xacts is created before a COMMIT record of the contained transaction, it may end up at higher LSN in WAL. So the cleanup I relied on might not take place. I've got no good idea how to fix that. Not sure I'm able to pursue the "database-specific snapshots" feature now. --=20 Antonin Houska Web: https://www.cybertec-postgresql.com