Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvKlA-00EIgF-0y for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Feb 2026 19:41:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvKl8-008S9B-0h for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Feb 2026 19:41:22 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvKl7-008S92-2l for pgsql-hackers@lists.postgresql.org; Wed, 25 Feb 2026 19:41:21 +0000 Received: from mail-wm1-x333.google.com ([2a00:1450:4864:20::333]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vvKl3-00000001Gks-2m3Q for pgsql-hackers@lists.postgresql.org; Wed, 25 Feb 2026 19:41:21 +0000 Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-483487335c2so1175435e9.2 for ; Wed, 25 Feb 2026 11:41:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1772048477; x=1772653277; darn=lists.postgresql.org; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:from:to:cc :subject:date:message-id:reply-to; bh=fqARtQOlY7kV2ShiMa2fenDsdFGIcrwoIvifceO0lCE=; b=ipkxvWxStYt5lhCbRVzeTzkvibpyRCka9BVFBerytyPk66l43DJSAJDOB37b7i+BOD K9tHeHPRv3dboosvzsMJIY0zZ+JeugQpNXktiI7e9/zp6FmZjGXGxHJBNdSdmC8tcbvf 82Fjo+X+xiua/14luWTgUcCdOSkXK1ZQAlUyP8Kw9be+2ALblpsp0egcZJzT0NZ4f5+L wUivY4tCSoLdUU6fy0v0buYOUf3n+nVyAmLzrsNZ8EMjdVTgnwK2+Wpz36SenvL/qRPg +TUMN6pOH4UiGw6YqFC5a1xVKW/+hKCKszeAM/hR4ADZi/C9WSvrhVQuGCtyFD5eLXOm SPuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772048477; x=1772653277; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fqARtQOlY7kV2ShiMa2fenDsdFGIcrwoIvifceO0lCE=; b=QPdWsmlMA/9u5hHYn51Gaq9AzMjQdo6jetzV/xU2gm7R/AfC8GUMycf/QqpQPT/UVV 2MmCTDw7pKAnKbAU4YAdK2ailwN5DdH6S+mfMB6nyTbEXuqg7dQYN6+wG3sCVCrCLuS7 b+vFbjbyxECuxsNJKNCSuqTrvZcJ3r7EQzFENGV6hpY2DQgYcfEFQc9O8jGPQZ4+lHat /3ttXh5nCr5kme3H0puNvOZPP1x9EgNQHPAYcbW3Vh+H8G0TtW1Dzvk+0MdokCBYNDmi QhUs55QNbBgWXTkXSLY9u492pffcO+bDgGHf7H03JoavaLIlpTr7pFaEOxY/d+a+63e+ HDUg== X-Forwarded-Encrypted: i=1; AJvYcCVtkiuHwowTvQR/gBNikpEb/mRxQOcgIbQqp+B1Cdxwf5nsK/Vx00Ay9HIMzb8CJNucvYGDs3u1Ucqzlqje@lists.postgresql.org X-Gm-Message-State: AOJu0YzTHRrEne6C1Thk0xBaEjpgID5M1IAc4imIktxd48qMGnVhhJRO M6dKB22YKagdaB1Srb0LKWebOLQFkQfHziisH/t9gXi0NEOA7//9J5PNZm/64LZ0lMw= X-Gm-Gg: ATEYQzzZAM1nOMLcNddw8HbbPoZSZrvac3iVE/GNYFNuAPAymWflu2o2dEoFSFLuqBB r7j2ZmzHLqmTxGTW/hbLxOVEDjP9PiJCIy1fLf4AYfm7VpQtqKUw60L48NwJu0USKzEv4BOnuk9 EHTl+fX6KZyGzTh9v8IuWCgcuxJBoYbpihYc1C6pOO5PG03Wv9OPoKP8iPYteF9tuVG3SJTlDMq 0C9n/ji9T0yHpsOt24Tm8rE6WmN8frOURwOYkgRXmx2h5KyggQwtEQn+XKpsw+hs3vgMDaPeKag apj6yOM/c3XcTi5DQ44Qv3+EvCybvY2vzCYHas4W+KOW53pyLG829y+AZs9+7PI4xFYD0Z+k7xO X+4xQ+D1aVvocnkMb7M2bFFACEb9WqL9MUu/Tp5ABlZlWOxH+hyNx6spOYTMmmabcouPqoHUxev Axdr+0OkAhk9xl9yroZMuL6bLFdVRswni/uJjT X-Received: by 2002:a05:600c:3b0a:b0:480:4ae2:def1 with SMTP id 5b1f17b1804b1-483a95be7c9mr342843565e9.13.1772048476941; Wed, 25 Feb 2026 11:41:16 -0800 (PST) Received: from localhost (109-81-168-142.rct.o2.cz. [109.81.168.142]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-483bfcbd781sm37311935e9.8.2026.02.25.11.41.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Feb 2026 11:41:16 -0800 (PST) From: Antonin Houska To: Srinath Reddy Sadipiralla cc: alvherre@alvh.no-ip.org, Mihail Nikalayeu , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: References: <202602241757.6ac3iss2u4vo@alvherre.pgsql> <9116.1772009759@localhost> Comments: In-reply-to Srinath Reddy Sadipiralla message dated "Wed, 25 Feb 2026 19:25:42 +0530." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <100247.1772048475.1@localhost> Content-Transfer-Encoding: quoted-printable Date: Wed, 25 Feb 2026 20:41:15 +0100 Message-ID: <100248.1772048475@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Srinath Reddy Sadipiralla wrote: > I did stress testing on v35 patches, where I did concurrency test using > pgbench with 50 concurrent clients, 4 threads with the below pgbench > script (dual_chaos.sql) on the following table setup(setup.sql). > I ran pgbench with 5M rows for 10 minutes and 50M for ~45 minutes > multiple times. REPACK (concurrently) ran successfully except "once"(see= below). > I created a shadow/clone table to use for checking the correctness after= doing > the concurrency test.I used 4 checks to verify that data is intact and > REPACK (concurrently) ran successfully. > = > 1) table file OID(relfilenode) swapped? > 2) bloat gone? victim relation size should be less than > shadow relation size. > 3) using FULL JOIN logic (borrowed from repack.spec, with small change) > against the shadow table which goes under the same concurrent ops > done on the victim table , basically doing dual writes (see dual_chaos.s= ql) to > verify table data integrity. > 4) Physical Index Integrity (amcheck) (borrowed from Mihail's tests) Thanks! > The concurrency test failed once. I tried to reproduce the below scenari= o > but no luck,i think the reason the assert failure happened because > after speculative insert there might be no spec CONFIRM or ABORT, though= ts? Perhaps, I'll try. I'm not sure the REPACK decoding worker does anthing special regarding decoding. If you happen to see the problem again, please= try to preserve the related WAL segments - if this is a bug in PG executor, pg_waldump might reveal that. > Crash Test: > i did crash test using debugger using a breakpoint inside apply_concurre= nt_changes > to simulate a crash while concurrent changes are being done, after few c= oncurrent changes > are done , i crashed the server using "pg_ctl -m immediate stop", then r= estarted the server, > i observed that REPACK (concurrently) didn't completed (expected), files= were not swapped and data > on the victim table is intact checked using FULL JOIN with shadow table,= but there are > some leftovers of the transient table we used for REPACK (concurrently) = such as > 1) transient table's relation files - these consume extra space , i thin= k this was the > case with VACUUM FULL previously, so these has to be removed manually , = but > I think this time we have a "leverage" which we can use to remove the ex= tra space. > 2) transient table's WALs - these are generated because of concurrent ch= anges done while > applying the logical decoded changes on the new transient table, i think= this won't be a problem > until they only will get recycled but if they get archived , they are of= no use instead they > consume more space and time during the archival process. VACUUM FULL / CLUSTER also produces (a lot of) WAL, so IMO there's nothing specific about REPACK. Regarding the transient table, I have a draft patch (for future versions) = that creates the transient table in a separate transaction and commits it. (Thi= s is part of the effort to not block the progress of VACUUM xmin horizon. The p= oint is that most of the time REPACK should not have XID assigned.) With this design, each time REPACK starts, it checks (in the pg_depend catalog) if t= he transient table exists for particular table, and if it does, it drops it. > "Leverage" Idea: > i think we can re-use these transient table's relation files and WALs du= ring crash recovery, > so that user don't have to re-run the REPACK (concurrently) after server= has recovered, > for this we might need to write a WAL for REPACK (concurrently) to let s= tartup process > know REPACK (concurrently) occurred which sets a flag, so at the end of = startup process > all the WALs of the transient table are already applied so transient tab= le perfect now , > at the end we can do swapping (finish_heap_swap) after checking the flag= , these are > all my initial thoughts on this idea to reuse the "residue" files of the= transient table. > I could be totally wrong :) Please correct me if I am. I think it'd be quite difficult to restart REPACK exactly at the point it crashed. Especially if the tables are unlocked between server restart and = the restart of REPACK. > i think we need to update this statement in repack.sgml regarding wal_le= vel > > > The wal_level > configuration parameter is less than logical. > > > because of this commit POC: enable logical decoding when wal_level =3D '= replica' without a server restart (67c2097) I'm aware of this commit and already updated regression tests, however for= got to update the user documentation. Thanks for reminder. -- = Antonin Houska Web: https://www.cybertec-postgresql.com