Re: Adding REPACK [concurrently]

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Adding REPACK [concurrently]
18+ messages / 7 participants
[nested] [flat]

* Re: Adding REPACK [concurrently]
@ 2025-12-04 15:17 David Klika <[email protected]>
  2025-12-04 15:43 ` Re: Adding REPACK [concurrently] Álvaro Herrera <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: David Klika @ 2025-12-04 15:17 UTC (permalink / raw)
  To: [email protected]; [email protected]; +Cc: [email protected]; [email protected]; [email protected]; [email protected]

Hello

Great to hear about this feature.

You speak about table rewrite (suppose a whole-table rewrite). I would 
like to share idea of an alternative approach that also takes into 
account amount of WAL generated during the operation. Applicable to 
non-clustered case only.

Let's consider a large table where 80% blocks are fine (filled enough by 
live tuples). The table could be scanned from the beginning (left side) 
to identify "not enough filled" blocks and also from the end (right 
side) to process live tuples by moving them to the blocks identified 
by the left side scan. The work is over when both scan reaches the same 
position.

Example:

_ stands for filled enough blocks

D stands for blocks with (many) dead tuples

123456789
___DD____

Left scan identifies page #4 and tuples from the right scan (page #9) 
are moved here. The same with tuples from #8 to #5. Two pages from the 
data file are trimmed and (only) pages #4 and #5 are written in WAL, 
others are untouched.

Regards
David

^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2025-12-04 15:17 Re: Adding REPACK [concurrently] David Klika <[email protected]>
@ 2025-12-04 15:43 ` Álvaro Herrera <[email protected]>
  2025-12-04 17:47   ` Re: Adding REPACK [concurrently] Marcos Pegoraro <[email protected]>
  2025-12-08 09:13   ` Re: Adding REPACK [concurrently] David Klika <[email protected]>
  0 siblings, 2 replies; 18+ messages in thread

From: Álvaro Herrera @ 2025-12-04 15:43 UTC (permalink / raw)
  To: David Klika <[email protected]>; +Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Hello David,

Thanks for your interest in this.

On 2025-Dec-04, David Klika wrote:

> Let's consider a large table where 80% blocks are fine (filled enough by
> live tuples). The table could be scanned from the beginning (left side) to
> identify "not enough filled" blocks and also from the end (right side) to
> process live tuples by moving them to the blocks identified by the left side
> scan. The work is over when both scan reaches the same position.

If you only have a small number of pages that have this problem, then
you don't actually need to do anything -- the pages will be marked free
by regular vacuuming, and future inserts or updates can make use of
those pages.  It's not a problem to have a small number of pages in
empty state for some time.

So if you're trying to do this, the number of problematic pages must be
large.

Now, the issue with what you propose is that you need to make either the
old tuples or the new tuples visible to concurrent transactions.  If at
any point they are both visible, or none of them is visible, then you
have potentially corrupted the results that would be obtained by a query
that's scanning the table and halfway through.

The other point is that you need to keep indexes updated.  That is, you
need to make the indexes point to both the old and new, until you remove
the old tuples from the table, then remove those index pointers.
This process bloats the indexes, which is not insignificant, considering
that the number of tuples to process is large.  If there are several
indexes, this makes your process take even longer.

You can fix the concurrency problem by holding a lock on the table that
ensures nobody is reading the table until you've finished.  But we don't
want to have to hold such a lock for long!  And we already established
that the number of pages to check is large, which means you're going to
work for a long time.

So, I'm not really sure that it's practical to implement what you
suggest.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2025-12-04 15:17 Re: Adding REPACK [concurrently] David Klika <[email protected]>
  2025-12-04 15:43 ` Re: Adding REPACK [concurrently] Álvaro Herrera <[email protected]>
@ 2025-12-04 17:47   ` Marcos Pegoraro <[email protected]>
  2025-12-06 10:58     ` Re: Adding REPACK [concurrently] Álvaro Herrera <[email protected]>
  1 sibling, 1 reply; 18+ messages in thread

From: Marcos Pegoraro @ 2025-12-04 17:47 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: David Klika <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Em qui., 4 de dez. de 2025 às 12:43, Álvaro Herrera <[email protected]>
escreveu:

> If you only have a small number of pages that have this problem, then
> you don't actually need to do anything -- the pages will be marked free
> by regular vacuuming, and future inserts or updates can make use of
> those pages.  It's not a problem to have a small number of pages in
> empty state for some time.
>
> So if you're trying to do this, the number of problematic pages must be
> large.


Not necessarily. I have some tables where I like to use CLUSTER
every 2 or 3 months, to reorganize the data based on an index
and consequently load fewer pages with each call. These tables
don't have more than 2 or 3% of dead records, but they are quite
disorganized from the point of view of that index, since the
inserted and updated records don't follow the order I determined.


regards
Marcos


^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2025-12-04 15:17 Re: Adding REPACK [concurrently] David Klika <[email protected]>
  2025-12-04 15:43 ` Re: Adding REPACK [concurrently] Álvaro Herrera <[email protected]>
  2025-12-04 17:47   ` Re: Adding REPACK [concurrently] Marcos Pegoraro <[email protected]>
@ 2025-12-06 10:58     ` Álvaro Herrera <[email protected]>
  0 siblings, 0 replies; 18+ messages in thread

From: Álvaro Herrera @ 2025-12-06 10:58 UTC (permalink / raw)
  To: Marcos Pegoraro <[email protected]>; +Cc: David Klika <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Hello,

On 2025-Dec-04, Marcos Pegoraro wrote:

> Em qui., 4 de dez. de 2025 às 12:43, Álvaro Herrera <[email protected]>
> escreveu:
> 
> > So if you're trying to do this, the number of problematic pages must
> > be large.
> 
> Not necessarily. I have some tables where I like to use CLUSTER every
> 2 or 3 months, to reorganize the data based on an index and
> consequently load fewer pages with each call. These tables don't have
> more than 2 or 3% of dead records, but they are quite disorganized
> from the point of view of that index, since the inserted and updated
> records don't follow the order I determined.

I don't understand what does this have to do with what David was
proposing.  I mean, you're right: if all you want is to CLUSTER, you may
not have an enormous number of pages to get rid of.  But how can you use
the technique he proposes to deal with reordering tuples?  If you just
move the tuples from the end of the table to where some random hole has
appeared, you've not clustered the table at all.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"People get annoyed when you try to debug them."  (Larry Wall)





^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2025-12-04 15:17 Re: Adding REPACK [concurrently] David Klika <[email protected]>
  2025-12-04 15:43 ` Re: Adding REPACK [concurrently] Álvaro Herrera <[email protected]>
@ 2025-12-08 09:13   ` David Klika <[email protected]>
  1 sibling, 0 replies; 18+ messages in thread

From: David Klika @ 2025-12-08 09:13 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Hello Alvaro

Thank you for the detailed analysis.

Dne 04.12.2025 v 16:43 Álvaro Herrera napsal(a):
> Hello David,
>
> Thanks for your interest in this.
>
> On 2025-Dec-04, David Klika wrote:
>
>> Let's consider a large table where 80% blocks are fine (filled enough by
>> live tuples). The table could be scanned from the beginning (left side) to
>> identify "not enough filled" blocks and also from the end (right side) to
>> process live tuples by moving them to the blocks identified by the left side
>> scan. The work is over when both scan reaches the same position.
> If you only have a small number of pages that have this problem, then
> you don't actually need to do anything -- the pages will be marked free
> by regular vacuuming, and future inserts or updates can make use of
> those pages.  It's not a problem to have a small number of pages in
> empty state for some time.
>
> So if you're trying to do this, the number of problematic pages must be
> large.

I agree, I had in mind about 20-40% of the table that could have tenths 
of GB.

> Now, the issue with what you propose is that you need to make either the
> old tuples or the new tuples visible to concurrent transactions.  If at
> any point they are both visible, or none of them is visible, then you
> have potentially corrupted the results that would be obtained by a query
> that's scanning the table and halfway through.

When performing a tuple movement from a (right) page to a (left) page, 
both of pages must be hold in shared buffers. I suppose the other 
processes scanning the table also access the table data through the 
shared buffers so the movement could be handled at this level. If the 
tuple movement does not change its xid, it wouldn't even have to be in 
conflict with other transactions that locked/modified the tuple (in 
buffer cache again, just changing the physical location). Looks like 
something dirty...

> The other point is that you need to keep indexes updated.  That is, you
> need to make the indexes point to both the old and new, until you remove
> the old tuples from the table, then remove those index pointers.
> This process bloats the indexes, which is not insignificant, considering
> that the number of tuples to process is large.  If there are several
> indexes, this makes your process take even longer.
>
> You can fix the concurrency problem by holding a lock on the table that
> ensures nobody is reading the table until you've finished.  But we don't
> want to have to hold such a lock for long!  And we already established
> that the number of pages to check is large, which means you're going to
> work for a long time.
> So, I'm not really sure that it's practical to implement what you
> suggest.

I agree. Proposed tuple shuffle might work better compared to the 
current VACUUM FULL (i.e. blocking non-clustered maintenance) but I 
understand that you prefer an universal method of data files maintenance 
(the concurrent variant will be amazing).

Regards David






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-05-19 18:52 Alvaro Herrera <[email protected]>
  2026-05-23 15:29 ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: Alvaro Herrera @ 2026-05-19 18:52 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Andres Freund <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On 2026-May-14, Amit Kapila wrote:

> The broader issue is that the entire logical decoding mechanism is
> designed to process cluster-wide transactions. This patch tries to
> bypass that foundational assumption, but only during the initial
> snapshot construction while processing running_xacts record.
> 
> To be clear, I am not against the idea of db-specific snapshots to
> enable concurrent repacks. My concern is simply the time required to
> get the architecture right. In its current state, we need more time to
> carefully consider how this db-specific concept interacts with the
> rest of the logical decoding machinery, which is built for
> cluster-wide records.

Hmm.  So at this point I have to admit that the time I'll have before
beta 1 is going to be very scarce.  You're probably right that it's
better to revert db-specific snapshots in pg19, and try again for 20.
The revert should be a simple patch.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-19 18:52 Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
@ 2026-05-23 15:29 ` Amit Kapila <[email protected]>
  2026-05-24 11:29   ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: Amit Kapila @ 2026-05-23 15:29 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Andres Freund <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Tue, May 19, 2026 at 11:52 AM Alvaro Herrera <[email protected]> wrote:
>
> On 2026-May-14, Amit Kapila wrote:
>
> > The broader issue is that the entire logical decoding mechanism is
> > designed to process cluster-wide transactions. This patch tries to
> > bypass that foundational assumption, but only during the initial
> > snapshot construction while processing running_xacts record.
> >
> > To be clear, I am not against the idea of db-specific snapshots to
> > enable concurrent repacks. My concern is simply the time required to
> > get the architecture right. In its current state, we need more time to
> > carefully consider how this db-specific concept interacts with the
> > rest of the logical decoding machinery, which is built for
> > cluster-wide records.
>
> Hmm.  So at this point I have to admit that the time I'll have before
> beta 1 is going to be very scarce.  You're probably right that it's
> better to revert db-specific snapshots in pg19, and try again for 20.
>

Sounds reasonable.

> The revert should be a simple patch.
>

I also think so.

-- 
With Regards,
Amit Kapila.






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-19 18:52 Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-23 15:29 ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
@ 2026-05-24 11:29   ` Alvaro Herrera <[email protected]>
  2026-06-03 17:27     ` Re: Adding REPACK [concurrently] Mihail Nikalayeu <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: Alvaro Herrera @ 2026-05-24 11:29 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Andres Freund <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On 2026-May-23, Amit Kapila wrote:

> > The revert should be a simple patch.
> 
> I also think so.

Okay, pushed the revert after seeing it pass CI:
  https://cirrus-ci.com/build/5520722497372160

Thanks!

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-19 18:52 Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-23 15:29 ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  2026-05-24 11:29   ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
@ 2026-06-03 17:27     ` Mihail Nikalayeu <[email protected]>
  0 siblings, 0 replies; 18+ messages in thread

From: Mihail Nikalayeu @ 2026-06-03 17:27 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Amit Kapila <[email protected]>; Antonin Houska <[email protected]>; Andres Freund <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Hello, everyone!

I think it is a good time to remind we still have a potential deadlock
issue leading to loosing vast amount of REPACK work.

Best soltuion I was able to craft so far is [1].

Best regards,
Mikhail.

P.S.
I'll continue working on the repack stress test suite in the near future.

[1]: https://www.postgresql.org/message-id/flat/CADzfLwXbUWuS6H4uJEFVL1jS1kzsVnuJ%2BzX1%2BtAEhQxBnEiGKw%4...






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* RE: Adding REPACK [concurrently]
@ 2026-05-25 06:26 Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: Zhijie Hou (Fujitsu) @ 2026-05-25 06:26 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; Alvaro Herrera <[email protected]>; +Cc: Amit Kapila <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>



> -----Original Message-----
> From: Antonin Houska <[email protected]>
> Sent: Friday, April 10, 2026 9:22 PM
> To: Hou, Zhijie/侯 志杰 <[email protected]>
> Cc: Alvaro Herrera <[email protected]>; Amit Kapila
> <[email protected]>; Kuroda, Hayato/黒田 隼人
> <[email protected]>; Srinath Reddy Sadipiralla
> <[email protected]>; Mihail Nikalayeu <[email protected]>;
> Matthias van de Meent <[email protected]>; Pg Hackers
> <[email protected]>; Robert Treat <[email protected]>
> Subject: Re: Adding REPACK [concurrently]
> 
> Zhijie Hou (Fujitsu) <[email protected]> wrote:
> 
> > When testing REPACK concurrently, I noticed that all WALs are retained
> > from the moment REPACK begins copying data to the new table until the
> > command finishes replaying concurrent changes on the new table and
> > stops the repack decoding worker.
> >
> > I understand the reason: the REPACK command itself starts a
> > long-running transaction, and logical decoding does not advance
> > restart_lsn beyond the oldest running transaction's start position. As
> > a result, slot.restart_lsn remains unchanged, preventing the checkpointer
> from recycling WALs.
> 
> I think you're right, sorry for the omission.
> 
> > IIUC, REPACK without using concurrent option does not have this issue.
> 
> It does not have the WAL recycling issue because it does not need to read
> WAL. However it also runs in a long transaction. Even though it does not need
> XID for the actual heap rewriting, it gets one at the moment it locks the table
> using AccessExclusiveLock (which is at the very beginning).
> 
> > Given that we do not restart a REPACK, I think the repack decoding
> > worker should be able to advance restart_lsn each time after writing
> > changes (similar to how a physical slot behaves). To illustrate this,
> > I've written a patch (attached) that implements this approach, and it works
> fine for me.
> 
> LGTM, thanks!
> 

Thanks for reviewing!

After listening to the REPACK talk at pgconf.dev this year, I understand that
WAL accumulation during REPACK CONCURRENTLY is not intended behavior. I think we
can consider fixing this in the current release. Attached is the rebased
patch, with comments adjusted based on Chao Li's comments.

Best Regards,
Hou zj


Attachments:

  [application/octet-stream] v2-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch (3.0K, 2-v2-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch)
  download | inline diff:
From 87ba918b9020ce9fa0d4852f222221f521d8701f Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Fri, 10 Apr 2026 16:24:55 +0800
Subject: [PATCH v2] Allow old WAL recycling during REPACK CONCURRENTLY

During REPACK CONCURRENTLY, logical decoding can keep replication
slot.restart_lsn pinned behind the oldest running transaction, which is often
the long-lived REPACK transaction itself. As a result, old WAL segments are
retained longer than necessary.

This commit advances the replication slot each time WAL insertion crosses a
segment boundary, so obsolete WAL files can be recycled while REPACK is still
running.

This change does not advance catalog_xmin. REPACK already holds a snapshot that
prevents catalog dead tuple removal, so catalog_xmin handling can be addressed
independently.
---
 src/backend/commands/repack_worker.c      | 14 +++++++++++++-
 src/backend/replication/logical/logical.c |  8 +++++++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/repack_worker.c b/src/backend/commands/repack_worker.c
index b84041372b8..eed69d36508 100644
--- a/src/backend/commands/repack_worker.c
+++ b/src/backend/commands/repack_worker.c
@@ -397,12 +397,24 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 
 			/*
 			 * If WAL segment boundary has been crossed, inform the decoding
-			 * system that the catalog_xmin can advance.
+			 * system that the slot can advance.
+			 *
+			 * Once REPACK begins copying data to the new table, the logical
+			 * decoding machinery prevents the slot from advancing beyond the
+			 * oldest running transaction (which is the REPACK transaction
+			 * itself). As a result, restart_lsn and catalog_xmin can no
+			 * longer advance automatically.
+			 *
+			 * To allow old WAL files to be recycled, we manually advance the
+			 * slot each time a WAL segment boundary is crossed. We do not
+			 * advance catalog_xmin here because the REPACK transaction anyway
+			 * holds a snapshot that prevents catalog dead tuple removal.
 			 */
 			end_lsn = ctx->reader->EndRecPtr;
 			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
 			if (segno_new != repack_current_segment)
 			{
+				LogicalIncreaseRestartDecodingForSlot(end_lsn, end_lsn);
 				LogicalConfirmReceivedLocation(end_lsn);
 				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
 					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index a33a685dcc6..11b64de3d90 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1913,8 +1913,14 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 			SpinLockRelease(&MyReplicationSlot->mutex);
 
 			ReplicationSlotsComputeRequiredXmin(false);
-			ReplicationSlotsComputeRequiredLSN();
 		}
+
+		/*
+		 * Now the new restart_lsn is safely on disk, recompute the global WAL
+		 * retention requirement.
+		 */
+		if (updated_restart)
+			ReplicationSlotsComputeRequiredLSN();
 	}
 	else
 	{
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
@ 2026-05-26 15:31 ` Alvaro Herrera <[email protected]>
  2026-05-27 08:08   ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: Alvaro Herrera @ 2026-05-26 15:31 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: Antonin Houska <[email protected]>; Amit Kapila <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On 2026-May-25, Zhijie Hou (Fujitsu) wrote:

> After listening to the REPACK talk at pgconf.dev this year, I understand that
> WAL accumulation during REPACK CONCURRENTLY is not intended behavior. I think we
> can consider fixing this in the current release. Attached is the rebased
> patch, with comments adjusted based on Chao Li's comments.

You're right, this is a thinko.  I'll look at your patch hoping to get
it pushed shortly.  I wonder if we should add a TAP test to verify that
WAL files are actually removed?  Sounds a bit excessive TBH, but maybe
it isn't really.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)





^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* RE: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
@ 2026-05-27 08:08   ` Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-28 00:31     ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  2026-05-29 22:25     ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  0 siblings, 2 replies; 18+ messages in thread

From: Zhijie Hou (Fujitsu) @ 2026-05-27 08:08 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Amit Kapila <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Tuesday, May 26, 2026 11:32 PM Alvaro Herrera <[email protected]> wrote:
> On 2026-May-25, Zhijie Hou (Fujitsu) wrote:
> 
> > After listening to the REPACK talk at pgconf.dev this year, I
> > understand that WAL accumulation during REPACK CONCURRENTLY is not
> > intended behavior. I think we can consider fixing this in the current
> > release. Attached is the rebased patch, with comments adjusted based on
> Chao Li's comments.
> 
> You're right, this is a thinko.  I'll look at your patch hoping to get it pushed
> shortly.  I wonder if we should add a TAP test to verify that WAL files are
> actually removed?  Sounds a bit excessive TBH, but maybe it isn't really.

I tried a bit, and the test complexity and speed (< 1s) appear to be within
acceptable limits. I'm attaching 0002 as a reference test.

0001 remains unchanged.

Best Regards,
Hou zj


Attachments:

  [application/octet-stream] v3-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch (3.0K, 2-v3-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch)
  download | inline diff:
From 698d6e2e475669ffce16e6b4de04b836130e0ed4 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Fri, 10 Apr 2026 16:24:55 +0800
Subject: [PATCH v3 1/2] Allow old WAL recycling during REPACK CONCURRENTLY

During REPACK CONCURRENTLY, logical decoding can keep replication
slot.restart_lsn pinned behind the oldest running transaction, which is often
the long-lived REPACK transaction itself. As a result, old WAL segments are
retained longer than necessary.

This commit advances the replication slot each time WAL insertion crosses a
segment boundary, so obsolete WAL files can be recycled while REPACK is still
running.

This change does not advance catalog_xmin. REPACK already holds a snapshot that
prevents catalog dead tuple removal, so catalog_xmin handling can be addressed
independently.
---
 src/backend/commands/repack_worker.c      | 14 +++++++++++++-
 src/backend/replication/logical/logical.c |  8 +++++++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/repack_worker.c b/src/backend/commands/repack_worker.c
index b84041372b8..eed69d36508 100644
--- a/src/backend/commands/repack_worker.c
+++ b/src/backend/commands/repack_worker.c
@@ -397,12 +397,24 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 
 			/*
 			 * If WAL segment boundary has been crossed, inform the decoding
-			 * system that the catalog_xmin can advance.
+			 * system that the slot can advance.
+			 *
+			 * Once REPACK begins copying data to the new table, the logical
+			 * decoding machinery prevents the slot from advancing beyond the
+			 * oldest running transaction (which is the REPACK transaction
+			 * itself). As a result, restart_lsn and catalog_xmin can no
+			 * longer advance automatically.
+			 *
+			 * To allow old WAL files to be recycled, we manually advance the
+			 * slot each time a WAL segment boundary is crossed. We do not
+			 * advance catalog_xmin here because the REPACK transaction anyway
+			 * holds a snapshot that prevents catalog dead tuple removal.
 			 */
 			end_lsn = ctx->reader->EndRecPtr;
 			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
 			if (segno_new != repack_current_segment)
 			{
+				LogicalIncreaseRestartDecodingForSlot(end_lsn, end_lsn);
 				LogicalConfirmReceivedLocation(end_lsn);
 				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
 					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index b969caae72e..8b8095bd5d8 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1910,8 +1910,14 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 			SpinLockRelease(&MyReplicationSlot->mutex);
 
 			ReplicationSlotsComputeRequiredXmin(false);
-			ReplicationSlotsComputeRequiredLSN();
 		}
+
+		/*
+		 * Now the new restart_lsn is safely on disk, recompute the global WAL
+		 * retention requirement.
+		 */
+		if (updated_restart)
+			ReplicationSlotsComputeRequiredLSN();
 	}
 	else
 	{
-- 
2.43.0



  [application/octet-stream] v3-0002-Add-a-test-for-repack-concurrently.patch (3.2K, 3-v3-0002-Add-a-test-for-repack-concurrently.patch)
  download | inline diff:
From d97df7c2ecf815907b7e14322f6f8fde09951b9b Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 27 May 2026 15:58:05 +0800
Subject: [PATCH v3 2/2] Add a test for repack concurrently

---
 .../recovery/t/046_checkpoint_logical_slot.pl | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index 66761bf56c1..aa32859dd15 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -226,4 +226,78 @@ is( $standby->safe_psql(
 	"t",
 	'logical slot is not invalidated');
 
+# Verify that the REPACK slot's restart_lsn can advance while REPACK
+# CONCURRENTLY is still running, allowing WAL files to be recycled during this
+# period.
+
+# Create the table to be repacked and populate it with some data.
+$node->safe_psql(
+	'postgres',
+	q{
+CREATE TABLE repack_test(i int PRIMARY KEY, t text);
+INSERT INTO repack_test
+SELECT g, md5(g::text)
+FROM generate_series(1, 100) g;
+});
+
+# Pause the REPACK command in the middle of its execution so that the decoding
+# worker continues running, allowing us to test slot restart_lsn advancement
+# later.
+$node->safe_psql('postgres',
+	q(select injection_points_attach('repack-concurrently-before-lock','wait'))
+);
+
+my $repack = $node->background_psql('postgres');
+$repack->query_until(
+	qr/repack_started/,
+	q(
+\echo repack_started
+REPACK (CONCURRENTLY) repack_test;
+\q
+));
+
+# Wait until REPACK reaches the injection point.
+$node->wait_for_event('client backend', 'repack-concurrently-before-lock');
+
+my $restart_lsn_before = $node->safe_psql('postgres',
+	"SELECT restart_lsn FROM pg_replication_slots WHERE slot_name ~ '^repack_[0-9]+' AND slot_type = 'logical' AND temporary;");
+
+# Verify that the replication slot created by the subscription exists and has a
+# valid restart_lsn.
+ok(defined($restart_lsn_before) && $restart_lsn_before ne '',
+	'REPACK slot has restart_lsn');
+
+my $segment_before = $node->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn_before')");
+my $segment_before_path = $node->data_dir . "/pg_wal/$segment_before";
+ok(-f $segment_before_path,
+	"segment for initial restart_lsn exists: $segment_before");
+
+# Switch WAL file on the primary while REPACK is still running and then force
+# WAL removal/recycling with a checkpoint.
+$node->advance_wal(1);
+
+# Wait until the REPACK slot's restart_lsn advances
+ok( $node->poll_query_until(
+	'postgres', qq[
+    SELECT count(*) > 0
+	FROM pg_replication_slots
+	WHERE slot_name ~ '^repack_[0-9]+'
+	  AND slot_type = 'logical'
+	  AND temporary
+	  AND restart_lsn IS NOT NULL
+	  AND restart_lsn <> '$restart_lsn_before'::pg_lsn]),
+	'REPACK slot restart_lsn advances while command is still running');
+
+$node->safe_psql('postgres', 'CHECKPOINT');
+
+# Test that the old WAL segment was recycled
+ok(!-f $segment_before_path,
+	'old WAL segment was recycled while REPACK CONCURRENTLY was running');
+
+$node->safe_psql('postgres',
+	"SELECT injection_points_wakeup('repack-concurrently-before-lock')");
+
+$repack->quit;
+
 done_testing();
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-27 08:08   ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
@ 2026-05-28 00:31     ` Amit Kapila <[email protected]>
  2026-05-28 03:34       ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  1 sibling, 1 reply; 18+ messages in thread

From: Amit Kapila @ 2026-05-28 00:31 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Antonin Houska <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Wed, May 27, 2026 at 1:08 AM Zhijie Hou (Fujitsu)
<[email protected]> wrote:
>
> 0001 remains unchanged.
>

Few minor comments:
=================
*
+ * To allow old WAL files to be recycled, we manually advance the
+ * slot each time a WAL segment boundary is crossed.

This is  safe only because REPACK creates a temporary slot that is
dropped if REPACK fails — there's no scenario where this slot needs to
restart decoding from
an earlier position while still alive. I feel that is worth a mention.

*
@@ -1910,8 +1910,14 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
  SpinLockRelease(&MyReplicationSlot->mutex);

  ReplicationSlotsComputeRequiredXmin(false);
- ReplicationSlotsComputeRequiredLSN();
  }
+
+ /*
+ * Now the new restart_lsn is safely on disk, recompute the global WAL
+ * retention requirement.
+ */
+ if (updated_restart)
+ ReplicationSlotsComputeRequiredLSN();

This change is not related to this patch, rather we need it even
without this patch, is it worth mentioning in the commit message?

-- 
With Regards,
Amit Kapila.






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-27 08:08   ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-28 00:31     ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
@ 2026-05-28 03:34       ` Amit Kapila <[email protected]>
  2026-05-28 05:18         ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: Amit Kapila @ 2026-05-28 03:34 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Antonin Houska <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Wed, May 27, 2026 at 5:31 PM Amit Kapila <[email protected]> wrote:
>
> On Wed, May 27, 2026 at 1:08 AM Zhijie Hou (Fujitsu)
> <[email protected]> wrote:
> >
> > 0001 remains unchanged.
> >
>
> Few minor comments:
> =================

Commit message says: "This change does not advance catalog_xmin.
REPACK already holds a snapshot that prevents catalog dead tuple
removal, so catalog_xmin handling can be addressed independently.".
Isn't it equally important to advance this, otherwise, for long
running REPACKs dead tuples will be accumulated needlessly? If so, do
we have any ideas to avoid this?

-- 
With Regards,
Amit Kapila.






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* RE: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-27 08:08   ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-28 00:31     ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  2026-05-28 03:34       ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
@ 2026-05-28 05:18         ` Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-29 15:39           ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  0 siblings, 1 reply; 18+ messages in thread

From: Zhijie Hou (Fujitsu) @ 2026-05-28 05:18 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Antonin Houska <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Thursday, May 28, 2026 11:34 AM Amit Kapila <[email protected]> wrote:
> On Wed, May 27, 2026 at 5:31 PM Amit Kapila <[email protected]>
> wrote:
> >
> > On Wed, May 27, 2026 at 1:08 AM Zhijie Hou (Fujitsu)
> > <[email protected]> wrote:
> > >
> > > 0001 remains unchanged.
> > >
> >
> > Few minor comments:
> > =================
> 
> Commit message says: "This change does not advance catalog_xmin.
> REPACK already holds a snapshot that prevents catalog dead tuple removal,
> so catalog_xmin handling can be addressed independently.".
> Isn't it equally important to advance this, otherwise, for long running REPACKs
> dead tuples will be accumulated needlessly? If so, do we have any ideas to
> avoid this?

My understanding is that dead tuple accumulation is common to all long-running
commands (including CLUSTER, VACUUM FULL, and REPACK without CONCURRENTLY). As
long as a command holds a snapshot for a long time while scanning and copying
data, the backend xmin will cause similar accumulation. So, this doesn't seem
like a new issue to me, and given that catalog_xmin only affect tuples in system
catalog which is less harmful, I thought it could be handled independently.
There was a proposal to improve this case in [1]. Sorry if I've missed something.

Attaching the v4 patch which improved the comments and commit message as
suggested.

[1] https://www.postgresql.org/message-id/125085.1775827305%40localhost

Best Regards,
Hou zj


Attachments:

  [application/octet-stream] v4-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch (3.6K, 2-v4-0001-Allow-old-WAL-recycling-during-REPACK-CONCURRENTL.patch)
  download | inline diff:
From 74881e1bf03da8a4772b3bc5a24542b6dfb51042 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Fri, 10 Apr 2026 16:24:55 +0800
Subject: [PATCH v4 1/2] Allow old WAL recycling during REPACK CONCURRENTLY

During REPACK CONCURRENTLY, logical decoding can keep replication
slot.restart_lsn pinned behind the oldest running transaction, which is often
the long-lived REPACK transaction itself. As a result, old WAL segments are
retained longer than necessary.

This commit advances the replication slot each time WAL insertion crosses a
segment boundary, so obsolete WAL files can be recycled while REPACK is still
running.

This change does not advance catalog_xmin. REPACK already holds a snapshot that
prevents catalog dead tuple removal, so catalog_xmin handling can be addressed
independently.

Additionally, this commit improves LogicalConfirmReceivedLocation to compute the
oldest restart LSN whenever slot.restart_lsn is updated. Previously, this
function performed the computation only when catalog_xmin was updated, which was
less problematic because catalog_xmin typically advances in most replication
cases, but not for REPACK.
---
 src/backend/commands/repack_worker.c      | 20 +++++++++++++++++++-
 src/backend/replication/logical/logical.c |  8 +++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/repack_worker.c b/src/backend/commands/repack_worker.c
index 4f82eb46bec..56974cbc1f5 100644
--- a/src/backend/commands/repack_worker.c
+++ b/src/backend/commands/repack_worker.c
@@ -397,12 +397,30 @@ decode_concurrent_changes(LogicalDecodingContext *ctx,
 
 			/*
 			 * If WAL segment boundary has been crossed, inform the decoding
-			 * system that the catalog_xmin can advance.
+			 * system that the slot can advance.
+			 *
+			 * Once REPACK begins copying data to the new table, the logical
+			 * decoding machinery prevents the slot from advancing beyond the
+			 * oldest running transaction (which is the REPACK transaction
+			 * itself). As a result, restart_lsn and catalog_xmin can no
+			 * longer advance automatically.
+			 *
+			 * To allow old WAL files to be recycled, we manually advance the
+			 * slot each time a WAL segment boundary is crossed. This is safe
+			 * because the REPACK slot is temporary and will be dropped
+			 * automatically if the REPACK command fails. There is no scenario
+			 * where this slot needs to restart decoding from an earlier
+			 * position while still alive.
+			 *
+			 * We do not advance catalog_xmin here because the REPACK
+			 * transaction anyway holds a snapshot that prevents catalog dead
+			 * tuple removal.
 			 */
 			end_lsn = ctx->reader->EndRecPtr;
 			XLByteToSeg(end_lsn, segno_new, wal_segment_size);
 			if (segno_new != repack_current_segment)
 			{
+				LogicalIncreaseRestartDecodingForSlot(end_lsn, end_lsn);
 				LogicalConfirmReceivedLocation(end_lsn);
 				elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
 					 (uint32) (end_lsn >> 32), (uint32) end_lsn);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index b969caae72e..8b8095bd5d8 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1910,8 +1910,14 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn)
 			SpinLockRelease(&MyReplicationSlot->mutex);
 
 			ReplicationSlotsComputeRequiredXmin(false);
-			ReplicationSlotsComputeRequiredLSN();
 		}
+
+		/*
+		 * Now the new restart_lsn is safely on disk, recompute the global WAL
+		 * retention requirement.
+		 */
+		if (updated_restart)
+			ReplicationSlotsComputeRequiredLSN();
 	}
 	else
 	{
-- 
2.43.0



  [application/octet-stream] v4-0002-Add-a-test-for-repack-concurrently.patch (3.2K, 3-v4-0002-Add-a-test-for-repack-concurrently.patch)
  download | inline diff:
From 3a82ef6b68280efb1ed008cf436b5a33eed1488f Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 27 May 2026 15:58:05 +0800
Subject: [PATCH v4 2/2] Add a test for repack concurrently

---
 .../recovery/t/046_checkpoint_logical_slot.pl | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index 66761bf56c1..aa32859dd15 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -226,4 +226,78 @@ is( $standby->safe_psql(
 	"t",
 	'logical slot is not invalidated');
 
+# Verify that the REPACK slot's restart_lsn can advance while REPACK
+# CONCURRENTLY is still running, allowing WAL files to be recycled during this
+# period.
+
+# Create the table to be repacked and populate it with some data.
+$node->safe_psql(
+	'postgres',
+	q{
+CREATE TABLE repack_test(i int PRIMARY KEY, t text);
+INSERT INTO repack_test
+SELECT g, md5(g::text)
+FROM generate_series(1, 100) g;
+});
+
+# Pause the REPACK command in the middle of its execution so that the decoding
+# worker continues running, allowing us to test slot restart_lsn advancement
+# later.
+$node->safe_psql('postgres',
+	q(select injection_points_attach('repack-concurrently-before-lock','wait'))
+);
+
+my $repack = $node->background_psql('postgres');
+$repack->query_until(
+	qr/repack_started/,
+	q(
+\echo repack_started
+REPACK (CONCURRENTLY) repack_test;
+\q
+));
+
+# Wait until REPACK reaches the injection point.
+$node->wait_for_event('client backend', 'repack-concurrently-before-lock');
+
+my $restart_lsn_before = $node->safe_psql('postgres',
+	"SELECT restart_lsn FROM pg_replication_slots WHERE slot_name ~ '^repack_[0-9]+' AND slot_type = 'logical' AND temporary;");
+
+# Verify that the replication slot created by the subscription exists and has a
+# valid restart_lsn.
+ok(defined($restart_lsn_before) && $restart_lsn_before ne '',
+	'REPACK slot has restart_lsn');
+
+my $segment_before = $node->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn_before')");
+my $segment_before_path = $node->data_dir . "/pg_wal/$segment_before";
+ok(-f $segment_before_path,
+	"segment for initial restart_lsn exists: $segment_before");
+
+# Switch WAL file on the primary while REPACK is still running and then force
+# WAL removal/recycling with a checkpoint.
+$node->advance_wal(1);
+
+# Wait until the REPACK slot's restart_lsn advances
+ok( $node->poll_query_until(
+	'postgres', qq[
+    SELECT count(*) > 0
+	FROM pg_replication_slots
+	WHERE slot_name ~ '^repack_[0-9]+'
+	  AND slot_type = 'logical'
+	  AND temporary
+	  AND restart_lsn IS NOT NULL
+	  AND restart_lsn <> '$restart_lsn_before'::pg_lsn]),
+	'REPACK slot restart_lsn advances while command is still running');
+
+$node->safe_psql('postgres', 'CHECKPOINT');
+
+# Test that the old WAL segment was recycled
+ok(!-f $segment_before_path,
+	'old WAL segment was recycled while REPACK CONCURRENTLY was running');
+
+$node->safe_psql('postgres',
+	"SELECT injection_points_wakeup('repack-concurrently-before-lock')");
+
+$repack->quit;
+
 done_testing();
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-27 08:08   ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-28 00:31     ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  2026-05-28 03:34       ` Re: Adding REPACK [concurrently] Amit Kapila <[email protected]>
  2026-05-28 05:18         ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
@ 2026-05-29 15:39           ` Amit Kapila <[email protected]>
  0 siblings, 0 replies; 18+ messages in thread

From: Amit Kapila @ 2026-05-29 15:39 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Antonin Houska <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Wed, May 27, 2026 at 10:18 PM Zhijie Hou (Fujitsu)
<[email protected]> wrote:
>
> On Thursday, May 28, 2026 11:34 AM Amit Kapila <[email protected]> wrote:
> > On Wed, May 27, 2026 at 5:31 PM Amit Kapila <[email protected]>
> > wrote:
> > >
> > > On Wed, May 27, 2026 at 1:08 AM Zhijie Hou (Fujitsu)
> > > <[email protected]> wrote:
> > > >
> > > > 0001 remains unchanged.
> > > >
> > >
> > > Few minor comments:
> > > =================
> >
> > Commit message says: "This change does not advance catalog_xmin.
> > REPACK already holds a snapshot that prevents catalog dead tuple removal,
> > so catalog_xmin handling can be addressed independently.".
> > Isn't it equally important to advance this, otherwise, for long running REPACKs
> > dead tuples will be accumulated needlessly? If so, do we have any ideas to
> > avoid this?
>
> My understanding is that dead tuple accumulation is common to all long-running
> commands (including CLUSTER, VACUUM FULL, and REPACK without CONCURRENTLY). As
> long as a command holds a snapshot for a long time while scanning and copying
> data, the backend xmin will cause similar accumulation. So, this doesn't seem
> like a new issue to me, and given that catalog_xmin only affect tuples in system
> catalog which is less harmful, I thought it could be handled independently.
> There was a proposal to improve this case in [1].
>

Fair enough. It makes sense to deal with catalog_xmin separately.

> Attaching the v4 patch which improved the comments and commit message as
> suggested.
>

I haven't tested it but otherwise the code changes looks good to me.

-- 
With Regards,
Amit Kapila.






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* Re: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-27 08:08   ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
@ 2026-05-29 22:25     ` Alvaro Herrera <[email protected]>
  2026-06-01 02:25       ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  1 sibling, 1 reply; 18+ messages in thread

From: Alvaro Herrera @ 2026-05-29 22:25 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: Antonin Houska <[email protected]>; Amit Kapila <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On 2026-May-27, Zhijie Hou (Fujitsu) wrote:

> I tried a bit, and the test complexity and speed (< 1s) appear to be within
> acceptable limits. I'm attaching 0002 as a reference test.
> 
> 0001 remains unchanged.

Pushed 0001, in two parts.  I rewrote the comment in
decode_concurrent_changes() though, hope it ended up okay.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Los cuentos de hadas no dan al niño su primera idea sobre los monstruos.
Lo que le dan es su primera idea de la posible derrota del monstruo."
                                                   (G. K. Chesterton)






^ permalink  raw  reply  [nested|flat] 18+ messages in thread

* RE: Adding REPACK [concurrently]
  2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-26 15:31 ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
  2026-05-27 08:08   ` RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
  2026-05-29 22:25     ` Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
@ 2026-06-01 02:25       ` Zhijie Hou (Fujitsu) <[email protected]>
  0 siblings, 0 replies; 18+ messages in thread

From: Zhijie Hou (Fujitsu) @ 2026-06-01 02:25 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Amit Kapila <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Saturday, May 30, 2026 6:25 AM Alvaro Herrera <[email protected]> wrote:
> 
> On 2026-May-27, Zhijie Hou (Fujitsu) wrote:
> 
> > I tried a bit, and the test complexity and speed (< 1s) appear to be
> > within acceptable limits. I'm attaching 0002 as a reference test.
> >
> > 0001 remains unchanged.
> 
> Pushed 0001, in two parts.  I rewrote the comment in
> decode_concurrent_changes() though, hope it ended up okay.

Thank you for pushing the patches ! The comments look good to me.

In the original 0002 test patch, I realized that I missed to drop existing slots
from previous tests (I did locally but missed to merge into the patch), which
also prevented WAL removal, sorry for that. Attached is the fixed test patch.

Best Regards,
Hou zj


Attachments:

  [application/octet-stream] v5-0001-Add-TAP-test-for-WAL-recycling-during-REPACK-CONC.patch (3.7K, 2-v5-0001-Add-TAP-test-for-WAL-recycling-during-REPACK-CONC.patch)
  download | inline diff:
From 92537befabccbbd8e5e16f14f7cf3f25b4562399 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 27 May 2026 15:58:05 +0800
Subject: [PATCH v5] Add TAP test for WAL recycling during REPACK CONCURRENTLY

Following 45b0298 which allows old WAL recycling during REPACK CONCURRENTLY,
this commit adds a TAP test to verify the intended behavior and helps prevent
future regressions in WAL retention during long-running REPACK CONCURRENTLY
operations.
---
 .../recovery/t/046_checkpoint_logical_slot.pl | 83 +++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl
index 66761bf56c1..778bdf42406 100644
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -226,4 +226,87 @@ is( $standby->safe_psql(
 	"t",
 	'logical slot is not invalidated');
 
+# Clean up old replication slots that might interfere with later WAL removal
+# tests.
+$standby->stop;
+$primary->safe_psql('postgres',
+	q{SELECT pg_drop_replication_slot('slot_logical');
+	SELECT pg_drop_replication_slot('slot_physical');
+	SELECT pg_drop_replication_slot('failover_slot');
+	SELECT pg_drop_replication_slot('phys_slot');});
+
+# Verify that the REPACK slot's restart_lsn can advance while REPACK
+# CONCURRENTLY is still running, allowing WAL files to be recycled during this
+# period.
+
+# Create the table to be repacked and populate it with some data.
+$node->safe_psql(
+	'postgres',
+	q{
+CREATE TABLE repack_test(i int PRIMARY KEY, t text);
+INSERT INTO repack_test
+SELECT g, md5(g::text)
+FROM generate_series(1, 100) g;
+});
+
+# Pause the REPACK command in the middle of its execution so that the decoding
+# worker continues running, allowing us to test slot restart_lsn advancement
+# later.
+$node->safe_psql('postgres',
+	q(select injection_points_attach('repack-concurrently-before-lock','wait'))
+);
+
+my $repack = $node->background_psql('postgres');
+$repack->query_until(
+	qr/repack_started/,
+	q(
+\echo repack_started
+REPACK (CONCURRENTLY) repack_test;
+\q
+));
+
+# Wait until REPACK reaches the injection point.
+$node->wait_for_event('client backend', 'repack-concurrently-before-lock');
+
+my $restart_lsn_before = $node->safe_psql('postgres',
+	"SELECT restart_lsn FROM pg_replication_slots WHERE slot_name ~ '^repack_[0-9]+' AND slot_type = 'logical' AND temporary;");
+
+# Verify that the replication slot created by the subscription exists and has a
+# valid restart_lsn.
+ok(defined($restart_lsn_before) && $restart_lsn_before ne '',
+	'REPACK slot has restart_lsn');
+
+my $segment_before = $node->safe_psql('postgres',
+	"SELECT pg_walfile_name('$restart_lsn_before')");
+my $segment_before_path = $node->data_dir . "/pg_wal/$segment_before";
+ok(-f $segment_before_path,
+	"segment for initial restart_lsn exists: $segment_before");
+
+# Switch WAL file on the primary while REPACK is still running and then force
+# WAL removal/recycling with a checkpoint.
+$node->advance_wal(1);
+
+# Wait until the REPACK slot's restart_lsn advances
+ok( $node->poll_query_until(
+	'postgres', qq[
+    SELECT count(*) > 0
+	FROM pg_replication_slots
+	WHERE slot_name ~ '^repack_[0-9]+'
+	  AND slot_type = 'logical'
+	  AND temporary
+	  AND restart_lsn IS NOT NULL
+	  AND restart_lsn <> '$restart_lsn_before'::pg_lsn]),
+	'REPACK slot restart_lsn advances while command is still running');
+
+$node->safe_psql('postgres', 'CHECKPOINT');
+
+# Test that the old WAL segment was recycled
+ok(!-f $segment_before_path,
+	'old WAL segment was recycled while REPACK CONCURRENTLY was running');
+
+$node->safe_psql('postgres',
+	"SELECT injection_points_wakeup('repack-concurrently-before-lock')");
+
+$repack->quit;
+
 done_testing();
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 18+ messages in thread

end of thread, other threads:[~2026-06-03 17:27 UTC | newest]

Thread overview: 18+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-12-04 15:17 Re: Adding REPACK [concurrently] David Klika <[email protected]>
2025-12-04 15:43 ` Álvaro Herrera <[email protected]>
2025-12-04 17:47   ` Marcos Pegoraro <[email protected]>
2025-12-06 10:58     ` Álvaro Herrera <[email protected]>
2025-12-08 09:13   ` David Klika <[email protected]>
2026-05-19 18:52 Re: Adding REPACK [concurrently] Alvaro Herrera <[email protected]>
2026-05-23 15:29 ` Amit Kapila <[email protected]>
2026-05-24 11:29   ` Alvaro Herrera <[email protected]>
2026-06-03 17:27     ` Mihail Nikalayeu <[email protected]>
2026-05-25 06:26 RE: Adding REPACK [concurrently] Zhijie Hou (Fujitsu) <[email protected]>
2026-05-26 15:31 ` Alvaro Herrera <[email protected]>
2026-05-27 08:08   ` Zhijie Hou (Fujitsu) <[email protected]>
2026-05-28 00:31     ` Amit Kapila <[email protected]>
2026-05-28 03:34       ` Amit Kapila <[email protected]>
2026-05-28 05:18         ` Zhijie Hou (Fujitsu) <[email protected]>
2026-05-29 15:39           ` Amit Kapila <[email protected]>
2026-05-29 22:25     ` Alvaro Herrera <[email protected]>
2026-06-01 02:25       ` Zhijie Hou (Fujitsu) <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox