Re: Adding REPACK [concurrently]

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Adding REPACK [concurrently]
19+ messages / 5 participants
[nested] [flat]

* Re: Adding REPACK [concurrently]
@ 2026-03-17 10:53  Antonin Houska <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Antonin Houska @ 2026-03-17 10:53 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Alvaro Herrera <[email protected]> wrote:

> On 2026-Mar-16, Matthias van de Meent wrote:
> 
> > On Mon, 16 Mar 2026 at 21:15, Antonin Houska <[email protected]> wrote:
> 
> > > Anyway (fortunately?), the concurrent use of slots by REPACK is limited
> > > because, during the initialization of logical decoding, the backend needs to
> > > wait for all the transactions having XID assigned to finish, and these include
> > > the already running REPACK commands. See SnapBuildWaitSnapshot() and callers
> > > if you're interested in details.
> > 
> > Huh, so would you be able to run more than one Repack Concurrently in
> > the same database? ISTM that would not be possible, apart from
> > possibly a mechanism comparable to the SAFE_IN_IC flag (to not wait on
> > those backends).
> 
> Yeah, this sounds kind of bad news ...

Admittedly, it is a problem. I tried to address this in pg_squeeze by
pre-allocating slots when it's clear (due to scheduling) that more than one
table needs to be processed. This was an effort to achieve the best possible
performance rather than a response to complaints of users about low
throughput. Nevertheless, I'm glad I happened to mention it before it's too
late.

Regarding solution, a flag like SAFE_IN_IC alone does not help. The
information that particular transaction is used by REPACK (and therefore it
does not have to be decoded) would need to be propagated to the
xl_running_xacts WAL record too.

The enhancements I wrote for PG 20 (not all of them posted yet) that aim at
eliminating the impact of REPACK on VACUUM xmin horizon should fix this
problem: due to the MVCC-safety (i.e. preserving xmin/xmax of the tuples),
REPACK will not need XID assigned (except for catalog changes, which will
happen in separate transactions), so it won't block the logical decoding setup
of other backends.

So the question is whether we should implement a workaround for PG 19, that
won't be needed in v20.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-17 19:57  Antonin Houska <[email protected]>
  parent: Antonin Houska <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Antonin Houska @ 2026-03-17 19:57 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Antonin Houska <[email protected]> wrote:

> Alvaro Herrera <[email protected]> wrote:
> 
> > On 2026-Mar-16, Matthias van de Meent wrote:
> > 
> > > On Mon, 16 Mar 2026 at 21:15, Antonin Houska <[email protected]> wrote:
> > 
> > > > Anyway (fortunately?), the concurrent use of slots by REPACK is limited
> > > > because, during the initialization of logical decoding, the backend needs to
> > > > wait for all the transactions having XID assigned to finish, and these include
> > > > the already running REPACK commands. See SnapBuildWaitSnapshot() and callers
> > > > if you're interested in details.
> > > 
> > > Huh, so would you be able to run more than one Repack Concurrently in
> > > the same database? ISTM that would not be possible, apart from
> > > possibly a mechanism comparable to the SAFE_IN_IC flag (to not wait on
> > > those backends).
> > 
> > Yeah, this sounds kind of bad news ...
> 
> Admittedly, it is a problem. I tried to address this in pg_squeeze by
> pre-allocating slots when it's clear (due to scheduling) that more than one
> table needs to be processed. This was an effort to achieve the best possible
> performance rather than a response to complaints of users about low
> throughput. Nevertheless, I'm glad I happened to mention it before it's too
> late.
> 
> Regarding solution, a flag like SAFE_IN_IC alone does not help. The
> information that particular transaction is used by REPACK (and therefore it
> does not have to be decoded) would need to be propagated to the
> xl_running_xacts WAL record too.

0007 in the next version tries to implement that.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-18 19:12  Srinath Reddy Sadipiralla <[email protected]>
  parent: Antonin Houska <[email protected]>
  0 siblings, 2 replies; 19+ messages in thread

From: Srinath Reddy Sadipiralla @ 2026-03-18 19:12 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Matthias van de Meent <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Hello,

While i was doing concurrency test onn V41 patches ,i found this crash
because of the assert failure,

TRAP: failed Assert("RelationGetRelid(relation) == ((RepackDecodingState *)
ctx->output_writer_private)->relid"), File: "pgoutput_repack.c", Line: 97,
PID: 397007
postgres: REPACK decoding worker for relation "stress_victim"
(ExceptionalCondition+0x98)[0xaaaad9361698]
/home/srinath/Desktop/pgbuild/lib/postgresql/pgoutput_repack.so(+0xfe8)[0xffff90e00fe8]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x679e14)[0xaaaad9049e14]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x689cd0)[0xaaaad9059cd0]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x68a65c)[0xaaaad905a65c]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x68b2f0)[0xaaaad905b2f0]
postgres: REPACK decoding worker for relation "stress_victim"
(ReorderBufferCommit+0x74)[0xaaaad905b374]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x671ec4)[0xaaaad9041ec4]
postgres: REPACK decoding worker for relation "stress_victim"
(xact_decode+0x1a0)[0xaaaad9040edc]
postgres: REPACK decoding worker for relation "stress_victim"
(LogicalDecodingProcessRecord+0xd4)[0xaaaad9040a80]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x33f558)[0xaaaad8d0f558]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x341ccc)[0xaaaad8d11ccc]
postgres: REPACK decoding worker for relation "stress_victim"
(RepackWorkerMain+0x1ac)[0xaaaad8d11bd4]
postgres: REPACK decoding worker for relation "stress_victim"
(BackgroundWorkerMain+0x2b0)[0xaaaad900d21c]
postgres: REPACK decoding worker for relation "stress_victim"
(postmaster_child_launch+0x1f0)[0xaaaad9012070]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64b974)[0xaaaad901b974]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64bc64)[0xaaaad901bc64]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64a3e4)[0xaaaad901a3e4]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x647648)[0xaaaad9017648]
postgres: REPACK decoding worker for relation "stress_victim"
(PostmasterMain+0x160c)[0xaaaad9016d98]
postgres: REPACK decoding worker for relation "stress_victim"
(main+0x3dc)[0xaaaad8ea7a38]
/lib/aarch64-linux-gnu/libc.so.6(+0x284c4)[0xffff9c5c84c4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffff9c5c8598]
postgres: REPACK decoding worker for relation "stress_victim"
(_start+0x30)[0xaaaad8abc970]
2026-03-16 19:40:21.622 IST [393820] LOG:  background worker "REPACK
decoding worker" (PID 397007) was terminated by signal 6: Aborted
2026-03-16 19:40:21.622 IST [393820] LOG:  terminating any other active
server processes
2026-03-16 19:40:21.632 IST [397036] FATAL:  the database system is in
recovery mode

This crash happens if we run REPACK (concurrently) on a table while a heavy
pgbench workload is concurrently executing multi-table(setup.sql)
transactions(dual_chaos.sql).
It triggers after a few back to back REPACK (concurrently) runs.

i think i found the cause for this crash , because there were some changes
which
slipped under the nose of the change_useless_for_repack filter , which led
some
changes which are not related to the relation which we are currently doing
REPACK (concurrently)
got decoded and added into the reorderbuffer queue, the reason for this
is repacked_rel_locator.relNumber
is by default set to InvalidOid, this is actually set to the target
relation during setup_logical_decoding
but this done after DecodingContextFindStartpoint, in
DecodingContextFindStartpoint changes are not
filtered even if its not related to the target relation , because
rm_decode->change_useless_for_repack->am_decoding_for_repack
where repacked_rel_locator.relNumber is still InvalidOid, which makes it
skip the filtering even its not the target relation,
this makes it to be added to reorder buffer queue, so during the processing
of reorder buffer plugin_change is called
where assert fails, i have attached a diff patch to solve this.

thoughts?

-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/


Attachments:

  [application/octet-stream] fix_diff.norobots (1.6K, 3-fix_diff.norobots)
  download

  [application/octet-stream] dual_chaos.sql (680B, 4-dual_chaos.sql)
  download

  [application/octet-stream] setup.sql (466B, 5-setup.sql)
  download

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-18 20:07  Antonin Houska <[email protected]>
  parent: Srinath Reddy Sadipiralla <[email protected]>
  1 sibling, 0 replies; 19+ messages in thread

From: Antonin Houska @ 2026-03-18 20:07 UTC (permalink / raw)
  To: Srinath Reddy Sadipiralla <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Matthias van de Meent <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Srinath Reddy Sadipiralla <[email protected]> wrote:

> TRAP: failed Assert("RelationGetRelid(relation) == ((RepackDecodingState *) ctx->output_writer_private)->relid"), File: "pgoutput_repack.c",
> Line: 97, PID: 397007

> This crash happens if we run REPACK (concurrently) on a table while a heavy
> pgbench workload is concurrently executing multi-table(setup.sql) transactions(dual_chaos.sql).
> It triggers after a few back to back REPACK (concurrently) runs.
> 
> i think i found the cause for this crash , because there were some changes which
> slipped under the nose of the change_useless_for_repack filter , which led some
> changes which are not related to the relation which we are currently doing REPACK (concurrently)
> got decoded and added into the reorderbuffer queue, the reason for this is repacked_rel_locator.relNumber
> is by default set to InvalidOid, this is actually set to the target relation during setup_logical_decoding
> but this done after DecodingContextFindStartpoint, in DecodingContextFindStartpoint changes are not
> filtered even if its not related to the target relation , because rm_decode->change_useless_for_repack->am_decoding_for_repack
> where repacked_rel_locator.relNumber is still InvalidOid, which makes it skip the filtering even its not the target relation,
> this makes it to be added to reorder buffer queue, so during the processing of reorder buffer plugin_change is called
> where assert fails, i have attached a diff patch to solve this.

Thanks a lot! Yes, your explanation makes sense. I'll include the fix in the
next version. I think it might also explain the other crash [1] you reported
earlier. I'll try to reproduce that.

[1] https://www.postgresql.org/message-id/CAFC%2Bb6o2yzA80YmfEhmMO9puN8qvGRvr-15BBLn3UmJxPfpr2w%40mail.g...

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-20 12:35  Mihail Nikalayeu <[email protected]>
  parent: Srinath Reddy Sadipiralla <[email protected]>
  1 sibling, 2 replies; 19+ messages in thread

From: Mihail Nikalayeu @ 2026-03-20 12:35 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Srinath Reddy Sadipiralla <[email protected]>; Antonin Houska <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Hello, everyone!

Some comments for v43:

 ---
src/backend/commands/cluster.c:3211

slot_attisnull(dest, i) uses a 0-based loop index but slot_attisnull
starts with "Assert(attnum > 0);".
Also, it proves it has not been tested in any way yet.

------------------------------

src/backend/storage/ipc/standby.c:1376

"if (xlrec.xcnt > 0)"
looks like it should be "xlrec.xcnt + xlrec.xcnt_repack > 0" because
of "CurrentRunningXacts->xcnt = count - subcount - count_repack;"

------------------------------

src/backend/storage/ipc/procarray.c:2673-2681

"nrepack = logical_decoding_enabled ? MAX_REPACK_XIDS : 0"
not sure it is a real issue, but if EnableLogicalDecoding is called
somehow after allocating the array - it makes it possible to overrun
the array later by MAX_REPACK_XIDS.

------------------------------

src/backend/commands/cluster.c:3006

"apply_cxt = AllocSetContextCreate(TopTransactionContext,
                           "REPACK - apply",
                           ALLOCSET_DEFAULT_SIZES);"

Should we do it before the "MakeSingleTupleTableSlot" calls?
Because MakeSingleTupleTableSlot stored CurrentMemoryContext in tts_mcxt.

------------------------------

src/backend/commands/cluster.c:3187

if (natt_ext != 0)
    elog(WARNING, "have natt_ext %d, weird", natt_ext);

Should we make that ERROR?

------------------------------

src/backend/access/rmgrdesc/standbydesc.c:24

appendStringInfo(buf, "nextXid %u latestCompletedXid %u
oldestRunningXid %u oldestRunningXid %u",

"oldestRunningXidLogical" at the end?

------------------------------

src/backend/catalog/index.c:766,1464-1469

"bool      progress = (flags & INDEX_CREATE_REPORT_PROGRESS) != 0;"

AFAIU gin, hash and btree (at least) still just unconditionally write
PROGRESS_CREATEIDX_* progress.
------------------------------

src/backend/catalog/toasting.c:334

"INDEX_CREATE_IS_PRIMARY | INDEX_CREATE_REPORT_PROGRESS, 0,"

Should we add the "progress" flag too here and move it from make_new_heap?

-------------------------------

Also just, gently remind you about [1] and a simple way to avoid any
unnoticed MVCC-related issues with REPACK proposed here [2].

Best regards,
Mikhail.

[1]: https://www.postgresql.org/message-id/flat/5k2dfckyp6zv2fiovosvtbya5onvplgviz5n4kdamxupff4vi2%40yytz...
[2]: https://www.postgresql.org/message-id/[email protected]...





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-25 02:57  Srinath Reddy Sadipiralla <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread

From: Srinath Reddy Sadipiralla @ 2026-03-25 02:57 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Mihail Nikalayeu <[email protected]>; Antonin Houska <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Hallo Alvaro,

On Wed, Mar 25, 2026 at 4:02 AM Alvaro Herrera <[email protected]>
wrote:

>
> Many thanks for the review.  I have applied fixes for these, so here's
> v44.
>

Thanks for the patches.

- 0004 is Antonin's bugfix from the crash reported by Srinath.
>

I think it's "0004 is Srinath's bugfix from the crash reported by Srinath."
;-)
after i provided the analysis and fix for the crash[1], Antonin tried to
reproduce
this crash using isolation tester , for this he even proposed changes to
isolation tester (so cool ... btw i reviewed it) [2] .

i have done another round of stress testing on V43 , this time with more
tests,
as i did previously [3] did concurrency test - went well,

crash test:
after crash i observed that repack worker files are cleaned during server
restart
using RemovePgTempFiles but the transient table relation files remains
not cleaned up, maybe we can do cleanup for this as well during server
restart,
I will think about this more.

physical replication test where I did REPACK (concurrently) on primary and
checked if data is intact using the 4 verifications I did here [3] on
replica - went well.

Then as suggested by Alvaro off-list I checked the lock upgrade behavior
during the table swap phase. I observed that if another transaction holds a
conflicting lock on the table when the swap is attempted, it can lead to
“transient table” data loss during a manual or timeout abort.
when a REPACK (concurrent) waits for a conflicting lock to be released and
eventually hits a
lock_timeout (or is cancelled via ctrl+c), the transaction aborts. During
this abort,
the cleanup process triggers smgrDoPendingDeletes. This results in the
removal
of all transient table relfiles and decoder worker files created during the
process.
This effectively wipes out the work done by the transient table creation
before
the swap could successfully complete, this happens because during transient
table creation we add the table to the PendingRelDelete list.

rebuild_relation→make_new_heap->heap_create_with_catalog→heap_create→table_relation_set_new_filelocator→RelationCreateStorage
/*
* Add the relation to the list of stuff to delete at abort, if we are
* asked to do so.
*/
if (register_delete)
{
PendingRelDelete *pending;
pending = (PendingRelDelete *)
MemoryContextAlloc(TopMemoryContext, sizeof(PendingRelDelete));
pending->rlocator = rlocator;
pending->procNumber = procNumber;
pending->atCommit = false; /* delete if abort */
pending->nestLevel = GetCurrentTransactionNestLevel();
pending->next = pendingDeletes;
pendingDeletes = pending;
}
and the base/5/pgsql_tmp/ files also gets unlinked during the decoding
worker cleanup,
I think this cleanup of transient table relfiles and decoder files makes
sense because
we don’t have any resume operation in which we can re-use the transient
table’s files,
please correct me if I am not getting your point here.

test scenario:

session 1:
postgres=# repack (concurrently) stress_victim;
had a breakpoint rebuild_relation_finish_concurrent->
LockRelationOid(old_table_oid, AccessExclusiveLock); just before getting
the exclusive lock.
with lock_timeout = 5s

session 2:
postgres=# BEGIN;
SELECT * FROM stress_victim LIMIT 1;
-- left it open
BEGIN
 id  | balance |
           payload
-----+---------+---------------------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
--------------
 170 |      65 | d12f400c4d0d3c49818f88597e16cf29
d12f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c498
18f88597e16cf29d12f400c4d0d3c49818f88597e16cf29d1
2f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c49818
f88597e16cf29
(1 row)
-- this gets us a conflicting lock (AccessShareLock) on the same table,
REPACK (concurrently) is running on.

session 1:
release the breakpoint and now the backend waits for the conflicting lock
to be released.
in between if lock_timeout occurs then transaction aborts.
postgres=# repack (concurrently) stress_victim;
ERROR:  canceling statement due to lock timeout
CONTEXT:  waiting for AccessExclusiveLock on relation 16637 of database 5

Now we can see the transient table relfiles and decoder worker files
getting cleaned up.

[1] -
https://www.postgresql.org/message-id/CAFC%2Bb6qk3-DQTi43QMqvVLP%2BsudPV4vsLQm5iHfcCeObrNaVyA%40mail...
[2] - https://www.postgresql.org/message-id/flat/4703.1774250534%40localhost
[3] -
https://www.postgresql.org/message-id/CAFC%2Bb6o2yzA80YmfEhmMO9puN8qvGRvr-15BBLn3UmJxPfpr2w%40mail.g...

-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-27 16:12  Antonin Houska <[email protected]>
  parent: Mihail Nikalayeu <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread

From: Antonin Houska @ 2026-03-27 16:12 UTC (permalink / raw)
  To: Mihail Nikalayeu <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Antonin Houska <[email protected]> wrote:

> Mihail Nikalayeu <[email protected]> wrote:

> > src/backend/catalog/index.c:766,1464-1469
> > 
> > "bool      progress = (flags & INDEX_CREATE_REPORT_PROGRESS) != 0;"
> > 
> > AFAIU gin, hash and btree (at least) still just unconditionally write
> > PROGRESS_CREATEIDX_* progress.
> 
> I think you're right, I'll check it.

I concluded this is a problem that exists for quite a while and should be
fixed separately. Currently I don't see conflicts of parameter indexes between
PROGRESS_COMMAND_REPACK and PROGRESS_COMMAND_CREATE_INDEX. There would be some
if the index was built with the CONCURRENTLY option, but REPACK uses normal
index build.

I tried to write a patch that allows progress tracking of two commands at the
same time (a "main command" and a "subcommand"), but regression tests revealed
that contrib/file_fdw is broken in a way that I could not fix easily: during
execution of a join, two COPY FROM commands are executed at the same time and
they overwrite the status of each other. Unlike the concept of a sub-command,
we cannot assume here that the command that started the reporting as the
second will stop as the first. Thus in pgstat_progress_end_command() we cannot
figure out which node is trying to stop the reporting.

It needs more work, I can get back to it after PG 19 feature freeze.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-31 10:47  Amit Kapila <[email protected]>
  parent: Antonin Houska <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Amit Kapila @ 2026-03-31 10:47 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Fri, Mar 27, 2026 at 10:31 PM Alvaro Herrera <[email protected]> wrote:
>
> Lastly, 0008 is "Teach snapshot builder to skip transactions running
> REPACK (CONCURRENTLY)" which I see as the least mature of the pack.  I
> would really like to be able to squash it with 0003, but I'm not yet
> trusting it enough.
>

Few comments/questions by looking 0008 alone:

1.
+ TransactionId *xids_repack = NULL;
+ bool logical_decoding_enabled = IsLogicalDecodingEnabled();

  Assert(!RecoveryInProgress());

...
...

  /*
  * Allocating space for maxProcs xids is usually overkill; numProcs would
  * be sufficient.  But it seems better to do the malloc while not holding
@@ -2663,11 +2672,14 @@ GetRunningTransactionData(void)
  */
  if (CurrentRunningXacts->xids == NULL)
  {
+ /* FIXME probably fails if logical decoding is enable on-the-fly */
+ int nrepack = logical_decoding_enabled ? MAX_REPACK_XIDS : 0;

This FIXME is important to fix for this patch, otherwise, we can't
rely on transactions remembered as repack_concurrently.

2.
*
+ /*
+ * TODO Consider a GUC to reserve certain amount of replication slots for
+ * REPACK (CONCURRENTLY) and using it here.
+ */
+#define MAX_REPACK_XIDS 16
+

This sounds a bit scary as reserving replication slots for REPACK
(CONCURRENTLY) may not be what users expect. But it is not clear why
replication slots need to be reserved for this.

IIUC, two reasons why logical decoding can ignore REPACK
(CONCURRENTLY) are (a) does not perform any catalog changes relevant
to logical decoding, (b) neither walsenders nor backends performing
logical decoding needs to care sending the WAL generated by REPACK
(CONCURRENTLY). Is that understanding correct? If so, we may want to
clarify why we want to ignore this command's WAL while sending changes
downstream in the commit message or give some reference of the patch
where the same is mentioned. This can help reviewing this patch
independently.

BTW, are we intending to commit this patch series for PG19?

-- 
With Regards,
Amit Kapila.

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-31 17:37  Srinath Reddy Sadipiralla <[email protected]>
  parent: Srinath Reddy Sadipiralla <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Srinath Reddy Sadipiralla @ 2026-03-31 17:37 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Mihail Nikalayeu <[email protected]>; Antonin Houska <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Hi Alvaro,

On Thu, Mar 26, 2026 at 1:42 AM Alvaro Herrera <[email protected]>
wrote:

>
> As for lock upgrade, I wonder if the best way to handle this isn't to
> hack the deadlock detector so that it causes any *other* process to die,
> if they detect that they would block on REPACK.  Arguably there's
> nothing that you can do to a table while its undergoing REPACK
> CONCURRENTLY; any alterations would have to wait until the repacking is
> compelted.  We can implement that idea simply enough, as shown in this
> crude prototype.
>

After testing this, I observed that it solves the scenario where a query is
waiting
on REPACK. For example, if a DROP TABLE requests an AEL and queues
behind REPACK's ShareUpdateExclusiveLock, the deadlock detector comes
when REPACK tries to upgrade to AEL, killing the DROP to prevent the
circular
queue deadlock, But the case I originally mentioned [1] was the reverse:
what
happens if a transaction already holds a lock that conflicts with the
upcoming
AEL upgrade (e.g., an analytical SELECT or an idle-in-transaction holding
an AccessShareLock),
but isn't waiting on REPACK at all?

In this case, there's no circular wait. The deadlock detector never fires.
REPACK
simply queues behind the SELECT, eventually hits its lock_timeout, aborts
and
cleans up.Initially, I thought this cleanup was expected behavior. But
after seeing
your solution to protect REPACK from losing its transient table work, I
thought it's "not expected".
If the goal is to prevent REPACK's work from being wasted, should we error
out
the backend that is making REPACK wait during the final swap phase? I am
thinking
of something conceptually similar to
ResolveRecoveryConflictWithLock,actively
cancelling the conflicting session to allow the AEL upgrade to proceed.
Thoughts?

test scenario:

session 1:
postgres=# repack (concurrently) stress_victim;
had a breakpoint rebuild_relation_finish_concurrent->
LockRelationOid(old_table_oid, AccessExclusiveLock); just before getting
the exclusive lock.
with lock_timeout = 5s

session 2:
postgres=# BEGIN;
SELECT * FROM stress_victim LIMIT 1;
-- left it open
BEGIN
 id  | balance |
           payload
-----+---------+---------------------------------
-------------------------------------------------
-------------------------------------------------
-------------------------------------------------
--------------
 170 |      65 | d12f400c4d0d3c49818f88597e16cf29
d12f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c498
18f88597e16cf29d12f400c4d0d3c49818f88597e16cf29d1
2f400c4d0d3c49818f88597e16cf29d12f400c4d0d3c49818
f88597e16cf29
(1 row)
-- this gets us a conflicting lock (AccessShareLock) on the same table,
REPACK (concurrently) is running on.

session 1:
release the breakpoint and now the backend waits for the conflicting lock
to be released.
in between if lock_timeout occurs then transaction aborts.
postgres=# repack (concurrently) stress_victim;
ERROR:  canceling statement due to lock timeout
CONTEXT:  waiting for AccessExclusiveLock on relation 16637 of database 5

[1] -
https://www.postgresql.org/message-id/CAFC%2Bb6pK9ogeSpMA8hg18XhC1eNPcsKWBwoC5OySXi4iTxwtRw%40mail.g...

-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 08:42  Amit Kapila <[email protected]>
  parent: Amit Kapila <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Amit Kapila @ 2026-04-01 08:42 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Tue, Mar 31, 2026 at 8:06 PM Antonin Houska <[email protected]> wrote:
>
> Amit Kapila <[email protected]> wrote:
> > 2.
> > *
> > + /*
> > + * TODO Consider a GUC to reserve certain amount of replication slots for
> > + * REPACK (CONCURRENTLY) and using it here.
> > + */
> > +#define MAX_REPACK_XIDS 16
> > +
> >
> > This sounds a bit scary as reserving replication slots for REPACK
> > (CONCURRENTLY) may not be what users expect. But it is not clear why
> > replication slots need to be reserved for this.
>
> The point is that REPACK should not block replication [1]. Thus reserving
> slots for non-REPACK users is probably more precise statement.
>

oh, so shouldn't be a separate patch than this and an important for
this functionality to get committed? I don't see why we need to make
such a GUC or knob as part of this patch if we need the same.

> > IIUC, two reasons why logical decoding can ignore REPACK
> > (CONCURRENTLY) are (a) does not perform any catalog changes relevant
> > to logical decoding, (b) neither walsenders nor backends performing
> > logical decoding needs to care sending the WAL generated by REPACK
> > (CONCURRENTLY). Is that understanding correct? If so, we may want to
> > clarify why we want to ignore this command's WAL while sending changes
> > downstream in the commit message or give some reference of the patch
> > where the same is mentioned. This can help reviewing this patch
> > independently.
>
> Correct, but in fact this diff only affects the setup of the logical decoding
> rather than the decoding itself. On the other hand, if REPACK (CONCURRENTLY)
> starts when the decoding backend's snapshot builder is already in the
> SNAPBUILD_FULL_SNAPSHOT state, reorderbuffer.c processes the transaction
> normally, and another part of the series (v46-0002) ensures that the table
> rewriting is not decoded: REPACK simply tells heap_insert(), heap_update(),
> heap_delete() not to put the extra (replication specific) information into the
> corresponding WAL records. I suppose this is what you mean in (b).
>
> Regarding (a), yes, the absence of catalog changes in the REPACK's transaction
> is the reason that even the logical decoding setup does not have to wait for
> the transaction to finish.
>

Hmm, but we don't do any catalog changes for transactions that have
DML say only INSERT but we don't have separate logic like REPACK in
snapbuild machinery. Same is probably true for dml operations on an
unlogged table which doesn't have WAL to send nor any catalog
operations involved but we don't have any special path for that in
snapbuild code path. So, why do we need special handling for this
operation?

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 09:39  Amit Kapila <[email protected]>
  parent: Srinath Reddy Sadipiralla <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Amit Kapila @ 2026-04-01 09:39 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Antonin Houska <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Tue, Mar 31, 2026 at 11:54 PM Alvaro Herrera <[email protected]> wrote:
>
> On 2026-Mar-31, Srinath Reddy Sadipiralla wrote:
>
> > In this case, there's no circular wait. The deadlock detector never
> > fires.  REPACK simply queues behind the SELECT, eventually hits its
> > lock_timeout, aborts and cleans up.Initially, I thought this cleanup
> > was expected behavior. But after seeing your solution to protect
> > REPACK from losing its transient table work, I thought it's "not
> > expected".
>
> Yeah.  Keep in mind that REPACK could have been running for many hours
> or even days before it reaches the point of acquiring its AEL lock to do
> the final swap; and it may well be critical work.  We do not want to
> lose it.  So whatever is waiting to obtain a lock on the table, or
> already has a lock on the table, has to yield.
>
> > If the goal is to prevent REPACK's work from being wasted, should we
> > error out the backend that is making REPACK wait during the final swap
> > phase? I am thinking of something conceptually similar to
> > ResolveRecoveryConflictWithLock, actively cancelling the conflicting
> > session to allow the AEL upgrade to proceed.
>
> Something like that might be appropriate, yeah.
>

What about if the blocking process is an autovacumm that is working to
prevent XID wraparound? I think we already avoid killing it in such
cases. BTW, one can say that cancelling a long-running report query
also wastes a lot of effort of the user generating such a report. Why
can't REPACK wait for such a select to finish instead of cancelling
it?

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 09:43  Antonin Houska <[email protected]>
  parent: Amit Kapila <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Antonin Houska @ 2026-04-01 09:43 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Amit Kapila <[email protected]> wrote:

> On Tue, Mar 31, 2026 at 8:06 PM Antonin Houska <[email protected]> wrote:
> >
> > Amit Kapila <[email protected]> wrote:
> > > 2.
> > > *
> > > + /*
> > > + * TODO Consider a GUC to reserve certain amount of replication slots for
> > > + * REPACK (CONCURRENTLY) and using it here.
> > > + */
> > > +#define MAX_REPACK_XIDS 16
> > > +
> > >
> > > This sounds a bit scary as reserving replication slots for REPACK
> > > (CONCURRENTLY) may not be what users expect. But it is not clear why
> > > replication slots need to be reserved for this.
> >
> > The point is that REPACK should not block replication [1]. Thus reserving
> > slots for non-REPACK users is probably more precise statement.
> >
> 
> oh, so shouldn't be a separate patch than this and an important for
> this functionality to get committed? I don't see why we need to make
> such a GUC or knob as part of this patch if we need the same.

REPACK is a new user of replication slots. Without that, there is no other way
to "steal" the slots from the replication users.

> > > IIUC, two reasons why logical decoding can ignore REPACK
> > > (CONCURRENTLY) are (a) does not perform any catalog changes relevant
> > > to logical decoding, (b) neither walsenders nor backends performing
> > > logical decoding needs to care sending the WAL generated by REPACK
> > > (CONCURRENTLY). Is that understanding correct? If so, we may want to
> > > clarify why we want to ignore this command's WAL while sending changes
> > > downstream in the commit message or give some reference of the patch
> > > where the same is mentioned. This can help reviewing this patch
> > > independently.
> >
> > Correct, but in fact this diff only affects the setup of the logical decoding
> > rather than the decoding itself. On the other hand, if REPACK (CONCURRENTLY)
> > starts when the decoding backend's snapshot builder is already in the
> > SNAPBUILD_FULL_SNAPSHOT state, reorderbuffer.c processes the transaction
> > normally, and another part of the series (v46-0002) ensures that the table
> > rewriting is not decoded: REPACK simply tells heap_insert(), heap_update(),
> > heap_delete() not to put the extra (replication specific) information into the
> > corresponding WAL records. I suppose this is what you mean in (b).
> >
> > Regarding (a), yes, the absence of catalog changes in the REPACK's transaction
> > is the reason that even the logical decoding setup does not have to wait for
> > the transaction to finish.
> >
> 
> Hmm, but we don't do any catalog changes for transactions that have
> DML say only INSERT but we don't have separate logic like REPACK in
> snapbuild machinery. Same is probably true for dml operations on an
> unlogged table which doesn't have WAL to send nor any catalog
> operations involved but we don't have any special path for that in
> snapbuild code path. So, why do we need special handling for this
> operation?

If an "ordinary" transaction, which had started before the snapshot builder
reached the SNAPBUILD_FULL_SNAPSHOT state, runs DML, the snapshot builder does
not know if the same transaction changed something in the catalog earlier. So
it needs to wait for this transaction to finish before it advances to
SNAPBUILD_CONSISTENT. For REPACK, we know that it cannot happen because it
cannot run in transaction block for other reasons. So the snapshot builder
does not have to wait.

Nevertheless, I'm not sure it's a good idea for snapbuild.c to handle special
cases like REPACK. Instead, I'm thinking if snapbuild.c can safely ignore XIDs
of backends connected to databases other than the one we're decoding. Thus the
restriction would be one backend running REPACK per database rather than
cluster.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 10:22  Amit Kapila <[email protected]>
  parent: Antonin Houska <[email protected]>
  0 siblings, 2 replies; 19+ messages in thread

From: Amit Kapila @ 2026-04-01 10:22 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Wed, Apr 1, 2026 at 3:13 PM Antonin Houska <[email protected]> wrote:
>
> Amit Kapila <[email protected]> wrote:
>
> > On Tue, Mar 31, 2026 at 8:06 PM Antonin Houska <[email protected]> wrote:
> > >
> > > Amit Kapila <[email protected]> wrote:
> > > > 2.
> > > > *
> > > > + /*
> > > > + * TODO Consider a GUC to reserve certain amount of replication slots for
> > > > + * REPACK (CONCURRENTLY) and using it here.
> > > > + */
> > > > +#define MAX_REPACK_XIDS 16
> > > > +
> > > >
> > > > This sounds a bit scary as reserving replication slots for REPACK
> > > > (CONCURRENTLY) may not be what users expect. But it is not clear why
> > > > replication slots need to be reserved for this.
> > >
> > > The point is that REPACK should not block replication [1]. Thus reserving
> > > slots for non-REPACK users is probably more precise statement.
> > >
> >
> > oh, so shouldn't be a separate patch than this and an important for
> > this functionality to get committed? I don't see why we need to make
> > such a GUC or knob as part of this patch if we need the same.
>
> REPACK is a new user of replication slots. Without that, there is no other way
> to "steal" the slots from the replication users.
>
> > > > IIUC, two reasons why logical decoding can ignore REPACK
> > > > (CONCURRENTLY) are (a) does not perform any catalog changes relevant
> > > > to logical decoding, (b) neither walsenders nor backends performing
> > > > logical decoding needs to care sending the WAL generated by REPACK
> > > > (CONCURRENTLY). Is that understanding correct? If so, we may want to
> > > > clarify why we want to ignore this command's WAL while sending changes
> > > > downstream in the commit message or give some reference of the patch
> > > > where the same is mentioned. This can help reviewing this patch
> > > > independently.
> > >
> > > Correct, but in fact this diff only affects the setup of the logical decoding
> > > rather than the decoding itself. On the other hand, if REPACK (CONCURRENTLY)
> > > starts when the decoding backend's snapshot builder is already in the
> > > SNAPBUILD_FULL_SNAPSHOT state, reorderbuffer.c processes the transaction
> > > normally, and another part of the series (v46-0002) ensures that the table
> > > rewriting is not decoded: REPACK simply tells heap_insert(), heap_update(),
> > > heap_delete() not to put the extra (replication specific) information into the
> > > corresponding WAL records. I suppose this is what you mean in (b).
> > >
> > > Regarding (a), yes, the absence of catalog changes in the REPACK's transaction
> > > is the reason that even the logical decoding setup does not have to wait for
> > > the transaction to finish.
> > >
> >
> > Hmm, but we don't do any catalog changes for transactions that have
> > DML say only INSERT but we don't have separate logic like REPACK in
> > snapbuild machinery. Same is probably true for dml operations on an
> > unlogged table which doesn't have WAL to send nor any catalog
> > operations involved but we don't have any special path for that in
> > snapbuild code path. So, why do we need special handling for this
> > operation?
>
> If an "ordinary" transaction, which had started before the snapshot builder
> reached the SNAPBUILD_FULL_SNAPSHOT state, runs DML, the snapshot builder does
> not know if the same transaction changed something in the catalog earlier. So
> it needs to wait for this transaction to finish before it advances to
> SNAPBUILD_CONSISTENT. For REPACK, we know that it cannot happen because it
> cannot run in transaction block for other reasons. So the snapshot builder
> does not have to wait.
>
> Nevertheless, I'm not sure it's a good idea for snapbuild.c to handle special
> cases like REPACK. Instead, I'm thinking if snapbuild.c can safely ignore XIDs
> of backends connected to databases other than the one we're decoding.
>

What if such transactions have made changes in the global catalog?
Even if that won't matter, I feel such a change would be quite
fundamental to snapbuild machinery and changing at this point would be
risky.

BTW, is the reason to skip REPACK while building a snapshot is that it
can take a long time to finish?

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 11:38  Alvaro Herrera <[email protected]>
  parent: Amit Kapila <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Alvaro Herrera @ 2026-04-01 11:38 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Antonin Houska <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On 2026-Apr-01, Amit Kapila wrote:

> What about if the blocking process is an autovacumm that is working to
> prevent XID wraparound? I think we already avoid killing it in such
> cases.

If we just let REPACK finish, it will also renew the table's XID, so
autovacuum is not needed in that case.  I mean, there's no reason to let
autovacuum process the contents of a table that is going to be replaced
completely.

One potentially problematic case would be that an emergency autovacuum
has been running for a long time and about to finish, and REPACK is
started.  But in that case, autovacuum already has ShareUpdateExclusive,
so REPACK wouldn't be able to start at all, which means it won't kill
autovacuum.  And in the case where autovacuum is doing emergency
vacuuming, then it won't commit suicide, so it will be able to complete
before repack starts.

> BTW, one can say that cancelling a long-running report query also
> wastes a lot of effort of the user generating such a report. Why can't
> REPACK wait for such a select to finish instead of cancelling it?

I don't understand exactly which scenario you're concerned about.  Is
there a long-running query which, after spending a lot of time running a
report, tries to upgrade its lock level on the table?  Keep in mind that
this check only runs when the affected session runs the deadlock
checker, which means it's been waiting to acquire a lock for
deadlock_timeout seconds.  It's not repack that kills the query.

[ ... reflects ...]  Oh, actually what Srinath proposed does exactly
that -- kill other queries.  Hmm, yeah, I'm less sure about that
particular bit.  Here I'm only talking about my proposal to have the
deadlock detector handle the case of somebody waiting to lock the table.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 11:42  Alvaro Herrera <[email protected]>
  parent: Amit Kapila <[email protected]>
  1 sibling, 2 replies; 19+ messages in thread

From: Alvaro Herrera @ 2026-04-01 11:42 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On 2026-Apr-01, Amit Kapila wrote:

> BTW, is the reason to skip REPACK while building a snapshot is that it
> can take a long time to finish?

As I understand the issue, yes, that's precisely the problem: if you
have one REPACK running, then starting a second REPACK (which requires
building a new snapshot) would have to wait until the first REPACK is
over.  In other words, you wouldn't be able to have two repacks running
concurrently.  This sounds like a problematic requirement.  So having
snapbuild ignore REPACK is there to allow the second REPACK to work at
all.  But more generally,  *all* users of snapbuild would be prevented
from starting until REPACK is done; so if you have a very very large
table that takes a long time to repack, then everything would be blocked
behind it until it completes, which sounds extremely unpleasant.

So, if we're unable to get this particular patch in, we would have to
have a big fat warning in the docs, telling people to be careful about
other load if they choose to run concurrent repack -- it could have
serious consequences.  But on the other hand, it's better to *have* the
tool, even if it has problems, than not have it.  Keep in mind that
pg_repack and pg_squeeze also have all these problems/limitations (and
others), and still people use them.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 12:21  Amit Kapila <[email protected]>
  parent: Alvaro Herrera <[email protected]>
  0 siblings, 0 replies; 19+ messages in thread

From: Amit Kapila @ 2026-04-01 12:21 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Srinath Reddy Sadipiralla <[email protected]>; Mihail Nikalayeu <[email protected]>; Antonin Houska <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Wed, Apr 1, 2026 at 5:08 PM Alvaro Herrera <[email protected]> wrote:
>
> On 2026-Apr-01, Amit Kapila wrote:
>
> > What about if the blocking process is an autovacumm that is working to
> > prevent XID wraparound? I think we already avoid killing it in such
> > cases.
>
> If we just let REPACK finish, it will also renew the table's XID, so
> autovacuum is not needed in that case.  I mean, there's no reason to let
> autovacuum process the contents of a table that is going to be replaced
> completely.
>
> One potentially problematic case would be that an emergency autovacuum
> has been running for a long time and about to finish, and REPACK is
> started.  But in that case, autovacuum already has ShareUpdateExclusive,
> so REPACK wouldn't be able to start at all, which means it won't kill
> autovacuum.  And in the case where autovacuum is doing emergency
> vacuuming, then it won't commit suicide, so it will be able to complete
> before repack starts.
>
> > BTW, one can say that cancelling a long-running report query also
> > wastes a lot of effort of the user generating such a report. Why can't
> > REPACK wait for such a select to finish instead of cancelling it?
>
> I don't understand exactly which scenario you're concerned about.  Is
> there a long-running query which, after spending a lot of time running a
> report, tries to upgrade its lock level on the table?  Keep in mind that
> this check only runs when the affected session runs the deadlock
> checker, which means it's been waiting to acquire a lock for
> deadlock_timeout seconds.  It's not repack that kills the query.
>
> [ ... reflects ...]  Oh, actually what Srinath proposed does exactly
> that -- kill other queries.  Hmm, yeah, I'm less sure about that
> particular bit.
>

Yes, I was talking about Srinath's proposed solution. Do we need to do
anything about it?

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 12:31  Antonin Houska <[email protected]>
  parent: Amit Kapila <[email protected]>
  1 sibling, 0 replies; 19+ messages in thread

From: Antonin Houska @ 2026-04-01 12:31 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Amit Kapila <[email protected]> wrote:

> On Wed, Apr 1, 2026 at 3:13 PM Antonin Houska <[email protected]> wrote:
> >

> > Nevertheless, I'm not sure it's a good idea for snapbuild.c to handle special
> > cases like REPACK. Instead, I'm thinking if snapbuild.c can safely ignore XIDs
> > of backends connected to databases other than the one we're decoding.
> >
> 
> What if such transactions have made changes in the global catalog?
> Even if that won't matter, I feel such a change would be quite
> fundamental to snapbuild machinery and changing at this point would be
> risky.

I had thought that catalog is usually needed only to retrieve the tuple
descriptor, but regression tests with some Assert() statements prove now that
shared catalogs can be accessed too. So that idea does not seem to be useful.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 12:35  Amit Kapila <[email protected]>
  parent: Alvaro Herrera <[email protected]>
  1 sibling, 0 replies; 19+ messages in thread

From: Amit Kapila @ 2026-04-01 12:35 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Wed, Apr 1, 2026 at 5:12 PM Alvaro Herrera <[email protected]> wrote:
>
> On 2026-Apr-01, Amit Kapila wrote:
>
> > BTW, is the reason to skip REPACK while building a snapshot is that it
> > can take a long time to finish?
>
> As I understand the issue, yes, that's precisely the problem: if you
> have one REPACK running, then starting a second REPACK (which requires
> building a new snapshot) would have to wait until the first REPACK is
> over.  In other words, you wouldn't be able to have two repacks running
> concurrently.  This sounds like a problematic requirement.  So having
> snapbuild ignore REPACK is there to allow the second REPACK to work at
> all.  But more generally,  *all* users of snapbuild would be prevented
> from starting until REPACK is done; so if you have a very very large
> table that takes a long time to repack, then everything would be blocked
> behind it until it completes, which sounds extremely unpleasant.
>
> So, if we're unable to get this particular patch in, we would have to
> have a big fat warning in the docs, telling people to be careful about
> other load if they choose to run concurrent repack -- it could have
> serious consequences.
>

Right, I think during this time logical workers will keep timing out
and restarting without any progress because during this wait, we won't
be sending keep_alive messages.

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-04-01 13:20  Antonin Houska <[email protected]>
  parent: Alvaro Herrera <[email protected]>
  1 sibling, 0 replies; 19+ messages in thread

From: Antonin Houska @ 2026-04-01 13:20 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Amit Kapila <[email protected]>; Mihail Nikalayeu <[email protected]>; Srinath Reddy Sadipiralla <[email protected]>; Matthias van de Meent <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

Alvaro Herrera <[email protected]> wrote:

> So, if we're unable to get this particular patch in, we would have to
> have a big fat warning in the docs, telling people to be careful about
> other load if they choose to run concurrent repack -- it could have
> serious consequences.  But on the other hand, it's better to *have* the
> tool, even if it has problems, than not have it.  Keep in mind that
> pg_repack and pg_squeeze also have all these problems/limitations (and
> others), and still people use them.

1. To be precise, pg_squeeze has this limitation. pg_repack does not use
logical replication.

2. I expect the limitation of PG core to be relaxed in versions > 19, as long
as we integrate the MVCC safety feature. REPACK will then run w/o XID most of
the time (XID will only be needed for catalog changes), so other decoding
backends won't need to wait for its completion.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

end of thread, other threads:[~2026-04-01 13:20 UTC | newest]

Thread overview: 19+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-17 10:53 Re: Adding REPACK [concurrently] Antonin Houska <[email protected]>
2026-03-17 19:57 ` Antonin Houska <[email protected]>
2026-03-18 19:12   ` Srinath Reddy Sadipiralla <[email protected]>
2026-03-18 20:07     ` Antonin Houska <[email protected]>
2026-03-20 12:35     ` Mihail Nikalayeu <[email protected]>
2026-03-25 02:57       ` Srinath Reddy Sadipiralla <[email protected]>
2026-03-31 17:37         ` Srinath Reddy Sadipiralla <[email protected]>
2026-04-01 09:39           ` Amit Kapila <[email protected]>
2026-04-01 11:38             ` Alvaro Herrera <[email protected]>
2026-04-01 12:21               ` Amit Kapila <[email protected]>
2026-03-27 16:12       ` Antonin Houska <[email protected]>
2026-03-31 10:47         ` Amit Kapila <[email protected]>
2026-04-01 08:42           ` Amit Kapila <[email protected]>
2026-04-01 09:43             ` Antonin Houska <[email protected]>
2026-04-01 10:22               ` Amit Kapila <[email protected]>
2026-04-01 11:42                 ` Alvaro Herrera <[email protected]>
2026-04-01 12:35                   ` Amit Kapila <[email protected]>
2026-04-01 13:20                   ` Antonin Houska <[email protected]>
2026-04-01 12:31                 ` Antonin Houska <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox