public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Adding REPACK [concurrently]
6+ messages / 3 participants
[nested] [flat]

* Re: Adding REPACK [concurrently]
@ 2026-03-16 14:21  Matthias van de Meent <[email protected]>
  0 siblings, 2 replies; 6+ messages in thread

From: Matthias van de Meent @ 2026-03-16 14:21 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Mon, 16 Mar 2026 at 13:21, Alvaro Herrera <[email protected]> wrote:
>
> On 2026-Mar-16, Antonin Houska wrote:
>
> > One more problem related to the replication slot is that, due to the call of
> > CheckSlotPermissions() in setup_logical_decoding(), REPLICATION privilege is
> > required for REPACK (CONCURRENTLY) to run. That's not too user-friendly.
> >
> > I think the reason to require the REPLICATION privilege is that, in generic
> > case, the output plugin can access data of any table in the database. However
> > REPACK uses one particular plugin and that plugin only decodes changes of one
> > particular table. Thus I think we don't really need to call
> > CheckSlotPermissions(). Do I seem to miss something?
>
> Yeah, I don't think it makes sense to require REPLICATION privilege to
> run REPACK.

I don't think it makes sense to allow just any table owner to modify
the effective_wal_level GUC; which is what the effect would be of
removing the REPLICATION requirement from roles that want to REPACK
CONCURRENTLY -- and in doing so create logical slots, which increase
effective_wal_level to logical.

Creating a replication slot requires REPLICATION privilege, because it
consumes a (possibly very) limited server resource that can't be
increased without restart and because it impacts other backends' WAL
performance through effective_wal_level. Allowing users to consume
said resource without first having the appropriate permissions makes
this flag practically meaningless.

Note that most of my argument hinges on the impact on other, unrelated
databases/tables/sessions. Replication slots have a hard cap defined
at startup, and effective_wal_level increases the WAL generated by
practically all backends. If replication slot counts were SIGHUP, and
RelationIsLogicallyLogged() returned true only in databases with LR
slots, and only for the tables actually present in publications, I'd
consider this much less problematic. However, we don't live in that
world, so I am opposed to allowing table owners without REPLICATION to
take any/all replication slots.

Matthias van de Meent
Databricks (https://www.databricks.com)





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-16 15:49  Robert Treat <[email protected]>
  parent: Matthias van de Meent <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: Robert Treat @ 2026-03-16 15:49 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>

On Mon, Mar 16, 2026 at 11:11 AM Alvaro Herrera <[email protected]> wrote:
> On 2026-Mar-16, Matthias van de Meent wrote:
>

> > However, we don't live in that world, so I am opposed to allowing
> > table owners without REPLICATION to take any/all replication slots.
>
> I think requiring REPACK users to have the REPLICATION priv is rather
> user unfriendly.  Some potential REPACK users might not have any other
> use for replication at all.
>

For many users, I feel like repack concurrently making use of
replication machinery is an implementation detail that some will find
quite surprising (pg_repack doesn't use it), so I'd agree requiring
REPLICATION priv is both unfriendly and a bit counter intuitive,
especially if you need to run repack concurrently on a stand alone
server.

That said, I think Matthias is right that we can't allow "repackers"
to block "replicators"...

> > Note that most of my argument hinges on the impact on other, unrelated
> > databases/tables/sessions. Replication slots have a hard cap defined
> > at startup, and effective_wal_level increases the WAL generated by
> > practically all backends.
>
> I'd rather have a new GUC to declare a bunch of additional slots that
> are reserved exclusively for repack, set its default to something like
> 3, and call it a day.  If all repack slots are in use, you don't get to
> run repack, period.
>
> A slot costs nothing if unused, and we really don't want to make the
> interaction with regular replication more complicated than it is today.
>

I'm never excited about adding GUCs, but at first thought this seems
like a decent work-around; most people are unlikely to run multiple
repack concurrently's, but they can if needed. (I think the most
likely use case is on clusters using the "database per customer"
pattern, but if we have the guc, people will have a means to deal with
it).

Robert Treat
https://xzilla.net





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-16 17:55  Mahendra Singh Thalor <[email protected]>
  parent: Robert Treat <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Mahendra Singh Thalor @ 2026-03-16 17:55 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Robert Treat <[email protected]>; Matthias van de Meent <[email protected]>; Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>

On Mon, 16 Mar 2026 at 21:33, Alvaro Herrera <[email protected]>
wrote:
>
> On 2026-Mar-16, Robert Treat wrote:
>
> > I'm never excited about adding GUCs, but at first thought this seems
> > like a decent work-around; most people are unlikely to run multiple
> > repack concurrently's, but they can if needed. (I think the most
> > likely use case is on clusters using the "database per customer"
> > pattern, but if we have the guc, people will have a means to deal with
> > it).
>
> I wonder if, longer term, it would make sense to do away with the
> max_replication_slots GUC (and this new one) altogether, and use dynamic
> shared memory for slots instead.  There's of course always the danger
> that people would accumulate arbitrary numbers of slots since they would
> never be forced to check.  But that may be a lesser problem than having
> to gauge these GUCs with any care.
>
> --
> Álvaro Herrera               48°01'N 7°57'E  —
https://www.EnterpriseDB.com/
> "You're _really_ hosed if the person doing the hiring doesn't understand
> relational systems: you end up with a whole raft of programmers, none of
> whom has had a Date with the clue stick."              (Andrew Sullivan)
> https://postgr.es/m/[email protected]
>
>

Hi all,
I was reading this thread and was doing some tests.

postgres=# create table test1(a int);
CREATE TABLE
postgres=# create table test2(a int);
CREATE TABLE
postgres=# *vacuum full test1 , test2;*
VACUUM
postgres=# repack test1;
REPACK
postgres=# repack test2;
REPACK
postgres=#* repack test1, test2;*
ERROR:  syntax error at or near ","
LINE 1: repack test1, test2;
                    ^

I was not expecting any error but maybe I am missing something (some patch
needs to be applied to test this query?).

Thanks and Regards
Mahendra


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-16 19:24  Matthias van de Meent <[email protected]>
  parent: Matthias van de Meent <[email protected]>
  1 sibling, 2 replies; 6+ messages in thread

From: Matthias van de Meent @ 2026-03-16 19:24 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Mon, 16 Mar 2026 at 16:10, Alvaro Herrera <[email protected]> wrote:
>
> On 2026-Mar-16, Matthias van de Meent wrote:
>
> > Note that most of my argument hinges on the impact on other, unrelated
> > databases/tables/sessions. Replication slots have a hard cap defined
> > at startup, and effective_wal_level increases the WAL generated by
> > practically all backends.
>
> I'd rather have a new GUC to declare a bunch of additional slots that
> are reserved exclusively for repack, set its default to something like
> 3, and call it a day.  If all repack slots are in use, you don't get to
> run repack, period.
>
> A slot costs nothing if unused, and we really don't want to make the
> interaction with regular replication more complicated than it is today.

There are a few overheads even for unused slots: each slot uses some
shared memory, and many (most) functions that operate on the shared
slot state have O(n_slots) overhead through e.g.
ReplicationSlotsComputeRequiredXmin(); so having a 10k slot limit
whilst using only 1 will be slower for that one active slot than if
the limit was just 10. Granted, that's not a lot of overhead, but it's
not free.

> > However, we don't live in that world, so I am opposed to allowing
> > table owners without REPLICATION to take any/all replication slots.
>
> I think requiring REPACK users to have the REPLICATION priv is rather
> user unfriendly.  Some potential REPACK users might not have any other
> use for replication at all.

I agree it's not user-friendly, but that's the point of limiting
permissions. Users can't install c-functions without SUPERUSER,
because it can cause cluster instability and crashes. Users can't
create slots without REPLICATION, because they'll be able to
negatively impact the whole cluster's performance, and possibly,
stability, when taking up replication slots that otherwise would be
used for critical HA purposes.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-16 21:45  Matthias van de Meent <[email protected]>
  parent: Matthias van de Meent <[email protected]>
  1 sibling, 0 replies; 6+ messages in thread

From: Matthias van de Meent @ 2026-03-16 21:45 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: Antonin Houska <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Mon, 16 Mar 2026 at 20:34, Alvaro Herrera <[email protected]> wrote:
> On 2026-Mar-16, Matthias van de Meent wrote:
> > I agree it's not user-friendly, but that's the point of limiting
> > permissions. Users can't install c-functions without SUPERUSER,
> > because it can cause cluster instability and crashes. Users can't
> > create slots without REPLICATION, because they'll be able to
> > negatively impact the whole cluster's performance, and possibly,
> > stability, when taking up replication slots that otherwise would be
> > used for critical HA purposes.
>
> Well, as I said, these repack slots would be separate from the regular
> ones for replication and available only for repack, so they would not
> interfere with the count of slots used for replication, so the second
> point is moot, isn't it?

Sorry, I misread your response as "if at all, then at least like
this", instead of "let's do this alternative in this patch".

But, yes, a pool of slots used exclusively by REPACK CONCURRENTLY
would also solve the slot availability issue.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Adding REPACK [concurrently]
@ 2026-03-16 21:50  Matthias van de Meent <[email protected]>
  parent: Matthias van de Meent <[email protected]>
  1 sibling, 0 replies; 6+ messages in thread

From: Matthias van de Meent @ 2026-03-16 21:50 UTC (permalink / raw)
  To: Antonin Houska <[email protected]>; +Cc: Alvaro Herrera <[email protected]>; Mihail Nikalayeu <[email protected]>; Pg Hackers <[email protected]>; Robert Treat <[email protected]>

On Mon, 16 Mar 2026 at 21:15, Antonin Houska <[email protected]> wrote:
>
> Matthias van de Meent <[email protected]> wrote:
>
> > I agree it's not user-friendly, but that's the point of limiting
> > permissions. Users can't install c-functions without SUPERUSER,
> > because it can cause cluster instability and crashes. Users can't
> > create slots without REPLICATION, because they'll be able to
> > negatively impact the whole cluster's performance, and possibly,
> > stability, when taking up replication slots that otherwise would be
> > used for critical HA purposes.
>
> I thought these attributes exist primarily for security purposes.

Well, yes, but operational security is also security, right?

> If
> non-SUPERUSER user could install C-functions, it'd be easy to install code
> that leaks data.

Yes, as long as they're able to find the right primitives in the
available binaries. That's certainly possible, but a bit more work
than just none.

> REPLICATION is currently the only way to limit access to the
> the publisher's data as there is no ACL for publications.
> And regarding resources, the REPLICATION attribute alone does not pose a limit
> on resource consumption unless you limit the total number of sessions of all
> the REPLICATION users at the same time.

True. It's not great. And we also don't really have a (good)
distinction for logical/physical replication permissions either...

> Anyway (fortunately?), the concurrent use of slots by REPACK is limited
> because, during the initialization of logical decoding, the backend needs to
> wait for all the transactions having XID assigned to finish, and these include
> the already running REPACK commands. See SnapBuildWaitSnapshot() and callers
> if you're interested in details.

Huh, so would you be able to run more than one Repack Concurrently in
the same database? ISTM that would not be possible, apart from
possibly a mechanism comparable to the SAFE_IN_IC flag (to not wait on
those backends).

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)





^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2026-03-16 21:50 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-16 14:21 Re: Adding REPACK [concurrently] Matthias van de Meent <[email protected]>
2026-03-16 15:49 ` Robert Treat <[email protected]>
2026-03-16 17:55   ` Mahendra Singh Thalor <[email protected]>
2026-03-16 19:24 ` Matthias van de Meent <[email protected]>
2026-03-16 21:45   ` Matthias van de Meent <[email protected]>
2026-03-16 21:50   ` Matthias van de Meent <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox