public inbox for [email protected]  
help / color / mirror / Atom feed
From: Antonin Houska <[email protected]>
To: Alvaro Herrera <[email protected]>
Cc: Srinath Reddy Sadipiralla <[email protected]>
Cc: Amit Kapila <[email protected]>
Cc: Mihail Nikalayeu <[email protected]>
Cc: Matthias van de Meent <[email protected]>
Cc: Pg Hackers <[email protected]>
Cc: Robert Treat <[email protected]>
Subject: Re: Adding REPACK [concurrently]
Date: Mon, 06 Apr 2026 11:03:44 +0200
Message-ID: <82004.1775466224@localhost> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Alvaro Herrera <[email protected]> wrote:

> So I've been trying to understand the "Introduce an option to make
> logical replication database specific." patch and I have to confess I
> just cannot.
> 
> As far as I can read, the point is that if we reach
> SnapBuildProcessRunningXacts() when db_specific is true (which means
> standby_decode is called in an output plugin that has set
> need_shared_catalogs to false), _and_ we've not reached consistent state
> yet, then we'll call LogStandbySnapshot with our DB oid to emit a new
> xl_running_xacts message.

Right.

> So the WAL-decoding process emits WAL.  I don't know if in normal
> conditions logical decoding processes emit WAL.  If this is exceptional,
> I think we should add a comment.

Emitting WAL in logical decoding is not exceptional: SnapBuildWaitSnapshot()
already calls LogStandbySnapshot(), in order to get to the next phase.

> Now, this additional WAL message will be processed by all other
> processes decoding WAL.  Perhaps it will ignored by most of them.

Right. That's one thing that I realized late yesterday, after having posted
the latest version of the patch. In SnapBuildProcessRunningXacts(), we need

	if (OidIsValid(running->dbid))
		return;

rather than

	if (db_specific)
		return;

because other backends can also generate their database-specific records.

> But
> most importantly, it will also reach back to ourselves, at which point
> we can hopefully use it to see that we might have reached consistent
> state within our database.  Then we know our snapshot is ready to be
> used.
> 
> Is this correct?

Yes.

> I think the reason it's safe to skip a lot of the processing caused by
> this additional process, is that xl_running_xacts messages are also
> emitted in other places in a non-database specific manner.  So all the
> other placecs that are emitting that message continue to exist and
> cause logical-decoders operate in the same way as before.

Yes. I'm still thinking though if, after having used the database-specific
record to reach CONSITENT, we sould enforce one cluster-wide record, so that
the cleanup (in "our backend") takes place sooner rather than later. Not sure
about that.

> I think we should sprinkle lots of comments in several places about
> this.  For example, I propose that standby_redo() should have something
> like
> 
>   * If 'dbid' is valid, only gather transactions running in that database.
> + * Such records should not be the only ones emitted, because this has
> + * potentially dangerous side-effects which makes some places ignore them:
> + *
> + * 1. SnapBuildProcessRunningXacts will skip computing the xmin and restart
> + * point from its input record if the record's xmin is older that the
> + * snapbuilder's current xmin; this should normally be fine because that
> + * information will be updated from other xl_running_xacts records.
> + * 2. standby_redo will likewise skip processing such a record
>   *
> (are there other things that should be mentioned?)

I added something like that, but - due to the reference to
SnapBuildProcessRunningXacts() - less verbose about snapbuild.c.

> Also, LogStandbySnapshot() should have a comment explaining that passing
> a valid dboid is a weird corner case which is to be used with care, and
> that functions X Y and Z are going to ignore snapshots carrying a valid
> dbid.

One more check added in standby_decode() (and mentioned in that function in
the comment).

> Why do we call SnapBuildFindSnapshot() to do this, instead of doing it
> directly in SnapBuildProcessRunningXacts?  Seems like it would be more
> straightforward.

Yes, fixed.


One more problem I found when testing w/o background worker
(contrib/test_decoding) was that accessSharedCatalogsInDecoding was not set
back to true. Both AllocateSnapshotBuilder() FreeSnapshotBuilder() do that
now.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Adding REPACK [concurrently]
  In-Reply-To: <82004.1775466224@localhost>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox