public inbox for [email protected]  
help / color / mirror / Atom feed
Fix pg_upgrade to detect invalid logical replication slots on PG19
5+ messages / 4 participants
[nested] [flat]

* Fix pg_upgrade to detect invalid logical replication slots on PG19
@ 2026-04-20 08:57 Lakshmi N <[email protected]>
  2026-04-22 04:27 ` Re: Fix pg_upgrade to detect invalid logical replication slots on PG19 shveta malik <[email protected]>
  2026-04-22 09:13 ` Re: Fix pg_upgrade to detect invalid logical replication slots on PG19 Chao Li <[email protected]>
  0 siblings, 2 replies; 5+ messages in thread

From: Lakshmi N @ 2026-04-20 08:57 UTC (permalink / raw)
  To: [email protected] <[email protected]>

Hi Hackers,

The PG19-optimized slot catchup query uses a CTE that filters on
invalidation_reason IS NULL, then cross-joins it with the main slot
query.  When ALL logical slots in a database are invalid, the CTE
returns zero rows, and the cross join produces an empty result set.
This causes pg_upgrade to silently skip those slots entirely --
neither detecting them as invalid (which should block the upgrade)
nor attempting to migrate them.

The pre-PG19 query path does not have this problem because it queries
pg_replication_slots directly without a cross join. This may not impact
upgrade to PG19 but will change the behavior for PG20 upgrade.

Fix by changing the cross join to a LEFT JOIN,
so that invalid slots still appear in the result set with NULL
caught_up values.

Regards,
Lakshmi


Attachments:

  [application/octet-stream] 0001-Fix-pg_upgrade-to-detect-invalid-logical-replication.patch (965B, 3-0001-Fix-pg_upgrade-to-detect-invalid-logical-replication.patch)
  download | inline diff:
From 6a80958204364c94b511e7728f83281013cdbf06 Mon Sep 17 00:00:00 2001
From: Lakshmi N <[email protected]>
Date: Mon, 20 Apr 2026 01:47:50 -0700
Subject: [PATCH] Fix pg_upgrade to detect invalid logical replication slots on
  PG19

---
 src/bin/pg_upgrade/info.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 8c5679b8097..37fff93892f 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -743,7 +743,8 @@ get_old_cluster_logical_slot_infos_query(ClusterInfo *cluster)
 			"  confirmed_flush_lsn > last_pending_wal "
 			"END as caught_up, "
 			"invalidation_reason IS NOT NULL as invalid "
-			"FROM pg_catalog.pg_replication_slots, check_caught_up "
+			"FROM pg_catalog.pg_replication_slots "
+			"LEFT JOIN check_caught_up ON true "
 			"WHERE slot_type = 'logical' AND "
 			"database = current_database() AND "
 			"temporary IS FALSE ";
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Fix pg_upgrade to detect invalid logical replication slots on PG19
  2026-04-20 08:57 Fix pg_upgrade to detect invalid logical replication slots on PG19 Lakshmi N <[email protected]>
@ 2026-04-22 04:27 ` shveta malik <[email protected]>
  2026-04-22 08:02   ` Re: Fix pg_upgrade to detect invalid logical replication slots on PG19 Masahiko Sawada <[email protected]>
  1 sibling, 1 reply; 5+ messages in thread

From: shveta malik @ 2026-04-22 04:27 UTC (permalink / raw)
  To: Lakshmi N <[email protected]>; Sawada Masahiko <[email protected]>; +Cc: [email protected] <[email protected]>; shveta malik <[email protected]>

On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <[email protected]> wrote:
>
> Hi Hackers,
>
> The PG19-optimized slot catchup query uses a CTE that filters on
> invalidation_reason IS NULL, then cross-joins it with the main slot
> query.  When ALL logical slots in a database are invalid, the CTE
> returns zero rows, and the cross join produces an empty result set.
> This causes pg_upgrade to silently skip those slots entirely --
> neither detecting them as invalid (which should block the upgrade)
> nor attempting to migrate them.
>
> The pre-PG19 query path does not have this problem because it queries
> pg_replication_slots directly without a cross join. This may not impact
> upgrade to PG19 but will change the behavior for PG20 upgrade.
>
> Fix by changing the cross join to a LEFT JOIN,
> so that invalid slots still appear in the result set with NULL
> caught_up values.
>

I agree with the problem here.

Another way to solve this would be using a scalar subquery(see [1]),
but that would reduce readability. Thus, I prefer a LEFT OUTER JOIN on
TRUE here. There should also be no performance impact, since the
right-hand side query will always return at most one row due to the
LIMIT 1 clause. So IMO, the proposed solution is good. Copying
Sawada-san, as he was the author of the original patch.


[1]:
SELECT slot_name, plugin, two_phase, failover,
CASE
  WHEN invalidation_reason IS NOT NULL THEN FALSE
  ELSE (
    (SELECT last_pending_wal FROM check_caught_up) IS NULL
    OR confirmed_flush_lsn > (SELECT last_pending_wal FROM check_caught_up)
  )
END as caught_up,
invalidation_reason IS NOT NULL as invalid
FROM pg_catalog.pg_replication_slots
WHERE slot_type = 'logical'
  AND database = current_database()
  AND temporary IS FALSE;

thanks
Shveta





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Fix pg_upgrade to detect invalid logical replication slots on PG19
  2026-04-20 08:57 Fix pg_upgrade to detect invalid logical replication slots on PG19 Lakshmi N <[email protected]>
  2026-04-22 04:27 ` Re: Fix pg_upgrade to detect invalid logical replication slots on PG19 shveta malik <[email protected]>
@ 2026-04-22 08:02   ` Masahiko Sawada <[email protected]>
  2026-04-22 17:01     ` Re: Fix pg_upgrade to detect invalid logical replication slots on PG19 Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Masahiko Sawada @ 2026-04-22 08:02 UTC (permalink / raw)
  To: shveta malik <[email protected]>; +Cc: Lakshmi N <[email protected]>; [email protected] <[email protected]>

On Tue, Apr 21, 2026 at 9:27 PM shveta malik <[email protected]> wrote:
>
> On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <[email protected]> wrote:
> >
> > Hi Hackers,
> >
> > The PG19-optimized slot catchup query uses a CTE that filters on
> > invalidation_reason IS NULL, then cross-joins it with the main slot
> > query.  When ALL logical slots in a database are invalid, the CTE
> > returns zero rows, and the cross join produces an empty result set.
> > This causes pg_upgrade to silently skip those slots entirely --
> > neither detecting them as invalid (which should block the upgrade)
> > nor attempting to migrate them.
> >
> > The pre-PG19 query path does not have this problem because it queries
> > pg_replication_slots directly without a cross join. This may not impact
> > upgrade to PG19 but will change the behavior for PG20 upgrade.

Thank you for the report.

> >
> > Fix by changing the cross join to a LEFT JOIN,
> > so that invalid slots still appear in the result set with NULL
> > caught_up values.
> >
>
> I agree with the problem here.

I've confirmed the bug and the patch fixes it. The patch looks good to
me, so I'm going to push it tomorrow barring any objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Fix pg_upgrade to detect invalid logical replication slots on PG19
  2026-04-20 08:57 Fix pg_upgrade to detect invalid logical replication slots on PG19 Lakshmi N <[email protected]>
  2026-04-22 04:27 ` Re: Fix pg_upgrade to detect invalid logical replication slots on PG19 shveta malik <[email protected]>
  2026-04-22 08:02   ` Re: Fix pg_upgrade to detect invalid logical replication slots on PG19 Masahiko Sawada <[email protected]>
@ 2026-04-22 17:01     ` Masahiko Sawada <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Masahiko Sawada @ 2026-04-22 17:01 UTC (permalink / raw)
  To: shveta malik <[email protected]>; +Cc: Lakshmi N <[email protected]>; [email protected] <[email protected]>

On Wed, Apr 22, 2026 at 1:02 AM Masahiko Sawada <[email protected]> wrote:
>
> On Tue, Apr 21, 2026 at 9:27 PM shveta malik <[email protected]> wrote:
> >
> > On Mon, Apr 20, 2026 at 2:28 PM Lakshmi N <[email protected]> wrote:
> > >
> > > Hi Hackers,
> > >
> > > The PG19-optimized slot catchup query uses a CTE that filters on
> > > invalidation_reason IS NULL, then cross-joins it with the main slot
> > > query.  When ALL logical slots in a database are invalid, the CTE
> > > returns zero rows, and the cross join produces an empty result set.
> > > This causes pg_upgrade to silently skip those slots entirely --
> > > neither detecting them as invalid (which should block the upgrade)
> > > nor attempting to migrate them.
> > >
> > > The pre-PG19 query path does not have this problem because it queries
> > > pg_replication_slots directly without a cross join. This may not impact
> > > upgrade to PG19 but will change the behavior for PG20 upgrade.
>
> Thank you for the report.
>
> > >
> > > Fix by changing the cross join to a LEFT JOIN,
> > > so that invalid slots still appear in the result set with NULL
> > > caught_up values.
> > >
> >
> > I agree with the problem here.
>
> I've confirmed the bug and the patch fixes it. The patch looks good to
> me, so I'm going to push it tomorrow barring any objections.

Pushed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Fix pg_upgrade to detect invalid logical replication slots on PG19
  2026-04-20 08:57 Fix pg_upgrade to detect invalid logical replication slots on PG19 Lakshmi N <[email protected]>
@ 2026-04-22 09:13 ` Chao Li <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Chao Li @ 2026-04-22 09:13 UTC (permalink / raw)
  To: Lakshmi N <[email protected]>; +Cc: [email protected] <[email protected]>



> On Apr 20, 2026, at 16:57, Lakshmi N <[email protected]> wrote:
> 
> Hi Hackers,
> 
> The PG19-optimized slot catchup query uses a CTE that filters on
> invalidation_reason IS NULL, then cross-joins it with the main slot
> query.  When ALL logical slots in a database are invalid, the CTE
> returns zero rows, and the cross join produces an empty result set.
> This causes pg_upgrade to silently skip those slots entirely --
> neither detecting them as invalid (which should block the upgrade)
> nor attempting to migrate them.
> 
> The pre-PG19 query path does not have this problem because it queries
> pg_replication_slots directly without a cross join. This may not impact
> upgrade to PG19 but will change the behavior for PG20 upgrade.
> 
> Fix by changing the cross join to a LEFT JOIN,
> so that invalid slots still appear in the result set with NULL
> caught_up values. 
> 
> Regards,
> Lakshmi
> <0001-Fix-pg_upgrade-to-detect-invalid-logical-replication.patch>

This changes the inner join to a left join, which preserves the rows from pg_replication_slots. As check_caught_up is intentionally a singleton CTE because ORDER BY … LIMIT 1, so there is no row multiplication risk.

So, the patch looks good to me.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2026-04-22 17:01 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-20 08:57 Fix pg_upgrade to detect invalid logical replication slots on PG19 Lakshmi N <[email protected]>
2026-04-22 04:27 ` shveta malik <[email protected]>
2026-04-22 08:02   ` Masahiko Sawada <[email protected]>
2026-04-22 17:01     ` Masahiko Sawada <[email protected]>
2026-04-22 09:13 ` Chao Li <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox