public inbox for [email protected]  
help / color / mirror / Atom feed
Re: [PATCH] Release replication slot on error in SQL-callable slot functions
8+ messages / 3 participants
[nested] [flat]

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
@ 2026-05-21 06:49 vignesh C <[email protected]>
  2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: vignesh C @ 2026-05-21 06:49 UTC (permalink / raw)
  To: Fujii Masao <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>; PostgreSQL Hackers <[email protected]>

On Mon, 11 May 2026 at 08:31, Fujii Masao <[email protected]> wrote:
>
> On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
> <[email protected]> wrote:
> >
> > Hi Hackers,
> >
> > SQL-callable replication slot functions acquire a slot (setting
> > the process-global MyReplicationSlot) but can then ERROR before reaching
> > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
> > EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains
> > set because there is no subtransaction-level cleanup hook for replication
> > slots.
> >
> > Any subsequent slot operation in the same session then hits
> > Assert(MyReplicationSlot == NULL) and crashes the backend on assert
> > enabled builds. In release builds the stale MyReplicationSlot is silently overwritten,
> > permanently orphaning the old slot as "active." The orphaned slot blocks any other
> > session from acquiring it, vacuum and WAL deletion.
> >
> > Repro:
> >
> > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
> >
> > DO $$ BEGIN
> >     PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
> > EXCEPTION WHEN others THEN
> >     RAISE NOTICE 'caught: %', SQLERRM;
> > END $$;
> >
> > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
> >
> > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT:  SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
> > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805
> >
> >
> > Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks
> > and call ReplicationSlotRelease().
>
> Thanks for the report and the patch!
>
> I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
> a good direction for addressing the issue you reported.
>
>
> +   PG_CATCH();
> +   {
> +       ReplicationSlotRelease();
>
> When create_logical_replication_slot() is called with temporary = true,
> the created logical replication slot has RS_TEMPORARY persistency. Such a slot
> is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is
> dropped via ReplicationSlotDropAcquired().
>
> So even with the v1 patch, a temporary logical replication slot can remain
> unexpectedly if pg_create_logical_replication_slot() throws an error.
> In this case, should create_logical_replication_slot() explicitly drop the slot
> with ReplicationSlotDropAcquired(), or temporarily change the slot persistency
> to RS_EPHEMERAL before calling ReplicationSlotRelease()?
>
>
> Does a newly created logical replication slot created by
> pg_copy_logical_replication_slot() have the same issue?

Additionally pg_logical_slot_get_changes also has the same issue, it
can be reproduced by the following:
SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');

DO $$
BEGIN
    -- This will ERROR if the slot_get changes fails for the slot.
    PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
NULL, 'nonexistent-option', 'val');
EXCEPTION WHEN others THEN
    RAISE NOTICE 'caught: %', SQLERRM;
END $$;

SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL);

TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
Line: 638, PID: 80308
postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
[0x642e7b2ebae1]
postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
[0x642e7b00d732]

Regards,
Vignesh






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
  2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
@ 2026-05-21 14:38 ` SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-22 09:15   ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-05-21 14:38 UTC (permalink / raw)
  To: vignesh C <[email protected]>; +Cc: Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi

On Wed, May 20, 2026 at 11:49 PM vignesh C <[email protected]> wrote:

> On Mon, 11 May 2026 at 08:31, Fujii Masao <[email protected]> wrote:
> >
> > On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
> > <[email protected]> wrote:
> > >
> > > Hi Hackers,
> > >
> > > SQL-callable replication slot functions acquire a slot (setting
> > > the process-global MyReplicationSlot) but can then ERROR before
> reaching
> > > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
> > > EXCEPTION block (which uses a subtransaction), MyReplicationSlot
> remains
> > > set because there is no subtransaction-level cleanup hook for
> replication
> > > slots.
> > >
> > > Any subsequent slot operation in the same session then hits
> > > Assert(MyReplicationSlot == NULL) and crashes the backend on assert
> > > enabled builds. In release builds the stale MyReplicationSlot is
> silently overwritten,
> > > permanently orphaning the old slot as "active." The orphaned slot
> blocks any other
> > > session from acquiring it, vacuum and WAL deletion.
> > >
> > > Repro:
> > >
> > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
> > >
> > > DO $$ BEGIN
> > >     PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
> > > EXCEPTION WHEN others THEN
> > >     RAISE NOTICE 'caught: %', SQLERRM;
> > > END $$;
> > >
> > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL,
> NULL);
> > >
> > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT:  SELECT
> pg_create_logical_replication_slot('adv_test', 'test_decoding');
> > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
> Line: 638, PID: 1096805
> > >
> > >
> > > Attached a patch to address this by wrapping error-prone paths in
> PG_TRY/PG_CATCH blocks
> > > and call ReplicationSlotRelease().
> >
> > Thanks for the report and the patch!
> >
> > I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
> > a good direction for addressing the issue you reported.
> >
> >
> > +   PG_CATCH();
> > +   {
> > +       ReplicationSlotRelease();
> >
> > When create_logical_replication_slot() is called with temporary = true,
> > the created logical replication slot has RS_TEMPORARY persistency. Such
> a slot
> > is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot
> is
> > dropped via ReplicationSlotDropAcquired().
> >
> > So even with the v1 patch, a temporary logical replication slot can
> remain
> > unexpectedly if pg_create_logical_replication_slot() throws an error.
> > In this case, should create_logical_replication_slot() explicitly drop
> the slot
> > with ReplicationSlotDropAcquired(), or temporarily change the slot
> persistency
> > to RS_EPHEMERAL before calling ReplicationSlotRelease()?
> >
> >
> > Does a newly created logical replication slot created by
> > pg_copy_logical_replication_slot() have the same issue?
>
> Additionally pg_logical_slot_get_changes also has the same issue, it
> can be reproduced by the following:
> SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');
>
> DO $$
> BEGIN
>     -- This will ERROR if the slot_get changes fails for the slot.
>     PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
> NULL, 'nonexistent-option', 'val');
> EXCEPTION WHEN others THEN
>     RAISE NOTICE 'caught: %', SQLERRM;
> END $$;
>
> SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL,
> NULL);
>
> TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
> Line: 638, PID: 80308
> postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
> [0x642e7b2ebae1]
> postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
> [0x642e7b00d732]


Thank you for letting me know. Fixing these cases in the next update, will
send it shortly.

Thanks,
Satya

>
>
>


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
  2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
  2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
@ 2026-05-22 09:15   ` shveta malik <[email protected]>
  2026-05-25 07:12     ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: shveta malik @ 2026-05-22 09:15 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

Thanks for reporting the issue. I could reproduce the same issue with
all these as well:

pg_logical_slot_peek_changes
pg_logical_slot_get_binary_changes
pg_logical_slot_peek_binary_changes

thanks
Shveta






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
  2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
  2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-22 09:15   ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
@ 2026-05-25 07:12     ` SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-25 09:58       ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-05-25 07:12 UTC (permalink / raw)
  To: shveta malik <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

Hi

On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote:

> Thanks for reporting the issue. I could reproduce the same issue with
> all these as well:
>
> pg_logical_slot_peek_changes
> pg_logical_slot_get_binary_changes
> pg_logical_slot_peek_binary_changes
>

Please find the attached v2 patch that addressed these three cases as well.

Thanks,
Satya


Attachments:

  [application/octet-stream] v2-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch (14.4K, 3-v2-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch)
  download | inline diff:
diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..22ee8fdc0b8 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -466,3 +466,80 @@ SELECT pg_drop_replication_slot('physical_slot');
  
 (1 row)
 
+--
+-- Test that temporary slots are properly dropped on error, even when
+-- the error is caught by a PL/pgSQL EXCEPTION handler (which doesn't
+-- terminate the session).
+-- Test 1: create_logical_replication_slot with invalid plugin (temporary=true)
+DO $$
+BEGIN
+    PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true);
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+NOTICE:  caught: could not access file "nonexistent_plugin_xyz": No such file or directory
+SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_error';
+ slot_was_dropped 
+------------------
+ t
+(1 row)
+
+-- Test 2: session remains usable after the error (MyReplicationSlot cleared)
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true);
+ ?column? 
+----------
+ init
+(1 row)
+
+SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';
+ slot_exists 
+-------------
+ t
+(1 row)
+
+-- Test 3: pg_replication_slot_advance with a non-existent slot inside
+-- an EXCEPTION block — must not leave MyReplicationSlot dangling.
+DO $$
+BEGIN
+    PERFORM pg_replication_slot_advance('regression_slot_nonexist', '0/1');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+NOTICE:  caught: replication slot "regression_slot_nonexist" does not exist
+-- Session is still healthy: we can advance the real slot
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+     slot_name      
+--------------------
+ regression_slot_t3
+(1 row)
+
+-- Test 4: copy_replication_slot with max_replication_slots exceeded.
+-- We reduce max_replication_slots artificially by filling all remaining slots.
+-- Instead, trigger an error by copying to an already-existing name.
+DO $$
+BEGIN
+    PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_t3');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+NOTICE:  caught: replication slot "regression_slot_t3" already exists
+-- The original slot must still exist and be usable
+SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';
+ orig_slot_ok 
+--------------
+ t
+(1 row)
+
+-- Cleanup (slot is temporary, will drop on disconnect anyway, but be explicit)
+SELECT pg_drop_replication_slot('regression_slot_t3');
+ pg_drop_replication_slot 
+--------------------------
+ 
+(1 row)
+
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..29abf302413 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -190,3 +190,52 @@ SELECT pg_drop_replication_slot('failover_true_slot');
 SELECT pg_drop_replication_slot('failover_false_slot');
 SELECT pg_drop_replication_slot('failover_default_slot');
 SELECT pg_drop_replication_slot('physical_slot');
+
+--
+-- Test that temporary slots are properly dropped on error, even when
+-- the error is caught by a PL/pgSQL EXCEPTION handler (which doesn't
+-- terminate the session).
+-- Test 1: create_logical_replication_slot with invalid plugin (temporary=true)
+DO $$
+BEGIN
+    PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true);
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_error';
+
+-- Test 2: session remains usable after the error (MyReplicationSlot cleared)
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true);
+SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';
+
+-- Test 3: pg_replication_slot_advance with a non-existent slot inside
+-- an EXCEPTION block — must not leave MyReplicationSlot dangling.
+DO $$
+BEGIN
+    PERFORM pg_replication_slot_advance('regression_slot_nonexist', '0/1');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+-- Session is still healthy: we can advance the real slot
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+
+-- Test 4: copy_replication_slot with max_replication_slots exceeded.
+-- We reduce max_replication_slots artificially by filling all remaining slots.
+-- Instead, trigger an error by copying to an already-existing name.
+DO $$
+BEGIN
+    PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_t3');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+-- The original slot must still exist and be usable
+SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';
+
+-- Cleanup (slot is temporary, will drop on disconnect anyway, but be explicit)
+SELECT pg_drop_replication_slot('regression_slot_t3');
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 71fbaf72269..aa56e90bfab 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -197,10 +197,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(NULL);
 
-	ReplicationSlotAcquire(NameStr(*name), true, true);
-
 	PG_TRY();
 	{
+		ReplicationSlotAcquire(NameStr(*name), true, true);
+
 		/* restart at slot's confirmed_flush */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									options,
@@ -320,6 +320,9 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		/* clear all timetravel entries */
 		InvalidateSystemCaches();
 
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotRelease();
+
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 16fbd383735..acc643ac749 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -148,38 +148,55 @@ create_logical_replication_slot(char *name, char *plugin,
 						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, two_phase,
 						  false, failover, false);
 
-	/*
-	 * Ensure the logical decoding is enabled before initializing the logical
-	 * decoding context.
-	 */
-	EnsureLogicalDecodingEnabled();
-	Assert(IsLogicalDecodingEnabled());
+	PG_TRY();
+	{
+		/*
+		 * Ensure the logical decoding is enabled before initializing the logical
+		 * decoding context.
+		 */
+		EnsureLogicalDecodingEnabled();
+		Assert(IsLogicalDecodingEnabled());
 
-	/*
-	 * Create logical decoding context to find start point or, if we don't
-	 * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity.
-	 *
-	 * Note: when !find_startpoint this is still important, because it's at
-	 * this point that the output plugin is validated.
-	 */
-	ctx = CreateInitDecodingContext(plugin, NIL,
-									false,	/* just catalogs is OK */
-									false,	/* not repack */
-									restart_lsn,
-									XL_ROUTINE(.page_read = read_local_xlog_page,
-											   .segment_open = wal_segment_open,
-											   .segment_close = wal_segment_close),
-									NULL, NULL, NULL);
+		/*
+		 * Create logical decoding context to find start point or, if we don't
+		 * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity.
+		 *
+		 * Note: when !find_startpoint this is still important, because it's at
+		 * this point that the output plugin is validated.
+		 */
+		ctx = CreateInitDecodingContext(plugin, NIL,
+										false,	/* just catalogs is OK */
+										false,	/* not repack */
+										restart_lsn,
+										XL_ROUTINE(.page_read = read_local_xlog_page,
+												   .segment_open = wal_segment_open,
+												   .segment_close = wal_segment_close),
+										NULL, NULL, NULL);
 
-	/*
-	 * If caller needs us to determine the decoding start point, do so now.
-	 * This might take a while.
-	 */
-	if (find_startpoint)
-		DecodingContextFindStartpoint(ctx);
+		/*
+		 * If caller needs us to determine the decoding start point, do so now.
+		 * This might take a while.
+		 */
+		if (find_startpoint)
+			DecodingContextFindStartpoint(ctx);
 
-	/* don't need the decoding context anymore */
-	FreeDecodingContext(ctx);
+		/* don't need the decoding context anymore */
+		FreeDecodingContext(ctx);
+	}
+	PG_CATCH();
+	{
+		/*
+		 * Drop the slot on error.  ReplicationSlotRelease() only drops
+		 * RS_EPHEMERAL slots, so for RS_TEMPORARY slots we must explicitly
+		 * call ReplicationSlotDropAcquired() to avoid leaving the slot
+		 * behind (e.g. when the error is caught by a PL/pgSQL EXCEPTION
+		 * handler that doesn't terminate the session).
+		 */
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotDropAcquired();
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 }
 
 /*
@@ -566,49 +583,58 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	else
 		moveto = Min(moveto, GetXLogReplayRecPtr(NULL));
 
-	/* Acquire the slot so we "own" it */
-	ReplicationSlotAcquire(NameStr(*slotname), true, true);
-
-	/* A slot whose restart_lsn has never been reserved cannot be advanced */
-	if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("replication slot \"%s\" cannot be advanced",
-						NameStr(*slotname)),
-				 errdetail("This slot has never previously reserved WAL, or it has been invalidated.")));
+	PG_TRY();
+	{
+		/* Acquire the slot so we "own" it */
+		ReplicationSlotAcquire(NameStr(*slotname), true, true);
+		/* A slot whose restart_lsn has never been reserved cannot be advanced */
+		if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("replication slot \"%s\" cannot be advanced",
+							NameStr(*slotname)),
+					 errdetail("This slot has never previously reserved WAL, or it has been invalidated.")));
 
-	/*
-	 * Check if the slot is not moving backwards.  Physical slots rely simply
-	 * on restart_lsn as a minimum point, while logical slots have confirmed
-	 * consumption up to confirmed_flush, meaning that in both cases data
-	 * older than that is not available anymore.
-	 */
-	if (OidIsValid(MyReplicationSlot->data.database))
-		minlsn = MyReplicationSlot->data.confirmed_flush;
-	else
-		minlsn = MyReplicationSlot->data.restart_lsn;
+		/*
+		 * Check if the slot is not moving backwards.  Physical slots rely simply
+		 * on restart_lsn as a minimum point, while logical slots have confirmed
+		 * consumption up to confirmed_flush, meaning that in both cases data
+		 * older than that is not available anymore.
+		 */
+		if (OidIsValid(MyReplicationSlot->data.database))
+			minlsn = MyReplicationSlot->data.confirmed_flush;
+		else
+			minlsn = MyReplicationSlot->data.restart_lsn;
 
-	if (moveto < minlsn)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X",
-						LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn))));
+		if (moveto < minlsn)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X",
+							LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn))));
 
-	/* Do the actual slot update, depending on the slot type */
-	if (OidIsValid(MyReplicationSlot->data.database))
-		endlsn = pg_logical_replication_slot_advance(moveto);
-	else
-		endlsn = pg_physical_replication_slot_advance(moveto);
+		/* Do the actual slot update, depending on the slot type */
+		if (OidIsValid(MyReplicationSlot->data.database))
+			endlsn = pg_logical_replication_slot_advance(moveto);
+		else
+			endlsn = pg_physical_replication_slot_advance(moveto);
 
-	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
-	nulls[0] = false;
+		values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+		nulls[0] = false;
 
-	/*
-	 * Recompute the minimum LSN and xmin across all slots to adjust with the
-	 * advancing potentially done.
-	 */
-	ReplicationSlotsComputeRequiredXmin(false);
-	ReplicationSlotsComputeRequiredLSN();
+		/*
+		 * Recompute the minimum LSN and xmin across all slots to adjust with the
+		 * advancing potentially done.
+		 */
+		ReplicationSlotsComputeRequiredXmin(false);
+		ReplicationSlotsComputeRequiredLSN();
+	}
+	PG_CATCH();
+	{
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotRelease();
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 
 	ReplicationSlotRelease();
 
@@ -763,7 +789,13 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 	/*
 	 * Update the destination slot to current values of the source slot;
 	 * recheck that the source slot is still the one we saw previously.
+	 *
+	 * Use PG_TRY to ensure we drop the destination slot if any validation
+	 * error occurs.  Without this, an error caught by a PL/pgSQL EXCEPTION
+	 * handler would leave MyReplicationSlot set, crashing on the next slot
+	 * operation.
 	 */
+	PG_TRY();
 	{
 		TransactionId copy_effective_xmin;
 		TransactionId copy_effective_catalog_xmin;
@@ -797,9 +829,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		 * or the restart_lsn either is invalid or has gone backward. (The
 		 * restart_lsn could go backwards if the source slot is dropped and
 		 * copied from an older slot during installation.)
-		 *
-		 * Since erroring out will release and drop the destination slot we
-		 * don't need to release it here.
 		 */
 		if (copy_restart_lsn < src_restart_lsn ||
 			src_islogical != copy_islogical ||
@@ -857,6 +886,18 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		}
 #endif
 	}
+	PG_CATCH();
+	{
+		/*
+		 * Drop the newly-created destination slot on error. Same as in
+		 * create_logical_replication_slot(): use ReplicationSlotDropAcquired()
+		 * to handle both RS_EPHEMERAL and RS_TEMPORARY slots.
+		 */
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotDropAcquired();
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 
 	/* target slot fully created, mark as persistent if needed */
 	if (logical_slot && !temporary)


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
  2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
  2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-22 09:15   ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  2026-05-25 07:12     ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
@ 2026-05-25 09:58       ` shveta malik <[email protected]>
  2026-05-25 19:01         ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: shveta malik @ 2026-05-25 09:58 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM
<[email protected]> wrote:
>
> Hi
>
> On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote:
>>
>> Thanks for reporting the issue. I could reproduce the same issue with
>> all these as well:
>>
>> pg_logical_slot_peek_changes
>> pg_logical_slot_get_binary_changes
>> pg_logical_slot_peek_binary_changes
>
>
> Please find the attached v2 patch that addressed these three cases as well.
>

Thank You for addressuing these cases. A few comments:

1)

+-- Test 2: session remains usable after the error (MyReplicationSlot cleared)

It shoudl be part of 'Test 1' itself and thus should not be named as 'Test 2'

2)
--------
+-- Test 4: copy_replication_slot with max_replication_slots exceeded.
+-- We reduce max_replication_slots artificially by filling all remaining slots.
+-- Instead, trigger an error by copying to an already-existing name.
+DO $$
+BEGIN
+    PERFORM pg_copy_logical_replication_slot('regression_slot_t3',
'regression_slot_t3');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+-- The original slot must still exist and be usable
+SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';
-----------

I don't think we can hit the Assert with above test (at-least I could
not). Since creation of slot itself will fail as the slot with
same-name already exists, MyReplicationSlot will never be set and thus
Assert will not be hit.  A better testcase will be below which fails
during LoadOutputPlugin() after slot-creation and MyReplicationSlot is
set already.

SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding');

DO $$
BEGIN
PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot',
false, 'nonexistent_plugin');
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;

SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL);

3)
So overall these are the problematic APIs:

pg_create_logical_replication_slot
pg_replication_slot_advance
pg_copy_logical_replication_slot
pg_logical_slot_peek_binary_changes
pg_logical_slot_peek_changes
pg_logical_slot_get_changes
pg_logical_slot_get_binary_changes

First 3 are are mutually exclusive fixes fow which we have added
testcases. Last 4 are addressed by fixing common function
pg_logical_slot_get_changes_guts(). I think we should add a test case
for at-least any one of these APIs to cover
pg_logical_slot_get_changes_guts().

Thanks.
Shveta





^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
  2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
  2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-22 09:15   ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  2026-05-25 07:12     ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-25 09:58       ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
@ 2026-05-25 19:01         ` SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-26 04:36           ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-05-25 19:01 UTC (permalink / raw)
  To: shveta malik <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

Hi,

On Mon, May 25, 2026 at 2:58 AM shveta malik <[email protected]> wrote:

> On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM
> <[email protected]> wrote:
> >
> > Hi
> >
> > On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]>
> wrote:
> >>
> >> Thanks for reporting the issue. I could reproduce the same issue with
> >> all these as well:
> >>
> >> pg_logical_slot_peek_changes
> >> pg_logical_slot_get_binary_changes
> >> pg_logical_slot_peek_binary_changes
> >
> >
> > Please find the attached v2 patch that addressed these three cases as
> well.
> >
>
> Thank You for addressuing these cases. A few comments:
>
> 1)
>
> +-- Test 2: session remains usable after the error (MyReplicationSlot
> cleared)
>
> It shoudl be part of 'Test 1' itself and thus should not be named as 'Test
> 2'
>
> 2)
> --------
> +-- Test 4: copy_replication_slot with max_replication_slots exceeded.
> +-- We reduce max_replication_slots artificially by filling all remaining
> slots.
> +-- Instead, trigger an error by copying to an already-existing name.
> +DO $$
> +BEGIN
> +    PERFORM pg_copy_logical_replication_slot('regression_slot_t3',
> 'regression_slot_t3');
> +EXCEPTION WHEN OTHERS THEN
> +    RAISE NOTICE 'caught: %', SQLERRM;
> +END;
> +$$;
> +-- The original slot must still exist and be usable
> +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots
> +    WHERE slot_name = 'regression_slot_t3';
> -----------
>
> I don't think we can hit the Assert with above test (at-least I could
> not). Since creation of slot itself will fail as the slot with
> same-name already exists, MyReplicationSlot will never be set and thus
> Assert will not be hit.  A better testcase will be below which fails
> during LoadOutputPlugin() after slot-creation and MyReplicationSlot is
> set already.
>
> SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding');
>
> DO $$
> BEGIN
> PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot',
> false, 'nonexistent_plugin');
> EXCEPTION WHEN others THEN
> RAISE NOTICE 'caught: %', SQLERRM;
> END $$;
>
> SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL);
>
> 3)
> So overall these are the problematic APIs:
>
> pg_create_logical_replication_slot
> pg_replication_slot_advance
> pg_copy_logical_replication_slot
> pg_logical_slot_peek_binary_changes
> pg_logical_slot_peek_changes
> pg_logical_slot_get_changes
> pg_logical_slot_get_binary_changes
>
> First 3 are are mutually exclusive fixes fow which we have added
> testcases. Last 4 are addressed by fixing common function
> pg_logical_slot_get_changes_guts(). I think we should add a test case
> for at-least any one of these APIs to cover
> pg_logical_slot_get_changes_guts().
>

Thanks for reviewing. Please review the attached v3 patch.

Thanks,
Satya


Attachments:

  [application/octet-stream] v3-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch (15.7K, 3-v3-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch)
  download | inline diff:
diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
index 7de03c79f6f..962a4277b75 100644
--- a/contrib/test_decoding/expected/slot.out
+++ b/contrib/test_decoding/expected/slot.out
@@ -466,3 +466,122 @@ SELECT pg_drop_replication_slot('physical_slot');
  
 (1 row)
 
+--
+-- Test that replication slots are properly released or dropped on error,
+-- even when the error is caught by a PL/pgSQL EXCEPTION handler (which
+-- doesn't terminate the session).
+--
+-- pg_create_logical_replication_slot: error during slot creation should
+-- drop the slot.
+DO $$
+BEGIN
+    PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true);
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+NOTICE:  caught: could not access file "nonexistent_plugin_xyz": No such file or directory
+SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_error';
+ slot_was_dropped 
+------------------
+ t
+(1 row)
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true);
+ ?column? 
+----------
+ init
+(1 row)
+
+SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';
+ slot_exists 
+-------------
+ t
+(1 row)
+
+-- pg_replication_slot_advance: error after acquiring the slot should
+-- release it so the session stays usable.
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+     slot_name      
+--------------------
+ regression_slot_t3
+(1 row)
+
+DO $$
+BEGIN
+    PERFORM pg_replication_slot_advance('regression_slot_t3', '0/1');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught expected error';
+END;
+$$;
+NOTICE:  caught expected error
+-- the session is still healthy
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+     slot_name      
+--------------------
+ regression_slot_t3
+(1 row)
+
+-- pg_copy_logical_replication_slot: error after creating the destination
+-- slot should drop it.
+DO $$
+BEGIN
+    PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_dst', false, 'nonexistent_plugin_xyz');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+NOTICE:  caught: could not access file "nonexistent_plugin_xyz": No such file or directory
+-- the destination slot must not exist (it was dropped on error)
+SELECT count(*) = 0 AS dst_slot_dropped FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_dst';
+ dst_slot_dropped 
+------------------
+ t
+(1 row)
+
+-- the session is still usable
+SELECT count(*) >= 0 AS changes_ok FROM pg_logical_slot_get_changes('regression_slot_t3', NULL, NULL);
+ changes_ok 
+------------
+ t
+(1 row)
+
+-- pg_logical_slot_get_changes: error after acquiring the slot should
+-- release it.
+SELECT 'init' FROM pg_create_physical_replication_slot('regression_slot_phy', true);
+ ?column? 
+----------
+ init
+(1 row)
+
+DO $$
+BEGIN
+    PERFORM pg_logical_slot_get_changes('regression_slot_phy', NULL, NULL);
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+NOTICE:  caught: cannot use physical replication slot for logical decoding
+-- the session is still usable
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+     slot_name      
+--------------------
+ regression_slot_t3
+(1 row)
+
+SELECT pg_drop_replication_slot('regression_slot_phy');
+ pg_drop_replication_slot 
+--------------------------
+ 
+(1 row)
+
+-- cleanup
+SELECT pg_drop_replication_slot('regression_slot_t3');
+ pg_drop_replication_slot 
+--------------------------
+ 
+(1 row)
+
diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 580e3ae3bef..1b327f518aa 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -190,3 +190,70 @@ SELECT pg_drop_replication_slot('failover_true_slot');
 SELECT pg_drop_replication_slot('failover_false_slot');
 SELECT pg_drop_replication_slot('failover_default_slot');
 SELECT pg_drop_replication_slot('physical_slot');
+
+--
+-- Test that replication slots are properly released or dropped on error,
+-- even when the error is caught by a PL/pgSQL EXCEPTION handler (which
+-- doesn't terminate the session).
+--
+
+-- pg_create_logical_replication_slot: error during slot creation should
+-- drop the slot.
+DO $$
+BEGIN
+    PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true);
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_error';
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true);
+SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';
+
+-- pg_replication_slot_advance: error after acquiring the slot should
+-- release it so the session stays usable.
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+DO $$
+BEGIN
+    PERFORM pg_replication_slot_advance('regression_slot_t3', '0/1');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught expected error';
+END;
+$$;
+-- the session is still healthy
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+
+-- pg_copy_logical_replication_slot: error after creating the destination
+-- slot should drop it.
+DO $$
+BEGIN
+    PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_dst', false, 'nonexistent_plugin_xyz');
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+-- the destination slot must not exist (it was dropped on error)
+SELECT count(*) = 0 AS dst_slot_dropped FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_dst';
+-- the session is still usable
+SELECT count(*) >= 0 AS changes_ok FROM pg_logical_slot_get_changes('regression_slot_t3', NULL, NULL);
+
+-- pg_logical_slot_get_changes: error after acquiring the slot should
+-- release it.
+SELECT 'init' FROM pg_create_physical_replication_slot('regression_slot_phy', true);
+DO $$
+BEGIN
+    PERFORM pg_logical_slot_get_changes('regression_slot_phy', NULL, NULL);
+EXCEPTION WHEN OTHERS THEN
+    RAISE NOTICE 'caught: %', SQLERRM;
+END;
+$$;
+-- the session is still usable
+SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn());
+SELECT pg_drop_replication_slot('regression_slot_phy');
+
+-- cleanup
+SELECT pg_drop_replication_slot('regression_slot_t3');
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 71fbaf72269..aa56e90bfab 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -197,10 +197,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	else
 		end_of_wal = GetXLogReplayRecPtr(NULL);
 
-	ReplicationSlotAcquire(NameStr(*name), true, true);
-
 	PG_TRY();
 	{
+		ReplicationSlotAcquire(NameStr(*name), true, true);
+
 		/* restart at slot's confirmed_flush */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 									options,
@@ -320,6 +320,9 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		/* clear all timetravel entries */
 		InvalidateSystemCaches();
 
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotRelease();
+
 		PG_RE_THROW();
 	}
 	PG_END_TRY();
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 16fbd383735..acc643ac749 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -148,38 +148,55 @@ create_logical_replication_slot(char *name, char *plugin,
 						  temporary ? RS_TEMPORARY : RS_EPHEMERAL, two_phase,
 						  false, failover, false);
 
-	/*
-	 * Ensure the logical decoding is enabled before initializing the logical
-	 * decoding context.
-	 */
-	EnsureLogicalDecodingEnabled();
-	Assert(IsLogicalDecodingEnabled());
+	PG_TRY();
+	{
+		/*
+		 * Ensure the logical decoding is enabled before initializing the logical
+		 * decoding context.
+		 */
+		EnsureLogicalDecodingEnabled();
+		Assert(IsLogicalDecodingEnabled());
 
-	/*
-	 * Create logical decoding context to find start point or, if we don't
-	 * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity.
-	 *
-	 * Note: when !find_startpoint this is still important, because it's at
-	 * this point that the output plugin is validated.
-	 */
-	ctx = CreateInitDecodingContext(plugin, NIL,
-									false,	/* just catalogs is OK */
-									false,	/* not repack */
-									restart_lsn,
-									XL_ROUTINE(.page_read = read_local_xlog_page,
-											   .segment_open = wal_segment_open,
-											   .segment_close = wal_segment_close),
-									NULL, NULL, NULL);
+		/*
+		 * Create logical decoding context to find start point or, if we don't
+		 * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity.
+		 *
+		 * Note: when !find_startpoint this is still important, because it's at
+		 * this point that the output plugin is validated.
+		 */
+		ctx = CreateInitDecodingContext(plugin, NIL,
+										false,	/* just catalogs is OK */
+										false,	/* not repack */
+										restart_lsn,
+										XL_ROUTINE(.page_read = read_local_xlog_page,
+												   .segment_open = wal_segment_open,
+												   .segment_close = wal_segment_close),
+										NULL, NULL, NULL);
 
-	/*
-	 * If caller needs us to determine the decoding start point, do so now.
-	 * This might take a while.
-	 */
-	if (find_startpoint)
-		DecodingContextFindStartpoint(ctx);
+		/*
+		 * If caller needs us to determine the decoding start point, do so now.
+		 * This might take a while.
+		 */
+		if (find_startpoint)
+			DecodingContextFindStartpoint(ctx);
 
-	/* don't need the decoding context anymore */
-	FreeDecodingContext(ctx);
+		/* don't need the decoding context anymore */
+		FreeDecodingContext(ctx);
+	}
+	PG_CATCH();
+	{
+		/*
+		 * Drop the slot on error.  ReplicationSlotRelease() only drops
+		 * RS_EPHEMERAL slots, so for RS_TEMPORARY slots we must explicitly
+		 * call ReplicationSlotDropAcquired() to avoid leaving the slot
+		 * behind (e.g. when the error is caught by a PL/pgSQL EXCEPTION
+		 * handler that doesn't terminate the session).
+		 */
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotDropAcquired();
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 }
 
 /*
@@ -566,49 +583,58 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
 	else
 		moveto = Min(moveto, GetXLogReplayRecPtr(NULL));
 
-	/* Acquire the slot so we "own" it */
-	ReplicationSlotAcquire(NameStr(*slotname), true, true);
-
-	/* A slot whose restart_lsn has never been reserved cannot be advanced */
-	if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("replication slot \"%s\" cannot be advanced",
-						NameStr(*slotname)),
-				 errdetail("This slot has never previously reserved WAL, or it has been invalidated.")));
+	PG_TRY();
+	{
+		/* Acquire the slot so we "own" it */
+		ReplicationSlotAcquire(NameStr(*slotname), true, true);
+		/* A slot whose restart_lsn has never been reserved cannot be advanced */
+		if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("replication slot \"%s\" cannot be advanced",
+							NameStr(*slotname)),
+					 errdetail("This slot has never previously reserved WAL, or it has been invalidated.")));
 
-	/*
-	 * Check if the slot is not moving backwards.  Physical slots rely simply
-	 * on restart_lsn as a minimum point, while logical slots have confirmed
-	 * consumption up to confirmed_flush, meaning that in both cases data
-	 * older than that is not available anymore.
-	 */
-	if (OidIsValid(MyReplicationSlot->data.database))
-		minlsn = MyReplicationSlot->data.confirmed_flush;
-	else
-		minlsn = MyReplicationSlot->data.restart_lsn;
+		/*
+		 * Check if the slot is not moving backwards.  Physical slots rely simply
+		 * on restart_lsn as a minimum point, while logical slots have confirmed
+		 * consumption up to confirmed_flush, meaning that in both cases data
+		 * older than that is not available anymore.
+		 */
+		if (OidIsValid(MyReplicationSlot->data.database))
+			minlsn = MyReplicationSlot->data.confirmed_flush;
+		else
+			minlsn = MyReplicationSlot->data.restart_lsn;
 
-	if (moveto < minlsn)
-		ereport(ERROR,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X",
-						LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn))));
+		if (moveto < minlsn)
+			ereport(ERROR,
+					(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+					 errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X",
+							LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn))));
 
-	/* Do the actual slot update, depending on the slot type */
-	if (OidIsValid(MyReplicationSlot->data.database))
-		endlsn = pg_logical_replication_slot_advance(moveto);
-	else
-		endlsn = pg_physical_replication_slot_advance(moveto);
+		/* Do the actual slot update, depending on the slot type */
+		if (OidIsValid(MyReplicationSlot->data.database))
+			endlsn = pg_logical_replication_slot_advance(moveto);
+		else
+			endlsn = pg_physical_replication_slot_advance(moveto);
 
-	values[0] = NameGetDatum(&MyReplicationSlot->data.name);
-	nulls[0] = false;
+		values[0] = NameGetDatum(&MyReplicationSlot->data.name);
+		nulls[0] = false;
 
-	/*
-	 * Recompute the minimum LSN and xmin across all slots to adjust with the
-	 * advancing potentially done.
-	 */
-	ReplicationSlotsComputeRequiredXmin(false);
-	ReplicationSlotsComputeRequiredLSN();
+		/*
+		 * Recompute the minimum LSN and xmin across all slots to adjust with the
+		 * advancing potentially done.
+		 */
+		ReplicationSlotsComputeRequiredXmin(false);
+		ReplicationSlotsComputeRequiredLSN();
+	}
+	PG_CATCH();
+	{
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotRelease();
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 
 	ReplicationSlotRelease();
 
@@ -763,7 +789,13 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 	/*
 	 * Update the destination slot to current values of the source slot;
 	 * recheck that the source slot is still the one we saw previously.
+	 *
+	 * Use PG_TRY to ensure we drop the destination slot if any validation
+	 * error occurs.  Without this, an error caught by a PL/pgSQL EXCEPTION
+	 * handler would leave MyReplicationSlot set, crashing on the next slot
+	 * operation.
 	 */
+	PG_TRY();
 	{
 		TransactionId copy_effective_xmin;
 		TransactionId copy_effective_catalog_xmin;
@@ -797,9 +829,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		 * or the restart_lsn either is invalid or has gone backward. (The
 		 * restart_lsn could go backwards if the source slot is dropped and
 		 * copied from an older slot during installation.)
-		 *
-		 * Since erroring out will release and drop the destination slot we
-		 * don't need to release it here.
 		 */
 		if (copy_restart_lsn < src_restart_lsn ||
 			src_islogical != copy_islogical ||
@@ -857,6 +886,18 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot)
 		}
 #endif
 	}
+	PG_CATCH();
+	{
+		/*
+		 * Drop the newly-created destination slot on error. Same as in
+		 * create_logical_replication_slot(): use ReplicationSlotDropAcquired()
+		 * to handle both RS_EPHEMERAL and RS_TEMPORARY slots.
+		 */
+		if (MyReplicationSlot != NULL)
+			ReplicationSlotDropAcquired();
+		PG_RE_THROW();
+	}
+	PG_END_TRY();
 
 	/* target slot fully created, mark as persistent if needed */
 	if (logical_slot && !temporary)


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
  2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
  2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-22 09:15   ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  2026-05-25 07:12     ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-25 09:58       ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  2026-05-25 19:01         ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
@ 2026-05-26 04:36           ` shveta malik <[email protected]>
  2026-05-26 05:50             ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: shveta malik @ 2026-05-26 04:36 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

On Tue, May 26, 2026 at 12:31 AM SATYANARAYANA NARLAPURAM
<[email protected]> wrote:
>
> Hi,
>
> On Mon, May 25, 2026 at 2:58 AM shveta malik <[email protected]> wrote:
>>
>> On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM
>> <[email protected]> wrote:
>> >
>> > Hi
>> >
>> > On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote:
>> >>
>> >> Thanks for reporting the issue. I could reproduce the same issue with
>> >> all these as well:
>> >>
>> >> pg_logical_slot_peek_changes
>> >> pg_logical_slot_get_binary_changes
>> >> pg_logical_slot_peek_binary_changes
>> >
>> >
>> > Please find the attached v2 patch that addressed these three cases as well.
>> >
>>
>> Thank You for addressuing these cases. A few comments:
>>
>> 1)
>>
>> +-- Test 2: session remains usable after the error (MyReplicationSlot cleared)
>>
>> It shoudl be part of 'Test 1' itself and thus should not be named as 'Test 2'
>>
>> 2)
>> --------
>> +-- Test 4: copy_replication_slot with max_replication_slots exceeded.
>> +-- We reduce max_replication_slots artificially by filling all remaining slots.
>> +-- Instead, trigger an error by copying to an already-existing name.
>> +DO $$
>> +BEGIN
>> +    PERFORM pg_copy_logical_replication_slot('regression_slot_t3',
>> 'regression_slot_t3');
>> +EXCEPTION WHEN OTHERS THEN
>> +    RAISE NOTICE 'caught: %', SQLERRM;
>> +END;
>> +$$;
>> +-- The original slot must still exist and be usable
>> +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots
>> +    WHERE slot_name = 'regression_slot_t3';
>> -----------
>>
>> I don't think we can hit the Assert with above test (at-least I could
>> not). Since creation of slot itself will fail as the slot with
>> same-name already exists, MyReplicationSlot will never be set and thus
>> Assert will not be hit.  A better testcase will be below which fails
>> during LoadOutputPlugin() after slot-creation and MyReplicationSlot is
>> set already.
>>
>> SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding');
>>
>> DO $$
>> BEGIN
>> PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot',
>> false, 'nonexistent_plugin');
>> EXCEPTION WHEN others THEN
>> RAISE NOTICE 'caught: %', SQLERRM;
>> END $$;
>>
>> SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL);
>>
>> 3)
>> So overall these are the problematic APIs:
>>
>> pg_create_logical_replication_slot
>> pg_replication_slot_advance
>> pg_copy_logical_replication_slot
>> pg_logical_slot_peek_binary_changes
>> pg_logical_slot_peek_changes
>> pg_logical_slot_get_changes
>> pg_logical_slot_get_binary_changes
>>
>> First 3 are are mutually exclusive fixes fow which we have added
>> testcases. Last 4 are addressed by fixing common function
>> pg_logical_slot_get_changes_guts(). I think we should add a test case
>> for at-least any one of these APIs to cover
>> pg_logical_slot_get_changes_guts().
>
>
> Thanks for reviewing. Please review the attached v3 patch.
>

A few trivial things:

1)
pg_replication_slot_advance:
+ PG_TRY();
+ {
+ /* Acquire the slot so we "own" it */
+ ReplicationSlotAcquire(NameStr(*slotname), true, true);
+ /* A slot whose restart_lsn has never been reserved cannot be advanced */
+ if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))


We can have a blank line after ReplicationSlotAcquire for better readability.

2)

+SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot_t3',
'test_decoding', true);
+SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots
+    WHERE slot_name = 'regression_slot_t3';

The intent is not clear why are we checking existence of
regression_slot_t3? I think we can skip it (or else add a comment if
really needed). The success of previous
pg_create_logical_replication_slot is enough to confirm that session
is healthy to run other slot related queries.

3)
+SELECT pg_drop_replication_slot('regression_slot_phy');
+
+-- cleanup
+SELECT pg_drop_replication_slot('regression_slot_t3');

We can move drop of 'regression_slot_phy' too under '-- cleanup'

~~

I have no further comments other than the trivial things mentioned above.

thanks
Shveta






^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
  2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
  2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-22 09:15   ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  2026-05-25 07:12     ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-25 09:58       ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
  2026-05-25 19:01         ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-26 04:36           ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]>
@ 2026-05-26 05:50             ` shveta malik <[email protected]>
  0 siblings, 0 replies; 8+ messages in thread

From: shveta malik @ 2026-05-26 05:50 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

On Tue, May 26, 2026 at 10:06 AM shveta malik <[email protected]> wrote:
>
> On Tue, May 26, 2026 at 12:31 AM SATYANARAYANA NARLAPURAM
> <[email protected]> wrote:
> >
> > Hi,
> >
> > On Mon, May 25, 2026 at 2:58 AM shveta malik <[email protected]> wrote:
> >>
> >> On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM
> >> <[email protected]> wrote:
> >> >
> >> > Hi
> >> >
> >> > On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote:
> >> >>
> >> >> Thanks for reporting the issue. I could reproduce the same issue with
> >> >> all these as well:
> >> >>
> >> >> pg_logical_slot_peek_changes
> >> >> pg_logical_slot_get_binary_changes
> >> >> pg_logical_slot_peek_binary_changes
> >> >
> >> >
> >> > Please find the attached v2 patch that addressed these three cases as well.
> >> >
> >>
> >> Thank You for addressuing these cases. A few comments:
> >>
> >> 1)
> >>
> >> +-- Test 2: session remains usable after the error (MyReplicationSlot cleared)
> >>
> >> It shoudl be part of 'Test 1' itself and thus should not be named as 'Test 2'
> >>
> >> 2)
> >> --------
> >> +-- Test 4: copy_replication_slot with max_replication_slots exceeded.
> >> +-- We reduce max_replication_slots artificially by filling all remaining slots.
> >> +-- Instead, trigger an error by copying to an already-existing name.
> >> +DO $$
> >> +BEGIN
> >> +    PERFORM pg_copy_logical_replication_slot('regression_slot_t3',
> >> 'regression_slot_t3');
> >> +EXCEPTION WHEN OTHERS THEN
> >> +    RAISE NOTICE 'caught: %', SQLERRM;
> >> +END;
> >> +$$;
> >> +-- The original slot must still exist and be usable
> >> +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots
> >> +    WHERE slot_name = 'regression_slot_t3';
> >> -----------
> >>
> >> I don't think we can hit the Assert with above test (at-least I could
> >> not). Since creation of slot itself will fail as the slot with
> >> same-name already exists, MyReplicationSlot will never be set and thus
> >> Assert will not be hit.  A better testcase will be below which fails
> >> during LoadOutputPlugin() after slot-creation and MyReplicationSlot is
> >> set already.
> >>
> >> SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding');
> >>
> >> DO $$
> >> BEGIN
> >> PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot',
> >> false, 'nonexistent_plugin');
> >> EXCEPTION WHEN others THEN
> >> RAISE NOTICE 'caught: %', SQLERRM;
> >> END $$;
> >>
> >> SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL);
> >>
> >> 3)
> >> So overall these are the problematic APIs:
> >>
> >> pg_create_logical_replication_slot
> >> pg_replication_slot_advance
> >> pg_copy_logical_replication_slot
> >> pg_logical_slot_peek_binary_changes
> >> pg_logical_slot_peek_changes
> >> pg_logical_slot_get_changes
> >> pg_logical_slot_get_binary_changes
> >>
> >> First 3 are are mutually exclusive fixes fow which we have added
> >> testcases. Last 4 are addressed by fixing common function
> >> pg_logical_slot_get_changes_guts(). I think we should add a test case
> >> for at-least any one of these APIs to cover
> >> pg_logical_slot_get_changes_guts().
> >
> >
> > Thanks for reviewing. Please review the attached v3 patch.
> >
>
> A few trivial things:
>
> 1)
> pg_replication_slot_advance:
> + PG_TRY();
> + {
> + /* Acquire the slot so we "own" it */
> + ReplicationSlotAcquire(NameStr(*slotname), true, true);
> + /* A slot whose restart_lsn has never been reserved cannot be advanced */
> + if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn))
>
>
> We can have a blank line after ReplicationSlotAcquire for better readability.
>
> 2)
>
> +SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot_t3',
> 'test_decoding', true);
> +SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots
> +    WHERE slot_name = 'regression_slot_t3';
>
> The intent is not clear why are we checking existence of
> regression_slot_t3? I think we can skip it (or else add a comment if
> really needed). The success of previous
> pg_create_logical_replication_slot is enough to confirm that session
> is healthy to run other slot related queries.
>
> 3)
> +SELECT pg_drop_replication_slot('regression_slot_phy');
> +
> +-- cleanup
> +SELECT pg_drop_replication_slot('regression_slot_t3');
>
> We can move drop of 'regression_slot_phy' too under '-- cleanup'
>
> ~~
>
> I have no further comments other than the trivial things mentioned above.
>

Missed to inform this earlier, I am not able to apply any version of
the patches shared so far with 'git am'. It gives error, 'patch -p1'
works.

git am v3-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch
Patch format detection failed.

thanks
Shveta






^ permalink  raw  reply  [nested|flat] 8+ messages in thread


end of thread, other threads:[~2026-05-26 05:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]>
2026-05-21 14:38 ` SATYANARAYANA NARLAPURAM <[email protected]>
2026-05-22 09:15   ` shveta malik <[email protected]>
2026-05-25 07:12     ` SATYANARAYANA NARLAPURAM <[email protected]>
2026-05-25 09:58       ` shveta malik <[email protected]>
2026-05-25 19:01         ` SATYANARAYANA NARLAPURAM <[email protected]>
2026-05-26 04:36           ` shveta malik <[email protected]>
2026-05-26 05:50             ` shveta malik <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox