public inbox for [email protected]help / color / mirror / Atom feed
Re: [PATCH] Release replication slot on error in SQL-callable slot functions 8+ messages / 3 participants [nested] [flat]
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions @ 2026-05-21 06:49 vignesh C <[email protected]> 2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: vignesh C @ 2026-05-21 06:49 UTC (permalink / raw) To: Fujii Masao <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>; PostgreSQL Hackers <[email protected]> On Mon, 11 May 2026 at 08:31, Fujii Masao <[email protected]> wrote: > > On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM > <[email protected]> wrote: > > > > Hi Hackers, > > > > SQL-callable replication slot functions acquire a slot (setting > > the process-global MyReplicationSlot) but can then ERROR before reaching > > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL > > EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains > > set because there is no subtransaction-level cleanup hook for replication > > slots. > > > > Any subsequent slot operation in the same session then hits > > Assert(MyReplicationSlot == NULL) and crashes the backend on assert > > enabled builds. In release builds the stale MyReplicationSlot is silently overwritten, > > permanently orphaning the old slot as "active." The orphaned slot blocks any other > > session from acquiring it, vacuum and WAL deletion. > > > > Repro: > > > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding'); > > > > DO $$ BEGIN > > PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn); > > EXCEPTION WHEN others THEN > > RAISE NOTICE 'caught: %', SQLERRM; > > END $$; > > > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL); > > > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding'); > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805 > > > > > > Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks > > and call ReplicationSlotRelease(). > > Thanks for the report and the patch! > > I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems > a good direction for addressing the issue you reported. > > > + PG_CATCH(); > + { > + ReplicationSlotRelease(); > > When create_logical_replication_slot() is called with temporary = true, > the created logical replication slot has RS_TEMPORARY persistency. Such a slot > is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is > dropped via ReplicationSlotDropAcquired(). > > So even with the v1 patch, a temporary logical replication slot can remain > unexpectedly if pg_create_logical_replication_slot() throws an error. > In this case, should create_logical_replication_slot() explicitly drop the slot > with ReplicationSlotDropAcquired(), or temporarily change the slot persistency > to RS_EPHEMERAL before calling ReplicationSlotRelease()? > > > Does a newly created logical replication slot created by > pg_copy_logical_replication_slot() have the same issue? Additionally pg_logical_slot_get_changes also has the same issue, it can be reproduced by the following: SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding'); DO $$ BEGIN -- This will ERROR if the slot_get changes fails for the slot. PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL, 'nonexistent-option', 'val'); EXCEPTION WHEN others THEN RAISE NOTICE 'caught: %', SQLERRM; END $$; SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL); TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 80308 postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba) [0x642e7b2ebae1] postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e) [0x642e7b00d732] Regards, Vignesh ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> @ 2026-05-21 14:38 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-22 09:15 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: SATYANARAYANA NARLAPURAM @ 2026-05-21 14:38 UTC (permalink / raw) To: vignesh C <[email protected]>; +Cc: Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]> Hi On Wed, May 20, 2026 at 11:49 PM vignesh C <[email protected]> wrote: > On Mon, 11 May 2026 at 08:31, Fujii Masao <[email protected]> wrote: > > > > On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM > > <[email protected]> wrote: > > > > > > Hi Hackers, > > > > > > SQL-callable replication slot functions acquire a slot (setting > > > the process-global MyReplicationSlot) but can then ERROR before > reaching > > > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL > > > EXCEPTION block (which uses a subtransaction), MyReplicationSlot > remains > > > set because there is no subtransaction-level cleanup hook for > replication > > > slots. > > > > > > Any subsequent slot operation in the same session then hits > > > Assert(MyReplicationSlot == NULL) and crashes the backend on assert > > > enabled builds. In release builds the stale MyReplicationSlot is > silently overwritten, > > > permanently orphaning the old slot as "active." The orphaned slot > blocks any other > > > session from acquiring it, vacuum and WAL deletion. > > > > > > Repro: > > > > > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding'); > > > > > > DO $$ BEGIN > > > PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn); > > > EXCEPTION WHEN others THEN > > > RAISE NOTICE 'caught: %', SQLERRM; > > > END $$; > > > > > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, > NULL); > > > > > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT > pg_create_logical_replication_slot('adv_test', 'test_decoding'); > > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", > Line: 638, PID: 1096805 > > > > > > > > > Attached a patch to address this by wrapping error-prone paths in > PG_TRY/PG_CATCH blocks > > > and call ReplicationSlotRelease(). > > > > Thanks for the report and the patch! > > > > I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems > > a good direction for addressing the issue you reported. > > > > > > + PG_CATCH(); > > + { > > + ReplicationSlotRelease(); > > > > When create_logical_replication_slot() is called with temporary = true, > > the created logical replication slot has RS_TEMPORARY persistency. Such > a slot > > is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot > is > > dropped via ReplicationSlotDropAcquired(). > > > > So even with the v1 patch, a temporary logical replication slot can > remain > > unexpectedly if pg_create_logical_replication_slot() throws an error. > > In this case, should create_logical_replication_slot() explicitly drop > the slot > > with ReplicationSlotDropAcquired(), or temporarily change the slot > persistency > > to RS_EPHEMERAL before calling ReplicationSlotRelease()? > > > > > > Does a newly created logical replication slot created by > > pg_copy_logical_replication_slot() have the same issue? > > Additionally pg_logical_slot_get_changes also has the same issue, it > can be reproduced by the following: > SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding'); > > DO $$ > BEGIN > -- This will ERROR if the slot_get changes fails for the slot. > PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL, > NULL, 'nonexistent-option', 'val'); > EXCEPTION WHEN others THEN > RAISE NOTICE 'caught: %', SQLERRM; > END $$; > > SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL, > NULL); > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", > Line: 638, PID: 80308 > postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba) > [0x642e7b2ebae1] > postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e) > [0x642e7b00d732] Thank you for letting me know. Fixing these cases in the next update, will send it shortly. Thanks, Satya > > > ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> 2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> @ 2026-05-22 09:15 ` shveta malik <[email protected]> 2026-05-25 07:12 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: shveta malik @ 2026-05-22 09:15 UTC (permalink / raw) To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]> Thanks for reporting the issue. I could reproduce the same issue with all these as well: pg_logical_slot_peek_changes pg_logical_slot_get_binary_changes pg_logical_slot_peek_binary_changes thanks Shveta ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> 2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-22 09:15 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> @ 2026-05-25 07:12 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-25 09:58 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: SATYANARAYANA NARLAPURAM @ 2026-05-25 07:12 UTC (permalink / raw) To: shveta malik <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]> Hi On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote: > Thanks for reporting the issue. I could reproduce the same issue with > all these as well: > > pg_logical_slot_peek_changes > pg_logical_slot_get_binary_changes > pg_logical_slot_peek_binary_changes > Please find the attached v2 patch that addressed these three cases as well. Thanks, Satya Attachments: [application/octet-stream] v2-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch (14.4K, 3-v2-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch) download | inline diff: diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out index 7de03c79f6f..22ee8fdc0b8 100644 --- a/contrib/test_decoding/expected/slot.out +++ b/contrib/test_decoding/expected/slot.out @@ -466,3 +466,80 @@ SELECT pg_drop_replication_slot('physical_slot'); (1 row) +-- +-- Test that temporary slots are properly dropped on error, even when +-- the error is caught by a PL/pgSQL EXCEPTION handler (which doesn't +-- terminate the session). +-- Test 1: create_logical_replication_slot with invalid plugin (temporary=true) +DO $$ +BEGIN + PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +NOTICE: caught: could not access file "nonexistent_plugin_xyz": No such file or directory +SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots + WHERE slot_name = 'regression_slot_error'; + slot_was_dropped +------------------ + t +(1 row) + +-- Test 2: session remains usable after the error (MyReplicationSlot cleared) +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true); + ?column? +---------- + init +(1 row) + +SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; + slot_exists +------------- + t +(1 row) + +-- Test 3: pg_replication_slot_advance with a non-existent slot inside +-- an EXCEPTION block — must not leave MyReplicationSlot dangling. +DO $$ +BEGIN + PERFORM pg_replication_slot_advance('regression_slot_nonexist', '0/1'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +NOTICE: caught: replication slot "regression_slot_nonexist" does not exist +-- Session is still healthy: we can advance the real slot +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); + slot_name +-------------------- + regression_slot_t3 +(1 row) + +-- Test 4: copy_replication_slot with max_replication_slots exceeded. +-- We reduce max_replication_slots artificially by filling all remaining slots. +-- Instead, trigger an error by copying to an already-existing name. +DO $$ +BEGIN + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_t3'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +NOTICE: caught: replication slot "regression_slot_t3" already exists +-- The original slot must still exist and be usable +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; + orig_slot_ok +-------------- + t +(1 row) + +-- Cleanup (slot is temporary, will drop on disconnect anyway, but be explicit) +SELECT pg_drop_replication_slot('regression_slot_t3'); + pg_drop_replication_slot +-------------------------- + +(1 row) + diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql index 580e3ae3bef..29abf302413 100644 --- a/contrib/test_decoding/sql/slot.sql +++ b/contrib/test_decoding/sql/slot.sql @@ -190,3 +190,52 @@ SELECT pg_drop_replication_slot('failover_true_slot'); SELECT pg_drop_replication_slot('failover_false_slot'); SELECT pg_drop_replication_slot('failover_default_slot'); SELECT pg_drop_replication_slot('physical_slot'); + +-- +-- Test that temporary slots are properly dropped on error, even when +-- the error is caught by a PL/pgSQL EXCEPTION handler (which doesn't +-- terminate the session). +-- Test 1: create_logical_replication_slot with invalid plugin (temporary=true) +DO $$ +BEGIN + PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots + WHERE slot_name = 'regression_slot_error'; + +-- Test 2: session remains usable after the error (MyReplicationSlot cleared) +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true); +SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; + +-- Test 3: pg_replication_slot_advance with a non-existent slot inside +-- an EXCEPTION block — must not leave MyReplicationSlot dangling. +DO $$ +BEGIN + PERFORM pg_replication_slot_advance('regression_slot_nonexist', '0/1'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +-- Session is still healthy: we can advance the real slot +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); + +-- Test 4: copy_replication_slot with max_replication_slots exceeded. +-- We reduce max_replication_slots artificially by filling all remaining slots. +-- Instead, trigger an error by copying to an already-existing name. +DO $$ +BEGIN + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_t3'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +-- The original slot must still exist and be usable +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; + +-- Cleanup (slot is temporary, will drop on disconnect anyway, but be explicit) +SELECT pg_drop_replication_slot('regression_slot_t3'); diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c index 71fbaf72269..aa56e90bfab 100644 --- a/src/backend/replication/logical/logicalfuncs.c +++ b/src/backend/replication/logical/logicalfuncs.c @@ -197,10 +197,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin else end_of_wal = GetXLogReplayRecPtr(NULL); - ReplicationSlotAcquire(NameStr(*name), true, true); - PG_TRY(); { + ReplicationSlotAcquire(NameStr(*name), true, true); + /* restart at slot's confirmed_flush */ ctx = CreateDecodingContext(InvalidXLogRecPtr, options, @@ -320,6 +320,9 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin /* clear all timetravel entries */ InvalidateSystemCaches(); + if (MyReplicationSlot != NULL) + ReplicationSlotRelease(); + PG_RE_THROW(); } PG_END_TRY(); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 16fbd383735..acc643ac749 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -148,38 +148,55 @@ create_logical_replication_slot(char *name, char *plugin, temporary ? RS_TEMPORARY : RS_EPHEMERAL, two_phase, false, failover, false); - /* - * Ensure the logical decoding is enabled before initializing the logical - * decoding context. - */ - EnsureLogicalDecodingEnabled(); - Assert(IsLogicalDecodingEnabled()); + PG_TRY(); + { + /* + * Ensure the logical decoding is enabled before initializing the logical + * decoding context. + */ + EnsureLogicalDecodingEnabled(); + Assert(IsLogicalDecodingEnabled()); - /* - * Create logical decoding context to find start point or, if we don't - * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity. - * - * Note: when !find_startpoint this is still important, because it's at - * this point that the output plugin is validated. - */ - ctx = CreateInitDecodingContext(plugin, NIL, - false, /* just catalogs is OK */ - false, /* not repack */ - restart_lsn, - XL_ROUTINE(.page_read = read_local_xlog_page, - .segment_open = wal_segment_open, - .segment_close = wal_segment_close), - NULL, NULL, NULL); + /* + * Create logical decoding context to find start point or, if we don't + * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity. + * + * Note: when !find_startpoint this is still important, because it's at + * this point that the output plugin is validated. + */ + ctx = CreateInitDecodingContext(plugin, NIL, + false, /* just catalogs is OK */ + false, /* not repack */ + restart_lsn, + XL_ROUTINE(.page_read = read_local_xlog_page, + .segment_open = wal_segment_open, + .segment_close = wal_segment_close), + NULL, NULL, NULL); - /* - * If caller needs us to determine the decoding start point, do so now. - * This might take a while. - */ - if (find_startpoint) - DecodingContextFindStartpoint(ctx); + /* + * If caller needs us to determine the decoding start point, do so now. + * This might take a while. + */ + if (find_startpoint) + DecodingContextFindStartpoint(ctx); - /* don't need the decoding context anymore */ - FreeDecodingContext(ctx); + /* don't need the decoding context anymore */ + FreeDecodingContext(ctx); + } + PG_CATCH(); + { + /* + * Drop the slot on error. ReplicationSlotRelease() only drops + * RS_EPHEMERAL slots, so for RS_TEMPORARY slots we must explicitly + * call ReplicationSlotDropAcquired() to avoid leaving the slot + * behind (e.g. when the error is caught by a PL/pgSQL EXCEPTION + * handler that doesn't terminate the session). + */ + if (MyReplicationSlot != NULL) + ReplicationSlotDropAcquired(); + PG_RE_THROW(); + } + PG_END_TRY(); } /* @@ -566,49 +583,58 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS) else moveto = Min(moveto, GetXLogReplayRecPtr(NULL)); - /* Acquire the slot so we "own" it */ - ReplicationSlotAcquire(NameStr(*slotname), true, true); - - /* A slot whose restart_lsn has never been reserved cannot be advanced */ - if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) - ereport(ERROR, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("replication slot \"%s\" cannot be advanced", - NameStr(*slotname)), - errdetail("This slot has never previously reserved WAL, or it has been invalidated."))); + PG_TRY(); + { + /* Acquire the slot so we "own" it */ + ReplicationSlotAcquire(NameStr(*slotname), true, true); + /* A slot whose restart_lsn has never been reserved cannot be advanced */ + if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("replication slot \"%s\" cannot be advanced", + NameStr(*slotname)), + errdetail("This slot has never previously reserved WAL, or it has been invalidated."))); - /* - * Check if the slot is not moving backwards. Physical slots rely simply - * on restart_lsn as a minimum point, while logical slots have confirmed - * consumption up to confirmed_flush, meaning that in both cases data - * older than that is not available anymore. - */ - if (OidIsValid(MyReplicationSlot->data.database)) - minlsn = MyReplicationSlot->data.confirmed_flush; - else - minlsn = MyReplicationSlot->data.restart_lsn; + /* + * Check if the slot is not moving backwards. Physical slots rely simply + * on restart_lsn as a minimum point, while logical slots have confirmed + * consumption up to confirmed_flush, meaning that in both cases data + * older than that is not available anymore. + */ + if (OidIsValid(MyReplicationSlot->data.database)) + minlsn = MyReplicationSlot->data.confirmed_flush; + else + minlsn = MyReplicationSlot->data.restart_lsn; - if (moveto < minlsn) - ereport(ERROR, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X", - LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn)))); + if (moveto < minlsn) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X", + LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn)))); - /* Do the actual slot update, depending on the slot type */ - if (OidIsValid(MyReplicationSlot->data.database)) - endlsn = pg_logical_replication_slot_advance(moveto); - else - endlsn = pg_physical_replication_slot_advance(moveto); + /* Do the actual slot update, depending on the slot type */ + if (OidIsValid(MyReplicationSlot->data.database)) + endlsn = pg_logical_replication_slot_advance(moveto); + else + endlsn = pg_physical_replication_slot_advance(moveto); - values[0] = NameGetDatum(&MyReplicationSlot->data.name); - nulls[0] = false; + values[0] = NameGetDatum(&MyReplicationSlot->data.name); + nulls[0] = false; - /* - * Recompute the minimum LSN and xmin across all slots to adjust with the - * advancing potentially done. - */ - ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); + /* + * Recompute the minimum LSN and xmin across all slots to adjust with the + * advancing potentially done. + */ + ReplicationSlotsComputeRequiredXmin(false); + ReplicationSlotsComputeRequiredLSN(); + } + PG_CATCH(); + { + if (MyReplicationSlot != NULL) + ReplicationSlotRelease(); + PG_RE_THROW(); + } + PG_END_TRY(); ReplicationSlotRelease(); @@ -763,7 +789,13 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) /* * Update the destination slot to current values of the source slot; * recheck that the source slot is still the one we saw previously. + * + * Use PG_TRY to ensure we drop the destination slot if any validation + * error occurs. Without this, an error caught by a PL/pgSQL EXCEPTION + * handler would leave MyReplicationSlot set, crashing on the next slot + * operation. */ + PG_TRY(); { TransactionId copy_effective_xmin; TransactionId copy_effective_catalog_xmin; @@ -797,9 +829,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) * or the restart_lsn either is invalid or has gone backward. (The * restart_lsn could go backwards if the source slot is dropped and * copied from an older slot during installation.) - * - * Since erroring out will release and drop the destination slot we - * don't need to release it here. */ if (copy_restart_lsn < src_restart_lsn || src_islogical != copy_islogical || @@ -857,6 +886,18 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) } #endif } + PG_CATCH(); + { + /* + * Drop the newly-created destination slot on error. Same as in + * create_logical_replication_slot(): use ReplicationSlotDropAcquired() + * to handle both RS_EPHEMERAL and RS_TEMPORARY slots. + */ + if (MyReplicationSlot != NULL) + ReplicationSlotDropAcquired(); + PG_RE_THROW(); + } + PG_END_TRY(); /* target slot fully created, mark as persistent if needed */ if (logical_slot && !temporary) ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> 2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-22 09:15 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 2026-05-25 07:12 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> @ 2026-05-25 09:58 ` shveta malik <[email protected]> 2026-05-25 19:01 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: shveta malik @ 2026-05-25 09:58 UTC (permalink / raw) To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]> On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM <[email protected]> wrote: > > Hi > > On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote: >> >> Thanks for reporting the issue. I could reproduce the same issue with >> all these as well: >> >> pg_logical_slot_peek_changes >> pg_logical_slot_get_binary_changes >> pg_logical_slot_peek_binary_changes > > > Please find the attached v2 patch that addressed these three cases as well. > Thank You for addressuing these cases. A few comments: 1) +-- Test 2: session remains usable after the error (MyReplicationSlot cleared) It shoudl be part of 'Test 1' itself and thus should not be named as 'Test 2' 2) -------- +-- Test 4: copy_replication_slot with max_replication_slots exceeded. +-- We reduce max_replication_slots artificially by filling all remaining slots. +-- Instead, trigger an error by copying to an already-existing name. +DO $$ +BEGIN + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_t3'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +-- The original slot must still exist and be usable +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; ----------- I don't think we can hit the Assert with above test (at-least I could not). Since creation of slot itself will fail as the slot with same-name already exists, MyReplicationSlot will never be set and thus Assert will not be hit. A better testcase will be below which fails during LoadOutputPlugin() after slot-creation and MyReplicationSlot is set already. SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding'); DO $$ BEGIN PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot', false, 'nonexistent_plugin'); EXCEPTION WHEN others THEN RAISE NOTICE 'caught: %', SQLERRM; END $$; SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL); 3) So overall these are the problematic APIs: pg_create_logical_replication_slot pg_replication_slot_advance pg_copy_logical_replication_slot pg_logical_slot_peek_binary_changes pg_logical_slot_peek_changes pg_logical_slot_get_changes pg_logical_slot_get_binary_changes First 3 are are mutually exclusive fixes fow which we have added testcases. Last 4 are addressed by fixing common function pg_logical_slot_get_changes_guts(). I think we should add a test case for at-least any one of these APIs to cover pg_logical_slot_get_changes_guts(). Thanks. Shveta ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> 2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-22 09:15 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 2026-05-25 07:12 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-25 09:58 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> @ 2026-05-25 19:01 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-26 04:36 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: SATYANARAYANA NARLAPURAM @ 2026-05-25 19:01 UTC (permalink / raw) To: shveta malik <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]> Hi, On Mon, May 25, 2026 at 2:58 AM shveta malik <[email protected]> wrote: > On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM > <[email protected]> wrote: > > > > Hi > > > > On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> > wrote: > >> > >> Thanks for reporting the issue. I could reproduce the same issue with > >> all these as well: > >> > >> pg_logical_slot_peek_changes > >> pg_logical_slot_get_binary_changes > >> pg_logical_slot_peek_binary_changes > > > > > > Please find the attached v2 patch that addressed these three cases as > well. > > > > Thank You for addressuing these cases. A few comments: > > 1) > > +-- Test 2: session remains usable after the error (MyReplicationSlot > cleared) > > It shoudl be part of 'Test 1' itself and thus should not be named as 'Test > 2' > > 2) > -------- > +-- Test 4: copy_replication_slot with max_replication_slots exceeded. > +-- We reduce max_replication_slots artificially by filling all remaining > slots. > +-- Instead, trigger an error by copying to an already-existing name. > +DO $$ > +BEGIN > + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', > 'regression_slot_t3'); > +EXCEPTION WHEN OTHERS THEN > + RAISE NOTICE 'caught: %', SQLERRM; > +END; > +$$; > +-- The original slot must still exist and be usable > +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots > + WHERE slot_name = 'regression_slot_t3'; > ----------- > > I don't think we can hit the Assert with above test (at-least I could > not). Since creation of slot itself will fail as the slot with > same-name already exists, MyReplicationSlot will never be set and thus > Assert will not be hit. A better testcase will be below which fails > during LoadOutputPlugin() after slot-creation and MyReplicationSlot is > set already. > > SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding'); > > DO $$ > BEGIN > PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot', > false, 'nonexistent_plugin'); > EXCEPTION WHEN others THEN > RAISE NOTICE 'caught: %', SQLERRM; > END $$; > > SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL); > > 3) > So overall these are the problematic APIs: > > pg_create_logical_replication_slot > pg_replication_slot_advance > pg_copy_logical_replication_slot > pg_logical_slot_peek_binary_changes > pg_logical_slot_peek_changes > pg_logical_slot_get_changes > pg_logical_slot_get_binary_changes > > First 3 are are mutually exclusive fixes fow which we have added > testcases. Last 4 are addressed by fixing common function > pg_logical_slot_get_changes_guts(). I think we should add a test case > for at-least any one of these APIs to cover > pg_logical_slot_get_changes_guts(). > Thanks for reviewing. Please review the attached v3 patch. Thanks, Satya Attachments: [application/octet-stream] v3-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch (15.7K, 3-v3-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch) download | inline diff: diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out index 7de03c79f6f..962a4277b75 100644 --- a/contrib/test_decoding/expected/slot.out +++ b/contrib/test_decoding/expected/slot.out @@ -466,3 +466,122 @@ SELECT pg_drop_replication_slot('physical_slot'); (1 row) +-- +-- Test that replication slots are properly released or dropped on error, +-- even when the error is caught by a PL/pgSQL EXCEPTION handler (which +-- doesn't terminate the session). +-- +-- pg_create_logical_replication_slot: error during slot creation should +-- drop the slot. +DO $$ +BEGIN + PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +NOTICE: caught: could not access file "nonexistent_plugin_xyz": No such file or directory +SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots + WHERE slot_name = 'regression_slot_error'; + slot_was_dropped +------------------ + t +(1 row) + +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true); + ?column? +---------- + init +(1 row) + +SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; + slot_exists +------------- + t +(1 row) + +-- pg_replication_slot_advance: error after acquiring the slot should +-- release it so the session stays usable. +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); + slot_name +-------------------- + regression_slot_t3 +(1 row) + +DO $$ +BEGIN + PERFORM pg_replication_slot_advance('regression_slot_t3', '0/1'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught expected error'; +END; +$$; +NOTICE: caught expected error +-- the session is still healthy +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); + slot_name +-------------------- + regression_slot_t3 +(1 row) + +-- pg_copy_logical_replication_slot: error after creating the destination +-- slot should drop it. +DO $$ +BEGIN + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_dst', false, 'nonexistent_plugin_xyz'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +NOTICE: caught: could not access file "nonexistent_plugin_xyz": No such file or directory +-- the destination slot must not exist (it was dropped on error) +SELECT count(*) = 0 AS dst_slot_dropped FROM pg_replication_slots + WHERE slot_name = 'regression_slot_dst'; + dst_slot_dropped +------------------ + t +(1 row) + +-- the session is still usable +SELECT count(*) >= 0 AS changes_ok FROM pg_logical_slot_get_changes('regression_slot_t3', NULL, NULL); + changes_ok +------------ + t +(1 row) + +-- pg_logical_slot_get_changes: error after acquiring the slot should +-- release it. +SELECT 'init' FROM pg_create_physical_replication_slot('regression_slot_phy', true); + ?column? +---------- + init +(1 row) + +DO $$ +BEGIN + PERFORM pg_logical_slot_get_changes('regression_slot_phy', NULL, NULL); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +NOTICE: caught: cannot use physical replication slot for logical decoding +-- the session is still usable +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); + slot_name +-------------------- + regression_slot_t3 +(1 row) + +SELECT pg_drop_replication_slot('regression_slot_phy'); + pg_drop_replication_slot +-------------------------- + +(1 row) + +-- cleanup +SELECT pg_drop_replication_slot('regression_slot_t3'); + pg_drop_replication_slot +-------------------------- + +(1 row) + diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql index 580e3ae3bef..1b327f518aa 100644 --- a/contrib/test_decoding/sql/slot.sql +++ b/contrib/test_decoding/sql/slot.sql @@ -190,3 +190,70 @@ SELECT pg_drop_replication_slot('failover_true_slot'); SELECT pg_drop_replication_slot('failover_false_slot'); SELECT pg_drop_replication_slot('failover_default_slot'); SELECT pg_drop_replication_slot('physical_slot'); + +-- +-- Test that replication slots are properly released or dropped on error, +-- even when the error is caught by a PL/pgSQL EXCEPTION handler (which +-- doesn't terminate the session). +-- + +-- pg_create_logical_replication_slot: error during slot creation should +-- drop the slot. +DO $$ +BEGIN + PERFORM pg_create_logical_replication_slot('regression_slot_error', 'nonexistent_plugin_xyz', true); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +SELECT count(*) = 0 AS slot_was_dropped FROM pg_replication_slots + WHERE slot_name = 'regression_slot_error'; + +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true); +SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; + +-- pg_replication_slot_advance: error after acquiring the slot should +-- release it so the session stays usable. +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); +DO $$ +BEGIN + PERFORM pg_replication_slot_advance('regression_slot_t3', '0/1'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught expected error'; +END; +$$; +-- the session is still healthy +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); + +-- pg_copy_logical_replication_slot: error after creating the destination +-- slot should drop it. +DO $$ +BEGIN + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', 'regression_slot_dst', false, 'nonexistent_plugin_xyz'); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +-- the destination slot must not exist (it was dropped on error) +SELECT count(*) = 0 AS dst_slot_dropped FROM pg_replication_slots + WHERE slot_name = 'regression_slot_dst'; +-- the session is still usable +SELECT count(*) >= 0 AS changes_ok FROM pg_logical_slot_get_changes('regression_slot_t3', NULL, NULL); + +-- pg_logical_slot_get_changes: error after acquiring the slot should +-- release it. +SELECT 'init' FROM pg_create_physical_replication_slot('regression_slot_phy', true); +DO $$ +BEGIN + PERFORM pg_logical_slot_get_changes('regression_slot_phy', NULL, NULL); +EXCEPTION WHEN OTHERS THEN + RAISE NOTICE 'caught: %', SQLERRM; +END; +$$; +-- the session is still usable +SELECT slot_name FROM pg_replication_slot_advance('regression_slot_t3', pg_current_wal_lsn()); +SELECT pg_drop_replication_slot('regression_slot_phy'); + +-- cleanup +SELECT pg_drop_replication_slot('regression_slot_t3'); diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c index 71fbaf72269..aa56e90bfab 100644 --- a/src/backend/replication/logical/logicalfuncs.c +++ b/src/backend/replication/logical/logicalfuncs.c @@ -197,10 +197,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin else end_of_wal = GetXLogReplayRecPtr(NULL); - ReplicationSlotAcquire(NameStr(*name), true, true); - PG_TRY(); { + ReplicationSlotAcquire(NameStr(*name), true, true); + /* restart at slot's confirmed_flush */ ctx = CreateDecodingContext(InvalidXLogRecPtr, options, @@ -320,6 +320,9 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin /* clear all timetravel entries */ InvalidateSystemCaches(); + if (MyReplicationSlot != NULL) + ReplicationSlotRelease(); + PG_RE_THROW(); } PG_END_TRY(); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 16fbd383735..acc643ac749 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -148,38 +148,55 @@ create_logical_replication_slot(char *name, char *plugin, temporary ? RS_TEMPORARY : RS_EPHEMERAL, two_phase, false, failover, false); - /* - * Ensure the logical decoding is enabled before initializing the logical - * decoding context. - */ - EnsureLogicalDecodingEnabled(); - Assert(IsLogicalDecodingEnabled()); + PG_TRY(); + { + /* + * Ensure the logical decoding is enabled before initializing the logical + * decoding context. + */ + EnsureLogicalDecodingEnabled(); + Assert(IsLogicalDecodingEnabled()); - /* - * Create logical decoding context to find start point or, if we don't - * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity. - * - * Note: when !find_startpoint this is still important, because it's at - * this point that the output plugin is validated. - */ - ctx = CreateInitDecodingContext(plugin, NIL, - false, /* just catalogs is OK */ - false, /* not repack */ - restart_lsn, - XL_ROUTINE(.page_read = read_local_xlog_page, - .segment_open = wal_segment_open, - .segment_close = wal_segment_close), - NULL, NULL, NULL); + /* + * Create logical decoding context to find start point or, if we don't + * need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity. + * + * Note: when !find_startpoint this is still important, because it's at + * this point that the output plugin is validated. + */ + ctx = CreateInitDecodingContext(plugin, NIL, + false, /* just catalogs is OK */ + false, /* not repack */ + restart_lsn, + XL_ROUTINE(.page_read = read_local_xlog_page, + .segment_open = wal_segment_open, + .segment_close = wal_segment_close), + NULL, NULL, NULL); - /* - * If caller needs us to determine the decoding start point, do so now. - * This might take a while. - */ - if (find_startpoint) - DecodingContextFindStartpoint(ctx); + /* + * If caller needs us to determine the decoding start point, do so now. + * This might take a while. + */ + if (find_startpoint) + DecodingContextFindStartpoint(ctx); - /* don't need the decoding context anymore */ - FreeDecodingContext(ctx); + /* don't need the decoding context anymore */ + FreeDecodingContext(ctx); + } + PG_CATCH(); + { + /* + * Drop the slot on error. ReplicationSlotRelease() only drops + * RS_EPHEMERAL slots, so for RS_TEMPORARY slots we must explicitly + * call ReplicationSlotDropAcquired() to avoid leaving the slot + * behind (e.g. when the error is caught by a PL/pgSQL EXCEPTION + * handler that doesn't terminate the session). + */ + if (MyReplicationSlot != NULL) + ReplicationSlotDropAcquired(); + PG_RE_THROW(); + } + PG_END_TRY(); } /* @@ -566,49 +583,58 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS) else moveto = Min(moveto, GetXLogReplayRecPtr(NULL)); - /* Acquire the slot so we "own" it */ - ReplicationSlotAcquire(NameStr(*slotname), true, true); - - /* A slot whose restart_lsn has never been reserved cannot be advanced */ - if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) - ereport(ERROR, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("replication slot \"%s\" cannot be advanced", - NameStr(*slotname)), - errdetail("This slot has never previously reserved WAL, or it has been invalidated."))); + PG_TRY(); + { + /* Acquire the slot so we "own" it */ + ReplicationSlotAcquire(NameStr(*slotname), true, true); + /* A slot whose restart_lsn has never been reserved cannot be advanced */ + if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("replication slot \"%s\" cannot be advanced", + NameStr(*slotname)), + errdetail("This slot has never previously reserved WAL, or it has been invalidated."))); - /* - * Check if the slot is not moving backwards. Physical slots rely simply - * on restart_lsn as a minimum point, while logical slots have confirmed - * consumption up to confirmed_flush, meaning that in both cases data - * older than that is not available anymore. - */ - if (OidIsValid(MyReplicationSlot->data.database)) - minlsn = MyReplicationSlot->data.confirmed_flush; - else - minlsn = MyReplicationSlot->data.restart_lsn; + /* + * Check if the slot is not moving backwards. Physical slots rely simply + * on restart_lsn as a minimum point, while logical slots have confirmed + * consumption up to confirmed_flush, meaning that in both cases data + * older than that is not available anymore. + */ + if (OidIsValid(MyReplicationSlot->data.database)) + minlsn = MyReplicationSlot->data.confirmed_flush; + else + minlsn = MyReplicationSlot->data.restart_lsn; - if (moveto < minlsn) - ereport(ERROR, - (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X", - LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn)))); + if (moveto < minlsn) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot advance replication slot to %X/%08X, minimum is %X/%08X", + LSN_FORMAT_ARGS(moveto), LSN_FORMAT_ARGS(minlsn)))); - /* Do the actual slot update, depending on the slot type */ - if (OidIsValid(MyReplicationSlot->data.database)) - endlsn = pg_logical_replication_slot_advance(moveto); - else - endlsn = pg_physical_replication_slot_advance(moveto); + /* Do the actual slot update, depending on the slot type */ + if (OidIsValid(MyReplicationSlot->data.database)) + endlsn = pg_logical_replication_slot_advance(moveto); + else + endlsn = pg_physical_replication_slot_advance(moveto); - values[0] = NameGetDatum(&MyReplicationSlot->data.name); - nulls[0] = false; + values[0] = NameGetDatum(&MyReplicationSlot->data.name); + nulls[0] = false; - /* - * Recompute the minimum LSN and xmin across all slots to adjust with the - * advancing potentially done. - */ - ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); + /* + * Recompute the minimum LSN and xmin across all slots to adjust with the + * advancing potentially done. + */ + ReplicationSlotsComputeRequiredXmin(false); + ReplicationSlotsComputeRequiredLSN(); + } + PG_CATCH(); + { + if (MyReplicationSlot != NULL) + ReplicationSlotRelease(); + PG_RE_THROW(); + } + PG_END_TRY(); ReplicationSlotRelease(); @@ -763,7 +789,13 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) /* * Update the destination slot to current values of the source slot; * recheck that the source slot is still the one we saw previously. + * + * Use PG_TRY to ensure we drop the destination slot if any validation + * error occurs. Without this, an error caught by a PL/pgSQL EXCEPTION + * handler would leave MyReplicationSlot set, crashing on the next slot + * operation. */ + PG_TRY(); { TransactionId copy_effective_xmin; TransactionId copy_effective_catalog_xmin; @@ -797,9 +829,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) * or the restart_lsn either is invalid or has gone backward. (The * restart_lsn could go backwards if the source slot is dropped and * copied from an older slot during installation.) - * - * Since erroring out will release and drop the destination slot we - * don't need to release it here. */ if (copy_restart_lsn < src_restart_lsn || src_islogical != copy_islogical || @@ -857,6 +886,18 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) } #endif } + PG_CATCH(); + { + /* + * Drop the newly-created destination slot on error. Same as in + * create_logical_replication_slot(): use ReplicationSlotDropAcquired() + * to handle both RS_EPHEMERAL and RS_TEMPORARY slots. + */ + if (MyReplicationSlot != NULL) + ReplicationSlotDropAcquired(); + PG_RE_THROW(); + } + PG_END_TRY(); /* target slot fully created, mark as persistent if needed */ if (logical_slot && !temporary) ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> 2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-22 09:15 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 2026-05-25 07:12 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-25 09:58 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 2026-05-25 19:01 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> @ 2026-05-26 04:36 ` shveta malik <[email protected]> 2026-05-26 05:50 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: shveta malik @ 2026-05-26 04:36 UTC (permalink / raw) To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]> On Tue, May 26, 2026 at 12:31 AM SATYANARAYANA NARLAPURAM <[email protected]> wrote: > > Hi, > > On Mon, May 25, 2026 at 2:58 AM shveta malik <[email protected]> wrote: >> >> On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM >> <[email protected]> wrote: >> > >> > Hi >> > >> > On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote: >> >> >> >> Thanks for reporting the issue. I could reproduce the same issue with >> >> all these as well: >> >> >> >> pg_logical_slot_peek_changes >> >> pg_logical_slot_get_binary_changes >> >> pg_logical_slot_peek_binary_changes >> > >> > >> > Please find the attached v2 patch that addressed these three cases as well. >> > >> >> Thank You for addressuing these cases. A few comments: >> >> 1) >> >> +-- Test 2: session remains usable after the error (MyReplicationSlot cleared) >> >> It shoudl be part of 'Test 1' itself and thus should not be named as 'Test 2' >> >> 2) >> -------- >> +-- Test 4: copy_replication_slot with max_replication_slots exceeded. >> +-- We reduce max_replication_slots artificially by filling all remaining slots. >> +-- Instead, trigger an error by copying to an already-existing name. >> +DO $$ >> +BEGIN >> + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', >> 'regression_slot_t3'); >> +EXCEPTION WHEN OTHERS THEN >> + RAISE NOTICE 'caught: %', SQLERRM; >> +END; >> +$$; >> +-- The original slot must still exist and be usable >> +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots >> + WHERE slot_name = 'regression_slot_t3'; >> ----------- >> >> I don't think we can hit the Assert with above test (at-least I could >> not). Since creation of slot itself will fail as the slot with >> same-name already exists, MyReplicationSlot will never be set and thus >> Assert will not be hit. A better testcase will be below which fails >> during LoadOutputPlugin() after slot-creation and MyReplicationSlot is >> set already. >> >> SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding'); >> >> DO $$ >> BEGIN >> PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot', >> false, 'nonexistent_plugin'); >> EXCEPTION WHEN others THEN >> RAISE NOTICE 'caught: %', SQLERRM; >> END $$; >> >> SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL); >> >> 3) >> So overall these are the problematic APIs: >> >> pg_create_logical_replication_slot >> pg_replication_slot_advance >> pg_copy_logical_replication_slot >> pg_logical_slot_peek_binary_changes >> pg_logical_slot_peek_changes >> pg_logical_slot_get_changes >> pg_logical_slot_get_binary_changes >> >> First 3 are are mutually exclusive fixes fow which we have added >> testcases. Last 4 are addressed by fixing common function >> pg_logical_slot_get_changes_guts(). I think we should add a test case >> for at-least any one of these APIs to cover >> pg_logical_slot_get_changes_guts(). > > > Thanks for reviewing. Please review the attached v3 patch. > A few trivial things: 1) pg_replication_slot_advance: + PG_TRY(); + { + /* Acquire the slot so we "own" it */ + ReplicationSlotAcquire(NameStr(*slotname), true, true); + /* A slot whose restart_lsn has never been reserved cannot be advanced */ + if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) We can have a blank line after ReplicationSlotAcquire for better readability. 2) +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t3', 'test_decoding', true); +SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots + WHERE slot_name = 'regression_slot_t3'; The intent is not clear why are we checking existence of regression_slot_t3? I think we can skip it (or else add a comment if really needed). The success of previous pg_create_logical_replication_slot is enough to confirm that session is healthy to run other slot related queries. 3) +SELECT pg_drop_replication_slot('regression_slot_phy'); + +-- cleanup +SELECT pg_drop_replication_slot('regression_slot_t3'); We can move drop of 'regression_slot_phy' too under '-- cleanup' ~~ I have no further comments other than the trivial things mentioned above. thanks Shveta ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> 2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-22 09:15 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 2026-05-25 07:12 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-25 09:58 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> 2026-05-25 19:01 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-26 04:36 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <[email protected]> @ 2026-05-26 05:50 ` shveta malik <[email protected]> 0 siblings, 0 replies; 8+ messages in thread From: shveta malik @ 2026-05-26 05:50 UTC (permalink / raw) To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: vignesh C <[email protected]>; Fujii Masao <[email protected]>; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]> On Tue, May 26, 2026 at 10:06 AM shveta malik <[email protected]> wrote: > > On Tue, May 26, 2026 at 12:31 AM SATYANARAYANA NARLAPURAM > <[email protected]> wrote: > > > > Hi, > > > > On Mon, May 25, 2026 at 2:58 AM shveta malik <[email protected]> wrote: > >> > >> On Mon, May 25, 2026 at 12:42 PM SATYANARAYANA NARLAPURAM > >> <[email protected]> wrote: > >> > > >> > Hi > >> > > >> > On Fri, May 22, 2026 at 2:16 AM shveta malik <[email protected]> wrote: > >> >> > >> >> Thanks for reporting the issue. I could reproduce the same issue with > >> >> all these as well: > >> >> > >> >> pg_logical_slot_peek_changes > >> >> pg_logical_slot_get_binary_changes > >> >> pg_logical_slot_peek_binary_changes > >> > > >> > > >> > Please find the attached v2 patch that addressed these three cases as well. > >> > > >> > >> Thank You for addressuing these cases. A few comments: > >> > >> 1) > >> > >> +-- Test 2: session remains usable after the error (MyReplicationSlot cleared) > >> > >> It shoudl be part of 'Test 1' itself and thus should not be named as 'Test 2' > >> > >> 2) > >> -------- > >> +-- Test 4: copy_replication_slot with max_replication_slots exceeded. > >> +-- We reduce max_replication_slots artificially by filling all remaining slots. > >> +-- Instead, trigger an error by copying to an already-existing name. > >> +DO $$ > >> +BEGIN > >> + PERFORM pg_copy_logical_replication_slot('regression_slot_t3', > >> 'regression_slot_t3'); > >> +EXCEPTION WHEN OTHERS THEN > >> + RAISE NOTICE 'caught: %', SQLERRM; > >> +END; > >> +$$; > >> +-- The original slot must still exist and be usable > >> +SELECT count(*) = 1 AS orig_slot_ok FROM pg_replication_slots > >> + WHERE slot_name = 'regression_slot_t3'; > >> ----------- > >> > >> I don't think we can hit the Assert with above test (at-least I could > >> not). Since creation of slot itself will fail as the slot with > >> same-name already exists, MyReplicationSlot will never be set and thus > >> Assert will not be hit. A better testcase will be below which fails > >> during LoadOutputPlugin() after slot-creation and MyReplicationSlot is > >> set already. > >> > >> SELECT pg_create_logical_replication_slot('src_slot', 'test_decoding'); > >> > >> DO $$ > >> BEGIN > >> PERFORM pg_copy_logical_replication_slot('src_slot', 'dst_slot', > >> false, 'nonexistent_plugin'); > >> EXCEPTION WHEN others THEN > >> RAISE NOTICE 'caught: %', SQLERRM; > >> END $$; > >> > >> SELECT count(*) FROM pg_logical_slot_get_changes('src_slot', NULL, NULL); > >> > >> 3) > >> So overall these are the problematic APIs: > >> > >> pg_create_logical_replication_slot > >> pg_replication_slot_advance > >> pg_copy_logical_replication_slot > >> pg_logical_slot_peek_binary_changes > >> pg_logical_slot_peek_changes > >> pg_logical_slot_get_changes > >> pg_logical_slot_get_binary_changes > >> > >> First 3 are are mutually exclusive fixes fow which we have added > >> testcases. Last 4 are addressed by fixing common function > >> pg_logical_slot_get_changes_guts(). I think we should add a test case > >> for at-least any one of these APIs to cover > >> pg_logical_slot_get_changes_guts(). > > > > > > Thanks for reviewing. Please review the attached v3 patch. > > > > A few trivial things: > > 1) > pg_replication_slot_advance: > + PG_TRY(); > + { > + /* Acquire the slot so we "own" it */ > + ReplicationSlotAcquire(NameStr(*slotname), true, true); > + /* A slot whose restart_lsn has never been reserved cannot be advanced */ > + if (!XLogRecPtrIsValid(MyReplicationSlot->data.restart_lsn)) > > > We can have a blank line after ReplicationSlotAcquire for better readability. > > 2) > > +SELECT 'init' FROM > pg_create_logical_replication_slot('regression_slot_t3', > 'test_decoding', true); > +SELECT count(*) = 1 AS slot_exists FROM pg_replication_slots > + WHERE slot_name = 'regression_slot_t3'; > > The intent is not clear why are we checking existence of > regression_slot_t3? I think we can skip it (or else add a comment if > really needed). The success of previous > pg_create_logical_replication_slot is enough to confirm that session > is healthy to run other slot related queries. > > 3) > +SELECT pg_drop_replication_slot('regression_slot_phy'); > + > +-- cleanup > +SELECT pg_drop_replication_slot('regression_slot_t3'); > > We can move drop of 'regression_slot_phy' too under '-- cleanup' > > ~~ > > I have no further comments other than the trivial things mentioned above. > Missed to inform this earlier, I am not able to apply any version of the patches shared so far with 'git am'. It gives error, 'patch -p1' works. git am v3-0001-Release-replication-slot-on-error-in-slot-SQL-functions.patch Patch format detection failed. thanks Shveta ^ permalink raw reply [nested|flat] 8+ messages in thread
end of thread, other threads:[~2026-05-26 05:50 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <[email protected]> 2026-05-21 14:38 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-22 09:15 ` shveta malik <[email protected]> 2026-05-25 07:12 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-25 09:58 ` shveta malik <[email protected]> 2026-05-25 19:01 ` SATYANARAYANA NARLAPURAM <[email protected]> 2026-05-26 04:36 ` shveta malik <[email protected]> 2026-05-26 05:50 ` shveta malik <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox