public inbox for [email protected]  
help / color / mirror / Atom feed
Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
2+ messages / 2 participants
[nested] [flat]

* Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
@ 2024-02-22 07:00  Kyotaro Horiguchi <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Kyotaro Horiguchi @ 2024-02-22 07:00 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]; [email protected]

At Thu, 22 Feb 2024 14:38:15 +0900 (JST), Kyotaro Horiguchi <[email protected]> wrote in 
> At Thu, 22 Feb 2024 11:49:29 +0800, Tender Wang <[email protected]> wrote in 
> > Changing the behavior of tuplestore can work for this issue,  but I'm not
> > sure if this change will affect other components which depend on
> > BufFile(like nodeMaterial)

In the case of nodeMaterial, tuplestore is simply discarded and
underlying files are removed without flushing by the resource owner
mechanism. On the other hand, in the case of this bug report, the
files in the problem tuplestore are not managed by a resource
owner. As a result, PortalDrop have to explicitly call tuplestore_end
to release resources used by the tuplestore.

> 	/*
> 	 * Delete tuplestore if present.  We should do this even under error
> 	 * conditions; since the tuplestore would have been using cross-
> 	 * transaction storage, its temp files need to be explicitly deleted.
> 	 */
> 	if (portal->holdStore)
> 	{
> 		MemoryContext oldcontext;
> 
> 		oldcontext = MemoryContextSwitchTo(portal->holdContext);
> 		tuplestore_end(portal->holdStore);

As far as I can see, there are no caller sites to tuplestore_end in
the abort path or any error handling path.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center






^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
@ 2026-05-06 11:26  Alex Masterov <[email protected]>
  parent: Kyotaro Horiguchi <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Alex Masterov @ 2026-05-06 11:26 UTC (permalink / raw)
  To: Kyotaro Horiguchi <[email protected]>; +Cc: [email protected]; [email protected]; [email protected]

Hi,

 While running tests with Neon, we discovered an assertion failure that can
 occur during re-entrant AbortTransaction() calls.

 The issue arises when an error occurs during AbortTransaction() after
 ProcArrayEndTransaction() has cleared MyProc->xid. If another error is
raised
 during cleanup (e.g., in AtEOXact_Inval()), the PostgresMain error handler
 invokes AbortCurrentTransaction() again. The second AbortTransaction() call
 reads a still-valid s->transactionId (CleanupTransaction() hasn't run yet)
 and passes it to ProcArrayEndTransaction(), which then hits:

   Assert(TransactionIdIsValid(proc->xid))

 because MyProc->xid was already cleared by the first call.

 The attached patch fixes this by checking MyProc->xid validity before
calling
 RecordTransactionAbort() and only passing a valid latestXid when
appropriate.

 **Reproduction:**
 This can be reproduced reliably using the injection_points extension:

 1. Attach the injection point:
   SELECT injection_points_attach('transaction-end-process-inval', 'error');
 2. Create invalidation messages: CREATE TABLE test(id int);
 3. Trigger abort: ROLLBACK;

 Without the fix: assertion crash on ProcArrayEndTransaction()
 With the fix applied: the script will panic with "ERRORDATA_STACK_SIZE
exceeded"
 due to re-entrant error handling, demonstrating that the assertion is
resolved.

 I've included a reproduction script and the fix that clearly shows both
behaviors.

 **Files attached:**
 - 0001-xact-Prevent-assertion-failure-in-re-entrant-Abort.patch
 - repro_minimal_panic_if_fixed.sh

 Thoughts?

 Best regards,
 Alexey

>
>
>
>
>
>
>


Attachments:

  [application/octet-stream] 0001-xact-Prevent-assertion-failure-in-re-entrant-Abort.patch (1.6K, 3-0001-xact-Prevent-assertion-failure-in-re-entrant-Abort.patch)
  download | inline diff:
commit d0a862d09016b2c02811821644688cee2c33089d
Author: Alexey Masterov <[email protected]>
Date:   Wed May 6 11:01:31 2026 +0200

    xact: Prevent assertion failure in re-entrant AbortTransaction calls
    
    When an error occurs during AbortTransaction() after ProcArrayEndTransaction()
    has cleared MyProc->xid, subsequent abort attempts could pass a valid
    latestXid to ProcArrayEndTransaction() while proc->xid is already invalid,
    triggering Assert(TransactionIdIsValid(proc->xid)).
    
    Avoid this by checking MyProc->xid validity before calling
    RecordTransactionAbort() and only passing a valid latestXid when appropriate.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 48bc90c9673..c7819d7d9d4 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2979,7 +2979,20 @@ AbortTransaction(void)
 	 * record.
 	 */
 	if (!is_parallel_worker)
-		latestXid = RecordTransactionAbort(false);
+	{
+		/*
+		 * Re-entrant abort: if another ERROR is raised during AbortTransaction()
+		 * after ProcArrayEndTransaction() cleared our advertised XID, local
+		 * transaction state can still report an XID until CleanupTransaction().
+		 * Do not run RecordTransactionAbort again (duplicate WAL/clog) and do
+		 * not pass a valid latestXid to ProcArrayEndTransaction with proc->xid
+		 * already cleared.
+		 */
+		if (TransactionIdIsValid(MyProc->xid))
+			latestXid = RecordTransactionAbort(false);
+		else
+			latestXid = InvalidTransactionId;
+	}
 	else
 	{
 		latestXid = InvalidTransactionId;


  [text/x-sh] repro_minimal_panic_if_fixed.sh (1.1K, 4-repro_minimal_panic_if_fixed.sh)
  download | inline:
#!/usr/bin/env bash
#
# repro_minimal_panic_if_fixed.sh -- Minimal reproduction with raw behavior
#
# This script demonstrates the pure difference between original bug and fixed version:
#   โ€ข Original bug: TRAP assertion failure (immediate crash)
#   โ€ข Bug fixed: PANIC ERRORDATA_STACK_SIZE exceeded (infinite loop)
#
# No mitigation, no auto-detach - shows the raw behavior difference.

set -euo pipefail

PSQL=${PSQL:-psql}

echo "โšก Minimal Re-entrant AbortTransaction Reproduction"
echo "================================================"
echo ""
echo "Expected behavior:"
echo "  ๐Ÿ› Original bug: TRAP assertion failure"
echo "  ๐Ÿ”ง Bug fixed: PANIC ERRORDATA_STACK_SIZE exceeded"
echo ""

$PSQL -c "CREATE EXTENSION IF NOT EXISTS injection_points;" >/dev/null

echo "๐Ÿงช Executing minimal reproduction (raw behavior)..."

$PSQL <<'SQL'
SELECT injection_points_set_local();
SELECT injection_points_attach('transaction-end-process-inval', 'error');

BEGIN;
CREATE TABLE _minimal_repro (id int);
ROLLBACK;
SQL

echo "๐Ÿ˜ Clean completion - bug conditions not met or injection point didn't fire"

^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2026-05-06 11:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-02-22 07:00 Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded Kyotaro Horiguchi <[email protected]>
2026-05-06 11:26 ` Alex Masterov <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox