Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Alex Masterov <[email protected]>
To: Kyotaro Horiguchi <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
Date: Wed, 6 May 2026 13:26:51 +0200
Message-ID: <CA+8z=zu-mxXuBtnTVNcOzaGJhgeKSTyeAsK+AdRMr5_XZM1SDw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAHewXN=ZEL1Rvw2fJmt8k3sO_3fb58gijv7GR-JnY8ujgGP0UA@mail.gmail.com>
	<[email protected]>
	<[email protected]>

Hi,

 While running tests with Neon, we discovered an assertion failure that can
 occur during re-entrant AbortTransaction() calls.

 The issue arises when an error occurs during AbortTransaction() after
 ProcArrayEndTransaction() has cleared MyProc->xid. If another error is
raised
 during cleanup (e.g., in AtEOXact_Inval()), the PostgresMain error handler
 invokes AbortCurrentTransaction() again. The second AbortTransaction() call
 reads a still-valid s->transactionId (CleanupTransaction() hasn't run yet)
 and passes it to ProcArrayEndTransaction(), which then hits:

   Assert(TransactionIdIsValid(proc->xid))

 because MyProc->xid was already cleared by the first call.

 The attached patch fixes this by checking MyProc->xid validity before
calling
 RecordTransactionAbort() and only passing a valid latestXid when
appropriate.

 **Reproduction:**
 This can be reproduced reliably using the injection_points extension:

 1. Attach the injection point:
   SELECT injection_points_attach('transaction-end-process-inval', 'error');
 2. Create invalidation messages: CREATE TABLE test(id int);
 3. Trigger abort: ROLLBACK;

 Without the fix: assertion crash on ProcArrayEndTransaction()
 With the fix applied: the script will panic with "ERRORDATA_STACK_SIZE
exceeded"
 due to re-entrant error handling, demonstrating that the assertion is
resolved.

 I've included a reproduction script and the fix that clearly shows both
behaviors.

 **Files attached:**
 - 0001-xact-Prevent-assertion-failure-in-re-entrant-Abort.patch
 - repro_minimal_panic_if_fixed.sh

 Thoughts?

 Best regards,
 Alexey

>
>
>
>
>
>
>

Attachments:

  [application/octet-stream] 0001-xact-Prevent-assertion-failure-in-re-entrant-Abort.patch (1.6K, 3-0001-xact-Prevent-assertion-failure-in-re-entrant-Abort.patch)
  download | inline diff:
commit d0a862d09016b2c02811821644688cee2c33089d
Author: Alexey Masterov <[email protected]>
Date:   Wed May 6 11:01:31 2026 +0200

    xact: Prevent assertion failure in re-entrant AbortTransaction calls

    When an error occurs during AbortTransaction() after ProcArrayEndTransaction()
    has cleared MyProc->xid, subsequent abort attempts could pass a valid
    latestXid to ProcArrayEndTransaction() while proc->xid is already invalid,
    triggering Assert(TransactionIdIsValid(proc->xid)).

    Avoid this by checking MyProc->xid validity before calling
    RecordTransactionAbort() and only passing a valid latestXid when appropriate.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 48bc90c9673..c7819d7d9d4 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2979,7 +2979,20 @@ AbortTransaction(void)
 	 * record.
 	 */
 	if (!is_parallel_worker)
-		latestXid = RecordTransactionAbort(false);
+	{
+		/*
+		 * Re-entrant abort: if another ERROR is raised during AbortTransaction()
+		 * after ProcArrayEndTransaction() cleared our advertised XID, local
+		 * transaction state can still report an XID until CleanupTransaction().
+		 * Do not run RecordTransactionAbort again (duplicate WAL/clog) and do
+		 * not pass a valid latestXid to ProcArrayEndTransaction with proc->xid
+		 * already cleared.
+		 */
+		if (TransactionIdIsValid(MyProc->xid))
+			latestXid = RecordTransactionAbort(false);
+		else
+			latestXid = InvalidTransactionId;
+	}
 	else
 	{
 		latestXid = InvalidTransactionId;

  [text/x-sh] repro_minimal_panic_if_fixed.sh (1.1K, 4-repro_minimal_panic_if_fixed.sh)
  download | inline:
#!/usr/bin/env bash
#
# repro_minimal_panic_if_fixed.sh -- Minimal reproduction with raw behavior
#
# This script demonstrates the pure difference between original bug and fixed version:
#   • Original bug: TRAP assertion failure (immediate crash)
#   • Bug fixed: PANIC ERRORDATA_STACK_SIZE exceeded (infinite loop)
#
# No mitigation, no auto-detach - shows the raw behavior difference.

set -euo pipefail

PSQL=${PSQL:-psql}

echo "⚡ Minimal Re-entrant AbortTransaction Reproduction"
echo "================================================"
echo ""
echo "Expected behavior:"
echo "  🐛 Original bug: TRAP assertion failure"
echo "  🔧 Bug fixed: PANIC ERRORDATA_STACK_SIZE exceeded"
echo ""

$PSQL -c "CREATE EXTENSION IF NOT EXISTS injection_points;" >/dev/null

echo "🧪 Executing minimal reproduction (raw behavior)..."

$PSQL <<'SQL'
SELECT injection_points_set_local();
SELECT injection_points_attach('transaction-end-process-inval', 'error');

BEGIN;
CREATE TABLE _minimal_repro (id int);
ROLLBACK;
SQL

echo "😐 Clean completion - bug conditions not met or injection point didn't fire"

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded
  In-Reply-To: <CA+8z=zu-mxXuBtnTVNcOzaGJhgeKSTyeAsK+AdRMr5_XZM1SDw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox