public inbox for [email protected]  
help / color / mirror / Atom feed
From: Antonin Houska <[email protected]>
To: Alvaro Herrera <[email protected]>
Cc: Baji Shaik <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit
Date: Wed, 20 May 2026 09:54:06 +0200
Message-ID: <8656.1779263646@localhost> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Alvaro Herrera <[email protected]> wrote:

> On 2026-May-17, Baji Shaik wrote:
> 
> > v3 uses PG_ENSURE_ERROR_CLEANUP which:
> > - Handles both ERROR and FATAL exits
> > - Automatically cancels the callback on normal completion
> >   (no slot leak)
> > - Runs before memory contexts are destroyed (no use-after-free)
> 
> Yeah, looks good.  I have pushed it, with some comment wordsmithing and
> other cosmetic changes.
> 
> While looking at it, I realized that I didn't like the way
> stop_repack_decoding_worker() works, mainly because if there's no
> handle, we leak everything else -- and the way we initialize things
> means we leak the shared memory segment.  This is maybe a rare case and
> just a small memory leak, but it seems better to do it nicely.  So
> here's a followup patch that reworks that code.  This also forced me to
> understand more clearly what is going on, so I rewrote the comments.

The call of shm_mq_detach() got lost, or do you rely on dsm_detach() to call
shm_mq_detach_callback() ? The latter does not free ->mqh_buffer. Since each
REPACK runs in a separate transaction, I wouldn't consider that a leak, but I
still think that explicit call of shm_mq_detach() makes the code a bit easier
to read (i.e. no need for the developer to check if the detaching happens
automatically).


 	/*
-	 * If we could not cancel the current sleep due to ERROR, do that before
-	 * we detach from the shared memory the condition variable is located in.
-	 * If we did not, the bgworker ERROR handling code would try and fail
-	 * badly.
+	 * Now detach from our shared memory segment.  In error cases there might
+	 * still be messages from the worker in the queue, which ProcessInterrupts
+	 * would try to read; this is pointless (and causes an assertion failure),
+	 * so set the global pointer to NULL to have ProcessRepackMessages ignore
+	 * them.
 	 */
-	ConditionVariableCancelSleep();
-
-	dsm_detach(decoding_worker->seg);
+	dsmseg = decoding_worker->seg;
 	pfree(decoding_worker);
 	decoding_worker = NULL;
+
+	/* We must also cancel the current sleep, if one is still set up */
+	ConditionVariableCancelSleep();
+
+	if (dsmseg != NULL)
+		dsm_detach(dsmseg);


I suppose the reason for the assertion failure was reading from the queue
after the backend had detached from it? Thanks for fixing that.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit
  In-Reply-To: <8656.1779263646@localhost>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox