Hi,
On Mon, Apr 6, 2026 at 1:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I took a look at the v10 patch and it LGTM. I tested it - make
> > check-world passes, pgindent doesn't complain.
>
> While reviewing the patch, I found that with this patch, backend
> processes and autovacuum workers can simultaneously attempt to
> invalidate the same slot for the same reason. When invalidating a
> slot, we send a signal to the process owning the slot and wait for it
> to exit and release the slot. If the process takes a long time to exit
> for some reason, subsequent autovacuum workers attempting to
> invalidate the same slot will also send a SIGTERM and get stuck at
> InvalidatePossiblyObsoleteSlot(). In the worst case, this could result
> in all autovacuum activity being blocked. I think we need to address
> this problem.
Thank you!
You're right that multiple autovacuum workers can wait on the same
slot for SIGTERM to take effect on the process (mainly walsenders)
holding the slot. Once the process holding the slot exits, one worker
finishes the invalidation and the others see it's done and move on.
However, IMHO, this is unlikely to be a problem in practice.
I was able to reproduce this using pg_recvlogical on a slot, by pausing the
walsender using debugger , then i did some hacky stuff around the GUCs
(just to test), but in production IIUC I think During decoding a large transaction
or network delay , the walsender gets stuck for "some" time, so backend and
autovacuum workers get stuck until then, after that they resume their work,
Correct me if I am wrong :)
If needed, we could add a flag to skip extra invalidation attempts
based on field experience.
+1, yeah this would help other backends or autovacuum workers not
to retry again the same invalidation and stuck , instead they can check
the flag and be assured that slot invalidation is being taken care of,
so others can move on.