Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wA7b2-0025Fs-0e for pgsql-hackers@arkaria.postgresql.org; Tue, 07 Apr 2026 14:40:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wA7b0-000si8-2C for pgsql-hackers@arkaria.postgresql.org; Tue, 07 Apr 2026 14:40:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wA7b0-000si0-1B for pgsql-hackers@lists.postgresql.org; Tue, 07 Apr 2026 14:40:02 +0000 Received: from mail-dl1-x1233.google.com ([2607:f8b0:4864:20::1233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wA7ay-000000017uc-0iNC for pgsql-hackers@postgresql.org; Tue, 07 Apr 2026 14:40:02 +0000 Received: by mail-dl1-x1233.google.com with SMTP id a92af1059eb24-126ea4e9694so538836c88.1 for ; Tue, 07 Apr 2026 07:40:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775572798; cv=none; d=google.com; s=arc-20240605; b=N7PKJUcG8qmi2k0Bu1E7yxrb1gCTD5mPGxlgCRL9HJeaAw9p2ooRSYHPzBypEeiBq+ zDnT8qimbGPPbwYgtlnACcrqVuwBNwSx3RqSQdlNU79qfMV223SbXQ9qSO7sMDkAa09v U8LxuH3XKcmgksZP0wg1gW0JBbxQ6EoyYxbu+bIqpjtWqDmKxFNy+a5aJHyAr3rnCBHo JOQM2UToAB2tsm8kegrWU6+xk3302hifJIXuXkyfIkkGu8HD+QMyqKhO3i+uDZnxhh8M pNNTcANoXsw50bsWS8kdBbdCURBAgBMf1s0KYPMsdSSwlQqqxyxLRGYAv9tv2auosJzq hcTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=xoRqysLWcMoRVGgeE/kV+RUHOp9F/funKlW5MWcEW0M=; fh=XV+I7oVgQ8ob4U2o6n2IJTC1tsBX5sSIgmxHSZqcnJA=; b=XlC1kTBq/RyPKZuR5F6Vz4MDhPe9/sAQbPWa3sw5xfuzQ+XE97MPo7TAjgjFM2x1Ix HIXq1cjOIHxAzTdfPzLkdwj/f/3jnxvbC9jj9h/H+N9aFhD2g7qrXnxl7t/o5yiQ3NLI 5oaT63N768tPzl3kcSaHbewQCNIlM/7XUfHvr0GTz/i94uZeHI3YrRLYw8EOv2kzbpMF tiSmvF43zDVM0zQtdrbkIuUkb+d2r7wlQoQzBvSJDKDOvnJe7RT0peT0gP0doT2MC68F psXN+PZIPPxT8A7WzFowW9XT4z4AdZdeWBV4ciAHTEy8tCPdJMJX9xFppAuytdiXY3M3 mYhA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775572798; x=1776177598; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xoRqysLWcMoRVGgeE/kV+RUHOp9F/funKlW5MWcEW0M=; b=QCwy/BLtOGZgY5vLnSYutR9xuV0QURPS4ZUCmfTazc+5kfKHjCWSz1XN1fsuYUhm11 gYnGALrV7LIuZRSR2+vILCuZc4Q28u07DmAZGvpZl0S0lWYGHbAaFEbh1YyjjFun0ouo QJ464+OUeNs1dx+5lP9/53TAyz6e3j7iCdKbBLLaeXRiO/5DHqoWrL4GtqcF+rQQfQfe CwS09SE9YBVwa9x7X0AGp54lOaFOpFwRnkeXq/EDWhtndbv5Ehsr4nBQZq6W2uVxaIkx r7VhWwyXXs/2JKQt9aie5062+I7wW2vd4Tmgv0Eiq8R3Mo/7lKDTJC+OEA+ePsRfsgPy AQjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775572798; x=1776177598; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xoRqysLWcMoRVGgeE/kV+RUHOp9F/funKlW5MWcEW0M=; b=eYhciiegb2UeKHHlzvkdWDeFEFQhmbhqkmxqQhoRopFxhadEMppb/6PH4fa4isNfOu GRZGFGIh83NXeGl13W8BEQv/PpXxh+pihdJYR/zkwDw0/pFq1e6LFOnkW6+gdixg3MBf 3owXsnQkc/ELgM+V22tFV5l0vo1dbsU4VgT3uhd7dAF5eq/FTvKCH8jswuBKDFjpaxcr SLBO99ny3KpY6SN4MroWfzOWNOEjL+vqso2MNsnJ5GpiUvnk8TnC8S1K4blYPuvgS3YL ubLcRAOpoyvDRcvLNtHzxVsD5MTytx3J/iNxV5jxxVq54zdoeTGeVvlE2Ue913vl/R0t 0eJw== X-Forwarded-Encrypted: i=1; AJvYcCUIFsXwXM41vcIKR0ry5bR9PqkDh9eGzwKnUrfsKHLrPlf3S5YYFJSah49J6ULUiFsl++B0WQCQDlUFDBkn@postgresql.org X-Gm-Message-State: AOJu0YzFzoy+xHJxTkMOLeQqwzFWCK4Y4YI7EVLkzi2+/jM84YdaZdUN 9bFBjJV7UFT2ipJMowR8/EGB9P+0J1D1LAiqJEqIbDEpgxZnLDboPRPlCsf4zksigG5z/pLNdU+ KddB5TqF/R4GljWdYK3SUG0N1h4kHcGc= X-Gm-Gg: AeBDieuioZ/DNmG3r7HfUR/oV2iHV+97KxAH5xh3U9WWIRD/QYCucedmpFzeJnSvKYo xyoKKMCV2r4gW6V/S7njLH/aLpBZio1j0byq1E+mAm0cl295WGaFcyQKj2EIPsYR9UMoG69Zvu+ 9fxffIUM80dfVQXnSghVT2zIdtF5+rqVWu96nUIAd8bgv9r/aqBlSxgXNZXmB5HrcxpTKeJp8wZ UxWs7BA+eaxhjmVdd8D3ZfECGcThYofm0MWzUOZOZYAgedf7AxdeVOGh9J5fPPdnGRJrNxd/KPm mnNofE/cBtPQRqoDnLqeCEixqk5P8cKhKl1NHCzijLfsAmWeB4y2zEpnS4ghyVVo9F2mF95DnG0 fnhCjI+y1flqNd63ihNn8NyYExzw+IDfxr1pe X-Received: by 2002:a05:7300:a506:b0:2c5:347:e635 with SMTP id 5a478bee46e88-2cbfba8e1fbmr8609856eec.21.1775572797759; Tue, 07 Apr 2026 07:39:57 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Srinath Reddy Sadipiralla Date: Tue, 7 Apr 2026 20:09:45 +0530 X-Gm-Features: AQROBzAGIi3W3ehhfnpE7bxaFLSo5Ynv8rvKji6HfFfrvXe2_CQv3TWTsN-77Uc Message-ID: Subject: Re: Introduce XID age based replication slot invalidation To: Bharath Rupireddy Cc: Masahiko Sawada , SATYANARAYANA NARLAPURAM , "Hayato Kuroda (Fujitsu)" , John H , PostgreSQL-development Content-Type: multipart/alternative; boundary="000000000000e223ce064edfc16f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e223ce064edfc16f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, On Mon, Apr 6, 2026 at 11:12=E2=80=AFPM Bharath Rupireddy < bharath.rupireddyforpostgres@gmail.com> wrote: > Hi, > > On Mon, Apr 6, 2026 at 1:45=E2=80=AFAM Masahiko Sawada > wrote: > > > > > I took a look at the v10 patch and it LGTM. I tested it - make > > > check-world passes, pgindent doesn't complain. > > > > While reviewing the patch, I found that with this patch, backend > > processes and autovacuum workers can simultaneously attempt to > > invalidate the same slot for the same reason. When invalidating a > > slot, we send a signal to the process owning the slot and wait for it > > to exit and release the slot. If the process takes a long time to exit > > for some reason, subsequent autovacuum workers attempting to > > invalidate the same slot will also send a SIGTERM and get stuck at > > InvalidatePossiblyObsoleteSlot(). In the worst case, this could result > > in all autovacuum activity being blocked. I think we need to address > > this problem. > > Thank you! > > You're right that multiple autovacuum workers can wait on the same > slot for SIGTERM to take effect on the process (mainly walsenders) > holding the slot. Once the process holding the slot exits, one worker > finishes the invalidation and the others see it's done and move on. > > However, IMHO, this is unlikely to be a problem in practice. > I was able to reproduce this using pg_recvlogical on a slot, by pausing the walsender using debugger , then i did some hacky stuff around the GUCs (just to test), but in production IIUC I think During decoding a large transaction or network delay , the walsender gets stuck for "some" time, so backend and autovacuum workers get stuck until then, after that they resume their work, Correct me if I am wrong :) If needed, we could add a flag to skip extra invalidation attempts > based on field experience. > +1, yeah this would help other backends or autovacuum workers not to retry again the same invalidation and stuck , instead they can check the flag and be assured that slot invalidation is being taken care of, so others can move on. --=20 Thanks, Srinath Reddy Sadipiralla EDB: https://www.enterprisedb.com/ --000000000000e223ce064edfc16f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

On Mon, Apr 6, 2= 026 at 11:12=E2=80=AFPM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>= wrote:
Hi,

On Mon, Apr 6, 2026 at 1:45=E2=80=AFAM Masahiko Sawada <sawada.mshk@gmail.com> wr= ote:
>
> > I took a look at the v10 patch and it LGTM. I tested it - make > > check-world passes, pgindent doesn't complain.
>
> While reviewing the patch, I found that with this patch, backend
> processes and autovacuum workers can simultaneously attempt to
> invalidate the same slot for the same reason. When invalidating a
> slot, we send a signal to the process owning the slot and wait for it<= br> > to exit and release the slot. If the process takes a long time to exit=
> for some reason, subsequent autovacuum workers attempting to
> invalidate the same slot will also send a SIGTERM and get stuck at
> InvalidatePossiblyObsoleteSlot(). In the worst case, this could result=
> in all autovacuum activity being blocked. I think we need to address > this problem.

Thank you!

You're right that multiple autovacuum workers can wait on the same
slot for SIGTERM to take effect on the process (mainly walsenders)
holding the slot. Once the process holding the slot exits, one worker
finishes the invalidation and the others see it's done and move on.

However, IMHO, this is unlikely to be a problem in practice.
=C2=A0
I was able to reproduce this using pg_recvlogical on a slo= t, by pausing the
walsender using debugger , then i did some hack= y stuff around the GUCs
(just to test), but in production IIUC I think D= uring decoding a large transaction
or network delay , the walsender gets= stuck for "some" time, so backend and
autovacuum workers get = stuck until then, after that they resume their work,
Correct me if I am = wrong :)

If = needed, we could add a flag to skip extra invalidation attempts
based on field experience.

+1, yeah this would hel= p other backends or autovacuum workers not
to retry again the same inval= idation and stuck , instead they can check
the flag and be assured that = slot invalidation is being taken care of,
so others can move on.


--
Thanks,
Srinath Reddy Sadip= iralla
EDB:=C2=A0https://www.enterprisedb.com/<= /div>
--000000000000e223ce064edfc16f--