Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vQ48D-008LvW-19 for pgsql-general@arkaria.postgresql.org; Mon, 01 Dec 2025 13:39:57 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vQ48C-002rRl-03 for pgsql-general@arkaria.postgresql.org; Mon, 01 Dec 2025 13:39:56 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vQ48B-002rRd-21 for pgsql-general@lists.postgresql.org; Mon, 01 Dec 2025 13:39:56 +0000 Received: from mail-pf1-x433.google.com ([2607:f8b0:4864:20::433]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vQ488-002YrF-1S for pgsql-general@lists.postgresql.org; Mon, 01 Dec 2025 13:39:55 +0000 Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-7ab689d3fa0so328277b3a.0 for ; Mon, 01 Dec 2025 05:39:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764596390; x=1765201190; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=028ZTl/mfQxzc4FbCnptiEbTeXck+n7vnmAgo9Fn3h4=; b=TcPXWAAJ9lcLt4LcLVcWebmAjMaeKkUMm7FPldHwB7mbYQ55bbJ3fVsrW6iKQzGwao sh+hkFa+iZwSnK5w/ChAdSvYBSnQ4nju+0hCjDYPWte+wxWyp1CH6F1dX3vZkF086gc3 mt0agjKsVB10qRLQGQaQeGyPac5sYJIG2+I/PVvfmxaW1XjcULzwAIrc8NhAKOPWGDlR 0YdnayD/kTUv5xf/1MHpFDmquWtgDM+MnAHC9K4txaZPEU82E1NhH5aySfwrp5r+MQ6W CKy7xjYB1RcGDxDVtpNNpQDOAg2hHTEf9JO6vBrCsYW9BmZ3Yqb2KuCbjolbKyf1+2xc O4bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764596390; x=1765201190; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=028ZTl/mfQxzc4FbCnptiEbTeXck+n7vnmAgo9Fn3h4=; b=VvSzzMInqKQFwFfc7klTYwQ4q2WBirdDGgEeqAMgYgapMsSQWUgakP6cB72LdMVB9L IIFcZV5VdDj7GZuKHGi/IThP5NZrx9zIE/KJsfKpPwOaYvRual2DjcKFo1+qbXjg+Q3G pMYJSDejMmdGdIAv5cmgKZ7Q68VmT7T73+r/S65dxEnhBkq5QsTvlfRQjDf66QMMgvXk XcqpkglRTGYUMmOgcUh9pdx4sFLX88g6WUvpWhGcDCfqExVmL+nZuqj+UWLzqj8EFerH +A4SBhO3VOZMwhG/+45k4vRIQ+mtXh9RtWaWSBilUSAzYuSTS0ZgmUSoTQGJZSJmFOof xHMA== X-Gm-Message-State: AOJu0YxBNcu7wc4guwqkEiJ7fiDyJAQcB1tai5RPOQH+51bpP5xk3vHQ 981Wp1jr9PdgwNGNpbrb8RIgZo7YbKlMjm8gn5AUowc3IhK5YTHc8vVXYodnhOp3vzIVIrwB/Mi entCoEqyygtlc27i+PmZ/RHcIUZwjfQhkwVS1 X-Gm-Gg: ASbGnctaDrrsYXiJP6b13lYDC/ebBE1uiit2cmTW8xC0vOyWwET9ZvijKig6yhbkTZb blCMToZUAHykhGAfl4RJecWHWnGet2jXxh7c+r9DrUR2H+JEe6+157OKrFVBQGq+cJHilJg77/l 7JPfOMD8WdoM42W3e03aYW3Z53w8Z8v/cG162iZcXlcGECYZG8c45HrikDG+v42JQOgT/gVTj2F kPegwoAwTiJLqFdXOoVy0g7ibBqbohV4FU5ySIlo+yAY+GgS5Pn0GKVZpYUu+NWs0qfYc9C X-Google-Smtp-Source: AGHT+IHVp5Ppqcllopb389Cv3TH+O9wkq9LT6+BilRah6LOq+I20XKqSEiGy3Z+7IL8QjCMkAYWNJ8t6BAIS9PzrnDI= X-Received: by 2002:a05:7022:42aa:b0:119:e56b:c3f1 with SMTP id a92af1059eb24-11c9f2d50a6mr21547786c88.1.1764596390040; Mon, 01 Dec 2025 05:39:50 -0800 (PST) MIME-Version: 1.0 From: =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?= Date: Mon, 1 Dec 2025 14:39:38 +0100 X-Gm-Features: AWmQ_bmbYRckLpRa-urpxE_POlbegLjEJmzNH4YYGW60eyogTo1nABFXwBP8ANQ Message-ID: Subject: After upgrade from Pg11.2 to 17.7 logical replication prevents database instance shutdown To: "pgsql-generallists.postgresql.org" Content-Type: multipart/alternative; boundary="0000000000000016ad0644e41d90" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000016ad0644e41d90 Content-Type: text/plain; charset="UTF-8" Hello, We have recently upgraded from PostgreSQL 11.2 to PostgreSQL 17.7. We have logical replication between two database instances; no third-party CDC consumers are used. During low traffic on the publisher database, there are no issues, and the publisher instance shutdown is smooth, as expected. If we request a shutdown in a condition where there is a replication lag from the publisher to the subscriber instance (systemctl stop .... which is defined in the systems unit as ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast %i stop ) the shutdown hangs for exactly 30 minutes from the "received fast shutdown request" message in the database log with log message ( ... 0 5029/2736 sub_xxx_usd START_REPLICATION [57P01]:FATAL: terminating connection due to administrator command ). We have checked the corresponding logs from PG 11.2, it took exactly 60 seconds. We have also tried setting checkpoint_timeout = 27min and archive_timeout = 23min to make sure the delayed shutdown is not related to these parameters, and still the shutdown is blocked just for 30 minutes. If we disable the subscription, the shutdown is smooth; that is why we suspect some change in logical replication, or there are some new configuration parameters we have missed to let publisher instance shutdown cleanly without that long delay, and finally terminating the sender process on the publisher instance. PostgreSQL version: PostgreSQL 17.7 (Ubuntu 17.7-3.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0, 64-bit Timeouts: publisher instance: powa=# show wal_sender_timeout; wal_sender_timeout -------------------- 10min (1 row) subscriber instance: powa=# show wal_receiver_timeout; wal_receiver_timeout ---------------------- 10min (1 row) OS version: No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.5 LTS Release: 22.04 Codename: jammy We have found https://github.com/postgres/postgres/commit/5231ed8262c94936a69bce41f64076630bbd99a2, not sure whether it applies to the behavior change described above. Also, the "walsender.c" comment seems to explain that the shutdown is intentionally postponed (could be a very long time, in our case, the lag is caused by ETLs and can be about 80GB, so postponing the shutdown after all the lag costs a lot of time). And it does not explain to us the timeout change from 60 seconds to 30 minutes (no timeout is mentioned): * If the server is shut down, checkpointer sends us * PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited. If * the backend is idle or runs an SQL query this causes the backend to * shutdown, if logical replication is in progress all existing WAL records * are processed followed by a shutdown. Otherwise, this causes the walsender * to switch to the "stopping" state. In this state, the walsender will reject * any further replication commands. The checkpointer begins the shutdown * checkpoint once all walsenders are confirmed as stopping. When the shutdown * checkpoint finishes, the postmaster sends us SIGUSR2. This instructs * walsender to send any outstanding WAL, including the shutdown checkpoint * record, wait for it to be replicated to the standby, and then exit. Our pipeline requires the instance restart, so far the only workaround we have found is to explicitly disable subscription before initiating shutdown, but it is considered a bit fragile compared to smooth behavior on Pg11. Is there a way how to make the 30-minute shutdown shorter to become closer to pg11 behavior? Thanks in advance Ales Zeleny --0000000000000016ad0644e41d90 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

We have recently upgr= aded from PostgreSQL 11.2 to PostgreSQL 17.7. We have logical replication b= etween two database instances; no third-party CDC consumer= s are used.

During low traffic on the publisher database, there are no issues= , and the publisher instance shutdown is smooth, as expected.
<= div>
If we request a= shutdown in a condition where there is a replication lag from the publishe= r to the subscriber instance (systemctl stop .... which is defined in the s= ystems unit as
ExecStop=3D/usr/bin/pg_ctlcluster --skip-systemctl-re= direct -m fast %i stop
) the shutdown hangs for exactly 30 minutes = from the "received = fast shutdown request" message in the database log with log message (
... 0 5029/2736 sub_xxx_usd START_REPLICATION [57P0= 1]:FATAL: terminating connection due to administrator command
).
We have checked the corresponding logs from PG 11.2, it took ex= actly 60 seconds.

<= font size=3D"2">We have also tried setting=C2=A0checkpoint_timeout =3D 27min and archive_timeout = =3D 23min=C2=A0to make sure th= e delayed shutdown is not related to these parameters, and still the shutdo= wn is blocked just for 30 minutes.

If we disable the subscription, the shutdo= wn is smooth; that is why we suspect some change in logical replication, or= there are some new configuration parameters we have missed to let publishe= r instance shutdown cleanly without that long delay, and finally terminatin= g the sender process on the publisher instance.

PostgreSQL version:
PostgreSQL 17.7 (Ubuntu 17.7-3.pgdg22.04+1) on x86_64-pc-linux-gnu,= compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0, 64-bit
=
Timeouts:

publisher instance:=C2=A0=
powa=3D# show wal_sender_timeout;
=C2=A0wal_sender_timeout --------------------
=C2=A010min
(1 row)

subscrib= er instance:
powa=3D# show wal_receiver_timeout;
=C2=A0wal_receiver_t= imeout
----------------------
=C2=A010min
(1 row)
<= br>
OS version:
No LSB modules are available.
Distri= butor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Co= dename: jammy

We have found=C2=A0https://github.com/postgres/postgres/commit/5231ed8262c94936a69bce41f6= 4076630bbd99a2, not sure whether it applies to the behavior change desc= ribed above.

Also, the "walsender.c" comment seems to explain that the shutdown is in= tentionally postponed (could be a very long time, in our case, the lag is c= aused by ETLs=C2=A0 and can be about 80GB, so postponing the shutdown after= all the lag costs a lot of time). And it does not explain to us the timeou= t change from 60 seconds to 30 minutes (no timeout is mentioned):

* If the server is shut down, checkpointer sends us
* PROCSIG_WALSND_INIT_STOPPIN= G after all regular backends have exited. If
* the backend is idle or runs an SQL query this causes = the backend to
* shutdown, = if logical replication is in progress all existing WAL records
=
* are processed followed by a shutdown.= Otherwise, this causes the walsender
* to switch to the "stopping" state. In this state, = the walsender will reject
*= any further replication commands. The checkpointer begins the shutdown
* checkpoint once all walsende= rs are confirmed as stopping. When the shutdown
* checkpoint finishes, the postmaster sends us SIGUSR2= . This instructs
* walsende= r to send any outstanding WAL, including the shutdown checkpoint
* record, wait for it to be replicate= d to the standby, and then exit.

Our pipeline requires the insta= nce restart, so far the only workaround we have found is to explicitly disa= ble subscription before initiating shutdown, but it is considered a bit fra= gile compared to smooth behavior on Pg11.

Is there a way how to = make the 30-minute shutdown shorter to become closer to pg11 behavior?
<= br>
Thanks in advance
Ales Zeleny


--0000000000000016ad0644e41d90--