MIME-Version: 1.0
From: =?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?= <zeleny.ales@gmail.com>
Date: Mon, 1 Dec 2025 14:39:38 +0100
Message-ID: <CAODqTUZXgywhJXGK1UmaWtJDVuzXXUYG4-DTCuV0VkB--+SCWA@mail.gmail.com>
Subject: After upgrade from Pg11.2 to 17.7 logical replication prevents
 database instance shutdown
To: "pgsql-generallists.postgresql.org" <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000000016ad0644e41d90"
Archived-At: <https://www.postgresql.org/message-id/CAODqTUZXgywhJXGK1UmaWtJDVuzXXUYG4-DTCuV0VkB--%2BSCWA%40mail.gmail.com>
Precedence: bulk

--0000000000000016ad0644e41d90
Content-Type: text/plain; charset="UTF-8"

Hello,

We have recently upgraded from PostgreSQL 11.2 to PostgreSQL 17.7. We have
logical replication between two database instances; no third-party CDC
consumers are used.

During low traffic on the publisher database, there are no issues, and the
publisher instance shutdown is smooth, as expected.

If we request a shutdown in a condition where there is a replication lag
from the publisher to the subscriber instance (systemctl stop .... which is
defined in the systems unit as
ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast %i stop
) the shutdown hangs for exactly 30 minutes from the "received fast
shutdown request" message in the database log with log message (
... 0 5029/2736 sub_xxx_usd START_REPLICATION [57P01]:FATAL: terminating
connection due to administrator command
).
We have checked the corresponding logs from PG 11.2, it took exactly 60
seconds.

We have also tried setting checkpoint_timeout = 27min and archive_timeout =
23min to make sure the delayed shutdown is not related to these parameters,
and still the shutdown is blocked just for 30 minutes.

If we disable the subscription, the shutdown is smooth; that is why we
suspect some change in logical replication, or there are some new
configuration parameters we have missed to let publisher instance shutdown
cleanly without that long delay, and finally terminating the sender process
on the publisher instance.

PostgreSQL version:
PostgreSQL 17.7 (Ubuntu 17.7-3.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0, 64-bit

Timeouts:

publisher instance:
powa=# show wal_sender_timeout;
 wal_sender_timeout
--------------------
 10min
(1 row)

subscriber instance:
powa=# show wal_receiver_timeout;
 wal_receiver_timeout
----------------------
 10min
(1 row)

OS version:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy

We have found
https://github.com/postgres/postgres/commit/5231ed8262c94936a69bce41f64076630bbd99a2,
not sure whether it applies to the behavior change described above.

Also, the "walsender.c" comment seems to explain that the shutdown is
intentionally postponed (could be a very long time, in our case, the lag is
caused by ETLs  and can be about 80GB, so postponing the shutdown after all
the lag costs a lot of time). And it does not explain to us the timeout
change from 60 seconds to 30 minutes (no timeout is mentioned):

* If the server is shut down, checkpointer sends us
* PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited. If
* the backend is idle or runs an SQL query this causes the backend to
* shutdown, if logical replication is in progress all existing WAL records
* are processed followed by a shutdown. Otherwise, this causes the walsender
* to switch to the "stopping" state. In this state, the walsender will
reject
* any further replication commands. The checkpointer begins the shutdown
* checkpoint once all walsenders are confirmed as stopping. When the
shutdown
* checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
* walsender to send any outstanding WAL, including the shutdown checkpoint
* record, wait for it to be replicated to the standby, and then exit.

Our pipeline requires the instance restart, so far the only workaround we
have found is to explicitly disable subscription before initiating
shutdown, but it is considered a bit fragile compared to smooth behavior on
Pg11.

Is there a way how to make the 30-minute shutdown shorter to become closer
to pg11 behavior?

Thanks in advance
Ales Zeleny

--0000000000000016ad0644e41d90
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello,</div><div><br></div><div>We have recently upgr=
aded from PostgreSQL 11.2 to PostgreSQL 17.7. We have logical replication b=
etween two database instances; no th<font size=3D"2">ird-party CDC consumer=
s are used.</font></div><div><font size=3D"2"><br></font></div><div><font s=
ize=3D"2">During low traffic on the publisher database, there are no issues=
, and the publisher instance shutdown is smooth, as expected.</font></div><=
div><font size=3D"2"><br></font></div><div><font size=3D"2">If we request a=
 shutdown in a condition where there is a replication lag from the publishe=
r to the subscriber instance (systemctl stop .... which is defined in the s=
ystems unit as</font></div><div><span><span><span><code class=3D"gmail-_ca0=
qyh40 gmail-_u5f3m5ip gmail-_n3tdyh40 gmail-_19bvm5ip gmail-_2rkofajl gmail=
-_11c81u0j gmail-_1reo1wug gmail-_18m91wug gmail-_1dqoglyw gmail-_1e0c1nu9 =
gmail-_bfhk187e gmail-_16d9qvcn gmail-_syazi7uo gmail-_vwz41kw7 gmail-_1i4q=
1hna gmail-_o5721jtm">ExecStop=3D/usr/bin/pg_ctlcluster --skip-systemctl-re=
direct -m fast %i stop</code></span></span></span><font size=3D"2"><br></fo=
nt></div><div><font size=3D"2">) the shutdown hangs for exactly 30 minutes =
from the &quot;</font><span><span><span><span class=3D"gmail-prismjs gmail-=
_11c81u0j gmail-_2rko12b0 gmail-_1dqoglyw gmail-_1e0c1txw gmail-_vwz4gktf g=
mail-_1reo1wug gmail-_o572qvpr gmail-_1eimjvyg gmail-_bfhk187e gmail-_syazi=
7uo gmail-_1ozdn7od gmail-_7xinn7od gmail-_t7aun7od gmail-_r28du2gc gmail-_=
tajqu2gc gmail-_1ohiu2gc gmail-_m802u2gc gmail-_i6ntu2gc gmail-_1w2xu2gc gm=
ail-_1hmyegat gmail-_vblregat gmail-_vbulegat gmail-_196q1j08 gmail-_1vbw1j=
08 gmail-_1v9c1j08 gmail-_1srnt0uh gmail-_18r6myb0 gmail-_vyvc1n1a gmail-_1=
d4j1y44 gmail-_1f8gstnw gmail-_1pzyb3bt gmail-_ra6gsb9t gmail-_13cdh2mm gma=
il-_1pp0126e gmail-_zvy9f705 gmail-_qcxof705 gmail-_qzn01a66 gmail-_j0l11wu=
g gmail-_1weckb7n gmail-_1na21hna gmail-_vsnzgrf3 gmail-_x7c815vq gmail-_lh=
0y15vq gmail-_1m3815vq gmail-_qk1e15vq gmail-_12l6ysn8 gmail-_uga3ysn8 gmai=
l-_mx8b7mnp gmail-_1kr87mnp gmail-_xo19t94y gmail-_1bemt94y gmail-_nalpstnw=
 gmail-_151dstnw gmail-_1exb1q9c gmail-_1hgu1q9c gmail-_1mgnt94y gmail-_nhk=
et94y gmail-_h909i8nm gmail-_scgayz1z gmail-_ipl81e17 gmail-_40uk1l04 gmail=
-_i81p1a66 gmail-_1gx21e5h gmail-_1ls01ule gmail-_vm2c1rh5 gmail-_12ok1rh5 =
gmail-_rude1ule gmail-_1q16glyw gmail-_1io6glyw gmail-_juomusic gmail-_lcwu=
usic gmail-_pyovu2gc gmail-_ccm6u2gc gmail-_1ascu2gc gmail-_1yuau2gc gmail-=
_xr0w1a66 gmail-_4io21a66 gmail-_euyxusvi gmail-_cahfusvi gmail-_zhnuidpf g=
mail-_1amdidpf gmail-_mbgc124n gmail-_bu7z124n gmail-_131n1giz gmail-_gy101=
giz gmail-_1wfuwrk5 gmail-_16kzwrk5 gmail-_9kk3moej gmail-_cjus1w1g gmail-_=
9k2r1m30 gmail-_nhmw1m30 gmail-_yl021m30 gmail-_eiht5x2v gmail-_t9zb5x2v gm=
ail-_mqok1w1g gmail-_3hsg1w1g gmail-_i7ngn7od gmail-_9wu1fb2s gmail-_1xcoh5=
5r gmail-_1t36i7uo gmail-_137bh55r gmail-_1k7di7uo gmail-_97li16jw gmail-_1=
2nh9lu1 gmail-_1g0517qg gmail-_i2igqmo9 gmail-_326zi7uo gmail-_113p1rpy gma=
il-_1n6t16jw gmail-_tgu817qg gmail-_1k4716jw gmail-_g0lxi7uo gmail-_ys4e1rp=
y gmail-_7gp8h55r gmail-_1yvqqmo9 gmail-_1vwwqmo9 gmail-_1rjuqmo9 gmail-_1v=
0lh55r gmail-_wmyy17qg gmail-_748n17qg gmail-_1mfn17qg gmail-_1d7e17qg gmai=
l-_p2vr17qg gmail-_19o6qmo9 gmail-_kxov17qg gmail-_1np517qg gmail-_m2f517qg=
 gmail-_1b9t16jw gmail-_1tq616jw gmail-_1rd216jw gmail-_1pbk16jw gmail-_k3l=
i16jw gmail-_13zt1rpy gmail-_2g12fb2s gmail-_k86bqmo9 gmail-_b5iy1rpy gmail=
-_gti31rpy gmail-_1f0g16jw gmail-_9d3e17qg gmail-_qdia16jw gmail-_72uv16jw =
gmail-_13dgkb7n gmail-_1707bror gmail-_1i3h1txw gmail-_16noidpf gmail-_h4fu=
idpf gmail-_pp6yidpf gmail-_1g4tidpf gmail-_11wmidpf gmail-_1bx8idpf" tabin=
dex=3D"0" aria-label=3D"Scrollable content" role=3D"region"><code class=3D"=
gmail-language-" style=3D"white-space:pre"><span class=3D"gmail-">received =
fast shutdown request</span></code></span></span></span></span><font size=
=3D"2">&quot; message in the database log with log message (</font></div><d=
iv><span><span><span><span class=3D"gmail-prismjs gmail-_11c81u0j gmail-_2r=
ko12b0 gmail-_1dqoglyw gmail-_1e0c1txw gmail-_vwz4gktf gmail-_1reo1wug gmai=
l-_o572qvpr gmail-_1eimjvyg gmail-_bfhk187e gmail-_syazi7uo gmail-_1ozdn7od=
 gmail-_7xinn7od gmail-_t7aun7od gmail-_r28du2gc gmail-_tajqu2gc gmail-_1oh=
iu2gc gmail-_m802u2gc gmail-_i6ntu2gc gmail-_1w2xu2gc gmail-_1hmyegat gmail=
-_vblregat gmail-_vbulegat gmail-_196q1j08 gmail-_1vbw1j08 gmail-_1v9c1j08 =
gmail-_1srnt0uh gmail-_18r6myb0 gmail-_vyvc1n1a gmail-_1d4j1y44 gmail-_1f8g=
stnw gmail-_1pzyb3bt gmail-_ra6gsb9t gmail-_13cdh2mm gmail-_1pp0126e gmail-=
_zvy9f705 gmail-_qcxof705 gmail-_qzn01a66 gmail-_j0l11wug gmail-_1weckb7n g=
mail-_1na21hna gmail-_vsnzgrf3 gmail-_x7c815vq gmail-_lh0y15vq gmail-_1m381=
5vq gmail-_qk1e15vq gmail-_12l6ysn8 gmail-_uga3ysn8 gmail-_mx8b7mnp gmail-_=
1kr87mnp gmail-_xo19t94y gmail-_1bemt94y gmail-_nalpstnw gmail-_151dstnw gm=
ail-_1exb1q9c gmail-_1hgu1q9c gmail-_1mgnt94y gmail-_nhket94y gmail-_h909i8=
nm gmail-_scgayz1z gmail-_ipl81e17 gmail-_40uk1l04 gmail-_i81p1a66 gmail-_1=
gx21e5h gmail-_1ls01ule gmail-_vm2c1rh5 gmail-_12ok1rh5 gmail-_rude1ule gma=
il-_1q16glyw gmail-_1io6glyw gmail-_juomusic gmail-_lcwuusic gmail-_pyovu2g=
c gmail-_ccm6u2gc gmail-_1ascu2gc gmail-_1yuau2gc gmail-_xr0w1a66 gmail-_4i=
o21a66 gmail-_euyxusvi gmail-_cahfusvi gmail-_zhnuidpf gmail-_1amdidpf gmai=
l-_mbgc124n gmail-_bu7z124n gmail-_131n1giz gmail-_gy101giz gmail-_1wfuwrk5=
 gmail-_16kzwrk5 gmail-_9kk3moej gmail-_cjus1w1g gmail-_9k2r1m30 gmail-_nhm=
w1m30 gmail-_yl021m30 gmail-_eiht5x2v gmail-_t9zb5x2v gmail-_mqok1w1g gmail=
-_3hsg1w1g gmail-_i7ngn7od gmail-_9wu1fb2s gmail-_1xcoh55r gmail-_1t36i7uo =
gmail-_137bh55r gmail-_1k7di7uo gmail-_97li16jw gmail-_12nh9lu1 gmail-_1g05=
17qg gmail-_i2igqmo9 gmail-_326zi7uo gmail-_113p1rpy gmail-_1n6t16jw gmail-=
_tgu817qg gmail-_1k4716jw gmail-_g0lxi7uo gmail-_ys4e1rpy gmail-_7gp8h55r g=
mail-_1yvqqmo9 gmail-_1vwwqmo9 gmail-_1rjuqmo9 gmail-_1v0lh55r gmail-_wmyy1=
7qg gmail-_748n17qg gmail-_1mfn17qg gmail-_1d7e17qg gmail-_p2vr17qg gmail-_=
19o6qmo9 gmail-_kxov17qg gmail-_1np517qg gmail-_m2f517qg gmail-_1b9t16jw gm=
ail-_1tq616jw gmail-_1rd216jw gmail-_1pbk16jw gmail-_k3li16jw gmail-_13zt1r=
py gmail-_2g12fb2s gmail-_k86bqmo9 gmail-_b5iy1rpy gmail-_gti31rpy gmail-_1=
f0g16jw gmail-_9d3e17qg gmail-_qdia16jw gmail-_72uv16jw gmail-_13dgkb7n gma=
il-_1707bror gmail-_1i3h1txw gmail-_16noidpf gmail-_h4fuidpf gmail-_pp6yidp=
f gmail-_1g4tidpf gmail-_11wmidpf gmail-_1bx8idpf" tabindex=3D"0" aria-labe=
l=3D"Scrollable content" role=3D"region"><code class=3D"gmail-language-" st=
yle=3D"white-space:pre">... 0 5029/2736 sub_xxx_usd START_REPLICATION [57P0=
1]:FATAL:  terminating connection due to administrator command</code></span=
></span></span></span></div><div><font size=3D"2">).</font></div><div><font=
 size=3D"2">We have checked the corresponding logs from PG 11.2, it took ex=
actly 60 seconds.</font></div><div><font size=3D"2"><br></font></div><div><=
font size=3D"2">We have also tried setting=C2=A0</font><span><span><span><c=
ode class=3D"gmail-_ca0qyh40 gmail-_u5f3m5ip gmail-_n3tdyh40 gmail-_19bvm5i=
p gmail-_2rkofajl gmail-_11c81u0j gmail-_1reo1wug gmail-_18m91wug gmail-_1d=
qoglyw gmail-_1e0c1nu9 gmail-_bfhk187e gmail-_16d9qvcn gmail-_syazi7uo gmai=
l-_vwz41kw7 gmail-_1i4q1hna gmail-_o5721jtm">checkpoint_timeout =3D 27min</=
code> and <code class=3D"gmail-_ca0qyh40 gmail-_u5f3m5ip gmail-_n3tdyh40 gm=
ail-_19bvm5ip gmail-_2rkofajl gmail-_11c81u0j gmail-_1reo1wug gmail-_18m91w=
ug gmail-_1dqoglyw gmail-_1e0c1nu9 gmail-_bfhk187e gmail-_16d9qvcn gmail-_s=
yazi7uo gmail-_vwz41kw7 gmail-_1i4q1hna gmail-_o5721jtm">archive_timeout =
=3D 23min=C2=A0</code></span></span></span><font size=3D"2">to make sure th=
e delayed shutdown is not related to these parameters, and still the shutdo=
wn is blocked just for 30 minutes.</font></div><div><font size=3D"2"><br></=
font></div><div><font size=3D"2">If we disable the subscription, the shutdo=
wn is smooth; that is why we suspect some change in logical replication, or=
 there are some new configuration parameters we have missed to let publishe=
r instance shutdown cleanly without that long delay, and finally terminatin=
g the sender process on the publisher instance.</font></div><div><font size=
=3D"2"><br></font></div><div><font size=3D"2">PostgreSQL version:</font></d=
iv><div>PostgreSQL 17.7 (Ubuntu 17.7-3.pgdg22.04+1) on x86_64-pc-linux-gnu,=
 compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0, 64-bit</div><div>=
<br></div><div>Timeouts:</div><div><br></div><div>publisher instance:=C2=A0=
</div><div>powa=3D# show wal_sender_timeout;<br>=C2=A0wal_sender_timeout <b=
r>--------------------<br>=C2=A010min<br>(1 row)<br><br></div><div>subscrib=
er instance:<br>powa=3D# show wal_receiver_timeout;<br>=C2=A0wal_receiver_t=
imeout <br>----------------------<br>=C2=A010min<br>(1 row)<br></div><div><=
br></div><div>OS version:</div><div>No LSB modules are available.<br>Distri=
butor ID:	Ubuntu<br>Description:	Ubuntu 22.04.5 LTS<br>Release:	22.04<br>Co=
dename:	jammy</div><div><br></div><div>We have found=C2=A0<a href=3D"https:=
//github.com/postgres/postgres/commit/5231ed8262c94936a69bce41f64076630bbd9=
9a2">https://github.com/postgres/postgres/commit/5231ed8262c94936a69bce41f6=
4076630bbd99a2</a>, not sure whether it applies to the behavior change desc=
ribed above.<br><br></div><div>Also, the &quot;<span class=3D"gmail-zds-inl=
ine-code-text gmail-zds-inline-code-start gmail-zds-inline-code-end gmail-z=
ds-text">walsender.c&quot; comment seems to explain that the shutdown is in=
tentionally postponed (could be a very long time, in our case, the lag is c=
aused by ETLs=C2=A0 and can be about 80GB, so postponing the shutdown after=
 all the lag costs a lot of time). And it does not explain to us the timeou=
t change from 60 seconds to 30 minutes (no timeout is mentioned):</span></d=
iv><div><span class=3D"gmail-zds-inline-code-text gmail-zds-inline-code-sta=
rt gmail-zds-inline-code-end gmail-zds-text"><br></span></div><div><div cla=
ss=3D"gmail-code-viewer-line"><div class=3D"gmail-code-viewer-line-code"><s=
pan class=3D"gmail-">* If the server is shut down, checkpointer sends us</s=
pan></div></div><div class=3D"gmail-code-viewer-line"><div class=3D"gmail-c=
ode-viewer-line-code"><span class=3D"gmail-"> * PROCSIG_WALSND_INIT_STOPPIN=
G after all regular backends have exited.  If</span></div></div><div class=
=3D"gmail-code-viewer-line"><div class=3D"gmail-code-viewer-line-code"><spa=
n class=3D"gmail-"> * the backend is idle or runs an SQL query this causes =
the backend to</span></div></div><div class=3D"gmail-code-viewer-line"><div=
 class=3D"gmail-code-viewer-line-code"><span class=3D"gmail-"> * shutdown, =
if logical replication is in progress all existing WAL records</span></div>=
</div><div class=3D"gmail-code-viewer-line"><div class=3D"gmail-code-viewer=
-line-code"><span class=3D"gmail-"> * are processed followed by a shutdown.=
  Otherwise, this causes the walsender</span></div></div><div class=3D"gmai=
l-code-viewer-line"><div class=3D"gmail-code-viewer-line-code"><span class=
=3D"gmail-"> * to switch to the &quot;stopping&quot; state. In this state, =
the walsender will reject</span></div></div><div class=3D"gmail-code-viewer=
-line"><div class=3D"gmail-code-viewer-line-code"><span class=3D"gmail-"> *=
 any further replication commands. The checkpointer begins the shutdown</sp=
an></div></div><div class=3D"gmail-code-viewer-line"><div class=3D"gmail-co=
de-viewer-line-code"><span class=3D"gmail-"> * checkpoint once all walsende=
rs are confirmed as stopping. When the shutdown</span></div></div><div clas=
s=3D"gmail-code-viewer-line"><div class=3D"gmail-code-viewer-line-code"><sp=
an class=3D"gmail-"> * checkpoint finishes, the postmaster sends us SIGUSR2=
. This instructs</span></div></div><div class=3D"gmail-code-viewer-line"><d=
iv class=3D"gmail-code-viewer-line-code"><span class=3D"gmail-"> * walsende=
r to send any outstanding WAL, including the shutdown checkpoint</span></di=
v></div><div class=3D"gmail-code-viewer-line"><div class=3D"gmail-code-view=
er-line-code"><span class=3D"gmail-"> * record, wait for it to be replicate=
d to the standby, and then exit.<br><br></span></div><div class=3D"gmail-co=
de-viewer-line-code"><span class=3D"gmail-">Our pipeline requires the insta=
nce restart, so far the only workaround we have found is to explicitly disa=
ble subscription before initiating shutdown, but it is considered a bit fra=
gile compared to smooth behavior on Pg11.</span></div><div class=3D"gmail-c=
ode-viewer-line-code"><span class=3D"gmail-"><br></span></div><div class=3D=
"gmail-code-viewer-line-code"><span class=3D"gmail-">Is there a way how to =
make the 30-minute shutdown shorter to become closer to pg11 behavior?</spa=
n></div><div class=3D"gmail-code-viewer-line-code"><span class=3D"gmail-"><=
br></span></div><div class=3D"gmail-code-viewer-line-code"><span class=3D"g=
mail-">Thanks in advance</span></div><div class=3D"gmail-code-viewer-line-c=
ode"><span class=3D"gmail-">Ales Zeleny</span></div><div class=3D"gmail-cod=
e-viewer-line-code"><span class=3D"gmail-"><br></span></div></div><br></div=
></div>

--0000000000000016ad0644e41d90--