Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vFeg1-00FTzO-TY for pgsql-pkg-debian@arkaria.postgresql.org; Sun, 02 Nov 2025 20:27:49 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vFeg0-00DiPD-Ba for pgsql-pkg-debian@arkaria.postgresql.org; Sun, 02 Nov 2025 20:27:47 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vBbW6-0008Ts-MD for pgsql-pkg-debian@lists.postgresql.org; Wed, 22 Oct 2025 16:16:50 +0000 Received: from cantus.hateo.de ([2a03:4000:36:db:c8c7:36ff:fe3d:24a3]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1vBbW3-003BfC-0Y for pgsql-pkg-debian@lists.postgresql.org; Wed, 22 Oct 2025 16:16:48 +0000 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 2DC7A22186A for ; Wed, 22 Oct 2025 18:16:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hauke.ninja; s=dkim; t=1761149804; h=from:reply-to:subject:date:message-id:to:mime-version:content-type; bh=PR2yE/sgy0nehLssDC0altun0Qo0iNWeC9ANRjTlrUs=; b=h2Js2HdSUgqcasfRrCWvyysWw3UPZ3cqmzj9LUC2aIRlPDvgCpQCXbVQnaLhukc5w++xU1 Hs45DL4ucs9Yhtc4ghuBCqI79L0I1P+XwjsHl3BrKmpU2VCuIAhZ1CO5nDfRkkd9UCDO3j HtuqCLawhmjaF4CiqYmMAiAAjxGNkPPT93GdiFE8MpO+6PeAdK8szW3SmHaXoStK+bGAqg JSzI+2AqX4e+fbHWQbWS98kY6uqF34rdTc+j+J2Mvadtcd9kqN+WP5CQdxVT6kpWpNqqmF Ag7qMxQi3vBkQ+7TTVCyAD5lVHkUIrdx9cLR5TrsymeFHXH++a1A8LuQtx1lSQ== From: "Hauke Bruno Wollentin" To: pgsql-pkg-debian@lists.postgresql.org Subject: Patroni 4.1.0 causes issues because of systemd notify Date: Wed, 22 Oct 2025 16:16:42 +0000 Message-Id: Reply-To: "Hauke Bruno Wollentin" User-Agent: eM_Client/10.3.2622.0 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="------=_MB1BF518F5-4DC5-46C2-A7EE-8DEB8D84FAB8" X-Last-TLS-Session-Version: TLSv1.2 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --------=_MB1BF518F5-4DC5-46C2-A7EE-8DEB8D84FAB8 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi all, while spinning up new Patroni cluster with version 4.1.0 (which has been=20 added to the Debian repo 2 days ago), my clusters won=E2=80=99t bootstrap= =20 properly anymore. At first, logs looks clear but anyways, all systemd operations like=20 start/stop/restart will run into a timeout, causing the service to stop=20 which then will break the cluster at all. It seems like the new systemd notify feature is causing this, since=20 using Patroni 4.0.7 with `Type=3Dsimple` in the systemd unit file works=20 pretty fine. I=E2=80=99m on Debian 12 and psql 17. This is the patroni log after bootstrapping. Everything works fine until=20 the 30s timeout of the systemd unit kicks in: Oct 22 17:43:09 hetzner-lab01 patroni[13066]: Success. You can now start=20 the database server using: Oct 22 17:43:09 hetzner-lab01 patroni[13066]: =20 /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresql/patroni/data -l=20 logfile start Oct 22 17:43:09 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:09,768=20 INFO: postmaster pid=3D13083 Oct 22 17:43:09 hetzner-lab01 patroni[13084]: localhost:5432 - no=20 response Oct 22 17:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.780=20 CEST [13083] LOG: starting PostgreSQL 17.6 (Debian 17.6-2.pgdg12+1) on=20 x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0,=20 64-bit Oct 22 17:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.780=20 CEST [13083] LOG: listening on IPv4 address "0.0.0.0", port 5432 Oct 22 17:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.780=20 CEST [13083] LOG: listening on IPv6 address "::", port 5432 Oct 22 17:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.782=20 CEST [13083] LOG: listening on Unix socket=20 "/var/run/postgresql/.s.PGSQL.5432=E2=80=9D Oct 22 17:43:09 hetzner-lab01 patroni[13087]: 2025-10-22 17:43:09.789=20 CEST [13087] LOG: database system was shut down at 2025-10-22 17:43:09=20 CEST Oct 22 17:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.798=20 CEST [13083] LOG: database system is ready to accept connections Oct 22 17:43:09 hetzner-lab01 systemd[1]: patroni.service: Got=20 notification message from PID 13083, but reception only permitted for=20 main PID 13059 Oct 22 17:43:10 hetzner-lab01 patroni[13091]: localhost:5432 - accepting=20 connections Oct 22 17:43:10 hetzner-lab01 patroni[13093]: localhost:5432 - accepting=20 connections Oct 22 17:43:10 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:10,814=20 INFO: establishing a new patroni heartbeat connection to postgres Oct 22 17:43:10 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:10,918=20 INFO: running post_bootstrap Oct 22 17:43:11 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:11,225=20 INFO: initialized a new cluster Oct 22 17:43:11 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:11,373=20 INFO: no action. I am (hetzner-lab01), the leader with the lock Oct 22 17:43:21 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:21,276=20 INFO: no action. I am (hetzner-lab01), the leader with the lock Oct 22 17:43:31 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:31,368=20 INFO: no action. I am (hetzner-lab01), the leader with the lock Oct 22 17:43:37 hetzner-lab01 systemd[1]: patroni.service: start=20 operation timed out. Terminating. Oct 22 17:43:37 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:37.987=20 CEST [13083] LOG: received fast shutdown request Oct 22 17:43:37 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:37.989=20 CEST [13083] LOG: aborting any active transactions Oct 22 17:43:37 hetzner-lab01 patroni[13095]: 2025-10-22 17:43:37.990=20 CEST [13095] FATAL: terminating connection due to administrator command Oct 22 17:43:37 hetzner-lab01 systemd[1]: patroni.service: Got=20 notification message from PID 13083, but reception only permitted for=20 main PID 13059 Oct 22 17:43:37 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:37.992=20 CEST [13083] LOG: background worker "logical replication launcher" (PID=20 13090) exited with exit code 1 Oct 22 17:43:37 hetzner-lab01 patroni[13085]: 2025-10-22 17:43:37.994=20 CEST [13085] LOG: shutting down Oct 22 17:43:37 hetzner-lab01 patroni[13085]: 2025-10-22 17:43:37.996=20 CEST [13085] LOG: checkpoint starting: shutdown immediate Oct 22 17:43:38 hetzner-lab01 patroni[13085]: 2025-10-22 17:43:38.014=20 CEST [13085] LOG: checkpoint complete: wrote 17 buffers (0.1%); 0 WAL=20 file(s) added, 0 removed, 0 recycled; write=3D0.005 s, sync=3D0.004 s,=20 total=3D0.020 s; sync files=3D13, longest=3D0.002 s, average=3D0.001 s;=20 distance=3D36 kB, estimate=3D36 kB; lsn=3D0/152D6F0, redo lsn=3D0/152D6F0 Oct 22 17:43:38 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:38.020=20 CEST [13083] LOG: database system is shut down Oct 22 17:43:39 hetzner-lab01 systemd[1]: patroni.service: Failed with=20 result =E2=80=98timeout=E2=80=99. Oct 22 17:43:39 hetzner-lab01 systemd[1]: Failed to start=20 patroni.service - Runners to orchestrate a high-availability PostgreSQL. Any recommendations how to fix/workaround this? cheers, Hauke =E2=80=94 =F0=9F=A5=B7 Hauke Bruno Wollentin --------=_MB1BF518F5-4DC5-46C2-A7EE-8DEB8D84FAB8 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Hi all,

w= hile spinning up new Patroni cluster with version 4.1.0 (which has been add= ed to the Debian repo 2 days ago), my clusters won=E2=80=99t bootstrap prop= erly anymore.

A= t first, logs looks clear but anyways, all systemd operations like start/st= op/restart will run into a timeout, causing the service to stop which then= will break the cluster at all.

=
It seems like the new systemd notify feature is causing thi= s, since using Patroni 4.0.7 with `Type=3Dsimple` in the systemd unit file= works pretty fine.

I=E2=80=99m on Debian 12 and psql 17.

This is the patroni log after bootstrapping.= Everything works fine until the 30s timeout of the systemd unit kicks in:

Oct 22 17:43:09= hetzner-lab01 patroni[13066]: Success. You can now start the database serve= r using:
Oct 22 17:43:09 hetzner-lab01 patroni[13066]:= =C2=A0 =C2=A0=C2=A0 /usr/lib/postgresql/17/bin/pg_ctl -D /var/lib/postgresq= l/patroni/data -l logfile start
Oct 22 17:43:09 hetzne= r-lab01 patroni[13059]: 2025-10-22 17:43:09,768 INFO: postmaster pid=3D1308= 3
Oct 22 17:43:09 hetzner-lab01 patroni[13084]: localh= ost:5432 - no response
Oct 22 17:43:09 hetzner-lab01 p= atroni[13083]: 2025-10-22 17:43:09.780 CEST [13083] LOG:=C2=A0 starting Pos= tgreSQL 17.6 (Debian 17.6-2.pgdg12+1) on x86_64-pc-linux-gnu, compiled by g= cc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
Oct 22 17= :43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.780 CEST [13083] L= OG:=C2=A0 listening on IPv4 address "0.0.0.0", port 5432
Oct 22 17:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.780 CE= ST [13083] LOG:=C2=A0 listening on IPv6 address "::", port 5432
Oct 22 17:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09= .782 CEST [13083] LOG:=C2=A0 listening on Unix socket "/var/run/postgresql/= .s.PGSQL.5432=E2=80=9D
Oct 22 17:43:09 hetzner-lab01 p= atroni[13087]: 2025-10-22 17:43:09.789 CEST [13087] LOG:=C2=A0 database sys= tem was shut down at 2025-10-22 17:43:09 CEST
Oct 22 1= 7:43:09 hetzner-lab01 patroni[13083]: 2025-10-22 17:43:09.798 CEST [13083]= LOG:=C2=A0 database system is ready to accept connections
Oct 22 17:43:09 hetzner-lab01 systemd[1]: patroni.service: Got notifi= cation message from PID 13083, but reception only permitted for main PID 13= 059
Oct 22 17:43:10 hetzner-lab01 patroni[13091]: loca= lhost:5432 - accepting connections
Oct 22 17:43:10 het= zner-lab01 patroni[13093]: localhost:5432 - accepting connections
Oct 22 17:43:10 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:= 10,814 INFO: establishing a new patroni heartbeat connection to postgres
Oct 22 17:43:10 hetzner-lab01 patroni[13059]: 2025-10-22 = 17:43:10,918 INFO: running post_bootstrap
Oct 22 17:4= 3:11 hetzner-lab01 patroni[13059]: 2025-10-22 17:43:11,225 INFO: initialize= d a new cluster
Oct 22 17:43:11 hetzner-lab01 patroni[= 13059]: 2025-10-22 17:43:11,373 INFO: no action. I am (hetzner-lab01), the= leader with the lock
Oct 22 17:43:21 hetzner-lab01 pat= roni[13059]: 2025-10-22 17:43:21,276 INFO: no action. I am (hetzner-lab01), = the leader with the lock
Oct 22 17:43:31 hetzner-lab0= 1 patroni[13059]: 2025-10-22 17:43:31,368 INFO: no action. I am (hetzner-la= b01), the leader with the lock
Oct 22 17:43:37 hetzner= -lab01 systemd[1]: patroni.service: start operation timed out. Terminating.=
Oct 22 17:43:37 hetzner-lab01 patroni[13083]: 2025-10= -22 17:43:37.987 CEST [13083] LOG:=C2=A0 received fast shutdown request
Oct 22 17:43:37 hetzner-lab01 patroni[13083]: 2025-10-22= 17:43:37.989 CEST [13083] LOG:=C2=A0 aborting any active transactions
=
Oct 22 17:43:37 hetzner-lab01 patroni[13095]: 2025-10-22 17= :43:37.990 CEST [13095] FATAL:=C2=A0 terminating connection due to administ= rator command
Oct 22 17:43:37 hetzner-lab01 systemd[1]= : patroni.service: Got notification message from PID 13083, but reception o= nly permitted for main PID 13059
Oct 22 17:43:37 hetzn= er-lab01 patroni[13083]: 2025-10-22 17:43:37.992 CEST [13083] LOG:=C2=A0 ba= ckground worker "logical replication launcher" (PID 13090) exited with exit = code 1
Oct 22 17:43:37 hetzner-lab01 patroni[13085]:= 2025-10-22 17:43:37.994 CEST [13085] LOG:=C2=A0 shutting down
Oct 22 17:43:37 hetzner-lab01 patroni[13085]: 2025-10-22 17:43:37.9= 96 CEST [13085] LOG:=C2=A0 checkpoint starting: shutdown immediate
Oct 22 17:43:38 hetzner-lab01 patroni[13085]: 2025-10-22 17:43= :38.014 CEST [13085] LOG:=C2=A0 checkpoint complete: wrote 17 buffers (0.1%= ); 0 WAL file(s) added, 0 removed, 0 recycled; write=3D0.005 s, sync=3D0.00= 4 s, total=3D0.020 s; sync files=3D13, longest=3D0.002 s, average=3D0.001 s= ; distance=3D36 kB, estimate=3D36 kB; lsn=3D0/152D6F0, redo lsn=3D0/152D6F0=
Oct 22 17:43:38 hetzner-lab01 patroni[13083]: 2025-10= -22 17:43:38.020 CEST [13083] LOG:=C2=A0 database system is shut down
=
Oct 22 17:43:39 hetzner-lab01 systemd[1]: patroni.service:= Failed with result =E2=80=98timeout=E2=80=99.
Oct 22 1= 7:43:39 hetzner-lab01 systemd[1]: Failed to start patroni.service - Runners = to orchestrate a high-availability PostgreSQL.

Any recommendations how to fix/workaround this?

cheers,
Hauke

=E2=80=94
=F0=9F=A5=B7 Hauke Bruno Wollentin

--------=_MB1BF518F5-4DC5-46C2-A7EE-8DEB8D84FAB8--