Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgIgj-009P8C-3g for pgsql-admin@arkaria.postgresql.org; Fri, 07 Feb 2025 07:22:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tgIgi-00GPhY-3X for pgsql-admin@arkaria.postgresql.org; Fri, 07 Feb 2025 07:22:08 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgIgh-00GPhO-MR for pgsql-admin@lists.postgresql.org; Fri, 07 Feb 2025 07:22:07 +0000 Received: from mail-oo1-xc30.google.com ([2607:f8b0:4864:20::c30]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tgIgd-004JXz-2j for pgsql-admin@lists.postgresql.org; Fri, 07 Feb 2025 07:22:06 +0000 Received: by mail-oo1-xc30.google.com with SMTP id 006d021491bc7-5fa8fa48f30so1371712eaf.3 for ; Thu, 06 Feb 2025 23:22:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1738912922; x=1739517722; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=SBJxP2EaqdxDjpfH3P4ksiiJsq9erTba31lAeyP2sBs=; b=dGoBOymnWN8eByg2pa9eRG4VRd7w7Vc5ux40Uf/2F68ELAYfi42OWWlIhVQEZuT+9C Ym+omYvwC+kxyQ6td+zTkIibYx5sPSV4aQXjIj+sB+mdv9vCH2GSSDX0Nm9K6NKAEAGx UWsfwpDAW0YUCDWo6MEo7xbfgarLz+pq/252z5Fo07d9NtOmRX2zu50FPRyOibHZe55z m6ATvNsSUPJjVXYoOXmo1uX2299tX5wzxSJBDSU8WbsOtSrv1xxKXFYHrMk4L4cjY5Wv CAb3K159BfD5q2wJNy6SXKXp9rW0FK5oh7s2EocNoMiVtSE22A8x3dhMmDIhx5ppeek5 yfnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738912922; x=1739517722; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SBJxP2EaqdxDjpfH3P4ksiiJsq9erTba31lAeyP2sBs=; b=peBXikEGnEmqxjC+yGeSf8o80pEnSphJ7UxQkPYpCP729Q+dANzTumIGoIObygR6qR 8WqU7wFUP5olFYErv7TgTYcrEa8GyJ60zoKZhCwdbiIJEbIGF/5WvV5WIMl29bEOEMY4 A7Yyb1iS7aqubCzUZMdcMMtgTKZqyt2p56YGN/ZFqqk/92a3D7hzD0o9S+dF9LPmvUo8 LaE9FEJmQTBwh4r3GvekDKhtTjwM2Xpd9BdCEntx+G4ekdfJ+Q3hvm9Yxtt2+59H8HtY 0XFR8qCsH8ZEuXEkSBWl/FrAG/LCxwZZaFpuK4p+r+7WnQE9Ll8dM4+mm9IAC/c2vq/s uMbg== X-Gm-Message-State: AOJu0YxQBVP2xId5S24vgPfkUmIQYsm9Y7S0m4ZJQ/SU/Fb8irvpJEwp mTTJyFZRY8Va3fLG+5YI2mtCYBA44vm5R+f3X8DAG+3ZChXrlw5u8jD9t5HSW7fYHoc9KlZCJnm gtryR+drtFaorWRNhGiT6bEuG42iZIEgd0NnC X-Gm-Gg: ASbGncuazBtgh8ujFzVxGwjiAwy7RWlpXhcXiSM+jEsO4MtPTt6mqlcHCH9uh5e+QLX F745Eu4KfR+QusPbDl2dnLoTlhg3VOX9U83hxG8OdBcdXv+nKExlSvZu5D6lqM9E3ME0X3NoLuI 1pBPxIeaIsy6wWkLbeTrhT5f/I6CaQ X-Google-Smtp-Source: AGHT+IGr8iRxjB4qBi+nf1dNWMMnmvpoTeS0WqaTTIdZHSI7LBA39ZJwvBA+3n/DWzhqesIUNOh4k4TLvOhIL8wHDWE= X-Received: by 2002:a05:6820:1c8d:b0:5f6:2e9b:bb0a with SMTP id 006d021491bc7-5fc5e5cd122mr1359909eaf.2.1738912921623; Thu, 06 Feb 2025 23:22:01 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Raphael_Salguero_Arag=C3=B3n?= Date: Fri, 7 Feb 2025 08:21:50 +0100 X-Gm-Features: AWEUYZkZfADAxqvDzlNp7R0FHHitXF3d4rIvHJX483pkkbPtwSjzgvSmmZM9b9U Message-ID: Subject: Re: Postgresql replication failed in Patroni To: Mendbayar Alzakhgui Cc: "pgsql-admin@lists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000fd39b4062d8836d3" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000fd39b4062d8836d3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Mendbayar, Am Fr., 7. Feb. 2025 um 07:04 Uhr schrieb Mendbayar Alzakhgui < mendbayar.alz@unitel.mn>: > Hello everybody, > I need a urgent help on my Patroni managed postgres cluster, > > the main patroni managed leader postgres crushed and down, when we try to > start the Postgresql it=E2=80=99s showing us this error log > > 2025-02-07 12:31:18 +08 [2354332]: [4-1] user=3D,db=3D,app=3D,client=3DLO= G: > listening on IPv4 address "ip_address", port 5432 > > 2025-02-07 12:31:18 +08 [2354332]: [5-1] user=3D,db=3D,app=3D,client=3DLO= G: > listening on Unix socket "./.s.PGSQL.5432" > > 2025-02-07 12:31:18 +08 [2354337]: [1-1] user=3D,db=3D,app=3D,client=3DLO= G: > database system was shut down in recovery at 2025-02-07 11:56:50 +08 > > 2025-02-07 12:31:18 +08 [2354337]: [2-1] user=3D,db=3D,app=3D,client=3DLO= G: > entering standby mode > > 2025-02-07 12:31:18 +08 [2354337]: [3-1] user=3D,db=3D,app=3D,client=3DFA= TAL: > requested timeline 20 is not a child of this server's history > > 2025-02-07 12:31:18 +08 [2354337]: [4-1] user=3D,db=3D,app=3D,client=3DDE= TAIL: > Latest checkpoint is at 71/4D8BB8C0 on timeline 19, but in the history of > the requested timeline, the server forked off from that timeline at > 71/4D793220. > > 2025-02-07 12:31:18 +08 [2354332]: [6-1] user=3D,db=3D,app=3D,client=3DLO= G: > startup process (PID 2354337) exited with exit code 1 > > 2025-02-07 12:31:18 +08 [2354332]: [7-1] user=3D,db=3D,app=3D,client=3DLO= G: > aborting startup due to startup process failure > > 2025-02-07 12:31:18 +08 [2354332]: [8-1] user=3D,db=3D,app=3D,client=3DLO= G: > database system is shut down > > > what should we check?, and is this because the leader node already delete= d > the wal it=E2=80=99s needed to start? And we were connected debezium to t= his node > when we recover it will the debezium start automatically from the > disconnected sessions? Please help me. > > You're right, the crashed DB is not able to recover due to a lag of transactional information. What is your DB size? The easiest way is to stop Patroni on the crashed instance (systemctl stop patroni), remove and recreate the data directory (also take care about tablespace if they're in use). Afterwards, you can restart the Patroni service on the crashed instance and run a reinit from the current leader: patronictl -c /etc/patroni.yml reinit your_cluster_name replica_node That should do the trick :) > Sincerely, > > > * Mendbayar A. *| Database Administrator > > Information technology department > > > > +976 8611-2165 > > mendbayar.alz@unitel.mn > > Central Tower, 11th floor > > www.unitel.mn > > > Best regards Raphael --000000000000fd39b4062d8836d3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Mendbayar,

Am Fr.= , 7. Feb. 2025 um 07:04=C2=A0Uhr schrieb Mendbayar Alzakhgui <mendbayar.alz@unitel.mn>:
=

Hello everybody,
I need a urgent help on my Patroni managed postgres cluster,

the main patroni managed leader postgres crushed and down, when we try to s= tart the Postgresql it=E2=80=99s showing us this error log

2025-02-07 12:31:18 +08 [2354332]: [4-1] user=3D,db=3D,app=3D,client=3DLOG:= =C2=A0 listening on IPv4 address "ip_address", port 5432

2025-02-07 12:31:18 +08 [2354332]: [5-1] user=3D,db= =3D,app=3D,client=3DLOG:=C2=A0 listening on Unix socket "./.s.PGSQL.54= 32"

2025-02-07 12:31:18 +08 [2354337]: [1-1] user=3D,db= =3D,app=3D,client=3DLOG:=C2=A0 database system was shut down in recovery at= 2025-02-07 11:56:50 +08

2025-02-07 12:31:18 +08 [2354337]: [2-1] user=3D,db= =3D,app=3D,client=3DLOG:=C2=A0 entering standby mode

2025-02-07 12:31:18 +08 [2354337]: [3-1] user=3D,db= =3D,app=3D,client=3DFATAL:=C2=A0 requested timeline 20 is not a child of th= is server's history

2025-02-07 12:31:18 +08 [2354337]: [4-1] user=3D,db= =3D,app=3D,client=3DDETAIL:=C2=A0 Latest checkpoint is at 71/4D8BB8C0 on ti= meline 19, but in the history of the requested timeline, the server forked = off from that timeline at 71/4D793220.

2025-02-07 12:31:18 +08 [2354332]: [6-1] user=3D,db= =3D,app=3D,client=3DLOG:=C2=A0 startup process (PID 2354337) exited with ex= it code 1

2025-02-07 12:31:18 +08 [2354332]: [7-1] user=3D,db= =3D,app=3D,client=3DLOG:=C2=A0 aborting startup due to startup process fail= ure

2025-02-07 12:31:18 +08 [2354332]: [8-1] user=3D,db= =3D,app=3D,client=3DLOG:=C2=A0 database system is shut down


what should we check?, and is this because the leader node already deleted = the wal it=E2=80=99s needed to start? And we were connected debezium to thi= s node when we recover it will the debezium start automatically from the di= sconnected sessions? Please help me.

You're right, the crashed D= B is not able to recover due to a lag of transactional information.
What is your DB size?

The easiest way is to sto= p Patroni on the crashed instance (systemctl stop patroni), remove and recr= eate the data directory (also take care about tablespace if they're in = use).
Afterwards, you can restart the Patroni service on the cras= hed instance and run a reinit from the current leader:

=
patronictl -c /etc/patroni.yml reinit your_cluster_name replica_node

That should do the trick :)
=C2=A0

Sincerely,


Mendbayar A.
| Database Administrator

Information technology department

=C2=A0

+976 8611-2165

mendbayar.alz@unitel.mn

Central Tower, 11th floor

www.unitel.mn

=C2=A0

Best regards
=C2=A0Raphael=C2=A0
--000000000000fd39b4062d8836d3--