Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tbHjr-009Tt1-N2 for pgsql-general@arkaria.postgresql.org; Fri, 24 Jan 2025 11:20:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tbHjq-00DeIM-8x for pgsql-general@arkaria.postgresql.org; Fri, 24 Jan 2025 11:20:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tbHjp-00DeIE-QM for pgsql-general@lists.postgresql.org; Fri, 24 Jan 2025 11:20:37 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tbHjn-001Fy6-0M for pgsql-general@postgresql.org; Fri, 24 Jan 2025 11:20:36 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-5d3f28a4fccso2951780a12.2 for ; Fri, 24 Jan 2025 03:20:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737717634; x=1738322434; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xWoyiy5krfEO2jmHEDrPfRMLYYlzCXrZgvsMNmbY/xI=; b=e5wnPFcCZMmZC8mHZgVnbBrWt/zXUt73RFi0nfsaUPz10AVs/LxjjGnSsPqtdLrhDJ 0Tp1w2ueWKauJ1DMvC4yzuZzN4sgt6hq9hgdIxLfIHDAlkSvATkqqrK1UPXHvnCwN4AZ ffp1KG4diK7a9DuT9jMvXa/ZelXkzGoPcR3PZCSnBz0tW2XSl+/HTJmf0rHocCyFJC+f RfbIbEf8uA4OWGHBmt3GOO+Th+BwEUiJtkwvdoZXxK/iYOAK2S2uwntfk7Wo/DE1SfZ4 VmNWOC1kShlrlJD3DO7MCBBW9OELI/QniV4mnM6qOo90iPsbwAe0x9K1MUML10b5saHp Pnug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737717634; x=1738322434; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xWoyiy5krfEO2jmHEDrPfRMLYYlzCXrZgvsMNmbY/xI=; b=PQ21vwwHYJntErq91/CRxqMrUGXMtcjF4BeZ1yH8FI4+X4rPo8OVM2gfob+hpfYx1e mM+lFrFjO0HKYRZCk6Z90BIRBwECZ0OaOldOj6GmjUYfELSeT7eQuNchkK3vEiSwDCU9 aCKnPugya9T2Q3qhcnaZiodXU8AcLjnPGyeq8Bwx/GkFowHTI83OE1GhpXI9Giw50UJY hXlLj+qiltTm2K3Z10BwiSkb/JYHkattLcnvVBz+O8raVx5PdTugXyyShMHDqijuiw1v DtDp3VCJdxgq5fssBXS8oDO5VYjPaOIMOCZ+i2hhU87VBc7QYYjVBaqeehTwZpmllSEN ELRA== X-Forwarded-Encrypted: i=1; AJvYcCWPsJVvC8wkdZmdji9D+yy2IYARD9tICZtmCoxoWO39/HqAFpc8F2BgtlgUT/+mDyNZpwfmk01Uac9T3WKf@postgresql.org X-Gm-Message-State: AOJu0Yz+WdIjCdkYhT2Hz1+iMI2sz1WtvYUsz+YHBjaCeL2rRx5gtz+R Z6q7AkiNgqITJLFGDmD2wAleEM0aFk35F4qihhSoedndTqLmzxKiLn9cMwsEmO/SEP8NTc7weB/ BEqj54MbyBg0Ng1kndHjSsx/ufzQ= X-Gm-Gg: ASbGncv9k+D5+sWg2t7tQQv1hynr17/VftV/NlKG7FZK/MDdeQwz+/cpCTi15gAvB9n NB6hAq1kdGZAC38i/3HhPczocfsg+7OdwwMy6vckjC1sbGlo0tvPRDBNNaht0 X-Google-Smtp-Source: AGHT+IGQRuO0q/hQB2RJBUSI3OtUkKr9Gl3a4kpK0O1+WXQEnkoD8vqljx4rlqrC+3KiIBw9SDjzPYc8F6wbPS62WuE= X-Received: by 2002:a17:907:1c26:b0:ab2:d721:ed92 with SMTP id a640c23a62f3a-ab38b3afaecmr2669719266b.45.1737717633554; Fri, 24 Jan 2025 03:20:33 -0800 (PST) MIME-Version: 1.0 References: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> In-Reply-To: From: Paul Brindusa Date: Fri, 24 Jan 2025 11:20:22 +0000 X-Gm-Features: AWEUYZmgTOn8Zu9li6pTs1cRs5oFiJRRRZBC1FLKqtAqa0ujUw7tPQYNitmuRyI Message-ID: Subject: Re: Return of the pg_wal issue.. To: Saul Perdomo Cc: Adrian Klaver , pgsql-general Content-Type: multipart/alternative; boundary="000000000000449efa062c71ea5a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000449efa062c71ea5a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Good morning everyone, Following some troubleshooting last night we have managed to resolve the issue. Plowing through the entire thing we have actually came to the conclusion that the cluster is running but not replicating. So the number one lesson learned is to always check *replication* in the cluster, for the sake of data safety and not having to go through a million things. This cluster is set up without a VIP, therefore db01 will always be the master. Having it set up this way we have found that pb_hba.conf had this line: host all postgres db01/cidr trust From my understanding this means that the master was trying to replicate to itself and not trusting the other nodes? To fix we have put down the entire network: host all postgres network/cidr trust Following the config amendment we have restarted the replicas with patronictl -c /path/patroni/config reinit Happy to say that clean up of wal files kicked in and now are down 4% usage of /var volume from 96%. Now then, there is still the bit with the actual postgres logs not rotating properly? lol, but ill leave that for another email. Massive thank you to all of you for the support. On Thu, Jan 23, 2025 at 7:02=E2=80=AFPM Saul Perdomo wrote: > Thanks for the correction Adrian - my oversimplification went too far, an= d > into "plain wrong" territory. > > (The detail that I felt was too much for this explanation was: "and the > way to simply get rid of them would be to set your archive command to > '/bin/true', say".. but didn't want to make it seem like I was suggesting > Paul do that) > > On Thu, Jan 23, 2025, 11:07=E2=80=AFa.m. Adrian Klaver > wrote: > >> On 1/23/25 06:51, Saul Perdomo wrote: >> >> > This is why everybody will tell you "don't just delete these files, >> > archive them properly!" Again, for operational purposes, you could jus= t >> > delete them. But you really want to make a /copy /of them before you >> > do... you know, /just in case /something bad happens to your DB that >> > makes you want to roll it back in time. >> >> No you can't just delete them for operational purposes without knowledge >> of whether they are still needed or not. >> >> Per: >> >> https://www.postgresql.org/docs/current/wal-intro.html >> >> and >> >> https://www.postgresql.org/docs/current/wal-configuration.html >> >> Short version, a WAL file must remain until a checkpoint is done that >> makes it's content no longer needed. >> >> > Cheers >> > Saul >> > >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com >> >> --=20 Kind Regards, Paul Brindusa paulbrindusa88@gmail.com --000000000000449efa062c71ea5a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Good morning everyone,

Following some t= roubleshooting last night we have managed to resolve the issue.
P= lowing through the entire thing=C2=A0 we have actually came to the conclusi= on that the cluster is running but not replicating.
So the number= one lesson learned is to always check replic= ation=C2=A0in the cluster, for the sake of data=C2=A0 safety and not ha= ving to go through a million things.=C2=A0

This cl= uster is set up without a VIP, therefore db01 will always be the master. Ha= ving it set up this way we have found that pb_hba.conf had this line:
=

=C2=A0host =C2=A0 =C2=A0all =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 postgres=C2=A0 =C2=A0 =C2=A0 =C2=A0 db01/cidr=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 trust

From my understanding t= his means that the master was trying to replicate to itself and not trustin= g the other nodes?=C2=A0

To fix we have put down t= he entire network:=C2=A0

=C2=A0host =C2=A0 = =C2=A0all =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 postgres=C2=A0 =C2=A0 = =C2=A0 =C2=A0 network/cidr=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 trust

Following the config amendment we have restarted the= replicas with patronictl -c /path/patroni/config reinit <cluster> &l= t;host>

Happy to say that clean up of wal files= kicked in and now are down 4% usage of /var volume from 96%.
Now then, there is still the bit with the actual postgres logs = not rotating properly? lol, but ill leave that for another email.

Massive thank you to all of you for the support.
=


On Thu, Jan 23, 2025 at 7:02= =E2=80=AFPM Saul Perdomo <saul= .perdomo@gmail.com> wrote:
Thanks for the correction Adrian - my o= versimplification went too far, and into "plain wrong" territory.=

(The detail that I felt was t= oo much for this explanation was: "and the way to simply get rid of th= em would be to set your archive command to '/bin/true', say"..= but didn't want to make it seem like I was suggesting Paul do that)=C2= =A0

On Thu, Jan 23, 2025, 11:07=E2=80=AFa.m. Adrian Klaver <adrian.klaver@aklav= er.com> wrote:
On 1/23/25 06:51, Saul Perdomo wrote:

> This is why everybody will tell you "don't just delete these = files,
> archive them properly!" Again, for operational purposes, you coul= d just
> delete them. But you really want to make a /copy /of them before=C2=A0= you
> do... you know, /just in case /something bad happens to your DB that <= br> > makes you want to roll it back in time.

No you can't just delete them for operational purposes without knowledg= e
of whether they are still needed or not.

Per:

https://www.postgresql.org/docs/cur= rent/wal-intro.html

and

https://www.postgresql.org/= docs/current/wal-configuration.html

Short version, a WAL file must remain until a checkpoint is done that
makes it's content no longer needed.

> Cheers
> Saul
>

--
Adrian Klaver
adrian.klaver@aklaver.com



--
Kind Regards,
Paul Brindusa
=

--000000000000449efa062c71ea5a--