Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1um1a9-008k4j-IU for pgsql-general@arkaria.postgresql.org; Wed, 13 Aug 2025 02:51:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1um1a6-00BNyc-Nj for pgsql-general@arkaria.postgresql.org; Wed, 13 Aug 2025 02:51:14 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1um1a6-00BNyT-92 for pgsql-general@lists.postgresql.org; Wed, 13 Aug 2025 02:51:14 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1um1a0-000I1I-1F for pgsql-general@lists.postgresql.org; Wed, 13 Aug 2025 02:51:13 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-617b36cc489so12019002a12.0 for ; Tue, 12 Aug 2025 19:51:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755053466; x=1755658266; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jBu0jFv0ZdFwDnKam8fVdGOvWz4nnlkDAHHg55EMCGg=; b=k0QgA5iuDOpxJfu4p6atY5A3JhttXI2c9zBDFxTr6qLBjAVUJeZN2Vx8CZ/m94K+gF wNldm38UVKY4Qq29Ty/dw1vPlp7qrzQOAwhqWSa+kTSQ26xUpvmguFmRHSQPTKzVZlT2 P9XA4QXqkE4mIjLKMnOlHoNCycHYvl4YCUgD7miLOn6gwzt9YoHrvhYXf1WolAfrFBOL xJDlpO6bTN/PmXZ9oH4Vr23aGkmLC6iOvOZSyX+m37KXYxENibUT6994HrrH4SMbsp+O z0XgE8JKewrXY368CJRA0ffsOE8hU0+4IBLMSmddJwdq1nZg6ZvCg/oRs9PjvidJCe7l MGUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755053466; x=1755658266; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jBu0jFv0ZdFwDnKam8fVdGOvWz4nnlkDAHHg55EMCGg=; b=fU3EFfeQbZPLdlwXyDBhs+1YG+ufsA/vzgprsv2yFH8IvaIwFducDzzPeRIIztLNDR LTNvHeIkFCw2qtyswEsy0Q6LI8rXEQ7K5ptk88Z0ngHqQvOvhCCTk4rNv8rcvEZiISDM NrVeQhFI+cnHCCzBP1nkVUJhbAz4cIECFZjZlHPgpkqGg0GGgD1VVOk8KkLksJXzAfYB lkOdt3R/mCGKUXtxBvETbsVoRz2TEa8F7yEJAqxAzMroglGMOuu9Rki57GAQL8P/wLdE /hPP3mc6yDzVfkjT5GcpKaEj0EmLFxhzGtRpHZFq6jCjKZ27O/jXgSk9w1s8nxCffMgX HbCg== X-Forwarded-Encrypted: i=1; AJvYcCVciJxbSpHj9MYvlAA2jkqld996u5yw4jHyTHVmYVZH6HPg4hZWrUjSZTugyuaoBq7jKtl/Y2DWqWedNSD2@lists.postgresql.org X-Gm-Message-State: AOJu0YxZXN9rz6/18CCO7zoj6dUgxitiHhXnskwNDVoOCxmU6FLRe8O4 ErHSqJ5ANjBe0HbgDbzbUjxHpFwdtgKCtwVtmjhzZ+TzDUjj6W11itMuTqnrOdw93UiPrttg+YV /0rJTSCGkMyJUrN63jco3vqJ2WTqT/MaZVijd X-Gm-Gg: ASbGncsWvP1Yo5r9rIKt/AT5Yl4nhBsLqNsLCQAg5ijam1YjLv9K2y68CMktDf0PD3r 2fr79JpY2e3IjJ6Qmw532UJJqu1I2E0qgmstcg0xpsHGIHISeF/sCknXQmZMNfC7L5iNGi0HxAs wcA7XWWIctTa1nUSDonNRWQsgdhH7Q+juirDlaId1a9Xv9BFSpByt6nbSEWu/yQ/SKgAcBB1T3D lLhbR0= X-Google-Smtp-Source: AGHT+IFxxzy9Ks8sdVPASOCCzBXLmZNfjnxf3+76m55+1cQkRceNLc20Xf2/VAW5RQnaruZAi/NURUGjVOTbua73Yc0= X-Received: by 2002:a05:6402:524c:b0:615:bab3:9ee9 with SMTP id 4fb4d7f45d1cf-6186bfe282fmr1039437a12.26.1755053465893; Tue, 12 Aug 2025 19:51:05 -0700 (PDT) MIME-Version: 1.0 References: <73f3723b-f279-43c6-884d-d12b3151ec9e@aklaver.com> In-Reply-To: From: Justin Date: Tue, 12 Aug 2025 22:50:54 -0400 X-Gm-Features: Ac12FXyrEM4aY0UOQJS6qU6FEA4mAo5_rWEkXaFtQh54u19SIZJy48Hu2YWnEFM Message-ID: Subject: Re: Questions about the continuity of WAL archiving To: px shi Cc: Ron Johnson , "pgsql-generallists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000658232063c363a2a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000658232063c363a2a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Aug 12, 2025 at 10:24=E2=80=AFPM px shi wrote= : > How often does your primary node crash, and then not recover due to WALs >> corruption or WALs not existing? >> >> If it's _ever_ happened, you should _fix that_ instead of rolling your >> own WAL archival.process. >> > > I once encountered a case where the recovery process failed to restore t= o > the latest LSN due to missing WAL files in the archive. The root cause wa= s > multiple failovers between primary and standby. During one of the > switchovers, the primary crashed before completing the archiving of all W= AL > files. When the standby was promoted to primary, it began archiving WAL > files for the new timeline, resulting in a gap between the WAL files of t= he > two timelines. Moreover, no base backup was taken during this period. > > I am not sure what the problem is here either, other than something seriously wrong with configuration with PostgreSQL and PgBackrest. The replica should be receiving the WAL via a replication slot using Streaming. Meaning the primary will keep the WAL until the replica is caught up, if the replica becomes disconnected due to max_slot_wal_keep_size aka wal_keep_segments is exceeded the replicas recovery_command can take offer and fetch from the WAL Archive to catch the replica up. This assumes hot_feedback is on so the WAL replay won't become delayed due to snapshot locks on the replica. If all the above is true the replica should never lag behind unless the disk IO layer is way undersized compared to the Primary. S3 is being talked about so it makes me wonder about DISK IO configuration on the primary vs the replica. I see this causing lag under high load where the replica IO layer is the bottleneck. If PgBackrest can't keep up with WAL archiving, as others have stated you need to configure Asynchronous Archiving. The number of workers depends on the load. I have a server running 8 parallel workers to archive 1TB of WAL daily.... And another server during maintenance tasks generates around 10,000 WAL files in about 2 hours using 6 PgBAckrest workers All to S3 buckets. The above statement makes me wonder if there is some kind of High Availability monitor running like pg_autofailover, that is promoting a replica then converting the former primary to a replica of the recently "promoted replica" If the above matches to what is happening, it is very easy to mess up the configuration for WAL archiving and backups. Part of the process of promoting a replica is to make sure WAL archiving is working. The replica after being promoted immediately kicks of autovacuum to rebuild things like FSM which generates a lot of WAL files. If you are losing WAL files the configuration is wrong somewhere.. Just not enough information on the series of events and the configuration to tell what the root cause is other than miss-configuration. Thanks Justin --000000000000658232063c363a2a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Aug 12,= 2025 at 10:24=E2=80=AFPM px shi <spxlyy123@gmail.com> wrote:
How often does your primary node crash, and then not recover du= e to WALs corruption or WALs not existing?

If it's _ever_ happen= ed, you should _fix that_ instead of rolling your own WAL archival.process.=

=C2=A0I once encountered a case where = the recovery process failed to restore to the latest LSN due to missing WAL= files in the archive. The root cause was multiple failovers between primar= y and standby. During one of the switchovers, the primary crashed before co= mpleting the archiving of all WAL files. When the standby was promoted to p= rimary, it began archiving WAL files for the new timeline, resulting in a g= ap between the WAL files of the two timelines. Moreover, no base backup was= taken during this period.


=
I am not sure what the problem is=C2=A0 here either, other than s= omething seriously wrong with configuration with=C2=A0PostgreSQL and PgBack= rest.

The replica should be receiving the WAL=C2=A0via a replication= =C2=A0slot using Streaming.=C2=A0 Meaning the primary will keep the WAL unt= il the replica is caught up,=C2=A0 if the replica becomes disconnected due = to=C2=A0 max_slot_wal_keep_size aka=C2=A0wal_keep_segments=C2=A0 is exceede= d the replicas recovery_command can take offer and fetch from the WAL Archi= ve to catch the replica up.=C2=A0 This assumes hot_feedback is on so the WA= L replay won't become delayed due to snapshot locks on the replica.=C2= =A0

If=C2=A0 all the above is true the replica should never lag= behind unless the disk IO layer is way undersized compared to the Primary.= =C2=A0 S3 is being talked about=C2=A0 so it makes me wonder about DISK IO c= onfiguration on the primary vs the replica.=C2=A0 I see this causing lag un= der high load where the replica IO layer is the bottleneck.

If PgBac= krest can't keep up with WAL archiving, as others have stated=C2=A0 you= need to configure Asynchronous Archiving. The number of workers depends on= the load. I have a server running 8 parallel=C2=A0workers to archive 1TB o= f WAL daily....=C2=A0 And another server=C2=A0 during maintenance tasks gen= erates=C2=A0around 10,000 WAL files in about 2 hours using 6 PgBAckrest wor= kers=C2=A0 All=C2=A0to S3 buckets.

The above statement makes me wond= er if there is some kind of High Availability monitor running like pg_autof= ailover, that is promoting a replica then=C2=A0 converting the former prima= ry to a replica of the recently "promoted replica"=C2=A0=C2=A0
If the above matches to what=C2=A0is happening, it is very easy to mes= s up the configuration for WAL archiving and backups. Part of the process o= f promoting a replica is to make sure WAL archiving is working.=C2=A0 The r= eplica after being promoted immediately kicks of autovacuum to rebuild thin= gs like FSM which generates a lot of WAL files.

If you are=C2=A0losi= ng=C2=A0 WAL files the configuration is wrong somewhere..

Just not e= nough information on the series of events and the=C2=A0configuration to tel= l what the root cause is other than miss-configuration.


ThanksJustin


=C2=A0
--000000000000658232063c363a2a--