Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxNE0-00AqwJ-2u for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Mar 2026 10:43:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vxNDz-006V3I-1K for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Mar 2026 10:43:35 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vxNDy-006V37-2n for pgsql-hackers@lists.postgresql.org; Tue, 03 Mar 2026 10:43:35 +0000 Received: from forwardcorp1b.mail.yandex.net ([2a02:6b8:c02:900:1:45:d181:df01]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vxNDv-00000000AH7-12Wf for pgsql-hackers@postgresql.org; Tue, 03 Mar 2026 10:43:34 +0000 Received: from mail-nwsmtp-smtp-corp-main-34.sas.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-34.sas.yp-c.yandex.net [IPv6:2a02:6b8:c24:fa2:0:640:41ee:0]) by forwardcorp1b.mail.yandex.net (Yandex) with ESMTPS id 41EB180808; Tue, 03 Mar 2026 13:43:26 +0300 (MSK) Received: from smtpclient.apple (unknown [2a02:6bf:8080:672::1:1c]) by mail-nwsmtp-smtp-corp-main-34.sas.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id NhkPD20AFa60-oKqrfeZY; Tue, 03 Mar 2026 13:43:25 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1772534605; bh=mIq+sIcczSGNDJs9B7SYckjoqkF3Jk83YolaaRBHOlA=; h=References:To:Cc:In-Reply-To:Date:From:Message-Id:Subject; b=fS8sQfscUqhAgkSY6+RZ5LTFBWP1AJCLqjEDnIhclqN/wg8TeLc4hmItKe1qcBfRG z8Vva9Yz/euGp2uJ2D0d0CVOoYwO5uNMPLuxvRQBT4Q/wNxAZaHfm7Oyek9JwWZStg wsnPfN8skQDdbKJeKLZvA68r6lNWOa9yqTjfzgjI= Authentication-Results: mail-nwsmtp-smtp-corp-main-34.sas.yp-c.yandex.net; dkim=pass header.i=@yandex-team.ru From: Jaroslav Novikov Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_08872F1B-5957-4064-9A93-0B778A1E8BE4" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: Streaming replication and WAL archive interactions Date: Tue, 3 Mar 2026 13:43:13 +0300 In-Reply-To: Cc: hlinnaka@iki.fi, Michael Paquier , Robert Haas , Venkata Balaji N , Andres Freund , Fujii Masao , Borodin Vladimir , PostgreSQL-development , nkak@vmware.com, Roman Khapov , Kirill Reshke , ShirishaRao@vmware.com To: Andrey Borodin References: <548AF1CB.80702@vmware.com> <689EB259-44C2-4820-B901-4F6B1C55A1E4@simply.name> <549083D6.1000301@vmware.com> <54949108.3030109@vmware.com> <552FA38F.9060005@iki.fi> <5535FE71.1010905@iki.fi> <55362CAD.2000207@iki.fi> <553741FE.1080403@iki.fi> <554CB84E.3070406@iki.fi> <5550D20D.6090703@iki.fi> X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_08872F1B-5957-4064-9A93-0B778A1E8BE4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 12 Feb 2026, at 09:56, Andrey Borodin wrote: >=20 > Hi Heikki, >=20 > There=E2=80=99s a nearby thread [0] (about 10 years later) where I=E2=80= =99m working on a problem your patch from this thread helps solve. >=20 > In datacenter large outages, 1=E2=80=932% of clusters end up with gaps = in their PITR timeline. > In HA setups, when the primary is lost, some WAL can be missing from = the archive even though it was streamed to the standby. Many HA tools = (PGConsul, Patroni, etc.) try to re-archive from the standby, but those = WAL files may already have been removed. >=20 > Your =E2=80=9Cshared=E2=80=9D archive mode addresses this: the standby = keeps WAL until it=E2=80=99s archived. archive_mode=3Dalways plus an = archive tool can work, but it=E2=80=99s expensive. In WAL-G, for = example, the archive command does a GET on the standby=E2=80=99s WAL, = then decrypts and compares. Switching to HEAD would reduce cost in some = clouds but still adds cost. >=20 > Another option is coordinating archiving outside Postgres, but that = would mean building distributed coordination into the archive tool. >=20 > Shared archive mode tackles this in Postgres itself. >=20 > I=E2=80=99ve retrofitted your patch, incorporated ideas from the = Greenplum work [1], and made some improvements. >=20 > The patchset has three parts: > * Rebase + tests =E2=80=93 Your original patch, rebased, with tests = added. > * Timeline switching =E2=80=93 Correct handling of timeline switches = in archive status updates. > * Avoid directory scans =E2=80=93 Skip scanning archive_status when = possible, which was costly in WAL-G setups. >=20 > What do you think? >=20 > Best regards, Andrey Borodin. >=20 > = Hi Andrey, Adding the missing references [0] and [1]. [0] https://www.postgresql.org/message-id/5550D20D.6090703%40iki.fi [1] = https://github.com/open-gpdb/gpdb/commit/4f2db1929df1b5eed28f3350595563609= 6bb4e8b Best, Jaroslav Novikov. --Apple-Mail=_08872F1B-5957-4064-9A93-0B778A1E8BE4 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 12 Feb 2026, at 09:56, Andrey Borodin = <x4mmm@yandex-team.ru> wrote:

Hi = Heikki,

There=E2=80=99s a nearby thread [0] (about 10 years = later) where I=E2=80=99m working on a problem your patch from this = thread helps solve.

In datacenter large outages, 1=E2=80=932% of = clusters end up with gaps in their PITR timeline.
In HA setups, when = the primary is lost, some WAL can be missing from the archive even = though it was streamed to the standby. Many HA tools (PGConsul, Patroni, = etc.) try to re-archive from the standby, but those WAL files may = already have been removed.

Your =E2=80=9Cshared=E2=80=9D archive = mode addresses this: the standby keeps WAL until it=E2=80=99s archived. = archive_mode=3Dalways plus an archive tool can work, but it=E2=80=99s = expensive. In WAL-G, for example, the archive command does a GET on the = standby=E2=80=99s WAL, then decrypts and compares. Switching to HEAD = would reduce cost in some clouds but still adds cost.

Another = option is coordinating archiving outside Postgres, but that would mean = building distributed coordination into the archive tool.

Shared = archive mode tackles this in Postgres itself.

I=E2=80=99ve = retrofitted your patch, incorporated ideas from the Greenplum work [1], = and made some improvements.

The patchset has three parts:
* = Rebase + tests =E2=80=93 Your original patch, rebased, with tests = added.
* Timeline switching =E2=80=93 Correct handling of timeline = switches in archive status updates.
* Avoid directory scans =E2=80=93 = Skip scanning archive_status when possible, which was costly in WAL-G = setups.

What do you think?

Best regards, Andrey = Borodin.

<v4-0001-Add-archive_mo= de-shared-for-coordinated-WAL-archi.patch><v4-0003-Optimize-Proce= ssArchivalReport-to-avoid-directory.patch><v4-0002-Mark-ancestor-= timeline-WAL-segments-as-archived.patch>

Hi Andrey,

Adding the missing = references [0] and [1].

[1] = https://github.com/open-gpdb/gpdb/commit/4f2db1929df1b5eed28f3350595563609= 6bb4e8b

Best, Jaroslav = Novikov.

= --Apple-Mail=_08872F1B-5957-4064-9A93-0B778A1E8BE4--