Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txBom-004PM2-4j for pgsql-general@arkaria.postgresql.org; Tue, 25 Mar 2025 21:28:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1txBok-00AGs7-CE for pgsql-general@arkaria.postgresql.org; Tue, 25 Mar 2025 21:28:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txBok-00AGrz-0G for pgsql-general@lists.postgresql.org; Tue, 25 Mar 2025 21:28:14 +0000 Received: from mail-pj1-x102e.google.com ([2607:f8b0:4864:20::102e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1txBoh-0019aR-39 for pgsql-general@postgresql.org; Tue, 25 Mar 2025 21:28:13 +0000 Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-30185d00446so310455a91.0 for ; Tue, 25 Mar 2025 14:28:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742938090; x=1743542890; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ElqSg3hc/ca9nKTpqZqkb+CiaiBmVAlCmzkcgoEH5J4=; b=XkRwU8oHO95n9wUgc0IMoswSoiq8KsMTwOhO7Vi7QYX18tZ1ZiKr1RLUx6xSCEUUfC XV8jBx1kbIdQ4q2FPiBqEx+IgeG0FA51JubZiagSAGGXt5zwXxeWDBC/hxeJ7M0ncxgU 7L+t02c2zlKaifXjTfYqOyw76pwI/eYQUyPZFSVGPvCIYfHM3zpLyzkjcii2+AjnKRs1 gK9D3mQb1XfBA8AaXpbJteHehlD4aiH41O6JrBicCO60CW3OFTAOlVNF2m/g3027Z60i wHyh15ZmZ5zlGz8u0Qew/62CmOuU/5Xe5aWld7jr3PS2DOONQicOrWvKsCWVN6vCxM+M 3Ifg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742938090; x=1743542890; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ElqSg3hc/ca9nKTpqZqkb+CiaiBmVAlCmzkcgoEH5J4=; b=ZidEHTm6VHa5AHrql5jJABXT9QCt2c4CchPuCZRl5ZjqtG5VETBkCveiH7onr8hC2z m5lbb6rmu16OHIFDwdXVUg19oETU5AIZDBFBjbwXq443xAvPmYJUGUm+jwV13KQXM47L 0A13TgrQYoSUFfhqmmWMMMktMXHEKi5EX7wqRz1AqhnLFsflfBiBJ8UvhGWn4xYKECkQ /kRtmJcsm6U18ZQ+AYsvUA/HXeAtKMnUkRvwoASl3HyvM3p1Oh6X/ta6SFu5OE5FSnTD B+iYWXJCOCZPkjlRAcC65IsLflYqEKqZUUYQTK3Xbs7seCNz7qN5vUzVGs6teHrcp8AR tByQ== X-Gm-Message-State: AOJu0Yx654/nwGDb+yQr1JEIPr4AlvrjZqllCVPZILY7bx/uB/dz3JH3 V0eISOV+nC01NB0PUls4hge24XJYxtUz65PxrYhhKT1CO6mLPk2xvFR68V6XNT+fNuRANiRL7ax 8RDrKRiVtiu4D2w9dVHDgAz+mXQU= X-Gm-Gg: ASbGncsXg3XZP4I/1j1n+8whvXfJKrQm24s5eWPp/VN5vhGzREnqaZ9W9adF20nFxd/ sEy78fvEryqhD2NSg6iXiB4Tm9cTs8gR36bey6Dpq++u6BhDI4YQ8yZrwk+qgnaO1foDrY55FB8 6GI5FVt2DD6KtwdYRiU7EYD1N9 X-Google-Smtp-Source: AGHT+IEyBxcC9Eg4mD2h2Jl60/pJESIPJMvxP3oOCQFjGJ9hZnh4UpaDXG/mtuxJchfzXIyk+aGiyxl7Xi25pWDs1Zg= X-Received: by 2002:a17:90b:274e:b0:2ff:6e58:89f5 with SMTP id 98e67ed59e1d1-303788f2f32mr1974093a91.6.1742938089516; Tue, 25 Mar 2025 14:28:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Phillip Diffley Date: Tue, 25 Mar 2025 16:27:58 -0500 X-Gm-Features: AQ5f1JqG0xqd8PzYdf-ybXqIzoMiwGIrZJNfhzmAP1WnyJf8t7zC9_xVBIJZOXE Message-ID: Subject: Re: Replication slot WAL reservation To: Christophe Pettus Cc: pgsql-general@postgresql.org Content-Type: multipart/alternative; boundary="000000000000b0f8f206313165d0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000b0f8f206313165d0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Oh I see! I was conflating the data I see coming out of a replication slot with the internal organization of the WAL. I think the more specific question I am trying to answer is, as a consumer of a replication slot, how do I reason about what replication records will be made unavailable when I confirm an LSN? Here I am worried about situations where the replication connection is interrupted or the program processing the records crashes, and we need to replay records that may have been previously sent but were not fully processed. For example, are the records sent by a replication slot always sent in the same order such that if I advance the confirmed_flush_lsn of a slot to the LSN of record "A", I will know that any records that had been streamed after record "A" will be replayable? On Tue, Mar 25, 2025 at 12:14=E2=80=AFPM Christophe Pettus wrote: > Missed this question! > > > On Mar 25, 2025, at 09:56, Phillip Diffley > wrote: > > But when processing data from a replication slot, we confirm rows that > have been processed and can be deleted from the WAL based on the LSN (eg. > with pg_replication_slot_advance). How does postgres identify what parts = of > the WAL can be freed? > > Basically, if no part of the system "needs" a particular LSN position, th= e > segments that include that LSN position and earlier can be free. > > The various things that can "need" a particular LSN point are: > > 1. Replication slots, if the other side has not confirmed that it has > received it (under whatever synchronous commit rules that slot is operati= ng > under). > 2. The wal_keep_size setting. > 3. The max_wal_size setting. > 4. The archive_command, if a WAL segment hasn't been successfully archive= d > yet. > > One thing to remember is that the WAL does *not* contain contiguous block= s > of operations for a single transaction. The operations are written to th= e > WAL by every session as they do operations, so the WAL is a jumble of > different transactions. One of the jobs of the logical replication > framework is to sort that out so it can present only the operations that > belong to committed transactions to the output plugin. (This is why > there's an internal structure called the "reorder buffer": it reorders WA= L > operations into transaction blocks.) --000000000000b0f8f206313165d0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Oh I see! I was conflating the data = I see coming out of a replication slot with the internal organization of the WAL. I=20 think the more specific question I am trying to answer is, as a=20 consumer of a replication slot, how do I reason about what replication=20 records will be made unavailable when I confirm an LSN? Here I am=20 worried about situations where the replication connection is interrupted or the program processing the records crashes, and we need to replay=20 records that may have been previously sent but were not fully processed.

For example, are the records sent by a replication slot always sent in the=20 same order such that if I advance the=C2=A0confirmed_flush_lsn of a slot to= =20 the LSN of record "A", I will know that any records that had been= =20 streamed after record "A" will be replayable?


On Tue, Mar 25, 2025 at 12:14=E2=80=AFPM Christophe Pettu= s <xof@thebuild.com> wrote:
Missed this quest= ion!

> On Mar 25, 2025, at 09:56, Phillip Diffley <phillip6402@gmail.com> wrote: > But when processing data from a replication slot, we confirm rows that= have been processed and can be deleted from the WAL based on the LSN (eg. = with pg_replication_slot_advance). How does postgres identify what parts of= the WAL can be freed?

Basically, if no part of the system "needs" a particular LSN posi= tion, the segments that include that LSN position and earlier can be free.<= br>
The various things that can "need" a particular LSN point are:
1. Replication slots, if the other side has not confirmed that it has recei= ved it (under whatever synchronous commit rules that slot is operating unde= r).
2. The wal_keep_size setting.
3. The max_wal_size setting.
4. The archive_command, if a WAL segment hasn't been successfully archi= ved yet.

One thing to remember is that the WAL does *not* contain contiguous blocks = of operations for a single transaction.=C2=A0 The operations are written to= the WAL by every session as they do operations, so the WAL is a jumble of = different transactions.=C2=A0 One of the jobs of the logical replication fr= amework is to sort that out so it can present only the operations that belo= ng to committed transactions to the output plugin.=C2=A0 (This is why there= 's an internal structure called the "reorder buffer": it reor= ders WAL operations into transaction blocks.)
--000000000000b0f8f206313165d0--