Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txHsw-0063kC-In for pgsql-general@arkaria.postgresql.org; Wed, 26 Mar 2025 03:56:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1txHsv-00EG1J-AI for pgsql-general@arkaria.postgresql.org; Wed, 26 Mar 2025 03:56:57 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txHsu-00EG1B-UI for pgsql-general@lists.postgresql.org; Wed, 26 Mar 2025 03:56:56 +0000 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1txHss-001CnC-17 for pgsql-general@postgresql.org; Wed, 26 Mar 2025 03:56:56 +0000 Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-227aaa82fafso74925385ad.2 for ; Tue, 25 Mar 2025 20:56:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742961411; x=1743566211; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=lMz115/IS/p2uF6KeeDTOE3vinl4SbnJG2T9ZtfccNg=; b=lBgNI+XsIxcExgFAEDBowgmixfV7ebxu6AyKUKi/oDfUXWf9lIf9zpjeWJrz+frcSz DLWPEj7grWBAgXsbgqESMQ077AyQ1L/1LSkEZC/uKmQkETCaBB/ZWoaHrOClQpSxr+lo c3bvlemiTHhE5IPw+7jNaVY1YkQYVsD3WG8Vdk5vqBx1PvMhGpypIugIyyestcMJVVxY 5/3DvCy3WxgUBmAD+24snubnVtJw3iT4O7mj5umMVhLI9Qb9eEgV8cTsrrZASBEdAJEU SGCNSDKxWs7TT6lgDCPazCRnJwo2tULIg+N1BSiGZ/Ql7K2O63QYLurWa+U/KVUbWy6s DZFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742961411; x=1743566211; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lMz115/IS/p2uF6KeeDTOE3vinl4SbnJG2T9ZtfccNg=; b=U7h83YaEzhhmXxVoJnuCoN3uFc7adCwL6RUDIUlPbgQkM/WF/ZCzVaHr8h9XimpD50 uZNBLzB3fKqo/MX6iePsEWR1A73u9tRbC/EGC4GRJN7nfZ7gTozm9LNDwrBK2EWj0Rlt vvJDsqOsdt6qcNCqJ2esEfaWTykluWfCpoF1aGiRQAw+pxt1SsAUZRMUwrwkSQ6zccXj UH/Qz3EsaIe58BHEIyQ1oKri/H+c9uCaPh/34zX49GC6oDTdQ92QIsUtxPSBz5UYIjqv 37irmqEiUtU/yrpBcxnHeLsfHx4RMGRtD9TWcxTFlzifBPybQvWXpYmYIpdZne1kOoMV OU1w== X-Gm-Message-State: AOJu0YzBdOih7AN4cJ9Y7A1VViXjuZt+tOXAG47K4NOWOKk3Tjl0VfC1 jtbupdNBMGPgpyvwzn501F55BXflbls+Sj7HRI1Atw98yCpGvqnNwQCqG19yTis8xjP04Kvo2yU BGc+gfDPBut/TDr9RSTixBdUwerg= X-Gm-Gg: ASbGncsCpKsMlQvgwANYlpjed070NmYRn4wmQbqj8Pw5cuFng1wg7V8uRPzjYyWAln6 0qIV0ZxIiR3DQDBtii244nXXdUGQgo2lmZZQ8A/7K1dOD2TxNFEcbhrl6GkT7rDBoctPSJAkn/k P+TAj+LujIk13IBu5FZ74duWD8 X-Google-Smtp-Source: AGHT+IHo/8MSmI6g+g2d6BVgldxj5JxraDRT65g7pM8Uu6yPJ/JyXl+vx3shnwkkjhIaCG0/sYM1yl9npF9Wy8HI91c= X-Received: by 2002:a17:903:22c4:b0:224:e33:889b with SMTP id d9443c01a7336-22780c7abb5mr302267305ad.12.1742961410950; Tue, 25 Mar 2025 20:56:50 -0700 (PDT) MIME-Version: 1.0 References: <270FB587-2E83-4EB0-9FD6-07541F2A6A17@thebuild.com> In-Reply-To: <270FB587-2E83-4EB0-9FD6-07541F2A6A17@thebuild.com> From: Phillip Diffley Date: Tue, 25 Mar 2025 22:56:38 -0500 X-Gm-Features: AQ5f1JqEBZe9Gaxu6397HFTL3Sm_sQHMLj7ANX3TKOJwLcU0WT7iro1g3Fg4QRw Message-ID: Subject: Re: Replication slot WAL reservation To: Christophe Pettus Cc: pgsql-general@postgresql.org Content-Type: multipart/alternative; boundary="000000000000c1d722063136d3f5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c1d722063136d3f5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > You shouldn't need to manually advance the replication slot. > The client is also expected to send back regular messages letting the publisher / primary know that it has successfully consumed up to a particular point I was thinking of these as the same thing, but it sounds like they are different. At the moment, the only method I know for letting the publisher/primary know what has been successfully consumed is pg_replication_slot_advance. I looked at the message formats and logical replication message formats pages, but I did not see a message type for updating confirmed_flush_lsn or otherwise letting the publisher/primary know what logs have been successfully consumed. There is the flush message, but it looks like it only passes a 4 byte int instead of the 8 bytes required for an LSN. Is there a message type that is used to confirm what logs have been successfully consumed? > You know that any WAL generated after `confirmed_flush_lsn` is available for replay. The part I am uncertain about is what "after" means here, since LSNs are not presented in order, and the order of data streamed over the replication slot does not match the order of the data in the WAL. I initially (and incorrectly) thought the confirmation order was based on LSN. So if you confirmed an LSN "x" then all logs with LSN less than "x" could be released by the publisher/primary. That can't work though since LSNs are not presented in order by the replication slot. Is there a monotonically increasing identifier that can be used to identify which logs come "after" another? Or do you just keep track of the order the replication slot delivers logs in and not confirm a log until it and all the logs received before it are processed to the point of being crash-proof= ? On Tue, Mar 25, 2025 at 4:32=E2=80=AFPM Christophe Pettus wrote: > > > > On Mar 25, 2025, at 13:58, Phillip Diffley > wrote: > > > > Oh I see! I was conflating the data I see coming out of a replication > slot with the internal organization of the WAL. I think the more specific > question I am trying to answer is, as a consumer of a replication slot, h= ow > do I reason about what replication records will be made unavailable when = I > confirm an LSN? Here I am worried about situations where the replication > connection is interrupted or the program processing the records crashes, > and we need to replay records that may have been previously sent but were > not fully processed. > > It's up to the consuming client to keep track of where it is in the WAL > (using an LSN). When the client connects, it specifies what LSN to start > streaming at. If that LSN is no longer available, the publisher / primar= y > returns an error. > > The client shouldn't confirm the flush of an LSN unless it is crash-proof > to that point, since any WAL before that should be assumed to be > unavailable. > > > For example, are the records sent by a replication slot always sent in > the same order such that if I advance the confirmed_flush_lsn of a slot t= o > the LSN of record "A", I will know that any records that had been streame= d > after record "A" will be replayable? > > You know that any WAL generated after `confirmed_flush_lsn` is available > for replay. That's the oldest LSN that the client can specify on > connection (although it can specify a later one, if it exists). You > shouldn't need to manually advance the replication slot. Instead, the > client specifies where it wants to start when it connects. The client is > also expected to send back regular messages letting the publisher / prima= ry > know that it has successfully consumed up to a particular point in the WA= L, > so the publisher / primary knows it can release that WAL information. --000000000000c1d722063136d3f5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
> You shouldn't need to manually a= dvance the replication slot.
> The client is also expected to = send back regular messages letting the=20 publisher / primary know that it has successfully consumed up to a=20 particular point

I was thinking of these as the sa= me thing, but it sounds like they are different. At the moment, the only me= thod I know for letting the publisher/primary know what has been successful= ly consumed is pg_replication_slot_advance. I looked at the message formats and logical replicat= ion message formats pages, but I did not see a message type for updatin= g confirmed_flush_lsn or otherwise letting the publisher/primary know what = logs have been successfully consumed. There is the flush message, but it looks like it only passes a 4 byte = int instead of the 8 bytes required for an LSN. Is there a message type tha= t is used to confirm what logs have been successfully consumed?
<= br>
> You know that any WAL generated after `confirmed_flush_l= sn` is available for replay.

The part I am uncerta= in about is what "after" means here, since LSNs are not presente= d in order, and the order of data streamed over the replication slot does n= ot match the order of the data in the WAL.=C2=A0

I= initially (and incorrectly) thought the confirmation order was based on LS= N. So if you confirmed an LSN "x" then all logs with LSN less tha= n "x" could be released by the publisher/primary. That can't = work though since LSNs are not presented in order by the replication slot. = Is there a monotonically increasing identifier that can be used to identify= which logs come "after" another? Or do you just keep track of th= e order the replication slot delivers logs in and not confirm a log until = it and all the logs received before it are processed to the point of being = crash-proof?

On Tue, Mar 25, 2025 at 4:32= =E2=80=AFPM Christophe Pettus <xof@t= hebuild.com> wrote:


> On Mar 25, 2025, at 13:58, Phillip Diffley <phillip6402@gmail.com> wrote: >
> Oh I see! I was conflating the data I see coming out of a replication = slot with the internal organization of the WAL. I think the more specific q= uestion I am trying to answer is, as a consumer of a replication slot, how = do I reason about what replication records will be made unavailable when I = confirm an LSN? Here I am worried about situations where the replication co= nnection is interrupted or the program processing the records crashes, and = we need to replay records that may have been previously sent but were not f= ully processed.

It's up to the consuming client to keep track of where it is in the WAL= (using an LSN).=C2=A0 When the client connects, it specifies what LSN to s= tart streaming at.=C2=A0 If that LSN is no longer available, the publisher = / primary returns an error.

The client shouldn't confirm the flush of an LSN unless it is crash-pro= of to that point, since any WAL before that should be assumed to be unavail= able.

> For example, are the records sent by a replication slot always sent in= the same order such that if I advance the confirmed_flush_lsn of a slot to= the LSN of record "A", I will know that any records that had bee= n streamed after record "A" will be replayable?

You know that any WAL generated after `confirmed_flush_lsn` is available fo= r replay.=C2=A0 That's the oldest LSN that the client can specify on co= nnection (although it can specify a later one, if it exists).=C2=A0 You sho= uldn't need to manually advance the replication slot.=C2=A0 Instead, th= e client specifies where it wants to start when it connects.=C2=A0 The clie= nt is also expected to send back regular messages letting the publisher / p= rimary know that it has successfully consumed up to a particular point in t= he WAL, so the publisher / primary knows it can release that WAL informatio= n.
--000000000000c1d722063136d3f5--