Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txTUp-009vVl-JK for pgsql-general@arkaria.postgresql.org; Wed, 26 Mar 2025 16:20:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1txTUo-006DUb-55 for pgsql-general@arkaria.postgresql.org; Wed, 26 Mar 2025 16:20:50 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txTSe-0068kw-Hv for pgsql-general@lists.postgresql.org; Wed, 26 Mar 2025 16:18:36 +0000 Received: from smtp64.ord1d.emailsrvr.com ([184.106.54.64]) by magus.postgresql.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1txTSb-001LJ2-32 for pgsql-general@postgresql.org; Wed, 26 Mar 2025 16:18:36 +0000 X-Auth-ID: xof@thebuild.com Received: by smtp17.relay.ord1d.emailsrvr.com (Authenticated sender: xof-AT-thebuild.com) with ESMTPSA id A00FB201A6; Wed, 26 Mar 2025 12:18:31 -0400 (EDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51.11.1\)) Subject: Re: Replication slot WAL reservation From: Christophe Pettus In-Reply-To: Date: Wed, 26 Mar 2025 09:18:00 -0700 Cc: pgsql-general@postgresql.org Content-Transfer-Encoding: quoted-printable Message-Id: <3CC662DB-55FC-42C5-9068-2365F89229E8@thebuild.com> References: <270FB587-2E83-4EB0-9FD6-07541F2A6A17@thebuild.com> <3CBAE3E1-4ECB-4936-908E-3F03B79886F8@thebuild.com> To: Phillip Diffley X-Mailer: Apple Mail (2.3776.700.51.11.1) X-Classification-ID: 6315eaa7-f6bc-4bba-8c7d-46fe2deeb467-1-1 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Mar 26, 2025, at 07:55, Phillip Diffley = wrote: > Just to confirm, it sounds like the order messages are sent from the = output plugin is what matters here. When you update confirmed_flush_lsn = to LSN "A", any messages that were sent by the output plugin after the = message with LSN "A" will be replayable. Any messages sent by the output = plugin before the message with LSN "A" will most likely not be replayed, = since their data is freed for deletion. Is that correct? The terminology is shifting around a bit here, so to be specific: When = the primary (or publisher) receives a message from the secondary (or = replica) that a particular LSN has been flushed, the primary at that = point feels free to recycle any WAL segments that only contain WAL = entries whose LSN is less than that flush point (whether or not it = actually does depends on a lot of other factors). The actual horizon = that the primary needs to retain can be farther back than that, because = there's no requirement that the secondary send an LSN as = confirmed_flush_lsn that is at a transaction boundary, so the flush LSN = might land in the middle of a transaction. The actual point before = which the primary can recycle WAL is restart_lsn, which the primary = determines based on the flush LSN. When the secondary connects, it provides an LSN from which the primary = should start sending WAL (if a binary replica) or decoded WAL via the = plugin (if a logical replica). For a logical replica, that can be = confirmed_flush_lsn or any point after, but it can't be before. (Even = if the WAL exists, the primary will return an error if the start point = provided in START_REPLICATION is before confirmed_flush_lsn for a = logical replication slot.) Of course, you'll get an error if = START_REPLICATION supplies an LSN that doesn't actually exist yet. The behavior that the primary is expecting from the secondary is that = the secondary never sends back a confirmed_flush_lsn until up to that = point is crash / disconnection-safe. What "safe" means in this case = depends on the client behavior. It might be just spooling the incoming = stream to disk and processing it later, or it might be processing it = completely on the fly as it comes in. The most important point here is that the client consuming the logical = replication messages must keep track of the flush point (defined however = the client implements processing the messages), and provide the right = one back to the primary when it connects. (Another option is that that = the client is written so that each transaction is idempotent, and even = if transactions that it has already processed are sent again, the result = is the same.) One more note is that if the client supplies an LSN (for logical = replication) that lands in the middle of a transaction, the primary will = send over the complete transaction, so the actual start point may be = earlier than the supplied start point. Generally, this means that the = client should respect transaction boundaries, and be able to deal with = getting a partial transaction but discarding it if it doesn't get a = commit record for it.=