Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s55W2-0035Pl-Vj for pgsql-general@arkaria.postgresql.org; Thu, 09 May 2024 15:17:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s55W1-008WRk-AS for pgsql-general@arkaria.postgresql.org; Thu, 09 May 2024 15:17:01 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s55W0-008WRc-V2 for pgsql-general@lists.postgresql.org; Thu, 09 May 2024 15:17:01 +0000 Received: from mail-lj1-x230.google.com ([2a00:1450:4864:20::230]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1s55Vx-000CbQ-EQ for pgsql-general@postgresql.org; Thu, 09 May 2024 15:17:00 +0000 Received: by mail-lj1-x230.google.com with SMTP id 38308e7fff4ca-2dfb4ea2bbfso10436661fa.2 for ; Thu, 09 May 2024 08:16:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715267817; x=1715872617; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=b8oIfnU4Ynzfn9Jj0LnmIU98J2u9uIwx3Fym4tJm4P0=; b=I/8YG1A/vM+kneHrZKOy6OyOFv9jGACxIk+eIMgHsmPFM/E/BiZ5vzv3/Upy8HVuRv OHEwa8uVmSdKnRrVWWdTikk21STDiZvsIVUsjPVM8CpJUJIN7Qa6skpnUZg9MY4DSlUv qfCzFWLgvD0AzOTMeedLR9cQGU+1TgXHmXhw+mmy3OI2A0akeK1mUSFPu0nBwMm523AW 3IKNjqxVwRrjArPG2eIL+gSQdf6EA2RyULne8RqpSJnFSNqTaDkuBuocuSTGKxMZLCJZ rhZvXBnPCvAFYmBGeSxR2EoPVz9dh8Bo+dQb18oAayTiTczha5rImyDYlJEJoSe54tJa ndHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715267817; x=1715872617; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=b8oIfnU4Ynzfn9Jj0LnmIU98J2u9uIwx3Fym4tJm4P0=; b=nDxlZH+ePKXkr5Cqf6nlLT3J8T7vEZJzPHtfT2+768hOO4r6QxrCQFXXvayfl6+Ksh RB4tmp/mDwdQbWqA8u308YoVj5WPO5NN4bdSJrl4A++eVejuQoJY+CQrxDQfjK+9QRRk uHNjQGRc4LgAQFuKemzGow0SYdTUgDcHF4JJyXpJwzMfE/aOYrwFFNHXGz370yJz6ANI JldlS9JGDJXzCTdKg/bXxwmb5dNKFpl1L/1E/HuzNfcuXBtovHfw2j/HeTe8zVudsvkI uWYNPZbgQG1DnuRTU3STbghUaQwvE8LO/GbnpxBlg13IZhvEhlA9fYAIj6N+R518aHoK CMDg== X-Forwarded-Encrypted: i=1; AJvYcCUD3nTvPcK7ItsGpiZOvuD/qsp1vdDys4Vrp8DCKiV4DZR5wn/L/NZI2TcK9rH73Wz9FK91aJhgGJs2RrTTA7KqY5ub4xTbaBTdiSRi X-Gm-Message-State: AOJu0YwB4wrn2JihpYWP28Cda0euqyjMSt8VBZL/bY4vSJE7RIqZd8L3 m9k8P9+0wy6fN+CeC2e38E2fIc3G9JRWiSFarDSn5i+2pMAUm2N8hHBYaKxDQK6Vn1/oU5f/YG7 na7oT0eVAVwfYzAvAccau7Ke3tQ== X-Google-Smtp-Source: AGHT+IHSrXI2/CuCvbfKZ9c7aswOOMOvdEeCPUaoyEfit/w52+mHflrccqvbPhu2AafL1JeAGlfb+vlAkpmz79i1/E8= X-Received: by 2002:a2e:be09:0:b0:2e1:ebec:1ded with SMTP id 38308e7fff4ca-2e447698ccemr39486931fa.25.1715267816952; Thu, 09 May 2024 08:16:56 -0700 (PDT) MIME-Version: 1.0 References: <8a534c5f-e400-4bb5-b39e-2017d259ff06@aklaver.com> <10bcc03d-fa3e-40b6-bdd2-cac0acd046f4@aklaver.com> In-Reply-To: From: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= Date: Thu, 9 May 2024 17:16:45 +0200 Message-ID: Subject: Re: Unexpected data when subscribing to logical replication slot To: Daniel McKenzie Cc: Adrian Klaver , tomas.vondra@enterprisedb.com, pgsql-general@postgresql.org Content-Type: multipart/alternative; boundary="000000000000ec9a32061806e838" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000ec9a32061806e838 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sorry, to correct myself. The pg_xact bit is written with the next checkpoint. But the COMMIT record in the WAL is there. On Thu, May 9, 2024 at 5:14=E2=80=AFPM Torsten F=C3=B6rtsch wrote: > I would not find this behavior surprising in particular if you have a > synchronous replica. According to the documentation of synchronous_commit= : > > The local behavior of all non-off modes is to wait for local flush of > WAL to disk. > > This is when the logical decoder sees the item. But that does not mean th= e > change is visible to other transactions in the MVCC sense. So, if wal2jso= n > and the rest of your stuff is fast enough, then the enrichment query may > very well read old data. > > A transaction being committed means all the WAL has been written (and > usually synced) to disk including the bit in the pg_xact directory. > > On Thu, May 9, 2024 at 9:32=E2=80=AFAM Daniel McKenzie < > daniel.mckenzie@curvedental.com> wrote: > >> Asynchronous commit introduces the risk of data loss. There is a short >>> time window between the report of transaction completion to the client >>> and the time that the transaction is truly committed. >> >> >> The documentation speaks about synchronous_commit changing how >> transactions change behaviour for the client. So in this case, my psql >> terminal is the client, and I would expect a faster commit (from its >> perspective) and then a period of risk (as a process usually done as par= t >> of the commit is now being done in the background) but it's not clear ho= w >> that affects a replication slot subscriber. >> >> What we're struggling to understand is: why are we seeing any updates in >> the replication slot before they have been "truly committed"? >> >> There appears to be a state of limbo between updating data and that data >> being available to query (and our subscriber is picking up changes in th= is >> period of time) but I can't pin down any documentation which describes i= t. >> >> We've had this running in live now for years without a hiccup so we are >> surprised to learn that we have this massive race condition and it just = so >> happens that the hardware is fast enough to process the transaction befo= re >> the .NET application can react to replication slot changes. >> >> Daniel McKenzie >> Software Developer >> >> Office: +1 403.910.5927 x 251 >> Mobile: +44 7712 159045 >> Website: www.curvedental.com >> >> *Curve Dental Confidentiality Notice* >> This message is intended exclusively for the individual or entity to >> which it is addressed. This communication may contain information that >> is proprietary, privileged, confidential, or otherwise legally exempt fr= om >> disclosure. If you are not the named addressee, you are not authorized >> to read, print, retain, copy, or disseminate this message or any part of >> it. If you have received this message in error, please notify the sender >> immediately by replying to this e-mail and delete all copies of this >> message. >> >> >> On Wed, May 8, 2024 at 5:28=E2=80=AFPM Adrian Klaver >> wrote: >> >>> On 5/8/24 08:24, Daniel McKenzie wrote: >>> > It's running both (in docker containers) and also quite a few more >>> > docker containers running various .NET applications. >>> >>> I think what you found is that the r7a.medium instance is not capable >>> enough to do all that it is asked without introducing lag under load. >>> Answering the questions posed by Tomas Vondra would help get to the >>> actual cause of the lag. >>> >>> In meantime my suspicion is this part: >>> >>> "For example, when I use a psql terminal to update a user's last name >>> from "Jones" to "Smith" then I would expect the enrichment query to fin= d >>> "Smith" but it will sometimes still find "Jones". It finds the old data >>> perhaps 1 in 50 times." >>> >>> If this is being run against the Postgres server my guess is that >>> synchronous_commit=3Don is causing the commit on the server to wait for >>> the WAL records to be flushed to disk and this is not happening in a >>> timely manner in the '... 1 in 50 times' you mention. In that case you >>> see the old values not the new committed values. This seems to be >>> confirmed when you set synchronous_commit=3Doff and don't see old value= s. >>> For completeness per: >>> >>> https://www.postgresql.org/docs/current/wal-async-commit.html >>> >>> "However, for short transactions this delay is a major component of the >>> total transaction time. Selecting asynchronous commit mode means that >>> the server returns success as soon as the transaction is logically >>> completed, before the WAL records it generated have actually made their >>> way to disk. This can provide a significant boost in throughput for >>> small transactions. >>> >>> Asynchronous commit introduces the risk of data loss. There is a short >>> time window between the report of transaction completion to the client >>> and the time that the transaction is truly committed (that is, it is >>> guaranteed not to be lost if the server crashes). ... >>> " >>> >>> > >>> > Daniel McKenzie >>> > Software Developer >>> >>> -- >>> Adrian Klaver >>> adrian.klaver@aklaver.com >>> >>> --000000000000ec9a32061806e838 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Sorry, to correct myself. The pg_xact bit is written with = the next checkpoint. But the COMMIT record in the WAL is there.

On Thu, May = 9, 2024 at 5:14=E2=80=AFPM Torsten F=C3=B6rtsch <tfoertsch123@gmail.com> wrote:
I would not find = this behavior surprising in particular if you have a synchronous replica. A= ccording to the documentation of synchronous_commit:

=C2= =A0 =C2=A0The local behavior of all non-off modes is to wait for local flus= h of WAL to disk.

This is when the logical decoder= sees the item. But that does not mean the change is visible to other trans= actions in the MVCC sense. So, if wal2json and the rest of your stuff is fa= st enough, then the enrichment query may very well read old data.

A transaction being committed means all the WAL has been wr= itten (and usually synced) to disk including the bit in the pg_xact directo= ry.

On Thu, May 9, 2024 at 9:32=E2=80=AFAM Daniel McKenzie <daniel.mckenz= ie@curvedental.com> wrote:
Asynchronous commit introduces the risk of data loss. There is a= short
time window between the report of transaction completion to the c= lient
and the time that the transaction is truly committed.
=

The documentation speaks about=C2=A0synchronous_co= mmit changing how transactions change behaviour for the client. So in this = case, my psql terminal is the client, and I would expect a faster commit (f= rom its perspective) and then a period of risk (as a process usually done a= s part of the commit is now being done in the background) but it's not = clear how that affects a replication slot subscriber.

<= /div>
What we're struggling to understand is: why are we seein= g any updates in the replication slot before they have been "truly com= mitted"?

There appears to be a state of l= imbo between updating data and that data being available to query (and our = subscriber is picking up changes in this period of time) but I can't pi= n down any documentation which describes it.

We've had this running in live now for years without a hiccup so we ar= e surprised to learn that we have this massive race condition and it just s= o happens that the hardware is fast enough to process the transaction befor= e the .NET application can react to replication slot changes.
Daniel McKenzie
Software Developer

Office: +1 403.9= 10.5927 x 251
Mobile: +44 7712 1590= 45
Website:=C2=A0www.curvedental.com

Curve Dental Confidentiality Notice
This message is intended exclusive= ly for the individual or entity to which it is addressed. This communicatio= n may=C2=A0contain information tha= t is proprietary, privileged, confidential, or otherwise legally exempt fro= m disclosure. If you are=C2=A0not = the named addressee, you are not authorized to read, print, retain, copy, o= r disseminate this message or any=C2=A0part of it. If you have received this message in error, please notify = the sender immediately by replying to this e-mail=C2=A0and delete all copies of this message.
<= /div>


On Wed, May 8, 2024 at 5:28=E2=80=AFPM Adr= ian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/8/24 08:24, Daniel McKenzie wrote:
> It's running both (in docker containers) and also quite a few more=
> docker containers running various .NET applications.

I think what you found is that the r7a.medium instance is not capable
enough to do all that it is asked without introducing lag under load.
Answering the questions posed by Tomas Vondra would help get to the
actual cause of the lag.

In meantime my suspicion is this part:

"For example, when I use a psql terminal to update a user's last n= ame
from "Jones" to "Smith" then I would expect the enrichm= ent query to find
"Smith" but it will sometimes still find "Jones". It fi= nds the old data
perhaps 1 in 50 times."

If this is being run against the Postgres server my guess is that
synchronous_commit=3Don is causing the commit on the server to wait for the WAL records to be flushed to disk and this is not happening in a
timely manner in the '... 1 in 50 times' you mention. In that case = you
see the old values not the new committed values. This seems to be
confirmed when you set synchronous_commit=3Doff and don't see old value= s.
For completeness per:

https://www.postgresql.org/docs/current= /wal-async-commit.html

"However, for short transactions this delay is a major component of th= e
total transaction time. Selecting asynchronous commit mode means that
the server returns success as soon as the transaction is logically
completed, before the WAL records it generated have actually made their way to disk. This can provide a significant boost in throughput for
small transactions.

Asynchronous commit introduces the risk of data loss. There is a short
time window between the report of transaction completion to the client
and the time that the transaction is truly committed (that is, it is
guaranteed not to be lost if the server crashes).=C2=A0 ...
"

>
> Daniel McKenzie
> Software Developer

--
Adrian Klaver
adrian.klave= r@aklaver.com

--000000000000ec9a32061806e838--