public inbox for [email protected]  
help / color / mirror / Atom feed
From: Torsten Förtsch <[email protected]>
To: Daniel McKenzie <[email protected]>
Cc: Adrian Klaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Subject: Re: Unexpected data when subscribing to logical replication slot
Date: Thu, 9 May 2024 17:16:45 +0200
Message-ID: <CAKkG4_nXXXauGatuEgb1xKJZ7omc5HgCnwEfY4b=0zmwLv3stw@mail.gmail.com> (raw)
In-Reply-To: <CAKkG4_koC7dhj54ynLUEL-aNoh-U+P_ts6NSS9LG1MzJz3ZEfg@mail.gmail.com>
References: <CACKLY6hwMnSjjipZ57TTz_FC9aWBkg7OpNLJ1tu+GvfEFc4hJA@mail.gmail.com>
	<[email protected]>
	<CACKLY6hYp+9W9xijXFh_UEpDuoo7bxg-A3deBKcEzEub3K+Kfg@mail.gmail.com>
	<[email protected]>
	<CACKLY6hNLT8qQWjmMChFnALr3UFEgzCxbTFbckfWd6GjHyn7Sw@mail.gmail.com>
	<CAKkG4_koC7dhj54ynLUEL-aNoh-U+P_ts6NSS9LG1MzJz3ZEfg@mail.gmail.com>

Sorry, to correct myself. The pg_xact bit is written with the next
checkpoint. But the COMMIT record in the WAL is there.

On Thu, May 9, 2024 at 5:14 PM Torsten Förtsch <[email protected]>
wrote:

> I would not find this behavior surprising in particular if you have a
> synchronous replica. According to the documentation of synchronous_commit:
>
>    The local behavior of all non-off modes is to wait for local flush of
> WAL to disk.
>
> This is when the logical decoder sees the item. But that does not mean the
> change is visible to other transactions in the MVCC sense. So, if wal2json
> and the rest of your stuff is fast enough, then the enrichment query may
> very well read old data.
>
> A transaction being committed means all the WAL has been written (and
> usually synced) to disk including the bit in the pg_xact directory.
>
> On Thu, May 9, 2024 at 9:32 AM Daniel McKenzie <
> [email protected]> wrote:
>
>> Asynchronous commit introduces the risk of data loss. There is a short
>>> time window between the report of transaction completion to the client
>>> and the time that the transaction is truly committed.
>>
>>
>> The documentation speaks about synchronous_commit changing how
>> transactions change behaviour for the client. So in this case, my psql
>> terminal is the client, and I would expect a faster commit (from its
>> perspective) and then a period of risk (as a process usually done as part
>> of the commit is now being done in the background) but it's not clear how
>> that affects a replication slot subscriber.
>>
>> What we're struggling to understand is: why are we seeing any updates in
>> the replication slot before they have been "truly committed"?
>>
>> There appears to be a state of limbo between updating data and that data
>> being available to query (and our subscriber is picking up changes in this
>> period of time) but I can't pin down any documentation which describes it.
>>
>> We've had this running in live now for years without a hiccup so we are
>> surprised to learn that we have this massive race condition and it just so
>> happens that the hardware is fast enough to process the transaction before
>> the .NET application can react to replication slot changes.
>>
>> Daniel McKenzie
>> Software Developer
>>
>> Office: +1 403.910.5927 x 251
>> Mobile: +44 7712 159045
>> Website: www.curvedental.com
>>
>> *Curve Dental Confidentiality Notice*
>> This message is intended exclusively for the individual or entity to
>> which it is addressed. This communication may contain information that
>> is proprietary, privileged, confidential, or otherwise legally exempt from
>> disclosure. If you are not the named addressee, you are not authorized
>> to read, print, retain, copy, or disseminate this message or any part of
>> it. If you have received this message in error, please notify the sender
>> immediately by replying to this e-mail and delete all copies of this
>> message.
>>
>>
>> On Wed, May 8, 2024 at 5:28 PM Adrian Klaver <[email protected]>
>> wrote:
>>
>>> On 5/8/24 08:24, Daniel McKenzie wrote:
>>> > It's running both (in docker containers) and also quite a few more
>>> > docker containers running various .NET applications.
>>>
>>> I think what you found is that the r7a.medium instance is not capable
>>> enough to do all that it is asked without introducing lag under load.
>>> Answering the questions posed by Tomas Vondra would help get to the
>>> actual cause of the lag.
>>>
>>> In meantime my suspicion is this part:
>>>
>>> "For example, when I use a psql terminal to update a user's last name
>>> from "Jones" to "Smith" then I would expect the enrichment query to find
>>> "Smith" but it will sometimes still find "Jones". It finds the old data
>>> perhaps 1 in 50 times."
>>>
>>> If this is being run against the Postgres server my guess is that
>>> synchronous_commit=on is causing the commit on the server to wait for
>>> the WAL records to be flushed to disk and this is not happening in a
>>> timely manner in the '... 1 in 50 times' you mention. In that case you
>>> see the old values not the new committed values. This seems to be
>>> confirmed when you set synchronous_commit=off and don't see old values.
>>> For completeness per:
>>>
>>> https://www.postgresql.org/docs/current/wal-async-commit.html
>>>
>>> "However, for short transactions this delay is a major component of the
>>> total transaction time. Selecting asynchronous commit mode means that
>>> the server returns success as soon as the transaction is logically
>>> completed, before the WAL records it generated have actually made their
>>> way to disk. This can provide a significant boost in throughput for
>>> small transactions.
>>>
>>> Asynchronous commit introduces the risk of data loss. There is a short
>>> time window between the report of transaction completion to the client
>>> and the time that the transaction is truly committed (that is, it is
>>> guaranteed not to be lost if the server crashes).  ...
>>> "
>>>
>>> >
>>> > Daniel McKenzie
>>> > Software Developer
>>>
>>> --
>>> Adrian Klaver
>>> [email protected]
>>>
>>>


view thread (7+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Unexpected data when subscribing to logical replication slot
  In-Reply-To: <CAKkG4_nXXXauGatuEgb1xKJZ7omc5HgCnwEfY4b=0zmwLv3stw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox