Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s5P6T-005D35-U6 for pgsql-general@arkaria.postgresql.org; Fri, 10 May 2024 12:11:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s5P6S-00ELQx-3P for pgsql-general@arkaria.postgresql.org; Fri, 10 May 2024 12:11:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s5P6R-00ELQh-JX for pgsql-general@lists.postgresql.org; Fri, 10 May 2024 12:11:56 +0000 Received: from mail-yb1-xb32.google.com ([2607:f8b0:4864:20::b32]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1s5P6O-000KdZ-Vl for pgsql-general@postgresql.org; Fri, 10 May 2024 12:11:54 +0000 Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-de462979e00so2063187276.3 for ; Fri, 10 May 2024 05:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=curvedental.com; s=google; t=1715343111; x=1715947911; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FxMBYnfOaw9Ebh4gqEXsT+kqpPyQdFq8wq6aBpHOEn8=; b=LLRDFDnHfhMXFi4lY2/oY1PG0fvXa4FahVNHu4fPFm39+P51Kq/lO19qkUHYmGZ6S5 pLhCGTulduKUI/p35vyw6wQw94fsreMny+ffY1pNHU799dnz54a95qcF9+pH4cLEsc5l NcgYXXk1O936J0ECQZBipTz3g5zS3DGlJbye8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715343111; x=1715947911; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FxMBYnfOaw9Ebh4gqEXsT+kqpPyQdFq8wq6aBpHOEn8=; b=cY7qXiVLWs4AYB/XLJXPp6ZiB6XdMj2viOdiprHCc2pnytHAJhTOdTs31y3jpWXm2S aY9bq1lZxvmar79lViQSchgiM2I1nvy/77uPS/7ckOa4mkOQO2gH/ku3ViDaDPGQ/0Ui cYDp41k7xdK1/43N93jak2ml62N1eJ3TkKDgWoytXIg3X/XxF55mfW/ExerDVbzWQN6u VqJ0ZC4975LC8E6lGRVfAOeb/RplTirCo7SIs/7W2iYr0Czml/mvAkV824ItNoJ6XLmr nVPtoY0f+HJ5812PmxyBRQLpj+UPKxN1cBtOOCxv2QwqBftaFL3VQdFfUBugELI5IiFx BXYQ== X-Gm-Message-State: AOJu0YyeBnrc/OkdAqDoB+hmlSb25KuDoV3dpgbHdEGHzPPAitGJTWVM neRTFVU4N992zuf531+dFYfDOXpST/QZPwiJB483Njq4J9SmRUQiV+bMOVNYoQakbAZYmkl0/8T MryGROE+BaXvB3vKYifU8qTCEBamQ0BDgmu9h X-Google-Smtp-Source: AGHT+IFYdKIxR9TX8ygSdA+X9xGPSQHdU0qzaykS01Ypylysn5msYlQvfxVvESSlYbCfe9gYH3ab0sIImuwQKTlyNak= X-Received: by 2002:a5b:14e:0:b0:de5:d2d5:8d46 with SMTP id 3f1490d57ef6-dee4f4bf9fcmr2274120276.58.1715343111522; Fri, 10 May 2024 05:11:51 -0700 (PDT) MIME-Version: 1.0 References: <8a534c5f-e400-4bb5-b39e-2017d259ff06@aklaver.com> <10bcc03d-fa3e-40b6-bdd2-cac0acd046f4@aklaver.com> <654ded11-6223-462c-ab3f-ee3a17975fd6@aklaver.com> In-Reply-To: <654ded11-6223-462c-ab3f-ee3a17975fd6@aklaver.com> From: Daniel McKenzie Date: Fri, 10 May 2024 13:11:42 +0100 Message-ID: Subject: Re: Unexpected data when subscribing to logical replication slot To: Adrian Klaver , tomas.vondra@enterprisedb.com, tfoertsch123@gmail.com Cc: pgsql-general@postgresql.org Content-Type: multipart/alternative; boundary="000000000000d4b24d06181870d4" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000d4b24d06181870d4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you all for your input. We have solved the problem by - 1. Configuring wal2json to include xids . 2. Updating our enrichment queries to return the xmin . 3. Retrying our enrichment queries after a brief sleep if xmin < xid. This seems to be working very well so far. Daniel McKenzie Software Developer Office: +1 403.910.5927 x 251 Mobile: +44 7712 159045 Website: www.curvedental.com *Curve Dental Confidentiality Notice* This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged, confidential, or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy, or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by replying to this e-mail and delete all copies of this message. On Thu, May 9, 2024 at 5:45=E2=80=AFPM Adrian Klaver wrote: > On 5/9/24 00:32, Daniel McKenzie wrote: > > > We've had this running in live now for years without a hiccup so we are > > surprised to learn that we have this massive race condition and it just > > so happens that the hardware is fast enough to process the transaction > > before the .NET application can react to replication slot changes. > > On broad scale I'm going to say that over 'for years' there has been an > increase in load on the Postgres server as well as the I/0 system of the > machine it is running on. What you are seeing now is the canary in the > mine giving you the heads up that more trouble lies ahead as the > hardware and software is reaching load limits. > > On finer scale my guess is that the following is happening when > synchronous_commit =3D on: > > 1) Postgres session #1 does data change. > > 2) This is replicated out and picked up by wal2json, which sees the new > data. > > 3) The Postgres server waits for the confirmation that the WAL record > has been written out to disk. Upon confirmation it commits on the > server. This is the part that I am not sure of in relation to wal2json. > > 4) Postgres session #2 queries the database for the record. In the case > where 3) has not completed it sees the old values as the data change in > session #1 has not committed and therefore the new values are not seen > by other sessions. > > > > > Daniel McKenzie > > Software Developer > > -- > Adrian Klaver > adrian.klaver@aklaver.com > > --000000000000d4b24d06181870d4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you all for your input.

We have solved the problem by -
  1. Configuring wal2json to= incl= ude xids.
  2. Updating our enrichment queries=C2=A0to return the xmin<= /a>.
  3. Retrying our enrichment queries after a brief sleep if xmin &l= t; xid.
This seems to be working very well so far.

Danie= l McKenzie
Software Developer

Office: +1 403.910.5927 x 251
Mobile: +44 7712 159045
Website:=C2=A0= www.curvedental.com
=
Cu= rve Dental Confidentiality Notice
This message is intended exclusively for the individual or entity to which= it is addressed. This communication may=C2=A0contain information that is proprietary, privileged, confidenti= al, or otherwise legally exempt from disclosure. If you are=C2=A0not the named addressee, you are not authori= zed to read, print, retain, copy, or disseminate this message or any=C2=A0<= /span>part of it. If you have received th= is message in error, please notify the sender immediately by replying to th= is e-mail=C2=A0and delete all copi= es of this message.

On Thu, M= ay 9, 2024 at 5:45=E2=80=AFPM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/9/24 00:32, Daniel McKenzie= wrote:

> We've had this running in live now for years without a hiccup so w= e are
> surprised to learn that we have this massive race condition and it jus= t
> so happens that the hardware is fast enough to process the transaction=
> before the .NET application can react to replication slot changes.

On broad scale I'm going to say that over 'for years' there has= been an
increase in load on the Postgres server as well as the I/0 system of the machine it is running on. What you are seeing now is the canary in the
mine giving you the heads up that more trouble lies ahead as the
hardware and software is reaching load limits.

On finer scale my guess is that the following is happening when
synchronous_commit =3D on:

1) Postgres session #1 does data change.

2) This is replicated out and picked up by wal2json, which sees the new data.

3) The Postgres server waits for the confirmation that the WAL record
has been written out to disk. Upon confirmation it commits on the
server. This is the part that I am not sure of in relation to wal2json.

4) Postgres session #2 queries the database for the record. In the case where 3) has not completed it sees the old values as the data change in session #1 has not committed and therefore the new values are not seen
by other sessions.

>
> Daniel McKenzie
> Software Developer

--
Adrian Klaver
adrian.klave= r@aklaver.com

--000000000000d4b24d06181870d4--