Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sIFCe-00E8xf-G1 for pgsql-bugs@arkaria.postgresql.org; Fri, 14 Jun 2024 22:15:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sIFCc-006n0U-8b for pgsql-bugs@arkaria.postgresql.org; Fri, 14 Jun 2024 22:15:23 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sIFCb-006n0M-SI for pgsql-bugs@lists.postgresql.org; Fri, 14 Jun 2024 22:15:22 +0000 Received: from mail-yb1-xb2d.google.com ([2607:f8b0:4864:20::b2d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sIFCa-001hAg-1Z for pgsql-bugs@lists.postgresql.org; Fri, 14 Jun 2024 22:15:22 +0000 Received: by mail-yb1-xb2d.google.com with SMTP id 3f1490d57ef6-dff2df7de4aso7813276.2 for ; Fri, 14 Jun 2024 15:15:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718403318; x=1719008118; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=EUbEIYHoFF61n5OC0cCaqnbwObRNwtYFAzyftuVjqQ8=; b=CgAMMy3wHbU9rOsM56a+wXN3p7UNJrmeVNzNtUmLf49+xFrow/S4YcG2CigkyN9EOb vuB9J8WsXz6kZB9pd0dDu4n2qsbwGoUHReYgLVTrowZSejmsT4XrjDUIcb/4eQQBSK15 +dcpaaP6epYUJOkHvBC4w3iWHXEr9phepPHFhssE5oBufdRM6f4XjMVaWoiW9Vf4E9g3 1ZSMOFyO9r3AoyPlG067fFksHAdkkOkxkJufcN1DE66PZ5rIae6UymMBUPZmZbqIo27z u34u2PNXL/HyKsjQW3hBi119yztU4idqwMWNQX2F4YqLokQYcWEpE410kc+v8px0WMNK pl3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718403318; x=1719008118; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=EUbEIYHoFF61n5OC0cCaqnbwObRNwtYFAzyftuVjqQ8=; b=rwGGdivbvTcGz10+ujWcnjyb4fwOCZMpyt+1enVXoUVjdp4PF+kvbnVPNUeAtUha0q ko64HbEq37G1KBIyLhoT7fgpSIQgxtlQsPX1R1RJ4VsCGrOwNduilYI9vCqlTAMCFyNd zMnbzjoVJi+zaVBO0TKgF0WfH1xX+d1cDGJRXnKt0vH2oiTZxUE+PuAiVKRUEKQ5DLMU rLB3mWXEfUmc7s03YXUcSTUmJHbOdtp+NWVategY2exrr8O2sAqf28mFt5jjyN2DvQLb RbCdBZzeauzLzMavAxzlGSkNqfTsBpg6a1P7poOvtrYnWWR/A6oGq0Lp6O5+Qu+sAc7B UTEg== X-Gm-Message-State: AOJu0YwelFU1S5hmKb1Z5efxAlOFcvXxH9Z48sxtum6cfABVTIcswomk tRvREj9GsZ46WjRdZr91SpGoCQeKrlJ/c0UxcBJaNjWLsCDSYaBqEi3FQzOg8pveq8vbiw1o1gB aXj47ktn9vfx5uTwi4WOb1KMlC7iFZ+6B X-Google-Smtp-Source: AGHT+IEC34BFJuqgQNMWZKdpm2zhgXPvfbEBievvG936wCimgx8yZ93yOClgr6IOz2siekpjKzEqoyMdx37+45fvQ14= X-Received: by 2002:a25:fc1a:0:b0:dfb:6ff:403e with SMTP id 3f1490d57ef6-dff15342a1cmr3883265276.13.1718403317774; Fri, 14 Jun 2024 15:15:17 -0700 (PDT) MIME-Version: 1.0 From: Rafael Thofehrn Castro Date: Fri, 14 Jun 2024 19:15:06 -0300 Message-ID: Subject: Inconsistent increment of pg_stat_database.xact_rollback with logical replication To: pgsql-bugs@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000056410c061ae0f367" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000056410c061ae0f367 Content-Type: text/plain; charset="UTF-8" Column xact_rollback from pg_stat_database gets inconsistently incremented when logical replication is being used (on publisher side). This can be easily reproduced in latest code from master branch: - Publisher postgres=# select xact_commit, xact_rollback from pg_stat_database where datname = 'postgres'; -[ RECORD 1 ]-+--- xact_commit | 20 xact_rollback | 0 postgres=# insert into t1 values (1); INSERT 0 1 postgres=# insert into t1 values (2); INSERT 0 1 postgres=# insert into t1 values (3); INSERT 0 1 postgres=# insert into t1 values (4); INSERT 0 1 postgres=# insert into t1 values (5); INSERT 0 1 postgres=# insert into t1 values (6); INSERT 0 1 postgres=# insert into t1 values (7); INSERT 0 1 postgres=# insert into t1 values (8); INSERT 0 1 postgres=# insert into t1 values (9); INSERT 0 1 postgres=# insert into t1 values (10); INSERT 0 1 postgres=# select xact_commit, xact_rollback from pg_stat_database where datname = 'postgres'; -[ RECORD 1 ]-+--- xact_commit | 33 xact_rollback | 0 - Subscriber postgres=# alter subscription sub disable; ALTER SUBSCRIPTION - Publisher postgres=# select xact_commit, xact_rollback from pg_stat_database where datname = 'postgres'; -[ RECORD 1 ]-+--- xact_commit | 36 xact_rollback | 10 What seems to be happening is that the amount of transactions decoded by the walsender are being added in pg_stat_database.xact_rollback. But these changes are only flushed to global stats when the walsender gets terminated. On a quick look look at the source I would suspect that the issue starts here: https://github.com/postgres/postgres/blob/master/src/backend/replication/logical/reorderbuffer.c#L2545 All decoded transactions are aborted for cleanup purposes. Following the source code flow after calling AbortCurrentTransaction() we eventually reach the part that increments rollback stats here: https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_database.c#L249 This is causing inconsistency in monitoring TPS metric of a database where we eventually see sudden spikes of TPS in the order of millions. Regards, Rafael Castro. --00000000000056410c061ae0f367 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Column xact_rollback from= pg_stat_database gets inconsistently incremented when logical replication = is being used (on publisher side).

This can be easi= ly reproduced in latest code from master branch:

- Publisher

postgres=3D# select xact_co= mmit, xact_rollback from pg_stat_database where datname =3D 'postgres&#= 39;;

-[ RECORD 1 ]-+---

xact_commit =C2=A0 | 20

xact_rollback | 0


<= /p>

postgres=3D# insert into t1 values (1);

=

INSERT 0 1

postgres=3D# inse= rt into t1 values (2);

INSERT 0 1

postgres=3D# insert into t1 values (3);

INSERT 0 1

postgres= =3D# insert into t1 values (4);

INSERT 0 1<= /font>

postgres=3D# insert into t1 values (5);

INSERT 0 1

pos= tgres=3D# insert into t1 values (6);

INSERT= 0 1

postgres=3D# insert into t1 values (7)= ;

INSERT 0 1

I= NSERT 0 1

postgres=3D# insert into t1 valu= es (9);

INSERT 0 1

postgres=3D# insert into t1 values (10);

INSERT 0 1

<= font color=3D"#000000" face=3D"arial, sans-serif">

postgres=3D# select xact_commit, xact_rollback from pg_stat_dat= abase where datname =3D 'postgres';

-[ RECORD 1 ]-+---

xact_commit =C2=A0 | 33

xact_roll= back | 0

<= br>

- Subscriber

<= font color=3D"#000000" face=3D"arial, sans-serif">

postg= res=3D# alter subscription sub disable;

ALTER SUBSCRIPTION


- Publisher


postgres=3D# select xact_commit, xact_rollback from pg_stat_= database where datname =3D 'postgres';

-[ RECORD 1 ]-+---

xact_commit =C2=A0 | 36

xact_rollb= ack | 10

What seems to be happe= ning is that the amount of transactions decoded by the walsender are being = added in pg_stat_database.xact_rollback. But these changes are only flushed= to global stats when the walsender gets terminated.

On a quick look look at the source I would suspect that the issue starts= here:=C2=A0https://github.c= om/postgres/postgres/blob/master/src/backend/replication/logical/reorderbuf= fer.c#L2545

All decoded transactions are abort= ed for cleanup purposes. Following the source code flow after calling=C2=A0= AbortCurrentTransaction() we eventually reach the part that increments roll= back stats here:=C2=A0https://github.co= m/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_database.= c#L249

This is causing inconsistency in monito= ring TPS metric of a database where we eventually see sudden spikes of TPS = in the order of millions.

Regards,

<= /div>
Rafael Castro.
--00000000000056410c061ae0f367--