public inbox for [email protected]
help / color / mirror / Atom feedBUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
8+ messages / 3 participants
[nested] [flat]
* BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-03-25 14:06 PG Bug reporting form <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: PG Bug reporting form @ 2026-03-25 14:06 UTC (permalink / raw)
To: [email protected]; +Cc: [email protected]
The following bug has been logged on the website:
Bug reference: 19439
Logged by: klemen kobau
Email address: [email protected]
PostgreSQL version: 18.0
Operating system: Linux (EndeavorOS)
Description:
I am running postgres:18.0 in a docker container, the configuration is as
follows:
postgres:
image: postgres:18.0
command: [
"postgres",
"-N", "200",
"-c", "max_prepared_transactions=100",
"-c", "wal_level=logical",
"-c", "max_wal_senders=10",
"-c", "max_replication_slots=20",
"-c", "wal_keep_size=10",
"-c", "max_slot_wal_keep_size=1024"
]
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
TZ: UTC
PGTZ: UTC
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresql
I use psql version 18.3.
I run the following:
psql -h localhost -p 5432 -U postgres -d postgres -c "
-- Transaction 1: insert 1 row, check stats, commit
BEGIN;
CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
INSERT INTO temp.xact_test (val) VALUES ('a');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
COMMIT;
-- Transaction 2: insert 1 row, check stats
BEGIN;
INSERT INTO temp.xact_test (val) VALUES ('b');
SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
'xact_test';
ROLLBACK;
-- Cleanup
DROP TABLE temp.xact_test;
"
And the output is
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)
COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)
ROLLBACK
DROP TABLE
This is not what I would expect from reading
https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
where it states
> A transaction can also see its own statistics (not yet flushed out to the
shared memory statistics) in the views pg_stat_xact_all_tables,
pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
pg_stat_xact_user_functions. These numbers do not act as stated above;
instead they update continuously throughout the transaction.
based on this, I would expect that the numbers are updated each time an
insert happens and that they are isolated per transaction.
Kind regards
Klemen Kobau
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-03-27 02:15 Xuneng Zhou <[email protected]>
parent: PG Bug reporting form <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Xuneng Zhou @ 2026-03-27 02:15 UTC (permalink / raw)
To: [email protected]; [email protected]
Hi klemen,
Thanks for the report.
On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
<[email protected]> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 19439
> Logged by: klemen kobau
> Email address: [email protected]
> PostgreSQL version: 18.0
> Operating system: Linux (EndeavorOS)
> Description:
>
> I am running postgres:18.0 in a docker container, the configuration is as
> follows:
>
> postgres:
> image: postgres:18.0
> command: [
> "postgres",
> "-N", "200",
> "-c", "max_prepared_transactions=100",
> "-c", "wal_level=logical",
> "-c", "max_wal_senders=10",
> "-c", "max_replication_slots=20",
> "-c", "wal_keep_size=10",
> "-c", "max_slot_wal_keep_size=1024"
> ]
> environment:
> POSTGRES_USER: postgres
> POSTGRES_PASSWORD: postgres
> POSTGRES_DB: postgres
> TZ: UTC
> PGTZ: UTC
> ports:
> - 5432:5432
> volumes:
> - postgres_data:/var/lib/postgresql
>
> I use psql version 18.3.
>
> I run the following:
>
> psql -h localhost -p 5432 -U postgres -d postgres -c "
> -- Transaction 1: insert 1 row, check stats, commit
> BEGIN;
> CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
> INSERT INTO temp.xact_test (val) VALUES ('a');
> SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> 'xact_test';
> COMMIT;
>
> -- Transaction 2: insert 1 row, check stats
> BEGIN;
> INSERT INTO temp.xact_test (val) VALUES ('b');
> SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> 'xact_test';
> ROLLBACK;
>
> -- Cleanup
> DROP TABLE temp.xact_test;
> "
>
> And the output is
>
> BEGIN
> CREATE TABLE
> INSERT 0 1
> relname | n_tup_ins
> -----------+-----------
> xact_test | 1
> (1 row)
>
> COMMIT
> BEGIN
> INSERT 0 1
> relname | n_tup_ins
> -----------+-----------
> xact_test | 2
> (1 row)
>
> ROLLBACK
> DROP TABLE
I can also reproduce this behavior on HEAD.
> This is not what I would expect from reading
> https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
> where it states
>
> > A transaction can also see its own statistics (not yet flushed out to the
> shared memory statistics) in the views pg_stat_xact_all_tables,
> pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
> pg_stat_xact_user_functions. These numbers do not act as stated above;
> instead they update continuously throughout the transaction.
>
> based on this, I would expect that the numbers are updated each time an
> insert happens and that they are isolated per transaction.
>
This snippet of the doc feels somewhat ambiguous. The current behavior
does not seem fully consistent with wording such as “throughout the
transaction.” It also seems more appropriate for these table views to
reflect stats within their own scopes, rather than cumulative values
spanning txn boundaries.
--
Best,
Xuneng
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-03-28 05:47 Xuneng Zhou <[email protected]>
parent: Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Xuneng Zhou @ 2026-03-28 05:47 UTC (permalink / raw)
To: [email protected]; [email protected]
On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <[email protected]> wrote:
>
> Hi klemen,
>
> Thanks for the report.
>
> On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
> <[email protected]> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference: 19439
> > Logged by: klemen kobau
> > Email address: [email protected]
> > PostgreSQL version: 18.0
> > Operating system: Linux (EndeavorOS)
> > Description:
> >
> > I am running postgres:18.0 in a docker container, the configuration is as
> > follows:
> >
> > postgres:
> > image: postgres:18.0
> > command: [
> > "postgres",
> > "-N", "200",
> > "-c", "max_prepared_transactions=100",
> > "-c", "wal_level=logical",
> > "-c", "max_wal_senders=10",
> > "-c", "max_replication_slots=20",
> > "-c", "wal_keep_size=10",
> > "-c", "max_slot_wal_keep_size=1024"
> > ]
> > environment:
> > POSTGRES_USER: postgres
> > POSTGRES_PASSWORD: postgres
> > POSTGRES_DB: postgres
> > TZ: UTC
> > PGTZ: UTC
> > ports:
> > - 5432:5432
> > volumes:
> > - postgres_data:/var/lib/postgresql
> >
> > I use psql version 18.3.
> >
> > I run the following:
> >
> > psql -h localhost -p 5432 -U postgres -d postgres -c "
> > -- Transaction 1: insert 1 row, check stats, commit
> > BEGIN;
> > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
> > INSERT INTO temp.xact_test (val) VALUES ('a');
> > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > 'xact_test';
> > COMMIT;
> >
> > -- Transaction 2: insert 1 row, check stats
> > BEGIN;
> > INSERT INTO temp.xact_test (val) VALUES ('b');
> > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > 'xact_test';
> > ROLLBACK;
> >
> > -- Cleanup
> > DROP TABLE temp.xact_test;
> > "
> >
> > And the output is
> >
> > BEGIN
> > CREATE TABLE
> > INSERT 0 1
> > relname | n_tup_ins
> > -----------+-----------
> > xact_test | 1
> > (1 row)
> >
> > COMMIT
> > BEGIN
> > INSERT 0 1
> > relname | n_tup_ins
> > -----------+-----------
> > xact_test | 2
> > (1 row)
> >
> > ROLLBACK
> > DROP TABLE
>
> I can also reproduce this behavior on HEAD.
>
> > This is not what I would expect from reading
> > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
> > where it states
> >
> > > A transaction can also see its own statistics (not yet flushed out to the
> > shared memory statistics) in the views pg_stat_xact_all_tables,
> > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
> > pg_stat_xact_user_functions. These numbers do not act as stated above;
> > instead they update continuously throughout the transaction.
> >
> > based on this, I would expect that the numbers are updated each time an
> > insert happens and that they are isolated per transaction.
> >
>
> This snippet of the doc feels somewhat ambiguous. The current behavior
> does not seem fully consistent with wording such as “throughout the
> transaction.” It also seems more appropriate for these table views to
> reflect stats within their own scopes, rather than cumulative values
> spanning txn boundaries.
>
I’ve looked into this issue and 'd like to propose a patch to address
it. Feedback is very welcome.
--- Root cause
Since the stats subsystem was rewritten (commit 5891c7a8), each
backend keeps a hash of pending stats entries that persist across
transaction boundaries. Entries that flush successfully are deleted
from the backend-local pending list; entries not flushed remain
pending with their accumulated counters. Flushing is driven by
pgstat_report_stat(), called from the ReadyForQuery path, and subject
to a rate limiter.
The pg_stat_xact_* views read these pending entries directly via
find_tabstat_entry() / find_funcstat_entry(). Both functions return
the raw accumulated counters without per-transaction scoping.
The bug is deterministic when multiple top-level transactions are
processed within a single simple-query message, because there is no
ReadyForQuery boundary between the transactions and therefore no
opportunity to flush and remove the pending entry.
--- Fix
The patch introduces a per-entry "transaction baseline" — a lazy
snapshot of the counters taken the first time an entry is touched in
each new top-level transaction. The accessor functions
(find_tabstat_entry, find_funcstat_entry) then subtract the baseline
from the current counters, yielding only the current transaction's
delta.
The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
pgstat_ensure_xact_baseline(), is called at every nontransactional
counter-increment site (the 7 event-counter macros in pgstat.h and the
4 non-inline counter functions in pgstat_relation.c). After the first
call per entry per transaction, the check reduces to a single integer
comparison with a predictably not-taken branch.
For function stats, a new PgStat_FunctionPending wrapper struct embeds
PgStat_FunctionCounts at offset 0 with the baseline fields appended,
so the flush callback requires only a trivial cast change.
--- Testing
The regression test is a TAP test under src/bin/psql/t/ that uses psql
-c to send multi-statement strings as single simple-query messages.
src/bin/psql/t look like the right existing harness for the primary
regression because psql -c sends the whole multi-statement string as
one simple-query message via simple_action_list dispatch. By contrast,
ordinary pg_regress SQL files are executed by psql in file/stdin mode,
which runs through MainLoop() and dispatches top-level statements one
at a time via SendQuery(), creating ReadyForQuery boundaries that
could mask the bug.
The test covers three scenarios: table n_tup_ins and seq_scan
counters, and function calls — each verifying isolation across
consecutive top-level transactions within a single message.
--
Best,
Xuneng
Attachments:
[application/octet-stream] v1-0001-Fix-pg_stat_xact_-views-leaking-stats-across-tran.patch (17.3K, 2-v1-0001-Fix-pg_stat_xact_-views-leaking-stats-across-tran.patch)
download | inline diff:
From a8391d40266eaff6be24a2feb38eb849d56c2c4f Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Fri, 27 Mar 2026 20:05:56 +0800
Subject: [PATCH v1] Fix pg_stat_xact_* views leaking stats across transaction
boundaries
The pg_stat_xact_user_tables and pg_stat_xact_user_functions views are
supposed to report activity for the current transaction only. However,
because backend-local pending stats entries are now kept alive across
transactions for asynchronous flushing, these views returned the total
accumulated pending counters instead of just the current transaction's
contribution.
Fix by introducing a per-entry "transaction baseline" that lazily
snapshots the pending counters the first time each entry is touched in a
new top-level transaction. The xact-scoped accessor functions then
subtract the baseline, yielding only the current transaction's delta.
The baseline is keyed by MyProc->vxid.lxid and established via a static
inline helper, pgstat_ensure_xact_baseline(), called at every
nontransactional counter-increment site. After the first call per entry
per transaction, the check reduces to a single integer comparison.
For function stats, a new PgStat_FunctionPending wrapper struct embeds
the existing PgStat_FunctionCounts at offset 0, so the flush callback
requires only a trivial cast change.
---
src/backend/utils/activity/pgstat.c | 2 +-
src/backend/utils/activity/pgstat_function.c | 47 +++++--
src/backend/utils/activity/pgstat_relation.c | 36 ++++++
src/bin/psql/meson.build | 1 +
src/bin/psql/t/040_pgstat_xact.pl | 121 +++++++++++++++++++
src/include/pgstat.h | 51 ++++++++
6 files changed, 249 insertions(+), 9 deletions(-)
create mode 100644 src/bin/psql/t/040_pgstat_xact.pl
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index eb8ccbaa628..730d474c216 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -326,7 +326,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Function),
.shared_data_off = offsetof(PgStatShared_Function, stats),
.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
- .pending_size = sizeof(PgStat_FunctionCounts),
+ .pending_size = sizeof(PgStat_FunctionPending),
.flush_pending_cb = pgstat_function_flush_cb,
.reset_timestamp_cb = pgstat_function_reset_timestamp_cb,
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index e6b84283c6c..a8f1354d1ee 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -18,6 +18,7 @@
#include "postgres.h"
#include "fmgr.h"
+#include "storage/proc.h"
#include "utils/inval.h"
#include "utils/pgstat_internal.h"
#include "utils/syscache.h"
@@ -73,8 +74,9 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
PgStat_FunctionCallUsage *fcu)
{
PgStat_EntryRef *entry_ref;
- PgStat_FunctionCounts *pending;
+ PgStat_FunctionPending *fpending;
bool created_entry;
+ LocalTransactionId curlxid = MyProc->vxid.lxid;
if (pgstat_track_functions <= fcinfo->flinfo->fn_stats)
{
@@ -119,12 +121,19 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
}
}
- pending = entry_ref->pending;
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
- fcu->fs = pending;
+ /* Establish baseline on first call in this top-level transaction */
+ if (fpending->baseline_lxid != curlxid)
+ {
+ fpending->xact_baseline = fpending->counts;
+ fpending->baseline_lxid = curlxid;
+ }
+
+ fcu->fs = &fpending->counts;
/* save stats for this function, later used to compensate for recursion */
- fcu->save_f_total_time = pending->total_time;
+ fcu->save_f_total_time = fpending->counts.total_time;
/* save current backend-wide total time */
fcu->save_total = total_func_time;
@@ -192,10 +201,12 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize)
bool
pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
{
+ PgStat_FunctionPending *fpending;
PgStat_FunctionCounts *localent;
PgStatShared_Function *shfuncent;
- localent = (PgStat_FunctionCounts *) entry_ref->pending;
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+ localent = &fpending->counts;
shfuncent = (PgStatShared_Function *) entry_ref->shared_stats;
/* localent always has non-zero content */
@@ -229,12 +240,32 @@ PgStat_FunctionCounts *
find_funcstat_entry(Oid func_id)
{
PgStat_EntryRef *entry_ref;
+ PgStat_FunctionPending *fpending;
+ PgStat_FunctionCounts *result;
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
- if (entry_ref)
- return entry_ref->pending;
- return NULL;
+ if (!entry_ref)
+ return NULL;
+
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+
+ /* Not called in this transaction — preserve NULL semantics */
+ if (fpending->baseline_lxid != MyProc->vxid.lxid)
+ return NULL;
+
+ result = palloc(sizeof(PgStat_FunctionCounts));
+ result->numcalls = fpending->counts.numcalls
+ - fpending->xact_baseline.numcalls;
+ INSTR_TIME_SET_ZERO(result->total_time);
+ INSTR_TIME_ACCUM_DIFF(result->total_time,
+ fpending->counts.total_time,
+ fpending->xact_baseline.total_time);
+ INSTR_TIME_SET_ZERO(result->self_time);
+ INSTR_TIME_ACCUM_DIFF(result->self_time,
+ fpending->counts.self_time,
+ fpending->xact_baseline.self_time);
+ return result;
}
/*
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index bc8c43b96aa..f814ea69797 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -20,6 +20,7 @@
#include "access/twophase_rmgr.h"
#include "access/xact.h"
#include "catalog/catalog.h"
+#include "storage/proc.h"
#include "utils/memutils.h"
#include "utils/pgstat_internal.h"
#include "utils/rel.h"
@@ -376,6 +377,7 @@ pgstat_count_heap_insert(Relation rel, PgStat_Counter n)
{
PgStat_TableStatus *pgstat_info = rel->pgstat_info;
+ pgstat_ensure_xact_baseline(pgstat_info);
ensure_tabstat_xact_level(pgstat_info);
pgstat_info->trans->tuples_inserted += n;
}
@@ -393,6 +395,7 @@ pgstat_count_heap_update(Relation rel, bool hot, bool newpage)
{
PgStat_TableStatus *pgstat_info = rel->pgstat_info;
+ pgstat_ensure_xact_baseline(pgstat_info);
ensure_tabstat_xact_level(pgstat_info);
pgstat_info->trans->tuples_updated++;
@@ -417,6 +420,7 @@ pgstat_count_heap_delete(Relation rel)
{
PgStat_TableStatus *pgstat_info = rel->pgstat_info;
+ pgstat_ensure_xact_baseline(pgstat_info);
ensure_tabstat_xact_level(pgstat_info);
pgstat_info->trans->tuples_deleted++;
}
@@ -432,6 +436,7 @@ pgstat_count_truncate(Relation rel)
{
PgStat_TableStatus *pgstat_info = rel->pgstat_info;
+ pgstat_ensure_xact_baseline(pgstat_info);
ensure_tabstat_xact_level(pgstat_info);
save_truncdrop_counters(pgstat_info->trans, false);
pgstat_info->trans->tuples_inserted = 0;
@@ -523,6 +528,33 @@ find_tabstat_entry(Oid rel_id)
*/
tablestatus->trans = NULL;
+ /*
+ * Adjust counters to show only the current top-level transaction's
+ * activity. If the baseline was set in the current transaction, subtract
+ * it to remove prior-transaction data. Otherwise, the entry was not
+ * touched in the current transaction, so all counters are zero.
+ */
+ if (tabentry->baseline_lxid == MyProc->vxid.lxid)
+ {
+ PgStat_TableCounts *base = &tabentry->xact_baseline;
+
+ tablestatus->counts.numscans -= base->numscans;
+ tablestatus->counts.tuples_returned -= base->tuples_returned;
+ tablestatus->counts.tuples_fetched -= base->tuples_fetched;
+ tablestatus->counts.tuples_inserted -= base->tuples_inserted;
+ tablestatus->counts.tuples_updated -= base->tuples_updated;
+ tablestatus->counts.tuples_deleted -= base->tuples_deleted;
+ tablestatus->counts.tuples_hot_updated -= base->tuples_hot_updated;
+ tablestatus->counts.tuples_newpage_updated -= base->tuples_newpage_updated;
+ tablestatus->counts.blocks_fetched -= base->blocks_fetched;
+ tablestatus->counts.blocks_hit -= base->blocks_hit;
+ }
+ else
+ {
+ /* Entry not touched in this transaction; all xact counters are zero */
+ memset(&tablestatus->counts, 0, sizeof(PgStat_TableCounts));
+ }
+
/*
* Live subtransaction counts are not included yet. This is not a hot
* code path so reconcile tuples_inserted, tuples_updated and
@@ -752,6 +784,8 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
/* Find or create a tabstat entry for the rel */
pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+ pgstat_ensure_xact_baseline(pgstat_info);
+
/* Same math as in AtEOXact_PgStat, commit case */
pgstat_info->counts.tuples_inserted += rec->tuples_inserted;
pgstat_info->counts.tuples_updated += rec->tuples_updated;
@@ -788,6 +822,8 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
/* Find or create a tabstat entry for the rel */
pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+ pgstat_ensure_xact_baseline(pgstat_info);
+
/* Same math as in AtEOXact_PgStat, abort case */
if (rec->truncdropped)
{
diff --git a/src/bin/psql/meson.build b/src/bin/psql/meson.build
index 922b2845267..25fbb817f1d 100644
--- a/src/bin/psql/meson.build
+++ b/src/bin/psql/meson.build
@@ -78,6 +78,7 @@ tests += {
't/010_tab_completion.pl',
't/020_cancel.pl',
't/030_pager.pl',
+ 't/040_pgstat_xact.pl',
],
},
}
diff --git a/src/bin/psql/t/040_pgstat_xact.pl b/src/bin/psql/t/040_pgstat_xact.pl
new file mode 100644
index 00000000000..d53a7424c4c
--- /dev/null
+++ b/src/bin/psql/t/040_pgstat_xact.pl
@@ -0,0 +1,121 @@
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test that pg_stat_xact_* views report only current-transaction activity,
+# not accumulated pending stats from prior transactions.
+#
+# This test uses "psql -c" to send multi-statement strings as a single
+# simple-query protocol message. This ensures there is no ReadyForQuery
+# boundary (and therefore no pgstat_report_stat() opportunity) between
+# consecutive top-level transactions within the same message, which is the
+# condition required to deterministically reproduce the cross-transaction
+# stats leakage bug.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "track_functions = 'all'");
+$node->start;
+
+my $dbname = 'postgres';
+
+# Create test objects.
+$node->safe_psql(
+ $dbname, q{
+ CREATE TABLE test_xact_stats (x int);
+ CREATE FUNCTION test_xact_stats_func() RETURNS void
+ LANGUAGE plpgsql AS $$ BEGIN NULL; END; $$;
+});
+
+# Like psql_like in 001_basic.pl, but sends the SQL via "psql -c" so
+# that all statements go as a single simple-query message.
+sub psql_c_like
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($node, $sql, $expected_stdout, $test_name) = @_;
+
+ my ($stdout, $stderr);
+ my @cmd = (
+ $node->installed_command('psql'),
+ '--no-psqlrc', '--no-align', '--tuples-only', '--quiet',
+ '-d', $node->connstr($dbname),
+ '-c', $sql);
+ my $result = IPC::Run::run \@cmd, '>', \$stdout, '2>', \$stderr;
+
+ is($result, 1, "$test_name: exit code 0");
+ is($stderr, '', "$test_name: no stderr");
+ like($stdout, $expected_stdout, "$test_name: matches");
+
+ return;
+}
+
+
+##
+## Test 1: Table n_tup_ins must not leak across top-level transactions.
+##
+## Two consecutive transactions each insert one row. Without the fix,
+## pg_stat_xact_user_tables in the second transaction would show
+## n_tup_ins = 2 (accumulated) instead of 1.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (1);
+ COMMIT;
+ BEGIN;
+ INSERT INTO test_xact_stats VALUES (2);
+ SELECT n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^1$/m,
+ "table n_tup_ins shows only current transaction's inserts");
+
+
+##
+## Test 2: seq_scan counter must not leak across top-level transactions.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ COMMIT;
+ BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ SELECT seq_scan FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/1$/m,
+ "table seq_scan shows only current transaction's scans");
+
+
+##
+## Test 3: Function call stats must not leak across top-level transactions.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT test_xact_stats_func();
+ COMMIT;
+ BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT calls FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^1$/m,
+ "function calls shows only current transaction's calls");
+
+# Cleanup
+$node->safe_psql(
+ $dbname, q{
+ DROP TABLE test_xact_stats;
+ DROP FUNCTION test_xact_stats_func();
+});
+
+$node->stop;
+done_testing();
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e3549c3752..e2776e7630c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
#include "postmaster/pgarch.h" /* for MAX_XFN_CHARS */
#include "replication/conflict.h"
#include "storage/locktag.h"
+#include "storage/proc.h"
#include "utils/backend_progress.h" /* for backward compatibility */ /* IWYU pragma: export */
#include "utils/backend_status.h" /* for backward compatibility */ /* IWYU pragma: export */
#include "utils/pgstat_kind.h"
@@ -90,6 +91,17 @@ typedef struct PgStat_FunctionCounts
instr_time self_time;
} PgStat_FunctionCounts;
+/*
+ * Backend-local pending state for function stats, wrapping the accumulated
+ * counts with a per-transaction baseline for pg_stat_xact_* isolation.
+ */
+typedef struct PgStat_FunctionPending
+{
+ PgStat_FunctionCounts counts; /* accumulated counts */
+ PgStat_FunctionCounts xact_baseline; /* snapshot at top-level txn start */
+ LocalTransactionId baseline_lxid; /* lxid when baseline was set */
+} PgStat_FunctionPending;
+
/*
* Working state needed to accumulate per-function-call timing statistics.
*/
@@ -183,6 +195,8 @@ typedef struct PgStat_TableStatus
bool shared; /* is it a shared catalog? */
struct PgStat_TableXactStatus *trans; /* lowest subxact's counts */
PgStat_TableCounts counts; /* event counts to be sent */
+ PgStat_TableCounts xact_baseline; /* snapshot at top-level txn start */
+ LocalTransactionId baseline_lxid; /* lxid when baseline was set */
Relation relation; /* rel that is using this entry */
} PgStat_TableStatus;
@@ -721,39 +735,76 @@ extern void pgstat_report_analyze(Relation rel,
#define pgstat_count_heap_scan(rel) \
do { \
if (pgstat_should_count_relation(rel)) \
+ { \
+ pgstat_ensure_xact_baseline((rel)->pgstat_info); \
(rel)->pgstat_info->counts.numscans++; \
+ } \
} while (0)
#define pgstat_count_heap_getnext(rel) \
do { \
if (pgstat_should_count_relation(rel)) \
+ { \
+ pgstat_ensure_xact_baseline((rel)->pgstat_info); \
(rel)->pgstat_info->counts.tuples_returned++; \
+ } \
} while (0)
#define pgstat_count_heap_fetch(rel) \
do { \
if (pgstat_should_count_relation(rel)) \
+ { \
+ pgstat_ensure_xact_baseline((rel)->pgstat_info); \
(rel)->pgstat_info->counts.tuples_fetched++; \
+ } \
} while (0)
#define pgstat_count_index_scan(rel) \
do { \
if (pgstat_should_count_relation(rel)) \
+ { \
+ pgstat_ensure_xact_baseline((rel)->pgstat_info); \
(rel)->pgstat_info->counts.numscans++; \
+ } \
} while (0)
#define pgstat_count_index_tuples(rel, n) \
do { \
if (pgstat_should_count_relation(rel)) \
+ { \
+ pgstat_ensure_xact_baseline((rel)->pgstat_info); \
(rel)->pgstat_info->counts.tuples_returned += (n); \
+ } \
} while (0)
#define pgstat_count_buffer_read(rel) \
do { \
if (pgstat_should_count_relation(rel)) \
+ { \
+ pgstat_ensure_xact_baseline((rel)->pgstat_info); \
(rel)->pgstat_info->counts.blocks_fetched++; \
+ } \
} while (0)
#define pgstat_count_buffer_hit(rel) \
do { \
if (pgstat_should_count_relation(rel)) \
+ { \
+ pgstat_ensure_xact_baseline((rel)->pgstat_info); \
(rel)->pgstat_info->counts.blocks_hit++; \
+ } \
} while (0)
+/*
+ * Ensure the xact baseline is current for this top-level transaction.
+ * Must be called before any counter in PgStat_TableStatus.counts is modified.
+ */
+static inline void
+pgstat_ensure_xact_baseline(PgStat_TableStatus *pgstat_info)
+{
+ LocalTransactionId curlxid = MyProc->vxid.lxid;
+
+ if (unlikely(pgstat_info->baseline_lxid != curlxid))
+ {
+ pgstat_info->xact_baseline = pgstat_info->counts;
+ pgstat_info->baseline_lxid = curlxid;
+ }
+}
+
extern void pgstat_count_heap_insert(Relation rel, PgStat_Counter n);
extern void pgstat_count_heap_update(Relation rel, bool hot, bool newpage);
extern void pgstat_count_heap_delete(Relation rel);
--
2.51.0
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-03-29 03:17 Xuneng Zhou <[email protected]>
parent: Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Xuneng Zhou @ 2026-03-29 03:17 UTC (permalink / raw)
To: [email protected]; [email protected]
On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <[email protected]> wrote:
>
> On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi klemen,
> >
> > Thanks for the report.
> >
> > On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
> > <[email protected]> wrote:
> > >
> > > The following bug has been logged on the website:
> > >
> > > Bug reference: 19439
> > > Logged by: klemen kobau
> > > Email address: [email protected]
> > > PostgreSQL version: 18.0
> > > Operating system: Linux (EndeavorOS)
> > > Description:
> > >
> > > I am running postgres:18.0 in a docker container, the configuration is as
> > > follows:
> > >
> > > postgres:
> > > image: postgres:18.0
> > > command: [
> > > "postgres",
> > > "-N", "200",
> > > "-c", "max_prepared_transactions=100",
> > > "-c", "wal_level=logical",
> > > "-c", "max_wal_senders=10",
> > > "-c", "max_replication_slots=20",
> > > "-c", "wal_keep_size=10",
> > > "-c", "max_slot_wal_keep_size=1024"
> > > ]
> > > environment:
> > > POSTGRES_USER: postgres
> > > POSTGRES_PASSWORD: postgres
> > > POSTGRES_DB: postgres
> > > TZ: UTC
> > > PGTZ: UTC
> > > ports:
> > > - 5432:5432
> > > volumes:
> > > - postgres_data:/var/lib/postgresql
> > >
> > > I use psql version 18.3.
> > >
> > > I run the following:
> > >
> > > psql -h localhost -p 5432 -U postgres -d postgres -c "
> > > -- Transaction 1: insert 1 row, check stats, commit
> > > BEGIN;
> > > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
> > > INSERT INTO temp.xact_test (val) VALUES ('a');
> > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > > 'xact_test';
> > > COMMIT;
> > >
> > > -- Transaction 2: insert 1 row, check stats
> > > BEGIN;
> > > INSERT INTO temp.xact_test (val) VALUES ('b');
> > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > > 'xact_test';
> > > ROLLBACK;
> > >
> > > -- Cleanup
> > > DROP TABLE temp.xact_test;
> > > "
> > >
> > > And the output is
> > >
> > > BEGIN
> > > CREATE TABLE
> > > INSERT 0 1
> > > relname | n_tup_ins
> > > -----------+-----------
> > > xact_test | 1
> > > (1 row)
> > >
> > > COMMIT
> > > BEGIN
> > > INSERT 0 1
> > > relname | n_tup_ins
> > > -----------+-----------
> > > xact_test | 2
> > > (1 row)
> > >
> > > ROLLBACK
> > > DROP TABLE
> >
> > I can also reproduce this behavior on HEAD.
> >
> > > This is not what I would expect from reading
> > > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
> > > where it states
> > >
> > > > A transaction can also see its own statistics (not yet flushed out to the
> > > shared memory statistics) in the views pg_stat_xact_all_tables,
> > > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
> > > pg_stat_xact_user_functions. These numbers do not act as stated above;
> > > instead they update continuously throughout the transaction.
> > >
> > > based on this, I would expect that the numbers are updated each time an
> > > insert happens and that they are isolated per transaction.
> > >
> >
> > This snippet of the doc feels somewhat ambiguous. The current behavior
> > does not seem fully consistent with wording such as “throughout the
> > transaction.” It also seems more appropriate for these table views to
> > reflect stats within their own scopes, rather than cumulative values
> > spanning txn boundaries.
> >
>
> I’ve looked into this issue and 'd like to propose a patch to address
> it. Feedback is very welcome.
>
> --- Root cause
>
> Since the stats subsystem was rewritten (commit 5891c7a8), each
> backend keeps a hash of pending stats entries that persist across
> transaction boundaries. Entries that flush successfully are deleted
> from the backend-local pending list; entries not flushed remain
> pending with their accumulated counters. Flushing is driven by
> pgstat_report_stat(), called from the ReadyForQuery path, and subject
> to a rate limiter.
>
> The pg_stat_xact_* views read these pending entries directly via
> find_tabstat_entry() / find_funcstat_entry(). Both functions return
> the raw accumulated counters without per-transaction scoping.
>
> The bug is deterministic when multiple top-level transactions are
> processed within a single simple-query message, because there is no
> ReadyForQuery boundary between the transactions and therefore no
> opportunity to flush and remove the pending entry.
>
> --- Fix
>
> The patch introduces a per-entry "transaction baseline" — a lazy
> snapshot of the counters taken the first time an entry is touched in
> each new top-level transaction. The accessor functions
> (find_tabstat_entry, find_funcstat_entry) then subtract the baseline
> from the current counters, yielding only the current transaction's
> delta.
>
> The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
> pgstat_ensure_xact_baseline(), is called at every nontransactional
> counter-increment site (the 7 event-counter macros in pgstat.h and the
> 4 non-inline counter functions in pgstat_relation.c). After the first
> call per entry per transaction, the check reduces to a single integer
> comparison with a predictably not-taken branch.
>
> For function stats, a new PgStat_FunctionPending wrapper struct embeds
> PgStat_FunctionCounts at offset 0 with the baseline fields appended,
> so the flush callback requires only a trivial cast change.
>
> --- Testing
>
> The regression test is a TAP test under src/bin/psql/t/ that uses psql
> -c to send multi-statement strings as single simple-query messages.
> src/bin/psql/t look like the right existing harness for the primary
> regression because psql -c sends the whole multi-statement string as
> one simple-query message via simple_action_list dispatch. By contrast,
> ordinary pg_regress SQL files are executed by psql in file/stdin mode,
> which runs through MainLoop() and dispatches top-level statements one
> at a time via SendQuery(), creating ReadyForQuery boundaries that
> could mask the bug.
>
> The test covers three scenarios: table n_tup_ins and seq_scan
> counters, and function calls — each verifying isolation across
> consecutive top-level transactions within a single message.
>
postgres % cat output.txt
BEGIN
CREATE TABLE
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 1
(1 row)
COMMIT
BEGIN
INSERT 0 1
relname | n_tup_ins
-----------+-----------
xact_test | 2
(1 row)
ROLLBACK
DROP TABLE
I can also reproduce this behavior in v13. So this issue pre-existed
the major refactor 5891c7a8.
--
Best,
Xuneng
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-03-30 03:14 Xuneng Zhou <[email protected]>
parent: Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Xuneng Zhou @ 2026-03-30 03:14 UTC (permalink / raw)
To: [email protected]; [email protected]
On Sun, Mar 29, 2026 at 11:17 AM Xuneng Zhou <[email protected]> wrote:
>
> On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <[email protected]> wrote:
> >
> > On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <[email protected]> wrote:
> > >
> > > Hi klemen,
> > >
> > > Thanks for the report.
> > >
> > > On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
> > > <[email protected]> wrote:
> > > >
> > > > The following bug has been logged on the website:
> > > >
> > > > Bug reference: 19439
> > > > Logged by: klemen kobau
> > > > Email address: [email protected]
> > > > PostgreSQL version: 18.0
> > > > Operating system: Linux (EndeavorOS)
> > > > Description:
> > > >
> > > > I am running postgres:18.0 in a docker container, the configuration is as
> > > > follows:
> > > >
> > > > postgres:
> > > > image: postgres:18.0
> > > > command: [
> > > > "postgres",
> > > > "-N", "200",
> > > > "-c", "max_prepared_transactions=100",
> > > > "-c", "wal_level=logical",
> > > > "-c", "max_wal_senders=10",
> > > > "-c", "max_replication_slots=20",
> > > > "-c", "wal_keep_size=10",
> > > > "-c", "max_slot_wal_keep_size=1024"
> > > > ]
> > > > environment:
> > > > POSTGRES_USER: postgres
> > > > POSTGRES_PASSWORD: postgres
> > > > POSTGRES_DB: postgres
> > > > TZ: UTC
> > > > PGTZ: UTC
> > > > ports:
> > > > - 5432:5432
> > > > volumes:
> > > > - postgres_data:/var/lib/postgresql
> > > >
> > > > I use psql version 18.3.
> > > >
> > > > I run the following:
> > > >
> > > > psql -h localhost -p 5432 -U postgres -d postgres -c "
> > > > -- Transaction 1: insert 1 row, check stats, commit
> > > > BEGIN;
> > > > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
> > > > INSERT INTO temp.xact_test (val) VALUES ('a');
> > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > > > 'xact_test';
> > > > COMMIT;
> > > >
> > > > -- Transaction 2: insert 1 row, check stats
> > > > BEGIN;
> > > > INSERT INTO temp.xact_test (val) VALUES ('b');
> > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
> > > > 'xact_test';
> > > > ROLLBACK;
> > > >
> > > > -- Cleanup
> > > > DROP TABLE temp.xact_test;
> > > > "
> > > >
> > > > And the output is
> > > >
> > > > BEGIN
> > > > CREATE TABLE
> > > > INSERT 0 1
> > > > relname | n_tup_ins
> > > > -----------+-----------
> > > > xact_test | 1
> > > > (1 row)
> > > >
> > > > COMMIT
> > > > BEGIN
> > > > INSERT 0 1
> > > > relname | n_tup_ins
> > > > -----------+-----------
> > > > xact_test | 2
> > > > (1 row)
> > > >
> > > > ROLLBACK
> > > > DROP TABLE
> > >
> > > I can also reproduce this behavior on HEAD.
> > >
> > > > This is not what I would expect from reading
> > > > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
> > > > where it states
> > > >
> > > > > A transaction can also see its own statistics (not yet flushed out to the
> > > > shared memory statistics) in the views pg_stat_xact_all_tables,
> > > > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
> > > > pg_stat_xact_user_functions. These numbers do not act as stated above;
> > > > instead they update continuously throughout the transaction.
> > > >
> > > > based on this, I would expect that the numbers are updated each time an
> > > > insert happens and that they are isolated per transaction.
> > > >
> > >
> > > This snippet of the doc feels somewhat ambiguous. The current behavior
> > > does not seem fully consistent with wording such as “throughout the
> > > transaction.” It also seems more appropriate for these table views to
> > > reflect stats within their own scopes, rather than cumulative values
> > > spanning txn boundaries.
> > >
> >
> > I’ve looked into this issue and 'd like to propose a patch to address
> > it. Feedback is very welcome.
> >
> > --- Root cause
> >
> > Since the stats subsystem was rewritten (commit 5891c7a8), each
> > backend keeps a hash of pending stats entries that persist across
> > transaction boundaries. Entries that flush successfully are deleted
> > from the backend-local pending list; entries not flushed remain
> > pending with their accumulated counters. Flushing is driven by
> > pgstat_report_stat(), called from the ReadyForQuery path, and subject
> > to a rate limiter.
> >
> > The pg_stat_xact_* views read these pending entries directly via
> > find_tabstat_entry() / find_funcstat_entry(). Both functions return
> > the raw accumulated counters without per-transaction scoping.
> >
> > The bug is deterministic when multiple top-level transactions are
> > processed within a single simple-query message, because there is no
> > ReadyForQuery boundary between the transactions and therefore no
> > opportunity to flush and remove the pending entry.
> >
> > --- Fix
> >
> > The patch introduces a per-entry "transaction baseline" — a lazy
> > snapshot of the counters taken the first time an entry is touched in
> > each new top-level transaction. The accessor functions
> > (find_tabstat_entry, find_funcstat_entry) then subtract the baseline
> > from the current counters, yielding only the current transaction's
> > delta.
> >
> > The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
> > pgstat_ensure_xact_baseline(), is called at every nontransactional
> > counter-increment site (the 7 event-counter macros in pgstat.h and the
> > 4 non-inline counter functions in pgstat_relation.c). After the first
> > call per entry per transaction, the check reduces to a single integer
> > comparison with a predictably not-taken branch.
> >
> > For function stats, a new PgStat_FunctionPending wrapper struct embeds
> > PgStat_FunctionCounts at offset 0 with the baseline fields appended,
> > so the flush callback requires only a trivial cast change.
> >
> > --- Testing
> >
> > The regression test is a TAP test under src/bin/psql/t/ that uses psql
> > -c to send multi-statement strings as single simple-query messages.
> > src/bin/psql/t look like the right existing harness for the primary
> > regression because psql -c sends the whole multi-statement string as
> > one simple-query message via simple_action_list dispatch. By contrast,
> > ordinary pg_regress SQL files are executed by psql in file/stdin mode,
> > which runs through MainLoop() and dispatches top-level statements one
> > at a time via SendQuery(), creating ReadyForQuery boundaries that
> > could mask the bug.
> >
> > The test covers three scenarios: table n_tup_ins and seq_scan
> > counters, and function calls — each verifying isolation across
> > consecutive top-level transactions within a single message.
> >
>
> postgres % cat output.txt
> BEGIN
> CREATE TABLE
> INSERT 0 1
> relname | n_tup_ins
> -----------+-----------
> xact_test | 1
> (1 row)
>
> COMMIT
> BEGIN
> INSERT 0 1
> relname | n_tup_ins
> -----------+-----------
> xact_test | 2
> (1 row)
>
> ROLLBACK
> DROP TABLE
>
> I can also reproduce this behavior in v13. So this issue pre-existed
> the major refactor 5891c7a8.
>
After further thought of v1, I'd like to also propose an alternative
approach. The root cause and testing rationale are the same as
described upthread; what changes is how the baseline is established.
--- Problems with the lazy (v1) approach
The v1 patch snapshots the baseline lazily -- on every
nontransactional counter increment, it checks whether baseline_lxid
matches the current
transaction and snapshots if not. This has two drawbacks:
1) Hot-path overhead
Every pgstat_count_heap_scan(), pgstat_count_buffer_read_block(), etc.
acquires a branch. After the first call per entry per transaction, the
branch is predictably not-taken, but it could still present on some of
the most frequently executed paths in the backend.
2) Maintenance burden
Every new counter added to PgStat_TableCounts requires a corresponding
baseline snapshot at its increment site. Miss one, and that counter
silently leaks across transactions
--- eager baseline sweep
The attached patch records the baseline eagerly at transaction
boundaries instead of lazily at counter-increment sites.
pgstat_set_pending_baselines() iterates the pgStatPending list and
snapshots each entry's current counts into an xact_baseline field via
struct assignment. It is called from AtEOXact_PgStat() (after folding
transactional counts and removing dropped entries) and from
PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
and PREPARE TRANSACTION. The view accessors unconditionally subtract
the baseline. For entries created in the current transaction,
xact_baseline is zero-initialized, so the subtraction is a no-op.
I don’t have a clear preference between the two approaches; both are
presented for review.
--
Best,
Xuneng
Attachments:
[application/octet-stream] v1-0001-Fix-pg_stat_xact_-views-leaking-stats-across-tran.patch (18.2K, 2-v1-0001-Fix-pg_stat_xact_-views-leaking-stats-across-tran.patch)
download | inline diff:
From 9f0eec731a75f52bd8f3e423c1ae22dea69b0a5a Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Mon, 30 Mar 2026 10:47:53 +0800
Subject: [PATCH v1] Fix pg_stat_xact_* views leaking stats across transaction
boundaries
Currently, pgStatPending entries persist across transaction boundaries
within a backend. The pg_stat_xact_* view accessors (find_tabstat_entry,
find_funcstat_entry) return raw accumulated counters from these entries,
so they incorrectly report activity from prior transactions rather than
only the current one.
Fix this by recording a baseline snapshot of each pending entry's
counters at every top-level transaction boundary. A new function,
pgstat_set_pending_baselines(), sweeps the pgStatPending list at
commit, abort, and PREPARE TRANSACTION, copying each entry's current
counts into an xact_baseline field. The view accessors then subtract
the baseline, yielding only the current transaction's delta.
For entries first created in the current transaction, xact_baseline is
zero-initialized, so the subtraction is a no-op.
For function stats, introduce a PgStat_FunctionPending wrapper struct
that pairs the existing PgStat_FunctionCounts with a baseline copy.
find_funcstat_entry() now returns a palloc'd delta, or NULL when the
function was not called in the current transaction.
Add a TAP test under src/bin/psql/t/ that verifies counter isolation
across consecutive transactions within a single simple-query message.
---
src/backend/utils/activity/pgstat.c | 43 +++-
src/backend/utils/activity/pgstat_function.c | 42 +++-
src/backend/utils/activity/pgstat_relation.c | 21 ++
src/backend/utils/activity/pgstat_xact.c | 15 ++
src/bin/psql/meson.build | 1 +
src/bin/psql/t/040_pgstat_xact.pl | 203 +++++++++++++++++++
src/include/pgstat.h | 16 ++
src/include/utils/pgstat_internal.h | 1 +
8 files changed, 332 insertions(+), 10 deletions(-)
create mode 100644 src/bin/psql/t/040_pgstat_xact.pl
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index eb8ccbaa628..0d14e753bea 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -326,7 +326,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Function),
.shared_data_off = offsetof(PgStatShared_Function, stats),
.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
- .pending_size = sizeof(PgStat_FunctionCounts),
+ .pending_size = sizeof(PgStat_FunctionPending),
.flush_pending_cb = pgstat_function_flush_cb,
.reset_timestamp_cb = pgstat_function_reset_timestamp_cb,
@@ -1361,6 +1361,47 @@ pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
dlist_delete(&entry_ref->pending_node);
}
+/*
+ * Set xact baselines for all pending relation and function entries.
+ *
+ * Called at top-level transaction commit, abort, and successful PREPARE
+ * to record the current counter state as the baseline. The next
+ * transaction on this backend will compute xact-scoped deltas by
+ * subtracting this baseline.
+ *
+ * This sweeps all unflushed pending entries, not just those touched in
+ * the current transaction, because nontransactional counters (scans,
+ * buffer hits, etc.) accumulate in the pending entry regardless of
+ * whether a PgStat_TableXactStatus was created.
+ */
+void
+pgstat_set_pending_baselines(void)
+{
+ dlist_iter iter;
+
+ dlist_foreach(iter, &pgStatPending)
+ {
+ PgStat_EntryRef *entry_ref =
+ dlist_container(PgStat_EntryRef, pending_node, iter.cur);
+ PgStat_Kind kind = entry_ref->shared_entry->key.kind;
+
+ if (kind == PGSTAT_KIND_RELATION)
+ {
+ PgStat_TableStatus *tabstat =
+ (PgStat_TableStatus *) entry_ref->pending;
+
+ tabstat->xact_baseline = tabstat->counts;
+ }
+ else if (kind == PGSTAT_KIND_FUNCTION)
+ {
+ PgStat_FunctionPending *fpending =
+ (PgStat_FunctionPending *) entry_ref->pending;
+
+ fpending->xact_baseline = fpending->counts;
+ }
+ }
+}
+
/*
* Flush out pending variable-numbered stats.
*/
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index e6b84283c6c..704cd1d8240 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -119,7 +119,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
}
}
- pending = entry_ref->pending;
+ pending = &((PgStat_FunctionPending *) entry_ref->pending)->counts;
fcu->fs = pending;
@@ -192,10 +192,12 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize)
bool
pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
{
+ PgStat_FunctionPending *fpending;
PgStat_FunctionCounts *localent;
PgStatShared_Function *shfuncent;
- localent = (PgStat_FunctionCounts *) entry_ref->pending;
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+ localent = &fpending->counts;
shfuncent = (PgStatShared_Function *) entry_ref->shared_stats;
/* localent always has non-zero content */
@@ -223,18 +225,40 @@ pgstat_function_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
/*
* find any existing PgStat_FunctionCounts entry for specified function
*
- * If no entry, return NULL, don't create a new one
+ * Returns a palloc'd delta struct showing only the current transaction's
+ * contribution, or NULL if the function was not called in this transaction.
*/
PgStat_FunctionCounts *
find_funcstat_entry(Oid func_id)
{
PgStat_EntryRef *entry_ref;
-
- entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
-
- if (entry_ref)
- return entry_ref->pending;
- return NULL;
+ PgStat_FunctionPending *fpending;
+ PgStat_FunctionCounts *result;
+ PgStat_Counter delta_calls;
+
+ entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION,
+ MyDatabaseId, func_id);
+ if (!entry_ref)
+ return NULL;
+
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+
+ /* Compute delta; return NULL if not called in this transaction */
+ delta_calls = fpending->counts.numcalls - fpending->xact_baseline.numcalls;
+ if (delta_calls == 0)
+ return NULL;
+
+ result = palloc(sizeof(PgStat_FunctionCounts));
+ result->numcalls = delta_calls;
+ INSTR_TIME_SET_ZERO(result->total_time);
+ INSTR_TIME_ACCUM_DIFF(result->total_time,
+ fpending->counts.total_time,
+ fpending->xact_baseline.total_time);
+ INSTR_TIME_SET_ZERO(result->self_time);
+ INSTR_TIME_ACCUM_DIFF(result->self_time,
+ fpending->counts.self_time,
+ fpending->xact_baseline.self_time);
+ return result;
}
/*
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index bc8c43b96aa..4411870ba5e 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -503,6 +503,7 @@ find_tabstat_entry(Oid rel_id)
PgStat_TableXactStatus *trans;
PgStat_TableStatus *tabentry = NULL;
PgStat_TableStatus *tablestatus = NULL;
+ PgStat_TableCounts *tabentry_baseline;
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
if (!entry_ref)
@@ -523,6 +524,26 @@ find_tabstat_entry(Oid rel_id)
*/
tablestatus->trans = NULL;
+ /*
+ * Subtract the xact baseline to show only the current top-level
+ * transaction's activity. The baseline is set at the end of each
+ * top-level transaction (commit/abort/prepare) by
+ * pgstat_set_pending_baselines(). For entries created in the current
+ * transaction, the baseline is zero-initialized, so subtracting it is a
+ * no-op.
+ */
+ tabentry_baseline = &tabentry->xact_baseline;
+ tablestatus->counts.numscans -= tabentry_baseline->numscans;
+ tablestatus->counts.tuples_returned -= tabentry_baseline->tuples_returned;
+ tablestatus->counts.tuples_fetched -= tabentry_baseline->tuples_fetched;
+ tablestatus->counts.tuples_inserted -= tabentry_baseline->tuples_inserted;
+ tablestatus->counts.tuples_updated -= tabentry_baseline->tuples_updated;
+ tablestatus->counts.tuples_deleted -= tabentry_baseline->tuples_deleted;
+ tablestatus->counts.tuples_hot_updated -= tabentry_baseline->tuples_hot_updated;
+ tablestatus->counts.tuples_newpage_updated -= tabentry_baseline->tuples_newpage_updated;
+ tablestatus->counts.blocks_fetched -= tabentry_baseline->blocks_fetched;
+ tablestatus->counts.blocks_hit -= tabentry_baseline->blocks_hit;
+
/*
* Live subtransaction counts are not included yet. This is not a hot
* code path so reconcile tuples_inserted, tuples_updated and
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 5e2d69e6297..6a708beb05b 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -55,6 +55,14 @@ AtEOXact_PgStat(bool isCommit, bool parallel)
}
pgStatXactStack = NULL;
+ /*
+ * Record current counter values as the baseline for the next transaction.
+ * Must be after AtEOXact_PgStat_Relations() (which folds transactional
+ * counts into ->counts) and AtEOXact_PgStat_DroppedStats() (which removes
+ * dropped entries from pgStatPending).
+ */
+ pgstat_set_pending_baselines();
+
/* Make sure any stats snapshot is thrown away */
pgstat_clear_snapshot();
}
@@ -226,6 +234,13 @@ PostPrepare_PgStat(void)
}
pgStatXactStack = NULL;
+ /*
+ * Record baselines, same as at commit/abort. AtEOXact_PgStat() is not
+ * called during PREPARE, so without this, nontransactional counters from
+ * the prepared transaction would leak into later transactions.
+ */
+ pgstat_set_pending_baselines();
+
/* Make sure any stats snapshot is thrown away */
pgstat_clear_snapshot();
}
diff --git a/src/bin/psql/meson.build b/src/bin/psql/meson.build
index 922b2845267..25fbb817f1d 100644
--- a/src/bin/psql/meson.build
+++ b/src/bin/psql/meson.build
@@ -78,6 +78,7 @@ tests += {
't/010_tab_completion.pl',
't/020_cancel.pl',
't/030_pager.pl',
+ 't/040_pgstat_xact.pl',
],
},
}
diff --git a/src/bin/psql/t/040_pgstat_xact.pl b/src/bin/psql/t/040_pgstat_xact.pl
new file mode 100644
index 00000000000..cfb5cec3927
--- /dev/null
+++ b/src/bin/psql/t/040_pgstat_xact.pl
@@ -0,0 +1,203 @@
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test that pg_stat_xact_* views report only current-transaction activity,
+# not accumulated pending stats from prior transactions.
+#
+# Uses "psql -c" so all statements go as a single simple-query message,
+# ensuring no ReadyForQuery boundary (and no pgstat_report_stat() flush)
+# between consecutive top-level transactions.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "track_functions = 'all'");
+$node->append_conf('postgresql.conf', "max_prepared_transactions = 5");
+$node->start;
+
+my $dbname = 'postgres';
+
+# Create test objects.
+$node->safe_psql(
+ $dbname, q{
+ CREATE TABLE test_xact_stats (x int);
+ CREATE FUNCTION test_xact_stats_func() RETURNS void
+ LANGUAGE plpgsql AS $$ BEGIN NULL; END; $$;
+});
+
+# Send SQL via "psql -c" so all statements are one simple-query message.
+sub psql_c_like
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($node, $sql, $expected_stdout, $test_name) = @_;
+
+ my ($stdout, $stderr);
+ my @cmd = (
+ $node->installed_command('psql'),
+ '--no-psqlrc', '--no-align', '--tuples-only', '--quiet',
+ '-d', $node->connstr($dbname),
+ '-c', $sql);
+ my $result = IPC::Run::run \@cmd, '>', \$stdout, '2>', \$stderr;
+
+ is($result, 1, "$test_name: exit code 0");
+ is($stderr, '', "$test_name: no stderr");
+ like($stdout, $expected_stdout, "$test_name: matches");
+
+ return;
+}
+
+##
+## Test 1: Table n_tup_ins must not leak across top-level transactions.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (1);
+ COMMIT;
+ BEGIN;
+ INSERT INTO test_xact_stats VALUES (2);
+ SELECT 'n_tup_ins:' || n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^n_tup_ins:1$/m,
+ "table n_tup_ins shows only current transaction's inserts");
+
+##
+## Test 2: seq_scan counter must not leak across top-level transactions.
+## Uses labeled output to avoid false-positive matches from count(*) rows.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ COMMIT;
+ BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ SELECT 'seq_scan:' || seq_scan FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:1$/m,
+ "table seq_scan shows only current transaction's scans");
+
+##
+## Test 3: Function call stats must not leak across top-level transactions.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT test_xact_stats_func();
+ COMMIT;
+ BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT 'calls:' || calls FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^calls:1$/m,
+ "function calls shows only current transaction's calls");
+
+##
+## Test 4: seq_scan must not leak across a PREPARE TRANSACTION boundary.
+##
+## COMMIT PREPARED cannot appear in a multi-statement psql -c string.
+## In a multi-statement simple-query message, exec_simple_query() calls
+## BeginImplicitTransactionBlock() before each statement. After PREPARE
+## TRANSACTION ends the explicit block, the loop re-enters the implicit-
+## block path, putting the backend into TBLOCK_IMPLICIT_INPROGRESS.
+## PreventInTransactionBlock() rejects COMMIT PREPARED in that state.
+##
+## A separate psql() call works because it sends a single-statement
+## simple-query message: use_implicit_block is false, so the backend
+## stays in TBLOCK_STARTED and PreventInTransactionBlock() passes.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ PREPARE TRANSACTION 'test_prep';
+ BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ SELECT 'seq_scan:' || seq_scan FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:1$/m,
+ "table seq_scan doesn't leak across PREPARE boundary");
+
+# Best-effort cleanup: use psql() instead of safe_psql() so that if the
+# preceding test failed before PREPARE TRANSACTION, the missing prepared
+# xact does not abort the rest of the test file.
+$node->psql($dbname, q{COMMIT PREPARED 'test_prep';});
+
+##
+## Test 5: Untouched relation entry must show zero, not prior-txn counts.
+##
+## If a pending entry exists from a prior transaction but the current
+## transaction has not touched the relation, counts - baseline = 0.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ INSERT INTO test_xact_stats VALUES (99);
+ COMMIT;
+ BEGIN;
+ SELECT 'seq_scan:' || seq_scan || ',n_tup_ins:' || n_tup_ins
+ FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:0,n_tup_ins:0$/m,
+ "untouched relation entry shows zeroes in current transaction");
+
+##
+## Test 6: Untouched function entry must not appear in xact view.
+##
+## pg_stat_xact_user_functions filters with IS NOT NULL on the accessor.
+## When delta_calls == 0, find_funcstat_entry() returns NULL, so the
+## function must not appear.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ COMMIT;
+ BEGIN;
+ SELECT 'found:' || count(*) FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^found:0$/m,
+ "uncalled function not visible in pg_stat_xact_user_functions");
+
+##
+## Test 7: Function call stats must not leak across PREPARE boundary.
+##
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT test_xact_stats_func();
+ PREPARE TRANSACTION 'test_func_prep';
+ BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT 'calls:' || calls FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^calls:1$/m,
+ "function calls don't leak across PREPARE boundary");
+
+# Best-effort cleanup (see comment above).
+$node->psql($dbname, q{COMMIT PREPARED 'test_func_prep';});
+
+# Cleanup
+$node->safe_psql(
+ $dbname, q{
+ DROP TABLE test_xact_stats;
+ DROP FUNCTION test_xact_stats_func();
+});
+
+$node->stop;
+done_testing();
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e3549c3752..660b64e870a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -90,6 +90,21 @@ typedef struct PgStat_FunctionCounts
instr_time self_time;
} PgStat_FunctionCounts;
+/* ----------
+ * PgStat_FunctionPending Backend-local pending state for function stats
+ *
+ * Wraps the accumulated counts with a per-transaction baseline so that
+ * pg_stat_xact_user_functions can report only the current transaction's
+ * contribution. The baseline is set at each top-level transaction boundary
+ * by pgstat_set_pending_baselines().
+ * ----------
+ */
+typedef struct PgStat_FunctionPending
+{
+ PgStat_FunctionCounts counts; /* accumulated counts */
+ PgStat_FunctionCounts xact_baseline; /* snapshot at txn boundary */
+} PgStat_FunctionPending;
+
/*
* Working state needed to accumulate per-function-call timing statistics.
*/
@@ -183,6 +198,7 @@ typedef struct PgStat_TableStatus
bool shared; /* is it a shared catalog? */
struct PgStat_TableXactStatus *trans; /* lowest subxact's counts */
PgStat_TableCounts counts; /* event counts to be sent */
+ PgStat_TableCounts xact_baseline; /* snapshot at txn boundary */
Relation relation; /* rel that is using this entry */
} PgStat_TableStatus;
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 97704421a92..292d9757c02 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -687,6 +687,7 @@ extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
bool *created_entry);
extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind,
Oid dboid, uint64 objid);
+extern void pgstat_set_pending_baselines(void);
extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
extern void pgstat_snapshot_fixed(PgStat_Kind kind);
--
2.51.0
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-05-06 08:03 klemen kobau <[email protected]>
parent: Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: klemen kobau @ 2026-05-06 08:03 UTC (permalink / raw)
To: Xuneng Zhou <[email protected]>; +Cc: [email protected]
Hi Xuneng,
Thanks for the analysis and the patch proposals. As the original reporter,
I wanted to follow up since the thread has been quiet for over a month.
I am new to this process, do I need to do anything to help the process?
Kind regards
Klemen Kobau
On Wed, 6 May 2026 at 09:56, Xuneng Zhou <[email protected]> wrote:
> On Sun, Mar 29, 2026 at 11:17 AM Xuneng Zhou <[email protected]> wrote:
> >
> > On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <[email protected]>
> wrote:
> > >
> > > On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <[email protected]>
> wrote:
> > > >
> > > > Hi klemen,
> > > >
> > > > Thanks for the report.
> > > >
> > > > On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
> > > > <[email protected]> wrote:
> > > > >
> > > > > The following bug has been logged on the website:
> > > > >
> > > > > Bug reference: 19439
> > > > > Logged by: klemen kobau
> > > > > Email address: [email protected]
> > > > > PostgreSQL version: 18.0
> > > > > Operating system: Linux (EndeavorOS)
> > > > > Description:
> > > > >
> > > > > I am running postgres:18.0 in a docker container, the
> configuration is as
> > > > > follows:
> > > > >
> > > > > postgres:
> > > > > image: postgres:18.0
> > > > > command: [
> > > > > "postgres",
> > > > > "-N", "200",
> > > > > "-c", "max_prepared_transactions=100",
> > > > > "-c", "wal_level=logical",
> > > > > "-c", "max_wal_senders=10",
> > > > > "-c", "max_replication_slots=20",
> > > > > "-c", "wal_keep_size=10",
> > > > > "-c", "max_slot_wal_keep_size=1024"
> > > > > ]
> > > > > environment:
> > > > > POSTGRES_USER: postgres
> > > > > POSTGRES_PASSWORD: postgres
> > > > > POSTGRES_DB: postgres
> > > > > TZ: UTC
> > > > > PGTZ: UTC
> > > > > ports:
> > > > > - 5432:5432
> > > > > volumes:
> > > > > - postgres_data:/var/lib/postgresql
> > > > >
> > > > > I use psql version 18.3.
> > > > >
> > > > > I run the following:
> > > > >
> > > > > psql -h localhost -p 5432 -U postgres -d postgres -c "
> > > > > -- Transaction 1: insert 1 row, check stats, commit
> > > > > BEGIN;
> > > > > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY,
> val text);
> > > > > INSERT INTO temp.xact_test (val) VALUES ('a');
> > > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE
> relname =
> > > > > 'xact_test';
> > > > > COMMIT;
> > > > >
> > > > > -- Transaction 2: insert 1 row, check stats
> > > > > BEGIN;
> > > > > INSERT INTO temp.xact_test (val) VALUES ('b');
> > > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE
> relname =
> > > > > 'xact_test';
> > > > > ROLLBACK;
> > > > >
> > > > > -- Cleanup
> > > > > DROP TABLE temp.xact_test;
> > > > > "
> > > > >
> > > > > And the output is
> > > > >
> > > > > BEGIN
> > > > > CREATE TABLE
> > > > > INSERT 0 1
> > > > > relname | n_tup_ins
> > > > > -----------+-----------
> > > > > xact_test | 1
> > > > > (1 row)
> > > > >
> > > > > COMMIT
> > > > > BEGIN
> > > > > INSERT 0 1
> > > > > relname | n_tup_ins
> > > > > -----------+-----------
> > > > > xact_test | 2
> > > > > (1 row)
> > > > >
> > > > > ROLLBACK
> > > > > DROP TABLE
> > > >
> > > > I can also reproduce this behavior on HEAD.
> > > >
> > > > > This is not what I would expect from reading
> > > > >
> https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS
> ,
> > > > > where it states
> > > > >
> > > > > > A transaction can also see its own statistics (not yet flushed
> out to the
> > > > > shared memory statistics) in the views pg_stat_xact_all_tables,
> > > > > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
> > > > > pg_stat_xact_user_functions. These numbers do not act as stated
> above;
> > > > > instead they update continuously throughout the transaction.
> > > > >
> > > > > based on this, I would expect that the numbers are updated each
> time an
> > > > > insert happens and that they are isolated per transaction.
> > > > >
> > > >
> > > > This snippet of the doc feels somewhat ambiguous. The current
> behavior
> > > > does not seem fully consistent with wording such as “throughout the
> > > > transaction.” It also seems more appropriate for these table views to
> > > > reflect stats within their own scopes, rather than cumulative values
> > > > spanning txn boundaries.
> > > >
> > >
> > > I’ve looked into this issue and 'd like to propose a patch to address
> > > it. Feedback is very welcome.
> > >
> > > --- Root cause
> > >
> > > Since the stats subsystem was rewritten (commit 5891c7a8), each
> > > backend keeps a hash of pending stats entries that persist across
> > > transaction boundaries. Entries that flush successfully are deleted
> > > from the backend-local pending list; entries not flushed remain
> > > pending with their accumulated counters. Flushing is driven by
> > > pgstat_report_stat(), called from the ReadyForQuery path, and subject
> > > to a rate limiter.
> > >
> > > The pg_stat_xact_* views read these pending entries directly via
> > > find_tabstat_entry() / find_funcstat_entry(). Both functions return
> > > the raw accumulated counters without per-transaction scoping.
> > >
> > > The bug is deterministic when multiple top-level transactions are
> > > processed within a single simple-query message, because there is no
> > > ReadyForQuery boundary between the transactions and therefore no
> > > opportunity to flush and remove the pending entry.
> > >
> > > --- Fix
> > >
> > > The patch introduces a per-entry "transaction baseline" — a lazy
> > > snapshot of the counters taken the first time an entry is touched in
> > > each new top-level transaction. The accessor functions
> > > (find_tabstat_entry, find_funcstat_entry) then subtract the baseline
> > > from the current counters, yielding only the current transaction's
> > > delta.
> > >
> > > The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
> > > pgstat_ensure_xact_baseline(), is called at every nontransactional
> > > counter-increment site (the 7 event-counter macros in pgstat.h and the
> > > 4 non-inline counter functions in pgstat_relation.c). After the first
> > > call per entry per transaction, the check reduces to a single integer
> > > comparison with a predictably not-taken branch.
> > >
> > > For function stats, a new PgStat_FunctionPending wrapper struct embeds
> > > PgStat_FunctionCounts at offset 0 with the baseline fields appended,
> > > so the flush callback requires only a trivial cast change.
> > >
> > > --- Testing
> > >
> > > The regression test is a TAP test under src/bin/psql/t/ that uses psql
> > > -c to send multi-statement strings as single simple-query messages.
> > > src/bin/psql/t look like the right existing harness for the primary
> > > regression because psql -c sends the whole multi-statement string as
> > > one simple-query message via simple_action_list dispatch. By contrast,
> > > ordinary pg_regress SQL files are executed by psql in file/stdin mode,
> > > which runs through MainLoop() and dispatches top-level statements one
> > > at a time via SendQuery(), creating ReadyForQuery boundaries that
> > > could mask the bug.
> > >
> > > The test covers three scenarios: table n_tup_ins and seq_scan
> > > counters, and function calls — each verifying isolation across
> > > consecutive top-level transactions within a single message.
> > >
> >
> > postgres % cat output.txt
> > BEGIN
> > CREATE TABLE
> > INSERT 0 1
> > relname | n_tup_ins
> > -----------+-----------
> > xact_test | 1
> > (1 row)
> >
> > COMMIT
> > BEGIN
> > INSERT 0 1
> > relname | n_tup_ins
> > -----------+-----------
> > xact_test | 2
> > (1 row)
> >
> > ROLLBACK
> > DROP TABLE
> >
> > I can also reproduce this behavior in v13. So this issue pre-existed
> > the major refactor 5891c7a8.
> >
>
> After further thought of v1, I'd like to also propose an alternative
> approach. The root cause and testing rationale are the same as
> described upthread; what changes is how the baseline is established.
>
> --- Problems with the lazy (v1) approach
> The v1 patch snapshots the baseline lazily -- on every
> nontransactional counter increment, it checks whether baseline_lxid
> matches the current
> transaction and snapshots if not. This has two drawbacks:
>
> 1) Hot-path overhead
> Every pgstat_count_heap_scan(), pgstat_count_buffer_read_block(), etc.
> acquires a branch. After the first call per entry per transaction, the
> branch is predictably not-taken, but it could still present on some of
> the most frequently executed paths in the backend.
>
> 2) Maintenance burden
> Every new counter added to PgStat_TableCounts requires a corresponding
> baseline snapshot at its increment site. Miss one, and that counter
> silently leaks across transactions
>
> --- eager baseline sweep
> The attached patch records the baseline eagerly at transaction
> boundaries instead of lazily at counter-increment sites.
> pgstat_set_pending_baselines() iterates the pgStatPending list and
> snapshots each entry's current counts into an xact_baseline field via
> struct assignment. It is called from AtEOXact_PgStat() (after folding
> transactional counts and removing dropped entries) and from
> PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
> and PREPARE TRANSACTION. The view accessors unconditionally subtract
> the baseline. For entries created in the current transaction,
> xact_baseline is zero-initialized, so the subtraction is a no-op.
>
> I don’t have a clear preference between the two approaches; both are
> presented for review.
>
> --
> Best,
> Xuneng
>
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-05-07 06:35 Xuneng Zhou <[email protected]>
parent: klemen kobau <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Xuneng Zhou @ 2026-05-07 06:35 UTC (permalink / raw)
To: klemen kobau <[email protected]>; +Cc: [email protected]
Hi klemen,
On Wed, May 6, 2026 at 4:03 PM klemen kobau <[email protected]> wrote:
>
> Hi Xuneng,
>
> Thanks for the analysis and the patch proposals. As the original reporter, I wanted to follow up since the thread has been quiet for over a month.
>
> I am new to this process, do I need to do anything to help the process?
>
> Kind regards
>
> Klemen Kobau
>
>
> On Wed, 6 May 2026 at 09:56, Xuneng Zhou <[email protected]> wrote:
>>
>> On Sun, Mar 29, 2026 at 11:17 AM Xuneng Zhou <[email protected]> wrote:
>> >
>> > On Sat, Mar 28, 2026 at 1:47 PM Xuneng Zhou <[email protected]> wrote:
>> > >
>> > > On Fri, Mar 27, 2026 at 10:15 AM Xuneng Zhou <[email protected]> wrote:
>> > > >
>> > > > Hi klemen,
>> > > >
>> > > > Thanks for the report.
>> > > >
>> > > > On Fri, Mar 27, 2026 at 5:36 AM PG Bug reporting form
>> > > > <[email protected]> wrote:
>> > > > >
>> > > > > The following bug has been logged on the website:
>> > > > >
>> > > > > Bug reference: 19439
>> > > > > Logged by: klemen kobau
>> > > > > Email address: [email protected]
>> > > > > PostgreSQL version: 18.0
>> > > > > Operating system: Linux (EndeavorOS)
>> > > > > Description:
>> > > > >
>> > > > > I am running postgres:18.0 in a docker container, the configuration is as
>> > > > > follows:
>> > > > >
>> > > > > postgres:
>> > > > > image: postgres:18.0
>> > > > > command: [
>> > > > > "postgres",
>> > > > > "-N", "200",
>> > > > > "-c", "max_prepared_transactions=100",
>> > > > > "-c", "wal_level=logical",
>> > > > > "-c", "max_wal_senders=10",
>> > > > > "-c", "max_replication_slots=20",
>> > > > > "-c", "wal_keep_size=10",
>> > > > > "-c", "max_slot_wal_keep_size=1024"
>> > > > > ]
>> > > > > environment:
>> > > > > POSTGRES_USER: postgres
>> > > > > POSTGRES_PASSWORD: postgres
>> > > > > POSTGRES_DB: postgres
>> > > > > TZ: UTC
>> > > > > PGTZ: UTC
>> > > > > ports:
>> > > > > - 5432:5432
>> > > > > volumes:
>> > > > > - postgres_data:/var/lib/postgresql
>> > > > >
>> > > > > I use psql version 18.3.
>> > > > >
>> > > > > I run the following:
>> > > > >
>> > > > > psql -h localhost -p 5432 -U postgres -d postgres -c "
>> > > > > -- Transaction 1: insert 1 row, check stats, commit
>> > > > > BEGIN;
>> > > > > CREATE TABLE IF NOT EXISTS temp.xact_test (id serial PRIMARY KEY, val text);
>> > > > > INSERT INTO temp.xact_test (val) VALUES ('a');
>> > > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
>> > > > > 'xact_test';
>> > > > > COMMIT;
>> > > > >
>> > > > > -- Transaction 2: insert 1 row, check stats
>> > > > > BEGIN;
>> > > > > INSERT INTO temp.xact_test (val) VALUES ('b');
>> > > > > SELECT relname, n_tup_ins FROM pg_stat_xact_user_tables WHERE relname =
>> > > > > 'xact_test';
>> > > > > ROLLBACK;
>> > > > >
>> > > > > -- Cleanup
>> > > > > DROP TABLE temp.xact_test;
>> > > > > "
>> > > > >
>> > > > > And the output is
>> > > > >
>> > > > > BEGIN
>> > > > > CREATE TABLE
>> > > > > INSERT 0 1
>> > > > > relname | n_tup_ins
>> > > > > -----------+-----------
>> > > > > xact_test | 1
>> > > > > (1 row)
>> > > > >
>> > > > > COMMIT
>> > > > > BEGIN
>> > > > > INSERT 0 1
>> > > > > relname | n_tup_ins
>> > > > > -----------+-----------
>> > > > > xact_test | 2
>> > > > > (1 row)
>> > > > >
>> > > > > ROLLBACK
>> > > > > DROP TABLE
>> > > >
>> > > > I can also reproduce this behavior on HEAD.
>> > > >
>> > > > > This is not what I would expect from reading
>> > > > > https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS,
>> > > > > where it states
>> > > > >
>> > > > > > A transaction can also see its own statistics (not yet flushed out to the
>> > > > > shared memory statistics) in the views pg_stat_xact_all_tables,
>> > > > > pg_stat_xact_sys_tables, pg_stat_xact_user_tables, and
>> > > > > pg_stat_xact_user_functions. These numbers do not act as stated above;
>> > > > > instead they update continuously throughout the transaction.
>> > > > >
>> > > > > based on this, I would expect that the numbers are updated each time an
>> > > > > insert happens and that they are isolated per transaction.
>> > > > >
>> > > >
>> > > > This snippet of the doc feels somewhat ambiguous. The current behavior
>> > > > does not seem fully consistent with wording such as “throughout the
>> > > > transaction.” It also seems more appropriate for these table views to
>> > > > reflect stats within their own scopes, rather than cumulative values
>> > > > spanning txn boundaries.
>> > > >
>> > >
>> > > I’ve looked into this issue and 'd like to propose a patch to address
>> > > it. Feedback is very welcome.
>> > >
>> > > --- Root cause
>> > >
>> > > Since the stats subsystem was rewritten (commit 5891c7a8), each
>> > > backend keeps a hash of pending stats entries that persist across
>> > > transaction boundaries. Entries that flush successfully are deleted
>> > > from the backend-local pending list; entries not flushed remain
>> > > pending with their accumulated counters. Flushing is driven by
>> > > pgstat_report_stat(), called from the ReadyForQuery path, and subject
>> > > to a rate limiter.
>> > >
>> > > The pg_stat_xact_* views read these pending entries directly via
>> > > find_tabstat_entry() / find_funcstat_entry(). Both functions return
>> > > the raw accumulated counters without per-transaction scoping.
>> > >
>> > > The bug is deterministic when multiple top-level transactions are
>> > > processed within a single simple-query message, because there is no
>> > > ReadyForQuery boundary between the transactions and therefore no
>> > > opportunity to flush and remove the pending entry.
>> > >
>> > > --- Fix
>> > >
>> > > The patch introduces a per-entry "transaction baseline" — a lazy
>> > > snapshot of the counters taken the first time an entry is touched in
>> > > each new top-level transaction. The accessor functions
>> > > (find_tabstat_entry, find_funcstat_entry) then subtract the baseline
>> > > from the current counters, yielding only the current transaction's
>> > > delta.
>> > >
>> > > The baseline is keyed by MyProc->vxid.lxid. A static inline helper,
>> > > pgstat_ensure_xact_baseline(), is called at every nontransactional
>> > > counter-increment site (the 7 event-counter macros in pgstat.h and the
>> > > 4 non-inline counter functions in pgstat_relation.c). After the first
>> > > call per entry per transaction, the check reduces to a single integer
>> > > comparison with a predictably not-taken branch.
>> > >
>> > > For function stats, a new PgStat_FunctionPending wrapper struct embeds
>> > > PgStat_FunctionCounts at offset 0 with the baseline fields appended,
>> > > so the flush callback requires only a trivial cast change.
>> > >
>> > > --- Testing
>> > >
>> > > The regression test is a TAP test under src/bin/psql/t/ that uses psql
>> > > -c to send multi-statement strings as single simple-query messages.
>> > > src/bin/psql/t look like the right existing harness for the primary
>> > > regression because psql -c sends the whole multi-statement string as
>> > > one simple-query message via simple_action_list dispatch. By contrast,
>> > > ordinary pg_regress SQL files are executed by psql in file/stdin mode,
>> > > which runs through MainLoop() and dispatches top-level statements one
>> > > at a time via SendQuery(), creating ReadyForQuery boundaries that
>> > > could mask the bug.
>> > >
>> > > The test covers three scenarios: table n_tup_ins and seq_scan
>> > > counters, and function calls — each verifying isolation across
>> > > consecutive top-level transactions within a single message.
>> > >
>> >
>> > postgres % cat output.txt
>> > BEGIN
>> > CREATE TABLE
>> > INSERT 0 1
>> > relname | n_tup_ins
>> > -----------+-----------
>> > xact_test | 1
>> > (1 row)
>> >
>> > COMMIT
>> > BEGIN
>> > INSERT 0 1
>> > relname | n_tup_ins
>> > -----------+-----------
>> > xact_test | 2
>> > (1 row)
>> >
>> > ROLLBACK
>> > DROP TABLE
>> >
>> > I can also reproduce this behavior in v13. So this issue pre-existed
>> > the major refactor 5891c7a8.
>> >
>>
>> After further thought of v1, I'd like to also propose an alternative
>> approach. The root cause and testing rationale are the same as
>> described upthread; what changes is how the baseline is established.
>>
>> --- Problems with the lazy (v1) approach
>> The v1 patch snapshots the baseline lazily -- on every
>> nontransactional counter increment, it checks whether baseline_lxid
>> matches the current
>> transaction and snapshots if not. This has two drawbacks:
>>
>> 1) Hot-path overhead
>> Every pgstat_count_heap_scan(), pgstat_count_buffer_read_block(), etc.
>> acquires a branch. After the first call per entry per transaction, the
>> branch is predictably not-taken, but it could still present on some of
>> the most frequently executed paths in the backend.
>>
>> 2) Maintenance burden
>> Every new counter added to PgStat_TableCounts requires a corresponding
>> baseline snapshot at its increment site. Miss one, and that counter
>> silently leaks across transactions
>>
>> --- eager baseline sweep
>> The attached patch records the baseline eagerly at transaction
>> boundaries instead of lazily at counter-increment sites.
>> pgstat_set_pending_baselines() iterates the pgStatPending list and
>> snapshots each entry's current counts into an xact_baseline field via
>> struct assignment. It is called from AtEOXact_PgStat() (after folding
>> transactional counts and removing dropped entries) and from
>> PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
>> and PREPARE TRANSACTION. The view accessors unconditionally subtract
>> the baseline. For entries created in the current transaction,
>> xact_baseline is zero-initialized, so the subtraction is a no-op.
>>
>> I don’t have a clear preference between the two approaches; both are
>> presented for review.
>>
It would be useful to verify the fix by manually applying the patch
and building the instance. Additionally, a few issues surfaced after
looking at it again, which I will update later.
--
Best,
Xuneng
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
@ 2026-05-13 03:30 Xuneng Zhou <[email protected]>
parent: Xuneng Zhou <[email protected]>
0 siblings, 0 replies; 8+ messages in thread
From: Xuneng Zhou @ 2026-05-13 03:30 UTC (permalink / raw)
To: klemen kobau <[email protected]>; +Cc: [email protected]
> >> --- eager baseline sweep
> >> The attached patch records the baseline eagerly at transaction
> >> boundaries instead of lazily at counter-increment sites.
> >> pgstat_set_pending_baselines() iterates the pgStatPending list and
> >> snapshots each entry's current counts into an xact_baseline field via
> >> struct assignment. It is called from AtEOXact_PgStat() (after folding
> >> transactional counts and removing dropped entries) and from
> >> PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
> >> and PREPARE TRANSACTION. The view accessors unconditionally subtract
> >> the baseline. For entries created in the current transaction,
> >> xact_baseline is zero-initialized, so the subtraction is a no-op.
> >>
> >> I don’t have a clear preference between the two approaches; both are
> >> presented for review.
> >>
>
> It would be useful to verify the fix by manually applying the patch
> and building the instance. Additionally, a few issues surfaced after
> looking at it again, which I will update later.
>
Here is the updated version using eager baseline refresh, i.e. sweeping all
backend-local pending pgstat entries at each top-level transaction boundary.
I tested the eager-baseline approach at both micro and macro levels. The
results show the same cost shape in both cases: the patch cost scales with the
number of backend-local pending pgstat entries that must be swept at each
top-level transaction boundary.
The microbenchmark isolates the transaction-boundary cost. It first creates a
controlled number of pending pgstat entries in one backend, then times 10000
tiny BEGIN/COMMIT boundaries in one simple-query message:
BEGIN; COMMIT; BEGIN; COMMIT; ...
The results below use matching -O0 --enable-debug --enable-cassert builds for
both installs.
pending entries unpatched us/xact patched us/xact patch delta
---------------------------------------------------------------------------
0 713.526 703.357 -10.169
100 740.954 754.298 +13.344
1000 1183.302 / 1177.393 1211.533 / 1213.089 about +32 us
5000 2978.008 / 2967.674 3236.365 / 3081.945
about +186 us median
Machine: Mac mini, M4 Pro, 48GB mem
This is intentionally hostile to eager sweeping: one backend accumulates many
pending entries, then repeatedly crosses top-level transaction boundaries while
doing almost no useful work inside the transactions. The added cost is small at
100 pending entries, where noise matters, but becomes clear at 1000 and 5000
entries. In this debug/cassert build, the patch adds roughly 0.03-0.04 us per
pending entry per transaction in the 1000-5000 entry range. Absolute numbers
are inflated by the build profile, but the linear shape is the relevant signal.
The macro benchmark shows when that same boundary cost is visible in a more
query-shaped workload:
workload base tps patched tps change
------------------------------------------------------------------
5000 tables, 1 row/table 11216 6231 -44%
1000 tables, 1000 rows/table 7683 7113 -7%
100 tables, 10000 rows/table 2152 2162 ~0%
The latency breakdown explains the TPS pattern. For the 5000-table/1-row case:
base: avg_lat=0.089 ms, select_avg=0.051 ms
patched: avg_lat=0.160 ms, select_avg=0.062 ms
Machine: intel xeon server, 40 cores, 128GB mem
The SELECT itself barely changes. Most of the regression appears outside the
SELECT, where the patch does the baseline sweep at transaction end. This is
the worst case for the eager design: the transaction does very little real table
work, but the backend has thousands of pending relation stats entries.
As the number of pending entries drops, or as the query does more real scan
work, the same fixed boundary cost is diluted. With 1000 tables and 1000 rows
per table, the regression falls to about 7%. With 100 tables and 10000 rows per
table, the scan dominates and the sweep over about 100 pending entries is lost
in noise.
Taken together, the benchmarks confirm the expected implementation cost model:
eager baseline refresh cost ~= O(number of pending pgstat entries per backend)
Row count does not directly drive the cost; it only hides or exposes the fixed
transaction-boundary work.
These results suggest that the eager-sweeping approach has an unfavorable
cost model for long-lived sessions that accumulate many pending stats entries
and then execute small transactions. A lazy baseline appraoch, where
each pending
entry records the current transaction generation only when that entry is first
touched, should avoid the transaction-boundary sweep and make the cost scale
with the transaction's actual working set instead. However, it still
suffers from the
potential overhead of additional comparisons on hot paths, as well as increased
maintenance pain.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
Attachments:
[application/octet-stream] v2-0001-Fix-pg_stat_xact_-views-leaking-across-xact-bound.patch (17.7K, 2-v2-0001-Fix-pg_stat_xact_-views-leaking-across-xact-bound.patch)
download | inline diff:
From 8fca00cec38f2459baea3f11a06df79b94572186 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 12 May 2026 11:55:44 +0800
Subject: [PATCH v2] Fix pg_stat_xact_* views leaking across xact boundaries
pgStatPending entries can persist across transaction boundaries within a
backend. The pg_stat_xact_* view accessors read raw accumulated
counters from these entries, so they can report activity from prior
transactions instead of just the current one.
Fix this by recording a baseline snapshot of pending relation and
function counters at each top-level transaction boundary.
pgstat_set_pending_baselines() walks pgStatPending at commit, abort,
and PREPARE TRANSACTION. View accessors subtract the saved baseline
to produce the current transaction's delta.
Entries first created in the current transaction keep a zero baseline,
so their delta is unchanged.
For function stats, add PgStat_FunctionPending to pair
PgStat_FunctionCounts with a baseline copy. find_funcstat_entry() now
returns a palloc'd delta, or NULL when the function was not called in
the current transaction. Add the new typedef to pgindent's typedef
list.
Add a test_misc TAP test covering counter isolation across consecutive
transactions in a single simple-query message.
---
src/backend/utils/activity/pgstat.c | 42 +++-
src/backend/utils/activity/pgstat_function.c | 41 +++-
src/backend/utils/activity/pgstat_relation.c | 20 ++
src/backend/utils/activity/pgstat_xact.c | 15 ++
src/include/pgstat.h | 16 ++
src/include/utils/pgstat_internal.h | 1 +
src/test/modules/test_misc/meson.build | 1 +
.../modules/test_misc/t/013_pgstat_xact.pl | 211 ++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
9 files changed, 338 insertions(+), 10 deletions(-)
create mode 100644 src/test/modules/test_misc/t/013_pgstat_xact.pl
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b67da88c7dc..a414a488320 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -326,7 +326,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Function),
.shared_data_off = offsetof(PgStatShared_Function, stats),
.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
- .pending_size = sizeof(PgStat_FunctionCounts),
+ .pending_size = sizeof(PgStat_FunctionPending),
.flush_pending_cb = pgstat_function_flush_cb,
.reset_timestamp_cb = pgstat_function_reset_timestamp_cb,
@@ -1357,6 +1357,46 @@ pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
return entry_ref;
}
+/*
+ * Set xact baselines for all pending relation and function entries.
+ *
+ * Called at top-level transaction commit, abort, and successful PREPARE to
+ * record the current counter state as the baseline. The next transaction on
+ * this backend will compute xact-scoped deltas by subtracting this baseline.
+ *
+ * This sweeps all unflushed pending entries, not just those touched in the
+ * current transaction, because nontransactional counters (scans, buffer hits,
+ * etc.) accumulate in the pending entry regardless of whether a
+ * PgStat_TableXactStatus was created.
+ */
+void
+pgstat_set_pending_baselines(void)
+{
+ dlist_iter iter;
+
+ dlist_foreach(iter, &pgStatPending)
+ {
+ PgStat_EntryRef *entry_ref =
+ dlist_container(PgStat_EntryRef, pending_node, iter.cur);
+ PgStat_Kind kind = entry_ref->shared_entry->key.kind;
+
+ if (kind == PGSTAT_KIND_RELATION)
+ {
+ PgStat_TableStatus *tabstat =
+ (PgStat_TableStatus *) entry_ref->pending;
+
+ tabstat->xact_baseline = tabstat->counts;
+ }
+ else if (kind == PGSTAT_KIND_FUNCTION)
+ {
+ PgStat_FunctionPending *fpending =
+ (PgStat_FunctionPending *) entry_ref->pending;
+
+ fpending->xact_baseline = fpending->counts;
+ }
+ }
+}
+
void
pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
{
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d47d05e3d92..04bbe2e7cb6 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -119,7 +119,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
}
}
- pending = entry_ref->pending;
+ pending = &((PgStat_FunctionPending *) entry_ref->pending)->counts;
fcu->fs = pending;
@@ -192,10 +192,12 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize)
bool
pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
{
+ PgStat_FunctionPending *fpending;
PgStat_FunctionCounts *localent;
PgStatShared_Function *shfuncent;
- localent = (PgStat_FunctionCounts *) entry_ref->pending;
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+ localent = &fpending->counts;
shfuncent = (PgStatShared_Function *) entry_ref->shared_stats;
/* localent always has non-zero content */
@@ -223,18 +225,39 @@ pgstat_function_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
/*
* find any existing PgStat_FunctionCounts entry for specified function
*
- * If no entry, return NULL, don't create a new one
+ * Returns a palloc'd delta struct showing only the current transaction's
+ * contribution, or NULL if the function was not called in this transaction.
*/
PgStat_FunctionCounts *
find_funcstat_entry(Oid func_id)
{
PgStat_EntryRef *entry_ref;
-
- entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
-
- if (entry_ref)
- return entry_ref->pending;
- return NULL;
+ PgStat_FunctionPending *fpending;
+ PgStat_FunctionCounts *result;
+ PgStat_Counter delta_calls;
+
+ entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION,
+ MyDatabaseId, func_id);
+ if (!entry_ref)
+ return NULL;
+
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+ delta_calls = fpending->counts.numcalls - fpending->xact_baseline.numcalls;
+ if (delta_calls == 0)
+ return NULL;
+
+ result = palloc(sizeof(PgStat_FunctionCounts));
+ result->numcalls = delta_calls;
+ INSTR_TIME_SET_ZERO(result->total_time);
+ INSTR_TIME_ACCUM_DIFF(result->total_time,
+ fpending->counts.total_time,
+ fpending->xact_baseline.total_time);
+ INSTR_TIME_SET_ZERO(result->self_time);
+ INSTR_TIME_ACCUM_DIFF(result->self_time,
+ fpending->counts.self_time,
+ fpending->xact_baseline.self_time);
+
+ return result;
}
/*
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index b2ca28f83ba..0de69ef8b1d 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -505,6 +505,7 @@ find_tabstat_entry(Oid rel_id)
PgStat_TableXactStatus *trans;
PgStat_TableStatus *tabentry = NULL;
PgStat_TableStatus *tablestatus = NULL;
+ PgStat_TableCounts *baseline;
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
if (!entry_ref)
@@ -525,6 +526,25 @@ find_tabstat_entry(Oid rel_id)
*/
tablestatus->trans = NULL;
+ /*
+ * Subtract the xact baseline to show only the current top-level
+ * transaction's activity. Only subtract counters exposed through
+ * pg_stat_xact_* views. The remaining PgStat_TableCounts fields are
+ * cumulative relation-maintenance state, not xact-view output.
+ */
+ baseline = &tabentry->xact_baseline;
+ tablestatus->counts.numscans -= baseline->numscans;
+ tablestatus->counts.tuples_returned -= baseline->tuples_returned;
+ tablestatus->counts.tuples_fetched -= baseline->tuples_fetched;
+ tablestatus->counts.tuples_inserted -= baseline->tuples_inserted;
+ tablestatus->counts.tuples_updated -= baseline->tuples_updated;
+ tablestatus->counts.tuples_deleted -= baseline->tuples_deleted;
+ tablestatus->counts.tuples_hot_updated -= baseline->tuples_hot_updated;
+ tablestatus->counts.tuples_newpage_updated -=
+ baseline->tuples_newpage_updated;
+ tablestatus->counts.blocks_fetched -= baseline->blocks_fetched;
+ tablestatus->counts.blocks_hit -= baseline->blocks_hit;
+
/*
* Live subtransaction counts are not included yet. This is not a hot
* code path so reconcile tuples_inserted, tuples_updated and
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 5e2d69e6297..ad8ae891113 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -55,6 +55,14 @@ AtEOXact_PgStat(bool isCommit, bool parallel)
}
pgStatXactStack = NULL;
+ /*
+ * Record current counter values as the baseline for the next transaction.
+ * This must be after AtEOXact_PgStat_Relations(), which folds
+ * transactional counts into ->counts, and AtEOXact_PgStat_DroppedStats(),
+ * which removes dropped entries from pgStatPending.
+ */
+ pgstat_set_pending_baselines();
+
/* Make sure any stats snapshot is thrown away */
pgstat_clear_snapshot();
}
@@ -226,6 +234,13 @@ PostPrepare_PgStat(void)
}
pgStatXactStack = NULL;
+ /*
+ * Record baselines, same as at commit/abort. AtEOXact_PgStat() is not
+ * called during PREPARE, so without this, counters from the prepared
+ * transaction would leak into later transactions.
+ */
+ pgstat_set_pending_baselines();
+
/* Make sure any stats snapshot is thrown away */
pgstat_clear_snapshot();
}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index dfa2e837638..dec7fcec8e0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -90,6 +90,21 @@ typedef struct PgStat_FunctionCounts
instr_time self_time;
} PgStat_FunctionCounts;
+/* ----------
+ * PgStat_FunctionPending Backend-local pending state for function stats
+ *
+ * Wraps the accumulated counts with a per-transaction baseline so that
+ * pg_stat_xact_user_functions can report only the current transaction's
+ * contribution. The baseline is set at each top-level transaction boundary
+ * by pgstat_set_pending_baselines().
+ * ----------
+ */
+typedef struct PgStat_FunctionPending
+{
+ PgStat_FunctionCounts counts; /* accumulated counts */
+ PgStat_FunctionCounts xact_baseline; /* snapshot at xact boundary */
+} PgStat_FunctionPending;
+
/*
* Working state needed to accumulate per-function-call timing statistics.
*/
@@ -183,6 +198,7 @@ typedef struct PgStat_TableStatus
bool shared; /* is it a shared catalog? */
struct PgStat_TableXactStatus *trans; /* lowest subxact's counts */
PgStat_TableCounts counts; /* event counts to be sent */
+ PgStat_TableCounts xact_baseline; /* snapshot at xact boundary */
Relation relation; /* rel that is using this entry */
} PgStat_TableStatus;
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index fe463faaf63..f8bb8a4f230 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -684,6 +684,7 @@ extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
bool *created_entry);
extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind,
Oid dboid, uint64 objid);
+extern void pgstat_set_pending_baselines(void);
extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid,
bool *may_free);
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 356d8454b39..3bc56034fc0 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -21,6 +21,7 @@ tests += {
't/010_index_concurrently_upsert.pl',
't/011_lock_stats.pl',
't/012_ddlutils.pl',
+ 't/013_pgstat_xact.pl',
],
# The injection points are cluster-wide, so disable installcheck
'runningcheck': false,
diff --git a/src/test/modules/test_misc/t/013_pgstat_xact.pl b/src/test/modules/test_misc/t/013_pgstat_xact.pl
new file mode 100644
index 00000000000..03ad8ba2101
--- /dev/null
+++ b/src/test/modules/test_misc/t/013_pgstat_xact.pl
@@ -0,0 +1,211 @@
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test that pg_stat_xact_* views report only current-transaction activity,
+# not accumulated pending stats from prior transactions.
+#
+# Use psql -c so all statements in a command are sent as one simple-query
+# message. That preserves pgStatPending entries across consecutive top-level
+# transactions by avoiding a ReadyForQuery boundary, where PostgresMain() may
+# call pgstat_report_stat().
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "track_functions = 'all'");
+$node->append_conf('postgresql.conf', "max_prepared_transactions = 5");
+$node->start;
+
+my $dbname = 'postgres';
+
+$node->safe_psql(
+ $dbname, q{
+ CREATE TABLE test_xact_stats (x int);
+ CREATE FUNCTION test_xact_stats_func() RETURNS void
+ LANGUAGE plpgsql AS $$ BEGIN NULL; END; $$;
+});
+
+sub psql_c_sequence_like
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($node, $commands, $expected_stdout, $test_name) = @_;
+ my ($stdout, $stderr);
+ my @cmd = (
+ $node->installed_command('psql'),
+ '--no-psqlrc',
+ '--no-align',
+ '--tuples-only',
+ '--quiet',
+ '--set' => 'ON_ERROR_STOP=1',
+ '-d' => $node->connstr($dbname));
+
+ foreach my $command (@{$commands})
+ {
+ push @cmd, '-c' => $command;
+ }
+
+ my $result = IPC::Run::run \@cmd, '>', \$stdout, '2>', \$stderr;
+
+ is($result, 1, "$test_name: exit code 0");
+ is($stderr, '', "$test_name: no stderr");
+ like($stdout, $expected_stdout, "$test_name: matches");
+}
+
+sub psql_c_like
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($node, $sql, $expected_stdout, $test_name) = @_;
+
+ psql_c_sequence_like($node, [$sql], $expected_stdout, $test_name);
+}
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (1);
+ COMMIT;
+ BEGIN;
+ INSERT INTO test_xact_stats VALUES (2);
+ SELECT 'n_tup_ins:' || n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^n_tup_ins:1$/m,
+ "table n_tup_ins shows only current transaction's inserts");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ COMMIT;
+ BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ SELECT 'seq_scan:' || seq_scan FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:1$/m,
+ "table seq_scan shows only current transaction's scans");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT test_xact_stats_func();
+ COMMIT;
+ BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT 'calls:' || calls FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^calls:1$/m,
+ "function calls shows only current transaction's calls");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ PREPARE TRANSACTION 'test_prep';
+ BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ SELECT 'seq_scan:' || seq_scan FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:1$/m,
+ "table seq_scan does not leak across PREPARE boundary");
+
+$node->safe_psql($dbname, q{COMMIT PREPARED 'test_prep';});
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ INSERT INTO test_xact_stats VALUES (99);
+ COMMIT;
+ BEGIN;
+ SELECT 'seq_scan:' || seq_scan || ',n_tup_ins:' || n_tup_ins
+ FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:0,n_tup_ins:0$/m,
+ "untouched relation entry shows zeroes in current transaction");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ COMMIT;
+ BEGIN;
+ SELECT 'found:' || count(*) FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^found:0$/m,
+ "uncalled function is not visible in pg_stat_xact_user_functions");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT test_xact_stats_func();
+ PREPARE TRANSACTION 'test_func_prep';
+ BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT 'calls:' || calls FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^calls:1$/m,
+ "function calls do not leak across PREPARE boundary");
+
+$node->safe_psql($dbname, q{COMMIT PREPARED 'test_func_prep';});
+
+psql_c_sequence_like(
+ $node,
+ [
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (300);
+ PREPARE TRANSACTION 'test_commit_prep';
+ SELECT pg_stat_force_next_flush();},
+ q{COMMIT PREPARED 'test_commit_prep';},
+ q{BEGIN;
+ SELECT 'n_tup_ins:' || n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;}
+ ],
+ qr/^n_tup_ins:0$/m,
+ "table n_tup_ins does not leak after COMMIT PREPARED");
+
+psql_c_sequence_like(
+ $node,
+ [
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (400);
+ PREPARE TRANSACTION 'test_rollback_prep';
+ SELECT pg_stat_force_next_flush();},
+ q{ROLLBACK PREPARED 'test_rollback_prep';},
+ q{BEGIN;
+ SELECT 'n_tup_ins:' || n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;}
+ ],
+ qr/^n_tup_ins:0$/m,
+ "table n_tup_ins does not leak after ROLLBACK PREPARED");
+
+for my $gid (
+ qw(test_prep test_func_prep test_commit_prep test_rollback_prep))
+{
+ eval { $node->safe_psql($dbname, "ROLLBACK PREPARED '$gid'"); };
+}
+
+$node->safe_psql(
+ $dbname, q{
+ DROP TABLE test_xact_stats;
+ DROP FUNCTION test_xact_stats_func();
+});
+
+$node->stop;
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06532bf7152..eba58d860cf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2330,6 +2330,7 @@ PgStat_EntryRefHashEntry
PgStat_FetchConsistency
PgStat_FunctionCallUsage
PgStat_FunctionCounts
+PgStat_FunctionPending
PgStat_HashKey
PgStat_IO
PgStat_KindInfo
--
2.51.0
[text/x-sh] pgstat_xact_macro_bench.sh (7.4K, 3-pgstat_xact_macro_bench.sh)
download | inline:
#!/usr/bin/env bash
set -euo pipefail
usage()
{
cat <<'USAGE'
Usage:
pgstat_xact_macro_bench.sh [options]
Options:
--pg-bin DIR Run one PostgreSQL install.
--patched-pg-bin DIR Run and label this install as patched.
--unpatched-pg-bin DIR Run and label this install as unpatched.
--tables 5000 Number of tenant tables to create.
--rows-per-table 1 Rows inserted into each tenant table.
--clients 1 pgbench clients. Use 1 for one long-lived backend.
--jobs 1 pgbench worker threads.
--time 60 Measured pgbench duration in seconds.
--warmup 10 Warmup duration in seconds.
--port-base 65440 First temporary server port.
--keep Keep temporary cluster directories.
Environment:
PG_BIN Same as --pg-bin.
Examples:
./pgstat_xact_macro_bench.sh --pg-bin ./inst/bin
./pgstat_xact_macro_bench.sh \
--unpatched-pg-bin /tmp/pg-unpatched/inst/bin \
--patched-pg-bin /tmp/pg-patched/inst/bin
When both --patched-pg-bin and --unpatched-pg-bin are provided, runs are
executed in the same order as the options appear on the command line. Use
that to check for order bias.
This benchmark models a realistic bad case for transaction-boundary pgstat
work: many tenant tables, one long-lived connection, many short transactions,
and pending stats entries retained in the backend while the workload keeps
running. The pgbench script is:
\set id random(1, N)
BEGIN;
SELECT count(*) FROM tenant_:id;
COMMIT;
USAGE
}
pg_bin="${PG_BIN:-}"
patched_pg_bin=""
unpatched_pg_bin=""
run_labels=()
run_bins=()
tables=5000
rows_per_table=1
clients=1
jobs=1
duration=60
warmup=10
port_base=65440
keep=0
while [ "$#" -gt 0 ]; do
case "$1" in
--pg-bin)
pg_bin="$2"
shift 2
;;
--patched-pg-bin)
patched_pg_bin="$2"
run_labels+=("patched")
run_bins+=("$2")
shift 2
;;
--unpatched-pg-bin)
unpatched_pg_bin="$2"
run_labels+=("unpatched")
run_bins+=("$2")
shift 2
;;
--tables)
tables="$2"
shift 2
;;
--rows-per-table)
rows_per_table="$2"
shift 2
;;
--clients)
clients="$2"
shift 2
;;
--jobs)
jobs="$2"
shift 2
;;
--time)
duration="$2"
shift 2
;;
--warmup)
warmup="$2"
shift 2
;;
--port-base)
port_base="$2"
shift 2
;;
--keep)
keep=1
shift
;;
--help|-h)
usage
exit 0
;;
*)
echo "unknown option: $1" >&2
usage >&2
exit 2
;;
esac
done
if [ -n "$pg_bin" ]; then
if [ -n "$patched_pg_bin" ] || [ -n "$unpatched_pg_bin" ]; then
echo "--pg-bin cannot be combined with --patched-pg-bin or --unpatched-pg-bin" >&2
exit 1
fi
run_labels=("build")
run_bins=("$pg_bin")
fi
if [ "${#run_bins[@]}" -eq 0 ]; then
echo "set PG_BIN or pass --pg-bin/--patched-pg-bin/--unpatched-pg-bin" >&2
exit 1
fi
for bin in "${run_bins[@]}"; do
for prog in postgres initdb pg_ctl psql pgbench; do
if [ ! -x "$bin/$prog" ]; then
echo "missing executable: $bin/$prog" >&2
exit 1
fi
done
done
pg_tool()
{
local bin="$1"
local prog="$2"
local libdir
shift 2
libdir="$(cd "$bin/../lib" 2>/dev/null && pwd || true)"
if [ -z "$libdir" ]; then
"$bin/$prog" "$@"
return
fi
case "$(uname -s)" in
Darwin)
DYLD_LIBRARY_PATH="$libdir${DYLD_LIBRARY_PATH:+:$DYLD_LIBRARY_PATH}" \
"$bin/$prog" "$@"
;;
*)
LD_LIBRARY_PATH="$libdir${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
"$bin/$prog" "$@"
;;
esac
}
tmpdirs=()
datadirs=()
pg_bins=()
created_tmpdir=""
created_script_file=""
created_conninfo=""
cleanup()
{
set +e
local i
for i in "${!datadirs[@]}"; do
if [ -d "${datadirs[$i]}" ]; then
pg_tool "${pg_bins[$i]}" pg_ctl -D "${datadirs[$i]}" -m fast -w stop \
>/dev/null 2>&1
fi
done
if [ "$keep" -eq 0 ]; then
for i in "${!tmpdirs[@]}"; do
rm -rf "${tmpdirs[$i]}"
done
else
for i in "${!tmpdirs[@]}"; do
echo "kept temporary directory: ${tmpdirs[$i]}"
done
fi
}
trap cleanup EXIT
trap 'trap - EXIT; cleanup; exit 130' INT
trap 'trap - EXIT; cleanup; exit 143' TERM
create_cluster()
{
local label="$1"
local bin="$2"
local port="$3"
local tmpdir datadir logfile setup_sql script_file conninfo
tmpdir="$(mktemp -d "${TMPDIR:-/tmp}/pgstat-xact-macro-${label}.XXXXXX")"
datadir="$tmpdir/data"
logfile="$tmpdir/postgres.log"
setup_sql="$tmpdir/setup.sql"
script_file="$tmpdir/tenant.sql"
tmpdirs+=("$tmpdir")
datadirs+=("$datadir")
pg_bins+=("$bin")
pg_tool "$bin" initdb -D "$datadir" --auth trust --no-sync --no-instructions \
--lc-messages=C >/dev/null
cat >>"$datadir/postgresql.conf" <<EOF
listen_addresses = ''
port = $port
unix_socket_directories = '$tmpdir'
fsync = off
synchronous_commit = off
autovacuum = off
track_functions = 'none'
max_prepared_transactions = 0
EOF
pg_tool "$bin" pg_ctl -D "$datadir" -l "$logfile" -w start >/dev/null
{
echo "SET client_min_messages = warning;"
echo "DROP SCHEMA IF EXISTS pgstat_xact_macro CASCADE;"
echo "CREATE SCHEMA pgstat_xact_macro;"
echo "SET search_path = pgstat_xact_macro;"
for i in $(seq 1 "$tables"); do
printf "CREATE TABLE tenant_%d (a int);\n" "$i"
printf "INSERT INTO tenant_%d SELECT generate_series(1, %d);\n" \
"$i" "$rows_per_table"
done
} >"$setup_sql"
pg_tool "$bin" psql -X -q -v ON_ERROR_STOP=1 \
-h "$tmpdir" -p "$port" -d postgres -f "$setup_sql" >/dev/null
cat >"$script_file" <<EOF
\\set id random(1, $tables)
BEGIN;
SELECT count(*) FROM pgstat_xact_macro.tenant_:id;
COMMIT;
EOF
conninfo="host=$tmpdir port=$port dbname=postgres"
created_tmpdir="$tmpdir"
created_script_file="$script_file"
created_conninfo="$conninfo"
}
run_pgbench()
{
local label="$1"
local bin="$2"
local conninfo="$3"
local script_file="$4"
local seconds="$5"
local out_file="$6"
pg_tool "$bin" pgbench -n -r -M simple -c "$clients" -j "$jobs" -T "$seconds" \
-f "$script_file" "$conninfo" >"$out_file"
}
extract_metric()
{
local file="$1"
local pattern="$2"
awk -v pat="$pattern" '
$0 ~ pat {
for (i = 1; i <= NF; i++)
{
if ($i ~ /^[0-9]+([.][0-9]+)?$/)
{
print $i;
exit;
}
}
}
' "$file"
}
printf "tables=%s rows_per_table=%s clients=%s jobs=%s warmup=%s time=%s\n" \
"$tables" "$rows_per_table" "$clients" "$jobs" "$warmup" "$duration"
printf "%-12s %12s %14s %14s\n" \
"label" "tps" "avg_lat_ms" "select_avg_ms"
for idx in "${!run_bins[@]}"; do
label="${run_labels[$idx]}"
bin="${run_bins[$idx]}"
port=$((port_base + idx))
create_cluster "$label" "$bin" "$port"
tmpdir="$created_tmpdir"
script_file="$created_script_file"
conninfo="$created_conninfo"
warmup_out="$tmpdir/warmup.out"
run_out="$tmpdir/pgbench.out"
if [ "$warmup" -gt 0 ]; then
run_pgbench "$label" "$bin" "$conninfo" "$script_file" "$warmup" \
"$warmup_out"
fi
run_pgbench "$label" "$bin" "$conninfo" "$script_file" "$duration" \
"$run_out"
tps="$(extract_metric "$run_out" "^tps =")"
avg_lat="$(extract_metric "$run_out" "^latency average =")"
select_avg="$(awk '
/SELECT count\(\*\) FROM pgstat_xact_macro[.]tenant_:id/ {
print $1;
exit;
}
' "$run_out")"
if [ -z "$tps" ]; then
tps="?"
fi
if [ -z "$avg_lat" ]; then
avg_lat="?"
fi
if [ -z "$select_avg" ]; then
select_avg="?"
fi
printf "%-12s %12s %14s %14s\n" \
"$label" "$tps" "$avg_lat" "$select_avg"
if [ "$keep" -eq 1 ]; then
echo "temporary directory for $label: $tmpdir"
echo "pgbench output for $label: $run_out"
fi
done
[text/x-sh] pgstat_xact_micro_bench.sh (6.5K, 4-pgstat_xact_micro_bench.sh)
download | inline:
#!/usr/bin/env bash
set -euo pipefail
usage()
{
cat <<'USAGE'
Usage:
pgstat_xact_baseline_bench.sh [options]
Options:
--cases "0 10 100 1000" Pending-entry counts to test.
--m 2000 Number of tiny BEGIN/COMMIT transactions.
--keep Keep the temporary cluster directory.
--pg-bin DIR Directory containing postgres/initdb/pg_ctl/psql/pg_config.
--port PORT Port for the temporary server.
Environment:
PG_BIN Same as --pg-bin.
CC C compiler for the tiny libpq PQexec helper.
The benchmark starts a temporary PostgreSQL cluster, creates the maximum
number of test tables requested, then sends each measured workload as one
simple-query message through libpq. That is intentional: psql -f would split
the workload into multiple protocol messages and would not pressure the
pgStatPending-across-transaction-boundaries path in the same way.
USAGE
}
cases="0 10 100 1000"
m=2000
keep=0
pg_bin="${PG_BIN:-}"
port=65432
while [ "$#" -gt 0 ]; do
case "$1" in
--cases)
cases="$2"
shift 2
;;
--m)
m="$2"
shift 2
;;
--keep)
keep=1
shift
;;
--pg-bin)
pg_bin="$2"
shift 2
;;
--port)
port="$2"
shift 2
;;
--help|-h)
usage
exit 0
;;
*)
echo "unknown option: $1" >&2
usage >&2
exit 2
;;
esac
done
if [ -z "$pg_bin" ]; then
echo "set PG_BIN or pass --pg-bin" >&2
exit 1
fi
for prog in postgres initdb pg_ctl psql pg_config; do
if [ ! -x "$pg_bin/$prog" ]; then
echo "missing executable: $pg_bin/$prog" >&2
exit 1
fi
done
pg_tool()
{
local prog="$1"
local libdir
shift
libdir="$(cd "$pg_bin/../lib" 2>/dev/null && pwd || true)"
if [ -z "$libdir" ]; then
"$pg_bin/$prog" "$@"
return
fi
case "$(uname -s)" in
Darwin)
DYLD_LIBRARY_PATH="$libdir${DYLD_LIBRARY_PATH:+:$DYLD_LIBRARY_PATH}" \
"$pg_bin/$prog" "$@"
;;
*)
LD_LIBRARY_PATH="$libdir${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
"$pg_bin/$prog" "$@"
;;
esac
}
max_n="$(printf '%s\n' $cases | sort -n | tail -1)"
tmpdir="$(mktemp -d "${TMPDIR:-/tmp}/pgstat-xact-bench.XXXXXX")"
datadir="$tmpdir/data"
logfile="$tmpdir/postgres.log"
helper_src="$tmpdir/pqexec_once.c"
helper="$tmpdir/pqexec_once"
cleanup()
{
set +e
if [ -d "$datadir" ]; then
pg_tool pg_ctl -D "$datadir" -m fast -w stop >/dev/null 2>&1
fi
if [ "$keep" -eq 0 ]; then
rm -rf "$tmpdir"
else
echo "kept temporary directory: $tmpdir"
fi
}
trap cleanup EXIT
trap 'trap - EXIT; cleanup; exit 130' INT
trap 'trap - EXIT; cleanup; exit 143' TERM
cat >"$helper_src" <<'C'
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "libpq-fe.h"
static char *
read_file(const char *path, long *len_out)
{
FILE *f = fopen(path, "rb");
long len;
char *buf;
if (f == NULL)
{
fprintf(stderr, "could not open %s: %s\n", path, strerror(errno));
exit(2);
}
if (fseek(f, 0, SEEK_END) != 0)
{
perror("fseek");
exit(2);
}
len = ftell(f);
if (len < 0)
{
perror("ftell");
exit(2);
}
rewind(f);
buf = malloc((size_t) len + 1);
if (buf == NULL)
{
perror("malloc");
exit(2);
}
if (fread(buf, 1, (size_t) len, f) != (size_t) len)
{
perror("fread");
exit(2);
}
fclose(f);
buf[len] = '\0';
*len_out = len;
return buf;
}
int
main(int argc, char **argv)
{
const char *conninfo;
const char *sql_path;
PGconn *conn;
PGresult *res;
char *sql;
long sql_len;
if (argc != 3)
{
fprintf(stderr, "usage: %s CONNINFO SQLFILE\n", argv[0]);
return 2;
}
conninfo = argv[1];
sql_path = argv[2];
sql = read_file(sql_path, &sql_len);
conn = PQconnectdb(conninfo);
if (PQstatus(conn) != CONNECTION_OK)
{
fprintf(stderr, "connection failed: %s\n", PQerrorMessage(conn));
return 2;
}
res = PQexec(conn, sql);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
fprintf(stderr, "query failed: %s\n", PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
free(sql);
return 1;
}
if (PQntuples(res) != 1 || PQnfields(res) != 1)
{
fprintf(stderr, "query returned %d rows and %d columns, expected 1x1\n",
PQntuples(res), PQnfields(res));
PQclear(res);
PQfinish(conn);
free(sql);
return 1;
}
printf("%s\n", PQgetvalue(res, 0, 0));
PQclear(res);
PQfinish(conn);
free(sql);
return 0;
}
C
includedir="$(pg_tool pg_config --includedir)"
libdir="$(pg_tool pg_config --libdir)"
"${CC:-cc}" -O2 -I"$includedir" -L"$libdir" -Wl,-rpath,"$libdir" \
-o "$helper" "$helper_src" -lpq
pg_tool initdb -D "$datadir" --auth trust --no-sync --no-instructions \
--lc-messages=C >/dev/null
cat >>"$datadir/postgresql.conf" <<EOF
listen_addresses = ''
port = $port
unix_socket_directories = '$tmpdir'
fsync = off
autovacuum = off
track_functions = 'none'
max_prepared_transactions = 0
EOF
pg_tool pg_ctl -D "$datadir" -l "$logfile" -w start >/dev/null
conninfo="host=$tmpdir port=$port dbname=postgres"
setup_sql="$tmpdir/setup.sql"
{
echo "DROP SCHEMA IF EXISTS pgstat_xact_bench CASCADE;"
echo "CREATE SCHEMA pgstat_xact_bench;"
echo "SET search_path = pgstat_xact_bench;"
for i in $(seq 1 "$max_n"); do
printf "CREATE TABLE t_%06d (a int);\n" "$i"
printf "INSERT INTO t_%06d VALUES (1);\n" "$i"
done
} >"$setup_sql"
pg_tool psql -X -q -v ON_ERROR_STOP=1 \
-h "$tmpdir" -p "$port" -d postgres -f "$setup_sql" >/dev/null
printf "pg_bin=%s\n" "$pg_bin"
printf "tmpdir=%s\n" "$tmpdir"
printf "m=%s\n" "$m"
printf "%10s %12s %12s %12s\n" "pending_n" "xacts_m" "seconds" "us_per_xact"
for n in $cases; do
sql_file="$tmpdir/run_${n}.sql"
{
echo "SET search_path = pgstat_xact_bench;"
echo "SET stats_fetch_consistency = none;"
echo "CREATE TEMP TABLE bench_timer (started_at timestamptz);"
echo "BEGIN;"
if [ "$n" -eq 0 ]; then
echo "SELECT 0;"
else
for i in $(seq 1 "$n"); do
printf "SELECT count(*) FROM t_%06d;\n" "$i"
done
fi
echo "COMMIT;"
echo "INSERT INTO bench_timer VALUES (clock_timestamp());"
awk -v m="$m" 'BEGIN { for (i = 1; i <= m; i++) print "BEGIN; COMMIT;" }'
echo "SELECT EXTRACT(epoch FROM clock_timestamp() - started_at)::float8"
echo "FROM bench_timer;"
} >"$sql_file"
case "$(uname -s)" in
Darwin)
seconds="$(DYLD_LIBRARY_PATH="$libdir${DYLD_LIBRARY_PATH:+:$DYLD_LIBRARY_PATH}" \
"$helper" "$conninfo" "$sql_file")"
;;
*)
seconds="$(LD_LIBRARY_PATH="$libdir${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
"$helper" "$conninfo" "$sql_file")"
;;
esac
us_per_xact="$(awk -v s="$seconds" -v m="$m" 'BEGIN { printf "%.3f", (s * 1000000.0) / m }')"
printf "%10s %12s %12s %12s\n" "$n" "$m" "$seconds" "$us_per_xact"
done
^ permalink raw reply [nested|flat] 8+ messages in thread
end of thread, other threads:[~2026-05-13 03:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-25 14:06 BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction PG Bug reporting form <[email protected]>
2026-03-27 02:15 ` Xuneng Zhou <[email protected]>
2026-03-28 05:47 ` Xuneng Zhou <[email protected]>
2026-03-29 03:17 ` Xuneng Zhou <[email protected]>
2026-03-30 03:14 ` Xuneng Zhou <[email protected]>
2026-05-06 08:03 ` klemen kobau <[email protected]>
2026-05-07 06:35 ` Xuneng Zhou <[email protected]>
2026-05-13 03:30 ` Xuneng Zhou <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox