public inbox for [email protected]
help / color / mirror / Atom feedFrom: Xuneng Zhou <[email protected]>
To: klemen kobau <[email protected]>
Cc: [email protected]
Subject: Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
Date: Wed, 13 May 2026 11:30:41 +0800
Message-ID: <CABPTF7WUzwywrDmB+W=Sm=ynzQkcS83p2z-Ywt8W8XdKhXOPcQ@mail.gmail.com> (raw)
In-Reply-To: <CABPTF7XxssS5dptcp7_AxeOAXj8Vh7cKvPJR+4PayWhZhQzDqg@mail.gmail.com>
References: <[email protected]>
<CABPTF7Wx=WTHa67PJUnH8VaJf-svrcFjt1ZgOVrqWzfgwDuMCA@mail.gmail.com>
<CABPTF7Wb6VOac1EFopGzhAtiHOU8_XRkFvhC6BnSj=jACBB+Gw@mail.gmail.com>
<CABPTF7VSNTvHTpS4ga1MfLYO-=xt7mLQ0ONEFobRERnhzXSzxw@mail.gmail.com>
<CABPTF7XO101mBvA9u01nHkvt1gaQ3i60hypYOvNp6DdOFnY0jQ@mail.gmail.com>
<CAOGhw3iNj+v-08hsfQw+5eUitCicDAHJaaJ++b9Ym0EOrMn0Gg@mail.gmail.com>
<CABPTF7XxssS5dptcp7_AxeOAXj8Vh7cKvPJR+4PayWhZhQzDqg@mail.gmail.com>
> >> --- eager baseline sweep
> >> The attached patch records the baseline eagerly at transaction
> >> boundaries instead of lazily at counter-increment sites.
> >> pgstat_set_pending_baselines() iterates the pgStatPending list and
> >> snapshots each entry's current counts into an xact_baseline field via
> >> struct assignment. It is called from AtEOXact_PgStat() (after folding
> >> transactional counts and removing dropped entries) and from
> >> PostPrepare_PgStat() (after relation cleanup), covering commit, abort,
> >> and PREPARE TRANSACTION. The view accessors unconditionally subtract
> >> the baseline. For entries created in the current transaction,
> >> xact_baseline is zero-initialized, so the subtraction is a no-op.
> >>
> >> I don’t have a clear preference between the two approaches; both are
> >> presented for review.
> >>
>
> It would be useful to verify the fix by manually applying the patch
> and building the instance. Additionally, a few issues surfaced after
> looking at it again, which I will update later.
>
Here is the updated version using eager baseline refresh, i.e. sweeping all
backend-local pending pgstat entries at each top-level transaction boundary.
I tested the eager-baseline approach at both micro and macro levels. The
results show the same cost shape in both cases: the patch cost scales with the
number of backend-local pending pgstat entries that must be swept at each
top-level transaction boundary.
The microbenchmark isolates the transaction-boundary cost. It first creates a
controlled number of pending pgstat entries in one backend, then times 10000
tiny BEGIN/COMMIT boundaries in one simple-query message:
BEGIN; COMMIT; BEGIN; COMMIT; ...
The results below use matching -O0 --enable-debug --enable-cassert builds for
both installs.
pending entries unpatched us/xact patched us/xact patch delta
---------------------------------------------------------------------------
0 713.526 703.357 -10.169
100 740.954 754.298 +13.344
1000 1183.302 / 1177.393 1211.533 / 1213.089 about +32 us
5000 2978.008 / 2967.674 3236.365 / 3081.945
about +186 us median
Machine: Mac mini, M4 Pro, 48GB mem
This is intentionally hostile to eager sweeping: one backend accumulates many
pending entries, then repeatedly crosses top-level transaction boundaries while
doing almost no useful work inside the transactions. The added cost is small at
100 pending entries, where noise matters, but becomes clear at 1000 and 5000
entries. In this debug/cassert build, the patch adds roughly 0.03-0.04 us per
pending entry per transaction in the 1000-5000 entry range. Absolute numbers
are inflated by the build profile, but the linear shape is the relevant signal.
The macro benchmark shows when that same boundary cost is visible in a more
query-shaped workload:
workload base tps patched tps change
------------------------------------------------------------------
5000 tables, 1 row/table 11216 6231 -44%
1000 tables, 1000 rows/table 7683 7113 -7%
100 tables, 10000 rows/table 2152 2162 ~0%
The latency breakdown explains the TPS pattern. For the 5000-table/1-row case:
base: avg_lat=0.089 ms, select_avg=0.051 ms
patched: avg_lat=0.160 ms, select_avg=0.062 ms
Machine: intel xeon server, 40 cores, 128GB mem
The SELECT itself barely changes. Most of the regression appears outside the
SELECT, where the patch does the baseline sweep at transaction end. This is
the worst case for the eager design: the transaction does very little real table
work, but the backend has thousands of pending relation stats entries.
As the number of pending entries drops, or as the query does more real scan
work, the same fixed boundary cost is diluted. With 1000 tables and 1000 rows
per table, the regression falls to about 7%. With 100 tables and 10000 rows per
table, the scan dominates and the sweep over about 100 pending entries is lost
in noise.
Taken together, the benchmarks confirm the expected implementation cost model:
eager baseline refresh cost ~= O(number of pending pgstat entries per backend)
Row count does not directly drive the cost; it only hides or exposes the fixed
transaction-boundary work.
These results suggest that the eager-sweeping approach has an unfavorable
cost model for long-lived sessions that accumulate many pending stats entries
and then execute small transactions. A lazy baseline appraoch, where
each pending
entry records the current transaction generation only when that entry is first
touched, should avoid the transaction-boundary sweep and make the cost scale
with the transaction's actual working set instead. However, it still
suffers from the
potential overhead of additional comparisons on hot paths, as well as increased
maintenance pain.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
Attachments:
[application/octet-stream] v2-0001-Fix-pg_stat_xact_-views-leaking-across-xact-bound.patch (17.7K, 2-v2-0001-Fix-pg_stat_xact_-views-leaking-across-xact-bound.patch)
download | inline diff:
From 8fca00cec38f2459baea3f11a06df79b94572186 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 12 May 2026 11:55:44 +0800
Subject: [PATCH v2] Fix pg_stat_xact_* views leaking across xact boundaries
pgStatPending entries can persist across transaction boundaries within a
backend. The pg_stat_xact_* view accessors read raw accumulated
counters from these entries, so they can report activity from prior
transactions instead of just the current one.
Fix this by recording a baseline snapshot of pending relation and
function counters at each top-level transaction boundary.
pgstat_set_pending_baselines() walks pgStatPending at commit, abort,
and PREPARE TRANSACTION. View accessors subtract the saved baseline
to produce the current transaction's delta.
Entries first created in the current transaction keep a zero baseline,
so their delta is unchanged.
For function stats, add PgStat_FunctionPending to pair
PgStat_FunctionCounts with a baseline copy. find_funcstat_entry() now
returns a palloc'd delta, or NULL when the function was not called in
the current transaction. Add the new typedef to pgindent's typedef
list.
Add a test_misc TAP test covering counter isolation across consecutive
transactions in a single simple-query message.
---
src/backend/utils/activity/pgstat.c | 42 +++-
src/backend/utils/activity/pgstat_function.c | 41 +++-
src/backend/utils/activity/pgstat_relation.c | 20 ++
src/backend/utils/activity/pgstat_xact.c | 15 ++
src/include/pgstat.h | 16 ++
src/include/utils/pgstat_internal.h | 1 +
src/test/modules/test_misc/meson.build | 1 +
.../modules/test_misc/t/013_pgstat_xact.pl | 211 ++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 +
9 files changed, 338 insertions(+), 10 deletions(-)
create mode 100644 src/test/modules/test_misc/t/013_pgstat_xact.pl
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b67da88c7dc..a414a488320 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -326,7 +326,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Function),
.shared_data_off = offsetof(PgStatShared_Function, stats),
.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
- .pending_size = sizeof(PgStat_FunctionCounts),
+ .pending_size = sizeof(PgStat_FunctionPending),
.flush_pending_cb = pgstat_function_flush_cb,
.reset_timestamp_cb = pgstat_function_reset_timestamp_cb,
@@ -1357,6 +1357,46 @@ pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
return entry_ref;
}
+/*
+ * Set xact baselines for all pending relation and function entries.
+ *
+ * Called at top-level transaction commit, abort, and successful PREPARE to
+ * record the current counter state as the baseline. The next transaction on
+ * this backend will compute xact-scoped deltas by subtracting this baseline.
+ *
+ * This sweeps all unflushed pending entries, not just those touched in the
+ * current transaction, because nontransactional counters (scans, buffer hits,
+ * etc.) accumulate in the pending entry regardless of whether a
+ * PgStat_TableXactStatus was created.
+ */
+void
+pgstat_set_pending_baselines(void)
+{
+ dlist_iter iter;
+
+ dlist_foreach(iter, &pgStatPending)
+ {
+ PgStat_EntryRef *entry_ref =
+ dlist_container(PgStat_EntryRef, pending_node, iter.cur);
+ PgStat_Kind kind = entry_ref->shared_entry->key.kind;
+
+ if (kind == PGSTAT_KIND_RELATION)
+ {
+ PgStat_TableStatus *tabstat =
+ (PgStat_TableStatus *) entry_ref->pending;
+
+ tabstat->xact_baseline = tabstat->counts;
+ }
+ else if (kind == PGSTAT_KIND_FUNCTION)
+ {
+ PgStat_FunctionPending *fpending =
+ (PgStat_FunctionPending *) entry_ref->pending;
+
+ fpending->xact_baseline = fpending->counts;
+ }
+ }
+}
+
void
pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
{
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d47d05e3d92..04bbe2e7cb6 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -119,7 +119,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
}
}
- pending = entry_ref->pending;
+ pending = &((PgStat_FunctionPending *) entry_ref->pending)->counts;
fcu->fs = pending;
@@ -192,10 +192,12 @@ pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize)
bool
pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
{
+ PgStat_FunctionPending *fpending;
PgStat_FunctionCounts *localent;
PgStatShared_Function *shfuncent;
- localent = (PgStat_FunctionCounts *) entry_ref->pending;
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+ localent = &fpending->counts;
shfuncent = (PgStatShared_Function *) entry_ref->shared_stats;
/* localent always has non-zero content */
@@ -223,18 +225,39 @@ pgstat_function_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
/*
* find any existing PgStat_FunctionCounts entry for specified function
*
- * If no entry, return NULL, don't create a new one
+ * Returns a palloc'd delta struct showing only the current transaction's
+ * contribution, or NULL if the function was not called in this transaction.
*/
PgStat_FunctionCounts *
find_funcstat_entry(Oid func_id)
{
PgStat_EntryRef *entry_ref;
-
- entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
-
- if (entry_ref)
- return entry_ref->pending;
- return NULL;
+ PgStat_FunctionPending *fpending;
+ PgStat_FunctionCounts *result;
+ PgStat_Counter delta_calls;
+
+ entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION,
+ MyDatabaseId, func_id);
+ if (!entry_ref)
+ return NULL;
+
+ fpending = (PgStat_FunctionPending *) entry_ref->pending;
+ delta_calls = fpending->counts.numcalls - fpending->xact_baseline.numcalls;
+ if (delta_calls == 0)
+ return NULL;
+
+ result = palloc(sizeof(PgStat_FunctionCounts));
+ result->numcalls = delta_calls;
+ INSTR_TIME_SET_ZERO(result->total_time);
+ INSTR_TIME_ACCUM_DIFF(result->total_time,
+ fpending->counts.total_time,
+ fpending->xact_baseline.total_time);
+ INSTR_TIME_SET_ZERO(result->self_time);
+ INSTR_TIME_ACCUM_DIFF(result->self_time,
+ fpending->counts.self_time,
+ fpending->xact_baseline.self_time);
+
+ return result;
}
/*
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index b2ca28f83ba..0de69ef8b1d 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -505,6 +505,7 @@ find_tabstat_entry(Oid rel_id)
PgStat_TableXactStatus *trans;
PgStat_TableStatus *tabentry = NULL;
PgStat_TableStatus *tablestatus = NULL;
+ PgStat_TableCounts *baseline;
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
if (!entry_ref)
@@ -525,6 +526,25 @@ find_tabstat_entry(Oid rel_id)
*/
tablestatus->trans = NULL;
+ /*
+ * Subtract the xact baseline to show only the current top-level
+ * transaction's activity. Only subtract counters exposed through
+ * pg_stat_xact_* views. The remaining PgStat_TableCounts fields are
+ * cumulative relation-maintenance state, not xact-view output.
+ */
+ baseline = &tabentry->xact_baseline;
+ tablestatus->counts.numscans -= baseline->numscans;
+ tablestatus->counts.tuples_returned -= baseline->tuples_returned;
+ tablestatus->counts.tuples_fetched -= baseline->tuples_fetched;
+ tablestatus->counts.tuples_inserted -= baseline->tuples_inserted;
+ tablestatus->counts.tuples_updated -= baseline->tuples_updated;
+ tablestatus->counts.tuples_deleted -= baseline->tuples_deleted;
+ tablestatus->counts.tuples_hot_updated -= baseline->tuples_hot_updated;
+ tablestatus->counts.tuples_newpage_updated -=
+ baseline->tuples_newpage_updated;
+ tablestatus->counts.blocks_fetched -= baseline->blocks_fetched;
+ tablestatus->counts.blocks_hit -= baseline->blocks_hit;
+
/*
* Live subtransaction counts are not included yet. This is not a hot
* code path so reconcile tuples_inserted, tuples_updated and
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 5e2d69e6297..ad8ae891113 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -55,6 +55,14 @@ AtEOXact_PgStat(bool isCommit, bool parallel)
}
pgStatXactStack = NULL;
+ /*
+ * Record current counter values as the baseline for the next transaction.
+ * This must be after AtEOXact_PgStat_Relations(), which folds
+ * transactional counts into ->counts, and AtEOXact_PgStat_DroppedStats(),
+ * which removes dropped entries from pgStatPending.
+ */
+ pgstat_set_pending_baselines();
+
/* Make sure any stats snapshot is thrown away */
pgstat_clear_snapshot();
}
@@ -226,6 +234,13 @@ PostPrepare_PgStat(void)
}
pgStatXactStack = NULL;
+ /*
+ * Record baselines, same as at commit/abort. AtEOXact_PgStat() is not
+ * called during PREPARE, so without this, counters from the prepared
+ * transaction would leak into later transactions.
+ */
+ pgstat_set_pending_baselines();
+
/* Make sure any stats snapshot is thrown away */
pgstat_clear_snapshot();
}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index dfa2e837638..dec7fcec8e0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -90,6 +90,21 @@ typedef struct PgStat_FunctionCounts
instr_time self_time;
} PgStat_FunctionCounts;
+/* ----------
+ * PgStat_FunctionPending Backend-local pending state for function stats
+ *
+ * Wraps the accumulated counts with a per-transaction baseline so that
+ * pg_stat_xact_user_functions can report only the current transaction's
+ * contribution. The baseline is set at each top-level transaction boundary
+ * by pgstat_set_pending_baselines().
+ * ----------
+ */
+typedef struct PgStat_FunctionPending
+{
+ PgStat_FunctionCounts counts; /* accumulated counts */
+ PgStat_FunctionCounts xact_baseline; /* snapshot at xact boundary */
+} PgStat_FunctionPending;
+
/*
* Working state needed to accumulate per-function-call timing statistics.
*/
@@ -183,6 +198,7 @@ typedef struct PgStat_TableStatus
bool shared; /* is it a shared catalog? */
struct PgStat_TableXactStatus *trans; /* lowest subxact's counts */
PgStat_TableCounts counts; /* event counts to be sent */
+ PgStat_TableCounts xact_baseline; /* snapshot at xact boundary */
Relation relation; /* rel that is using this entry */
} PgStat_TableStatus;
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index fe463faaf63..f8bb8a4f230 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -684,6 +684,7 @@ extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
bool *created_entry);
extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind,
Oid dboid, uint64 objid);
+extern void pgstat_set_pending_baselines(void);
extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid,
bool *may_free);
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 356d8454b39..3bc56034fc0 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -21,6 +21,7 @@ tests += {
't/010_index_concurrently_upsert.pl',
't/011_lock_stats.pl',
't/012_ddlutils.pl',
+ 't/013_pgstat_xact.pl',
],
# The injection points are cluster-wide, so disable installcheck
'runningcheck': false,
diff --git a/src/test/modules/test_misc/t/013_pgstat_xact.pl b/src/test/modules/test_misc/t/013_pgstat_xact.pl
new file mode 100644
index 00000000000..03ad8ba2101
--- /dev/null
+++ b/src/test/modules/test_misc/t/013_pgstat_xact.pl
@@ -0,0 +1,211 @@
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test that pg_stat_xact_* views report only current-transaction activity,
+# not accumulated pending stats from prior transactions.
+#
+# Use psql -c so all statements in a command are sent as one simple-query
+# message. That preserves pgStatPending entries across consecutive top-level
+# transactions by avoiding a ReadyForQuery boundary, where PostgresMain() may
+# call pgstat_report_stat().
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "track_functions = 'all'");
+$node->append_conf('postgresql.conf', "max_prepared_transactions = 5");
+$node->start;
+
+my $dbname = 'postgres';
+
+$node->safe_psql(
+ $dbname, q{
+ CREATE TABLE test_xact_stats (x int);
+ CREATE FUNCTION test_xact_stats_func() RETURNS void
+ LANGUAGE plpgsql AS $$ BEGIN NULL; END; $$;
+});
+
+sub psql_c_sequence_like
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($node, $commands, $expected_stdout, $test_name) = @_;
+ my ($stdout, $stderr);
+ my @cmd = (
+ $node->installed_command('psql'),
+ '--no-psqlrc',
+ '--no-align',
+ '--tuples-only',
+ '--quiet',
+ '--set' => 'ON_ERROR_STOP=1',
+ '-d' => $node->connstr($dbname));
+
+ foreach my $command (@{$commands})
+ {
+ push @cmd, '-c' => $command;
+ }
+
+ my $result = IPC::Run::run \@cmd, '>', \$stdout, '2>', \$stderr;
+
+ is($result, 1, "$test_name: exit code 0");
+ is($stderr, '', "$test_name: no stderr");
+ like($stdout, $expected_stdout, "$test_name: matches");
+}
+
+sub psql_c_like
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+ my ($node, $sql, $expected_stdout, $test_name) = @_;
+
+ psql_c_sequence_like($node, [$sql], $expected_stdout, $test_name);
+}
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (1);
+ COMMIT;
+ BEGIN;
+ INSERT INTO test_xact_stats VALUES (2);
+ SELECT 'n_tup_ins:' || n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^n_tup_ins:1$/m,
+ "table n_tup_ins shows only current transaction's inserts");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ COMMIT;
+ BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ SELECT 'seq_scan:' || seq_scan FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:1$/m,
+ "table seq_scan shows only current transaction's scans");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT test_xact_stats_func();
+ COMMIT;
+ BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT 'calls:' || calls FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^calls:1$/m,
+ "function calls shows only current transaction's calls");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ PREPARE TRANSACTION 'test_prep';
+ BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ SELECT 'seq_scan:' || seq_scan FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:1$/m,
+ "table seq_scan does not leak across PREPARE boundary");
+
+$node->safe_psql($dbname, q{COMMIT PREPARED 'test_prep';});
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT count(*) FROM test_xact_stats;
+ INSERT INTO test_xact_stats VALUES (99);
+ COMMIT;
+ BEGIN;
+ SELECT 'seq_scan:' || seq_scan || ',n_tup_ins:' || n_tup_ins
+ FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;},
+ qr/^seq_scan:0,n_tup_ins:0$/m,
+ "untouched relation entry shows zeroes in current transaction");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ COMMIT;
+ BEGIN;
+ SELECT 'found:' || count(*) FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^found:0$/m,
+ "uncalled function is not visible in pg_stat_xact_user_functions");
+
+psql_c_like(
+ $node,
+ q{BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT test_xact_stats_func();
+ PREPARE TRANSACTION 'test_func_prep';
+ BEGIN;
+ SELECT test_xact_stats_func();
+ SELECT 'calls:' || calls FROM pg_stat_xact_user_functions
+ WHERE funcname = 'test_xact_stats_func';
+ COMMIT;},
+ qr/^calls:1$/m,
+ "function calls do not leak across PREPARE boundary");
+
+$node->safe_psql($dbname, q{COMMIT PREPARED 'test_func_prep';});
+
+psql_c_sequence_like(
+ $node,
+ [
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (300);
+ PREPARE TRANSACTION 'test_commit_prep';
+ SELECT pg_stat_force_next_flush();},
+ q{COMMIT PREPARED 'test_commit_prep';},
+ q{BEGIN;
+ SELECT 'n_tup_ins:' || n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;}
+ ],
+ qr/^n_tup_ins:0$/m,
+ "table n_tup_ins does not leak after COMMIT PREPARED");
+
+psql_c_sequence_like(
+ $node,
+ [
+ q{BEGIN;
+ INSERT INTO test_xact_stats VALUES (400);
+ PREPARE TRANSACTION 'test_rollback_prep';
+ SELECT pg_stat_force_next_flush();},
+ q{ROLLBACK PREPARED 'test_rollback_prep';},
+ q{BEGIN;
+ SELECT 'n_tup_ins:' || n_tup_ins FROM pg_stat_xact_user_tables
+ WHERE relname = 'test_xact_stats';
+ COMMIT;}
+ ],
+ qr/^n_tup_ins:0$/m,
+ "table n_tup_ins does not leak after ROLLBACK PREPARED");
+
+for my $gid (
+ qw(test_prep test_func_prep test_commit_prep test_rollback_prep))
+{
+ eval { $node->safe_psql($dbname, "ROLLBACK PREPARED '$gid'"); };
+}
+
+$node->safe_psql(
+ $dbname, q{
+ DROP TABLE test_xact_stats;
+ DROP FUNCTION test_xact_stats_func();
+});
+
+$node->stop;
+done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06532bf7152..eba58d860cf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2330,6 +2330,7 @@ PgStat_EntryRefHashEntry
PgStat_FetchConsistency
PgStat_FunctionCallUsage
PgStat_FunctionCounts
+PgStat_FunctionPending
PgStat_HashKey
PgStat_IO
PgStat_KindInfo
--
2.51.0
[text/x-sh] pgstat_xact_macro_bench.sh (7.4K, 3-pgstat_xact_macro_bench.sh)
download | inline:
#!/usr/bin/env bash
set -euo pipefail
usage()
{
cat <<'USAGE'
Usage:
pgstat_xact_macro_bench.sh [options]
Options:
--pg-bin DIR Run one PostgreSQL install.
--patched-pg-bin DIR Run and label this install as patched.
--unpatched-pg-bin DIR Run and label this install as unpatched.
--tables 5000 Number of tenant tables to create.
--rows-per-table 1 Rows inserted into each tenant table.
--clients 1 pgbench clients. Use 1 for one long-lived backend.
--jobs 1 pgbench worker threads.
--time 60 Measured pgbench duration in seconds.
--warmup 10 Warmup duration in seconds.
--port-base 65440 First temporary server port.
--keep Keep temporary cluster directories.
Environment:
PG_BIN Same as --pg-bin.
Examples:
./pgstat_xact_macro_bench.sh --pg-bin ./inst/bin
./pgstat_xact_macro_bench.sh \
--unpatched-pg-bin /tmp/pg-unpatched/inst/bin \
--patched-pg-bin /tmp/pg-patched/inst/bin
When both --patched-pg-bin and --unpatched-pg-bin are provided, runs are
executed in the same order as the options appear on the command line. Use
that to check for order bias.
This benchmark models a realistic bad case for transaction-boundary pgstat
work: many tenant tables, one long-lived connection, many short transactions,
and pending stats entries retained in the backend while the workload keeps
running. The pgbench script is:
\set id random(1, N)
BEGIN;
SELECT count(*) FROM tenant_:id;
COMMIT;
USAGE
}
pg_bin="${PG_BIN:-}"
patched_pg_bin=""
unpatched_pg_bin=""
run_labels=()
run_bins=()
tables=5000
rows_per_table=1
clients=1
jobs=1
duration=60
warmup=10
port_base=65440
keep=0
while [ "$#" -gt 0 ]; do
case "$1" in
--pg-bin)
pg_bin="$2"
shift 2
;;
--patched-pg-bin)
patched_pg_bin="$2"
run_labels+=("patched")
run_bins+=("$2")
shift 2
;;
--unpatched-pg-bin)
unpatched_pg_bin="$2"
run_labels+=("unpatched")
run_bins+=("$2")
shift 2
;;
--tables)
tables="$2"
shift 2
;;
--rows-per-table)
rows_per_table="$2"
shift 2
;;
--clients)
clients="$2"
shift 2
;;
--jobs)
jobs="$2"
shift 2
;;
--time)
duration="$2"
shift 2
;;
--warmup)
warmup="$2"
shift 2
;;
--port-base)
port_base="$2"
shift 2
;;
--keep)
keep=1
shift
;;
--help|-h)
usage
exit 0
;;
*)
echo "unknown option: $1" >&2
usage >&2
exit 2
;;
esac
done
if [ -n "$pg_bin" ]; then
if [ -n "$patched_pg_bin" ] || [ -n "$unpatched_pg_bin" ]; then
echo "--pg-bin cannot be combined with --patched-pg-bin or --unpatched-pg-bin" >&2
exit 1
fi
run_labels=("build")
run_bins=("$pg_bin")
fi
if [ "${#run_bins[@]}" -eq 0 ]; then
echo "set PG_BIN or pass --pg-bin/--patched-pg-bin/--unpatched-pg-bin" >&2
exit 1
fi
for bin in "${run_bins[@]}"; do
for prog in postgres initdb pg_ctl psql pgbench; do
if [ ! -x "$bin/$prog" ]; then
echo "missing executable: $bin/$prog" >&2
exit 1
fi
done
done
pg_tool()
{
local bin="$1"
local prog="$2"
local libdir
shift 2
libdir="$(cd "$bin/../lib" 2>/dev/null && pwd || true)"
if [ -z "$libdir" ]; then
"$bin/$prog" "$@"
return
fi
case "$(uname -s)" in
Darwin)
DYLD_LIBRARY_PATH="$libdir${DYLD_LIBRARY_PATH:+:$DYLD_LIBRARY_PATH}" \
"$bin/$prog" "$@"
;;
*)
LD_LIBRARY_PATH="$libdir${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
"$bin/$prog" "$@"
;;
esac
}
tmpdirs=()
datadirs=()
pg_bins=()
created_tmpdir=""
created_script_file=""
created_conninfo=""
cleanup()
{
set +e
local i
for i in "${!datadirs[@]}"; do
if [ -d "${datadirs[$i]}" ]; then
pg_tool "${pg_bins[$i]}" pg_ctl -D "${datadirs[$i]}" -m fast -w stop \
>/dev/null 2>&1
fi
done
if [ "$keep" -eq 0 ]; then
for i in "${!tmpdirs[@]}"; do
rm -rf "${tmpdirs[$i]}"
done
else
for i in "${!tmpdirs[@]}"; do
echo "kept temporary directory: ${tmpdirs[$i]}"
done
fi
}
trap cleanup EXIT
trap 'trap - EXIT; cleanup; exit 130' INT
trap 'trap - EXIT; cleanup; exit 143' TERM
create_cluster()
{
local label="$1"
local bin="$2"
local port="$3"
local tmpdir datadir logfile setup_sql script_file conninfo
tmpdir="$(mktemp -d "${TMPDIR:-/tmp}/pgstat-xact-macro-${label}.XXXXXX")"
datadir="$tmpdir/data"
logfile="$tmpdir/postgres.log"
setup_sql="$tmpdir/setup.sql"
script_file="$tmpdir/tenant.sql"
tmpdirs+=("$tmpdir")
datadirs+=("$datadir")
pg_bins+=("$bin")
pg_tool "$bin" initdb -D "$datadir" --auth trust --no-sync --no-instructions \
--lc-messages=C >/dev/null
cat >>"$datadir/postgresql.conf" <<EOF
listen_addresses = ''
port = $port
unix_socket_directories = '$tmpdir'
fsync = off
synchronous_commit = off
autovacuum = off
track_functions = 'none'
max_prepared_transactions = 0
EOF
pg_tool "$bin" pg_ctl -D "$datadir" -l "$logfile" -w start >/dev/null
{
echo "SET client_min_messages = warning;"
echo "DROP SCHEMA IF EXISTS pgstat_xact_macro CASCADE;"
echo "CREATE SCHEMA pgstat_xact_macro;"
echo "SET search_path = pgstat_xact_macro;"
for i in $(seq 1 "$tables"); do
printf "CREATE TABLE tenant_%d (a int);\n" "$i"
printf "INSERT INTO tenant_%d SELECT generate_series(1, %d);\n" \
"$i" "$rows_per_table"
done
} >"$setup_sql"
pg_tool "$bin" psql -X -q -v ON_ERROR_STOP=1 \
-h "$tmpdir" -p "$port" -d postgres -f "$setup_sql" >/dev/null
cat >"$script_file" <<EOF
\\set id random(1, $tables)
BEGIN;
SELECT count(*) FROM pgstat_xact_macro.tenant_:id;
COMMIT;
EOF
conninfo="host=$tmpdir port=$port dbname=postgres"
created_tmpdir="$tmpdir"
created_script_file="$script_file"
created_conninfo="$conninfo"
}
run_pgbench()
{
local label="$1"
local bin="$2"
local conninfo="$3"
local script_file="$4"
local seconds="$5"
local out_file="$6"
pg_tool "$bin" pgbench -n -r -M simple -c "$clients" -j "$jobs" -T "$seconds" \
-f "$script_file" "$conninfo" >"$out_file"
}
extract_metric()
{
local file="$1"
local pattern="$2"
awk -v pat="$pattern" '
$0 ~ pat {
for (i = 1; i <= NF; i++)
{
if ($i ~ /^[0-9]+([.][0-9]+)?$/)
{
print $i;
exit;
}
}
}
' "$file"
}
printf "tables=%s rows_per_table=%s clients=%s jobs=%s warmup=%s time=%s\n" \
"$tables" "$rows_per_table" "$clients" "$jobs" "$warmup" "$duration"
printf "%-12s %12s %14s %14s\n" \
"label" "tps" "avg_lat_ms" "select_avg_ms"
for idx in "${!run_bins[@]}"; do
label="${run_labels[$idx]}"
bin="${run_bins[$idx]}"
port=$((port_base + idx))
create_cluster "$label" "$bin" "$port"
tmpdir="$created_tmpdir"
script_file="$created_script_file"
conninfo="$created_conninfo"
warmup_out="$tmpdir/warmup.out"
run_out="$tmpdir/pgbench.out"
if [ "$warmup" -gt 0 ]; then
run_pgbench "$label" "$bin" "$conninfo" "$script_file" "$warmup" \
"$warmup_out"
fi
run_pgbench "$label" "$bin" "$conninfo" "$script_file" "$duration" \
"$run_out"
tps="$(extract_metric "$run_out" "^tps =")"
avg_lat="$(extract_metric "$run_out" "^latency average =")"
select_avg="$(awk '
/SELECT count\(\*\) FROM pgstat_xact_macro[.]tenant_:id/ {
print $1;
exit;
}
' "$run_out")"
if [ -z "$tps" ]; then
tps="?"
fi
if [ -z "$avg_lat" ]; then
avg_lat="?"
fi
if [ -z "$select_avg" ]; then
select_avg="?"
fi
printf "%-12s %12s %14s %14s\n" \
"$label" "$tps" "$avg_lat" "$select_avg"
if [ "$keep" -eq 1 ]; then
echo "temporary directory for $label: $tmpdir"
echo "pgbench output for $label: $run_out"
fi
done
[text/x-sh] pgstat_xact_micro_bench.sh (6.5K, 4-pgstat_xact_micro_bench.sh)
download | inline:
#!/usr/bin/env bash
set -euo pipefail
usage()
{
cat <<'USAGE'
Usage:
pgstat_xact_baseline_bench.sh [options]
Options:
--cases "0 10 100 1000" Pending-entry counts to test.
--m 2000 Number of tiny BEGIN/COMMIT transactions.
--keep Keep the temporary cluster directory.
--pg-bin DIR Directory containing postgres/initdb/pg_ctl/psql/pg_config.
--port PORT Port for the temporary server.
Environment:
PG_BIN Same as --pg-bin.
CC C compiler for the tiny libpq PQexec helper.
The benchmark starts a temporary PostgreSQL cluster, creates the maximum
number of test tables requested, then sends each measured workload as one
simple-query message through libpq. That is intentional: psql -f would split
the workload into multiple protocol messages and would not pressure the
pgStatPending-across-transaction-boundaries path in the same way.
USAGE
}
cases="0 10 100 1000"
m=2000
keep=0
pg_bin="${PG_BIN:-}"
port=65432
while [ "$#" -gt 0 ]; do
case "$1" in
--cases)
cases="$2"
shift 2
;;
--m)
m="$2"
shift 2
;;
--keep)
keep=1
shift
;;
--pg-bin)
pg_bin="$2"
shift 2
;;
--port)
port="$2"
shift 2
;;
--help|-h)
usage
exit 0
;;
*)
echo "unknown option: $1" >&2
usage >&2
exit 2
;;
esac
done
if [ -z "$pg_bin" ]; then
echo "set PG_BIN or pass --pg-bin" >&2
exit 1
fi
for prog in postgres initdb pg_ctl psql pg_config; do
if [ ! -x "$pg_bin/$prog" ]; then
echo "missing executable: $pg_bin/$prog" >&2
exit 1
fi
done
pg_tool()
{
local prog="$1"
local libdir
shift
libdir="$(cd "$pg_bin/../lib" 2>/dev/null && pwd || true)"
if [ -z "$libdir" ]; then
"$pg_bin/$prog" "$@"
return
fi
case "$(uname -s)" in
Darwin)
DYLD_LIBRARY_PATH="$libdir${DYLD_LIBRARY_PATH:+:$DYLD_LIBRARY_PATH}" \
"$pg_bin/$prog" "$@"
;;
*)
LD_LIBRARY_PATH="$libdir${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
"$pg_bin/$prog" "$@"
;;
esac
}
max_n="$(printf '%s\n' $cases | sort -n | tail -1)"
tmpdir="$(mktemp -d "${TMPDIR:-/tmp}/pgstat-xact-bench.XXXXXX")"
datadir="$tmpdir/data"
logfile="$tmpdir/postgres.log"
helper_src="$tmpdir/pqexec_once.c"
helper="$tmpdir/pqexec_once"
cleanup()
{
set +e
if [ -d "$datadir" ]; then
pg_tool pg_ctl -D "$datadir" -m fast -w stop >/dev/null 2>&1
fi
if [ "$keep" -eq 0 ]; then
rm -rf "$tmpdir"
else
echo "kept temporary directory: $tmpdir"
fi
}
trap cleanup EXIT
trap 'trap - EXIT; cleanup; exit 130' INT
trap 'trap - EXIT; cleanup; exit 143' TERM
cat >"$helper_src" <<'C'
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "libpq-fe.h"
static char *
read_file(const char *path, long *len_out)
{
FILE *f = fopen(path, "rb");
long len;
char *buf;
if (f == NULL)
{
fprintf(stderr, "could not open %s: %s\n", path, strerror(errno));
exit(2);
}
if (fseek(f, 0, SEEK_END) != 0)
{
perror("fseek");
exit(2);
}
len = ftell(f);
if (len < 0)
{
perror("ftell");
exit(2);
}
rewind(f);
buf = malloc((size_t) len + 1);
if (buf == NULL)
{
perror("malloc");
exit(2);
}
if (fread(buf, 1, (size_t) len, f) != (size_t) len)
{
perror("fread");
exit(2);
}
fclose(f);
buf[len] = '\0';
*len_out = len;
return buf;
}
int
main(int argc, char **argv)
{
const char *conninfo;
const char *sql_path;
PGconn *conn;
PGresult *res;
char *sql;
long sql_len;
if (argc != 3)
{
fprintf(stderr, "usage: %s CONNINFO SQLFILE\n", argv[0]);
return 2;
}
conninfo = argv[1];
sql_path = argv[2];
sql = read_file(sql_path, &sql_len);
conn = PQconnectdb(conninfo);
if (PQstatus(conn) != CONNECTION_OK)
{
fprintf(stderr, "connection failed: %s\n", PQerrorMessage(conn));
return 2;
}
res = PQexec(conn, sql);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
fprintf(stderr, "query failed: %s\n", PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
free(sql);
return 1;
}
if (PQntuples(res) != 1 || PQnfields(res) != 1)
{
fprintf(stderr, "query returned %d rows and %d columns, expected 1x1\n",
PQntuples(res), PQnfields(res));
PQclear(res);
PQfinish(conn);
free(sql);
return 1;
}
printf("%s\n", PQgetvalue(res, 0, 0));
PQclear(res);
PQfinish(conn);
free(sql);
return 0;
}
C
includedir="$(pg_tool pg_config --includedir)"
libdir="$(pg_tool pg_config --libdir)"
"${CC:-cc}" -O2 -I"$includedir" -L"$libdir" -Wl,-rpath,"$libdir" \
-o "$helper" "$helper_src" -lpq
pg_tool initdb -D "$datadir" --auth trust --no-sync --no-instructions \
--lc-messages=C >/dev/null
cat >>"$datadir/postgresql.conf" <<EOF
listen_addresses = ''
port = $port
unix_socket_directories = '$tmpdir'
fsync = off
autovacuum = off
track_functions = 'none'
max_prepared_transactions = 0
EOF
pg_tool pg_ctl -D "$datadir" -l "$logfile" -w start >/dev/null
conninfo="host=$tmpdir port=$port dbname=postgres"
setup_sql="$tmpdir/setup.sql"
{
echo "DROP SCHEMA IF EXISTS pgstat_xact_bench CASCADE;"
echo "CREATE SCHEMA pgstat_xact_bench;"
echo "SET search_path = pgstat_xact_bench;"
for i in $(seq 1 "$max_n"); do
printf "CREATE TABLE t_%06d (a int);\n" "$i"
printf "INSERT INTO t_%06d VALUES (1);\n" "$i"
done
} >"$setup_sql"
pg_tool psql -X -q -v ON_ERROR_STOP=1 \
-h "$tmpdir" -p "$port" -d postgres -f "$setup_sql" >/dev/null
printf "pg_bin=%s\n" "$pg_bin"
printf "tmpdir=%s\n" "$tmpdir"
printf "m=%s\n" "$m"
printf "%10s %12s %12s %12s\n" "pending_n" "xacts_m" "seconds" "us_per_xact"
for n in $cases; do
sql_file="$tmpdir/run_${n}.sql"
{
echo "SET search_path = pgstat_xact_bench;"
echo "SET stats_fetch_consistency = none;"
echo "CREATE TEMP TABLE bench_timer (started_at timestamptz);"
echo "BEGIN;"
if [ "$n" -eq 0 ]; then
echo "SELECT 0;"
else
for i in $(seq 1 "$n"); do
printf "SELECT count(*) FROM t_%06d;\n" "$i"
done
fi
echo "COMMIT;"
echo "INSERT INTO bench_timer VALUES (clock_timestamp());"
awk -v m="$m" 'BEGIN { for (i = 1; i <= m; i++) print "BEGIN; COMMIT;" }'
echo "SELECT EXTRACT(epoch FROM clock_timestamp() - started_at)::float8"
echo "FROM bench_timer;"
} >"$sql_file"
case "$(uname -s)" in
Darwin)
seconds="$(DYLD_LIBRARY_PATH="$libdir${DYLD_LIBRARY_PATH:+:$DYLD_LIBRARY_PATH}" \
"$helper" "$conninfo" "$sql_file")"
;;
*)
seconds="$(LD_LIBRARY_PATH="$libdir${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
"$helper" "$conninfo" "$sql_file")"
;;
esac
us_per_xact="$(awk -v s="$seconds" -v m="$m" 'BEGIN { printf "%.3f", (s * 1000000.0) / m }')"
printf "%10s %12s %12s %12s\n" "$n" "$m" "$seconds" "$us_per_xact"
done
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: BUG #19439: pg_stat_xact_user_tables stat not currect during the transaction
In-Reply-To: <CABPTF7WUzwywrDmB+W=Sm=ynzQkcS83p2z-Ywt8W8XdKhXOPcQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox