From 883781d4da9c133aa1c2408379276bb4f52bf3a8 Mon Sep 17 00:00:00 2001 From: Naga Appani Date: Mon, 18 Aug 2025 20:51:25 +0000 Subject: [PATCH v5] Add pg_get_multixact_stats() function for monitoring MultiXact usage Expose multixact state via a new SQL-callable function pg_get_multixact_stats(), returning: - num_mxids : number of MultiXact IDs in use - num_members : number of member entries in use - members_bytes : bytes used by num_members in pg_multixact/members directory - oldest_multixact : oldest MultiXact ID still needed - oldest_offset : oldest member offset still in use This patch: 1. Renames ReadMultiXactCounts() to GetMultiXactInfo() and makes it public - Provides a single accessor for MultiXact state - Returns counts and horizons in one call 2. Adds pg_get_multixact_stats() function - SQL-callable interface to GetMultiXactInfo() - Returns NULLs if MultiXact system not initialized - Includes isolation tests for monitoring invariants Documentation updates: - func-info.sgml: add function entry - maintenance.sgml: mention monitoring multixact usage Build and catalog: - Add function to existing multixactfuncs.c - pg_proc.dat entry Author: Naga Appani Reviewed-by: Ashutosh Bapat Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com --- doc/src/sgml/func/func-info.sgml | 31 +++++ doc/src/sgml/maintenance.sgml | 54 +++++++- src/backend/access/transam/multixact.c | 43 ++++-- src/backend/utils/adt/multixactfuncs.c | 52 +++++++ src/include/access/multixact.h | 1 + src/include/catalog/pg_proc.dat | 15 +++ .../isolation/expected/multixact_stats.out | 94 +++++++++++++ src/test/isolation/isolation_schedule | 1 + src/test/isolation/specs/multixact_stats.spec | 127 ++++++++++++++++++ 9 files changed, 399 insertions(+), 19 deletions(-) create mode 100644 src/test/isolation/expected/multixact_stats.out create mode 100644 src/test/isolation/specs/multixact_stats.spec diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml index c393832d94c..9dedc3715d7 100644 --- a/doc/src/sgml/func/func-info.sgml +++ b/doc/src/sgml/func/func-info.sgml @@ -2975,6 +2975,37 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres} modify key columns. + + + + + pg_get_multixact_stats + + pg_get_multixact_stats () + record + ( num_mxids integer, + num_members bigint, + members_bytes bigint, + oldest_multixact xid, + oldest_offset bigint ) + + + Returns statistics about current multixact usage: + num_mxids is the number of multixact IDs assigned, + num_members is the number of multixact member entries created, + members_bytes is the storage occupied by num_members + in pg_multixact/members directory, + oldest_multixact is the oldest multixact ID still in use, and + oldest_offset is the oldest member offset still in use. + These values can be used to monitor multixact consumption and anticipate + autovacuum behavior. See + for further details on multixact wraparound. + + + This is a live snapshot of shared counters; the numbers can change between calls, + even within the same transaction. + + diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index e7a9f58c015..badd3392c4f 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -813,12 +813,56 @@ HINT: Execute a database-wide VACUUM in that database. As a safety device, an aggressive vacuum scan will occur for any table whose multixact-age is greater than . Also, if the - storage occupied by multixacts members exceeds about 10GB, aggressive vacuum + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number + of members created exceeds approximately 2 billion entries, aggressive vacuum scans will occur more often for all tables, starting with those that - have the oldest multixact-age. Both of these kinds of aggressive - scans will occur even if autovacuum is nominally disabled. The members storage - area can grow up to about 20GB before reaching wraparound. + have the oldest multixact-age. Both of these kinds of aggressive + scans will occur even if autovacuum is nominally disabled. The members can grow + up to approximately 4 billion entries before reaching wraparound. + + + + The pg_get_multixact_stats() function, described in + , provides a way to monitor + multixact allocation and usage patterns in real time. For example: + +postgres=# SELECT num_mxids,num_members,oldest_multixact,oldest_offset FROM pg_catalog.pg_get_multixact_stats(); + num_mxids | num_members | oldest_multixact | oldest_offset +-----------+-------------+------------------+--------------- + 99883849 | 773468747 | 39974368 | 351952978 +(1 row) + + This output shows a system with significant multixact activity: about ~100 million + multixact IDs and ~773 million member entries have been created since the oldest + surviving multixact (ID 39974368). By leveraging this information, the function helps: + + + + Identify unusual multixact activity from concurrent row-level locks + or foreign key operations. For example, a spike in num_mxids might indicate + multiple sessions running UPDATE statements with foreign key checks, + concurrent SELECT FOR SHARE operations, or frequent use of savepoints + causing lock contention. + + + + + Track multixact cleanup efficiency by monitoring oldest_multixact. + If this value remains unchanged while num_members grows, it could indicate + that long-running transactions are preventing cleanup, or autovacuum is + not keeping up with the workload. + + + + + Detect potential performance impacts before they become critical. + For instance, high multixact usage from frequent row-level locking or + foreign key operations can lead to increased I/O and CPU overhead during + vacuum operations. Monitoring these stats helps tune autovacuum frequency + and transaction patterns. + + + diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 886740d2d55..5fb7c12fdce 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -2857,31 +2857,44 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result) } /* - * Determine how many multixacts, and how many multixact members, currently - * exist. Return false if unable to determine. + * GetMultiXactInfo + * + * Returns information about current MultiXact state: + * - multixacts: Number of MultiXacts (nextMultiXactId - oldestMultiXactId) + * - members: Number of member entries (nextOffset - oldestOffset) + * - oldestMultiXactId: Oldest MultiXact ID still in use + * - oldestOffset: Oldest offset still in use + * + * Returns false if the oldest offset is not known, in which case all output + * parameters are set to 0/invalid values for consistency. */ -static bool -ReadMultiXactCounts(uint32 *multixacts, MultiXactOffset *members) +bool +GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members, + MultiXactId *oldestMultiXactId, MultiXactOffset *oldestOffset) { - MultiXactOffset nextOffset; - MultiXactOffset oldestOffset; - MultiXactId oldestMultiXactId; - MultiXactId nextMultiXactId; - bool oldestOffsetKnown; + MultiXactOffset nextOffset; + MultiXactId nextMultiXactId; + bool oldestOffsetKnown; LWLockAcquire(MultiXactGenLock, LW_SHARED); nextOffset = MultiXactState->nextOffset; - oldestMultiXactId = MultiXactState->oldestMultiXactId; + *oldestMultiXactId = MultiXactState->oldestMultiXactId; nextMultiXactId = MultiXactState->nextMXact; - oldestOffset = MultiXactState->oldestOffset; + *oldestOffset = MultiXactState->oldestOffset; oldestOffsetKnown = MultiXactState->oldestOffsetKnown; LWLockRelease(MultiXactGenLock); if (!oldestOffsetKnown) + { + *members = 0; + *multixacts = 0; + *oldestMultiXactId = InvalidMultiXactId; + *oldestOffset = 0; return false; + } - *members = nextOffset - oldestOffset; - *multixacts = nextMultiXactId - oldestMultiXactId; + *members = nextOffset - *oldestOffset; + *multixacts = nextMultiXactId - *oldestMultiXactId; return true; } @@ -2920,9 +2933,11 @@ MultiXactMemberFreezeThreshold(void) uint32 victim_multixacts; double fraction; int result; + MultiXactId oldestMultiXactId; + MultiXactOffset oldestOffset; /* If we can't determine member space utilization, assume the worst. */ - if (!ReadMultiXactCounts(&multixacts, &members)) + if (!GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset)) return 0; /* If member space utilization is low, no special action is required. */ diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c index e74ea938348..ba9e8313ab4 100644 --- a/src/backend/utils/adt/multixactfuncs.c +++ b/src/backend/utils/adt/multixactfuncs.c @@ -85,3 +85,55 @@ pg_get_multixact_members(PG_FUNCTION_ARGS) SRF_RETURN_DONE(funccxt); } + +/* + * pg_get_multixact_stats + * + * Returns statistics about current MultiXact usage: + * - num_mxids: Number of MultiXact IDs in use + * - num_members: Total number of member entries + * - oldest_multixact: Oldest MultiXact ID still needed + * - oldest_offset: Oldest offset still in use + * + * Returns a row of NULLs if the MultiXact system is not yet initialized. + */ +Datum +pg_get_multixact_stats(PG_FUNCTION_ARGS) +{ + TupleDesc tupdesc; + Datum values[5]; + bool nulls[5] = {true, true, true, true, true}; + MultiXactOffset members; + MultiXactId oldestMultiXactId; + uint32 multixacts; + MultiXactOffset oldestOffset; + int64 membersBytes; + HeapTuple tuple; + + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("return type must be a row type"))); + + if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset)) + { + /* + * Calculate approximate storage space: + * - Members are stored in groups of 4 + * - Each group takes 20 bytes (5 bytes per member) + * Note: This ignores small page overhead (12 bytes per 8KB) + */ + membersBytes = (int64) members * 5; + + values[0] = Int32GetDatum(multixacts); + values[1] = Int64GetDatum(members); + values[2] = Int64GetDatum(membersBytes); + values[3] = UInt32GetDatum(oldestMultiXactId); + values[4] = Int64GetDatum(oldestOffset); + nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false; + } + + tuple = heap_form_tuple(tupdesc, values, nulls); + + PG_RETURN_DATUM(HeapTupleGetDatum(tuple)); +} diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h index 6607b645a18..19de74950cb 100644 --- a/src/include/access/multixact.h +++ b/src/include/access/multixact.h @@ -159,5 +159,6 @@ extern const char *multixact_identify(uint8 info); extern char *mxid_to_string(MultiXactId multi, int nmembers, MultiXactMember *members); extern char *mxstatus_to_string(MultiXactStatus status); +extern bool GetMultiXactInfo(uint32 *multixacts, MultiXactOffset *members, MultiXactId *oldestMultiXactId, MultiXactOffset *oldestOffset); #endif /* MULTIXACT_H */ diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 118d6da1ace..985364e0cd6 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12576,4 +12576,19 @@ proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}', prosrc => 'pg_get_aios' }, +# Get MultiXact state +{ + oid => '9001', + descr => 'get current multixact member and multixact ID counts and oldest values', + proname => 'pg_get_multixact_stats', + prorettype => 'record', + proargtypes => '', + proallargtypes => '{int4,int8,int8,xid,int8}', + proargmodes => '{o,o,o,o,o}', + proargnames => '{num_mxids,num_members,members_bytes,oldest_multixact,oldest_offset}', + provolatile => 'v', + proparallel => 's', + prosrc => 'pg_get_multixact_stats' +}, + ] diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out new file mode 100644 index 00000000000..2893c4d9f36 --- /dev/null +++ b/src/test/isolation/expected/multixact_stats.out @@ -0,0 +1,94 @@ +Parsed test spec with 3 sessions + +starting permutation: d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit +step d_begin: BEGIN; SET client_min_messages = warning; +step snap0: + CREATE TEMP TABLE snap0 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); + +step s1_begin: BEGIN; +step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; +?column? +-------- + 1 +(1 row) + +step snap1: + CREATE TEMP TABLE snap1 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); + +step s2_begin: BEGIN; +step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; +?column? +-------- + 1 +(1 row) + +step snap2: + CREATE TEMP TABLE snap2 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); + +step check_while_pinned: + SELECT r.assertion, r.ok + FROM snap0 s0 + JOIN snap1 s1 ON TRUE + JOIN snap2 s2 ON TRUE, + LATERAL unnest( + ARRAY[ + 'is_init_mxids', + 'is_init_members', + 'is_init_oldest_mxid', + 'is_init_oldest_off', + 'is_oldest_mxid_nondec_01', + 'is_oldest_mxid_nondec_12', + 'is_oldest_off_nondec_01', + 'is_oldest_off_nondec_12', + 'is_members_increased_ge1', + 'is_mxids_nondec_01', + 'is_mxids_nondec_12', + 'is_members_nondec_01', + 'is_members_nondec_12' + ], + ARRAY[ + (s2.num_mxids IS NOT NULL), + (s2.num_members IS NOT NULL), + (s2.oldest_multixact IS NOT NULL), + (s2.oldest_offset IS NOT NULL), + + (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)), + (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)), + (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)), + (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)), + + (s2.num_members >= COALESCE(s1.num_members, 0) + 1), + + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)), + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)), + (s1.num_members >= COALESCE(s0.num_members, 0)), + (s2.num_members >= COALESCE(s1.num_members, 0)) + ] + ) AS r(assertion, ok); + +assertion |ok +------------------------+-- +is_init_mxids |t +is_init_members |t +is_init_oldest_mxid |t +is_init_oldest_off |t +is_oldest_mxid_nondec_01|t +is_oldest_mxid_nondec_12|t +is_oldest_off_nondec_01 |t +is_oldest_off_nondec_12 |t +is_members_increased_ge1|t +is_mxids_nondec_01 |t +is_mxids_nondec_12 |t +is_members_nondec_01 |t +is_members_nondec_12 |t +(13 rows) + +step d_commit: COMMIT; +step s1_commit: COMMIT; +step s2_commit: COMMIT; diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index 9f1e997d81b..4d94fc94e77 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -118,3 +118,4 @@ test: serializable-parallel-2 test: serializable-parallel-3 test: matview-write-skew test: lock-nowait +test: multixact_stats diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec new file mode 100644 index 00000000000..9098b6f5c5d --- /dev/null +++ b/src/test/isolation/specs/multixact_stats.spec @@ -0,0 +1,127 @@ +# High-signal invariants for pg_get_multixact_stats() +# We create exactly one fresh MultiXact on a brand-new table. While it is pinned +# by two open transactions, we assert only invariants that background VACUUM/FREEZE +# cannot violate: +# • members increased by ≥ 1 when the second locker arrived, +# • num_mxids / num_members did not decrease vs earlier snapshots, +# • oldest_* never decreases. +# We make NO assertions after releasing locks (freezing/truncation may shrink deltas). +# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver +# transaction to narrow the window for unrelated truncation between snapshots. +# +# Terminology (global counters): +# num_mxids, num_members : “in-use” deltas derived from global horizons +# oldest_multixact, offset : oldest horizons; they move forward, never backward +# +# All assertions execute while our multixact is pinned by open txns, which protects +# the truncation horizon (VACUUM can’t advance past our pinned multi). + +setup +{ + CREATE TABLE mxq(id int PRIMARY KEY, v int); + INSERT INTO mxq VALUES (1, 42); +} + +teardown +{ + DROP TABLE mxq; +} + +# Two lockers on the SAME tuple -> one MultiXact with >= 2 members. +session "s1" +setup { SET client_min_messages = warning; SET lock_timeout = '5s'; } +step s1_begin { BEGIN; } +step s1_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; } +step s1_commit { COMMIT; } + +session "s2" +setup { SET client_min_messages = warning; SET lock_timeout = '5s'; } +step s2_begin { BEGIN; } +step s2_lock { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; } +step s2_commit { COMMIT; } + +# Driver session: keep a transaction open while we take snapshots and check. +session "driver" +step d_begin { BEGIN; SET client_min_messages = warning; } + +# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet. +step snap0 { + CREATE TEMP TABLE snap0 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); +} + +# After s1 has locked the row (still in driver xact). +step snap1 { + CREATE TEMP TABLE snap1 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); +} + +# After s2 joins on the SAME tuple -> multixact with >= 2 members (still in driver xact). +step snap2 { + CREATE TEMP TABLE snap2 AS + SELECT num_mxids, num_members, oldest_multixact, oldest_offset + FROM pg_get_multixact_stats(); +} + +# Pretty, deterministic key/value output of boolean checks. +# Keys: +# is_init_mxids : num_mxids is non-NULL +# is_init_members : num_members is non-NULL +# is_init_oldest_mxid : oldest_multixact is non-NULL +# is_init_oldest_off : oldest_offset is non-NULL +# is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1) +# is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2) +# is_oldest_off_nondec_01 : oldest_offset did not decrease (snap0→snap1) +# is_oldest_off_nondec_12 : oldest_offset did not decrease (snap1→snap2) +# is_members_increased_ge1 : members increased by at least 1 when s2 joined +# is_mxids_nondec_01 : num_mxids did not decrease (snap0→snap1) +# is_mxids_nondec_12 : num_mxids did not decrease (snap1→snap2) +# is_members_nondec_01 : num_members did not decrease (snap0→snap1) +# is_members_nondec_12 : num_members did not decrease (snap1→snap2) +step check_while_pinned { + SELECT r.assertion, r.ok + FROM snap0 s0 + JOIN snap1 s1 ON TRUE + JOIN snap2 s2 ON TRUE, + LATERAL unnest( + ARRAY[ + 'is_init_mxids', + 'is_init_members', + 'is_init_oldest_mxid', + 'is_init_oldest_off', + 'is_oldest_mxid_nondec_01', + 'is_oldest_mxid_nondec_12', + 'is_oldest_off_nondec_01', + 'is_oldest_off_nondec_12', + 'is_members_increased_ge1', + 'is_mxids_nondec_01', + 'is_mxids_nondec_12', + 'is_members_nondec_01', + 'is_members_nondec_12' + ], + ARRAY[ + (s2.num_mxids IS NOT NULL), + (s2.num_members IS NOT NULL), + (s2.oldest_multixact IS NOT NULL), + (s2.oldest_offset IS NOT NULL), + + (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)), + (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)), + (s1.oldest_offset >= COALESCE(s0.oldest_offset, 0)), + (s2.oldest_offset >= COALESCE(s1.oldest_offset, 0)), + + (s2.num_members >= COALESCE(s1.num_members, 0) + 1), + + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)), + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)), + (s1.num_members >= COALESCE(s0.num_members, 0)), + (s2.num_members >= COALESCE(s1.num_members, 0)) + ] + ) AS r(assertion, ok); +} + +step d_commit { COMMIT; } + +permutation d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit -- 2.47.3