public inbox for [email protected]  
help / color / mirror / Atom feed
From: Naga Appani <[email protected]>
To: Ashutosh Bapat <[email protected]>
Cc: torikoshia <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: [email protected]
Subject: Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
Date: Wed, 3 Sep 2025 16:10:56 -0500
Message-ID: <CA+QeY+DLsjj4RxY-Z2wRC-sYCeFxuPSd8zSkt4Nvs==YV6GP6w@mail.gmail.com> (raw)
In-Reply-To: <CAExHW5sUU1+Vq9oG6x=u85W7Obr-dPfAc7rQtaN-LJw5uufuMA@mail.gmail.com>
References: <CALdSSPi3Gh08NtcCn44uVeUAYGOT74sU6uei_06qUTa5rMK43g@mail.gmail.com>
	<CA+QeY+DTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg@mail.gmail.com>
	<CAExHW5uxBnodmpXygUQUxZAyxYvbXczadxGOVeL=0DxpkXUR-g@mail.gmail.com>
	<[email protected]>
	<CAExHW5vba_ecL2c27giAOOaRxoNdVX2BNQVCN4PFOZJC17NrqQ@mail.gmail.com>
	<CA+QeY+CFLjh0gb-j9p+0eJK=r19izhEPCBY3ns4+HCgOQojh8Q@mail.gmail.com>
	<CAExHW5s5hnxKK_VE=q94sN8uU5BJCRpAescGuamheuM-Ua559A@mail.gmail.com>
	<CA+QeY+C3tc12wWX56yHozLYK=MmCkn1fODzFWC6-dSY+TZghTQ@mail.gmail.com>
	<CAExHW5s-StFwDpZBeiikvdnrBBr0s+2jJTZRv3Hh914T4C6Euw@mail.gmail.com>
	<CA+QeY+AmRSSJU8GKJ1uNVe2ps8GTONXpioYR7gEz91_prKR6sw@mail.gmail.com>
	<[email protected]>
	<CA+QeY+AwWV=g8z1eDE7OGHxKBer18gVER7nD+SnfhqLmiL7NOA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAExHW5sUU1+Vq9oG6x=u85W7Obr-dPfAc7rQtaN-LJw5uufuMA@mail.gmail.com>

Hi Atsushi and Ashutosh,

Thank you for reviewing the patch. Attached is v7, incorporating the
feedback. Please see my responses in-line below.

On Fri, Aug 22, 2025 at 6:45 AM Ashutosh Bapat
<[email protected]> wrote:
>
> On Fri, Aug 22, 2025 at 7:37 AM torikoshia <[email protected]> wrote:
> >
> > | Now that pg_get_multixact_stats() exposes num_members, the HEAD branch
> > | docs can describe the thresholds in terms of counts directly.
> >
> > Personally, I think it might be fine to keep the gigabyte-based
> > description, and perhaps we could show both the number of members and
> > gigabytes, since it'd be also helpful to have a sense of the scale.
> >
>
> Those who have grown their own utilities to monitor the on-disk usage
> will not be able to use the count based thresholds and might take some
> time for them to starting using pg_get_multixact_stats(). It makes
> sense to mention both the count and the corresponding disk usage
> threshold. Something like "Also, if the number of multixact members
> exceeds approximately 2^31 entries (occupying roughly more than 10GB
> in storage) ... ". Users can choose which threshold they want to use.
> Adding disk storage threshold in parenthesis indicates that the count
> is more accurate and more useful.
Updated docs to include both counts and approximate storage.


> I thought mentioning bytes, a unit, in column name members_bytes would
> not be appropriate in case we start reporting it in a different unit
> like kB in future. But we already have
> pg_stat_replication_slots::spill_bytes with similar naming. So may be
> it's okay. But I would prefer members_size or members_storage or some
> such units-free name.
Good point! Adjusted to a units-free name: members_size.

> + in <literal>pg_multixact/members</literal> directory,
> + <literal>oldest_multixact</literal> is the oldest multixact ID still
> in use, and
> + <literal>oldest_offset</literal> is the oldest member offset still in use.
>
> I am not sure whether oldest_offset is worth exposing. It is an
> implementation detail. Upthread, Michael suggested to expose oldest
> offset from GetMultiXactInfo(), but I don't see him explicitly saying
> that we should expose it through this function as well. Michael what
> do you think?
IMHO, exposing oldest_offset gives a full picture of multixact state.
It complements oldest_multixact,
and including it won’t hurt. That said, if consensus is against it,
I’m happy to drop it.

> + These values can be used to monitor multixact consumption and anticipate
> + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
> + for further details on multixact wraparound.
>
> I still think that this is not needed. There is no reason to restrict
> how users want to use this function. We usually don't do that unless
> there is a hazard associated with it.
Removed.

> + <para>
> + This is a live snapshot of shared counters; the numbers can change
> between calls,
> + even within the same transaction.
> + </para></entry>
>
> I have not seen the phrase "live snapshot" being used in the
> documentation before. How about "The function reports the statistics
> at the time of invoking the function. They may vary between calls even
> within the same transaction."?
Updated wording.

> + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
> + of members created exceeds approximately 2^31 entries, aggressive vacuum
>
> a member means the transaction participating in a multixact. What you
> intend to say is "if the number of multixacts member entries created
> ...", right?
Correct, updated.

> + <para>
> + The <function>pg_get_multixact_stats()</function> function, described in
>
> unnecessary pair of commas.
Fixed.

> + This output shows a system with significant multixact activity:
> about ~100 million
> + multixact IDs and ~773 million member entries have been created
> since the oldest
> + surviving multixact (ID 39974368). By leveraging this information,
> the function helps:
> + <orderedlist>
> + <listitem>
> ... snip ...
> + Detect potential performance impacts before they become critical.
> + For instance, high multixact usage from frequent row-level locking or
> + foreign key operations can lead to increased I/O and CPU overhead during
> + vacuum operations. Monitoring these stats helps tune autovacuum frequency
> + and transaction patterns.
> + </simpara>
> + </listitem>
> + </orderedlist>
>
> I am unsure whether we should be mentioning use cases in such detail.
> Users may find other ways to use those counts. I think the following
> paragraph should be placed here.
>
> + These values can be used to monitor multixact consumption and anticipate
> + autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
> + for further details on multixact wraparound.
>
>  But others may have different opinions.
>
> Maybe you could further write in your example that an aggressive
> autovacuum will be triggered in another 10 seconds (or so) if the
> number of member entries continues to double every 5 seconds. Or some
> practical "usage example" like that.
Ack. I believe keeping the example with a short list is helpful for
users to navigate
and interpret the stats. If preferred, I can trim it to a brief
paragraph with just the query in
the next rev.

> + * Returns statistics about current MultiXact usage:
> + * - num_mxids: Number of MultiXact IDs in use
> + * - num_members: Total number of member entries
> + * - oldest_multixact: Oldest MultiXact ID still needed
> + * - oldest_offset: Oldest offset still in use
>
> We don't need to mention each column here, it's evident from the
> function body and also from the user facing documentation. Just the
> first line is ok.
Updated - kept only the high-level description.

> + *
> + * Returns a row of NULLs if the MultiXact system is not yet initialized.
>
> tuple or record instead of row.
>
> In the earlier patch you were calling PG_RETURN_NULL(), which I
> thought was better. It would get converted into a record of NULLs if
> someone is to do SELECT * FROM pg_get_multixact_stats().
>
> I don't think "the MultiXact system is not yet initialized" is the
> right description of that condition. GetMultiXactInfo() prologue says
> "
> Returns false if unable to determine, the oldest offset being
> unknown." MultiXactStatData has following comment for oldest offset.
> /*
> * Oldest multixact offset that is potentially referenced by a multixact
> * referenced by a relation. We don't always know this value, so there's
> * a flag here to indicate whether or not we currently do.
> */
Switched to PG_RETURN_NULL() and rephrased both code comment and docs.


> + * Calculate approximate storage space:
> + * - Members are stored in groups of 4
> + * - Each group takes 20 bytes (5 bytes per member)
> + * Note: This ignores small page overhead (12 bytes per 8KB)
> + */
> + membersBytes = (int64) members * 5;
>
> Do we have some constant macros or sizeof(some structure) defined for
> 5 and 4? That way this computation will be self maintaining and self
> documenting.
Those macros are already defined in multixact.c - for example,
MULTIXACT_MEMBERS_PER_MEMBERGROUP
and MULTIXACT_MEMBERGROUP_SIZE encode the 4-per-group and 20-byte
layout. They are local today,
and I’m not sure why they were never exposed. Rather than moving them
into a header and creating wider changes,
v7 retains the explicit 5-bytes/member estimate with an explanatory
comment to stay consistent with existing guidance.
If we feel these macros should be promoted to a header, I think that
would be best handled as a small, separate patch,
and I’d be happy to help with that.

> + nulls[0] = nulls[1] = nulls[2] = nulls[3] = nulls[4] = false;
>
> memset(nulls, false, sizeof(nulls)); is better and used everywhere.
>
> In fact, instead of initializing it all to true first and then setting
> all to false here, we should memset here and set it to true in else
> block.
Updated. v7 uses memset(false) and only sets true where needed.

> +++ b/src/test/isolation/specs/multixact_stats.spec
>
> I have not an seen an isolation test being used for testing a stats
> function. But I find it useful. Let's see what others think.
>
> @@ -0,0 +1,127 @@
> +# High-signal invariants for pg_get_multixact_stats()
> +# We create exactly one fresh MultiXact on a brand-new table. While
> it is pinned
> +# by two open transactions, we assert only invariants that background
> VACUUM/FREEZE
> +# cannot violate:
> +# • members increased by ≥ 1 when the second locker arrived,
> +# • num_mxids / num_members did not decrease vs earlier snapshots,
> +# • oldest_* never decreases.
> +# We make NO assertions after releasing locks (freezing/truncation
> may shrink deltas).
> +# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
> +# transaction to narrow the window for unrelated truncation between snapshots.
>
> What's a driver transaction?
A driver transaction is simply the controlling session that stays open
while snapshots are taken.

> +#
> +# Terminology (global counters):
> +# num_mxids, num_members : “in-use” deltas derived from global horizons
> +# oldest_multixact, offset : oldest horizons; they move forward, never backward
> +#
> +# All assertions execute while our multixact is pinned by open txns,
> which protects
> +# the truncation horizon (VACUUM can’t advance past our pinned multi).
>
> Probably this comment is not needed. But from the sequence of steps
> executed, the data is collected when multixact is pinned (what does
> that mean?) but the assertions are executed at the end when all the
> transactions are committed. Am I correct?
You are correct — the assertions are executed at the end, after the commits.
The key point is that all snapshots are taken while the multixact is
pinned by open transactions,
so the invariants hold despite the final check happening later.

> +step snap0 {
> + CREATE TEMP TABLE snap0 AS
> + SELECT num_mxids, num_members, oldest_multixact, oldest_offset
> + FROM pg_get_multixact_stats();
> +}
>
> You could use a single table with a primary key column to distinguish
> snaps which can be used for joining the rows. Why use a temporary
> table? Just setup and tear down the snap table as well?
I kept separate temp tables to keep each snapshot isolated and easy to
read in the spec.
A single table with a PK would work too, but I felt temp tables made
the sequence clearer.

> +
> +# Pretty, deterministic key/value output of boolean checks.
> +# Keys:
> ... snip ...
> + (s1.num_mxids >= COALESCE(s0.num_mxids, 0)),
> + (s2.num_mxids >= COALESCE(s1.num_mxids, 0)),
> + (s1.num_members >= COALESCE(s0.num_members, 0)),
> + (s2.num_members >= COALESCE(s1.num_members, 0))
> + ]
>
> This is getting too complex to follow. It produces pretty output but
> the query is complex. Instead just let keys as the columns in the
> query. Maybe you could print expanded output if that's possible in an
> isolation test.
>
I used the labeled key/value array to mimic \x-style readability while
keeping the output
deterministic for isolation’s text diffs. It clearly names each
invariant and avoids
formatter-dependent width/spacing.

Thanks again for the thoughtful reviews. I really appreciate the
guidance and will be glad to adjust
further if needed.

Best regards,
Naga


Attachments:

  [application/octet-stream] v7-0001-Add-pg_get_multixact_stats-function-for-monitorin.patch (18.9K, 2-v7-0001-Add-pg_get_multixact_stats-function-for-monitorin.patch)
  download | inline diff:
From 79dc6ffd50ad0a2dc84eeb2da3353d823b0ced91 Mon Sep 17 00:00:00 2001
From: Naga Appani <[email protected]>
Date: Wed, 3 Sep 2025 04:09:56 +0000
Subject: [PATCH v7] Add pg_get_multixact_stats() function for monitoring
 MultiXact usage

Expose multixact state via a new SQL-callable function pg_get_multixact_stats(),
returning:
- num_mxids          : number of MultiXact IDs in use
- num_members        : number of member entries in use
- members_size       : bytes used by num_members in pg_multixact/members directory
- oldest_multixact   : oldest MultiXact ID still needed
- oldest_offset      : oldest member offset still in use

This patch adds pg_get_multixact_stats() function
   - SQL-callable interface to GetMultiXactInfo()
   - Returns NULLs if MultiXact system not initialized
   - Includes isolation tests for monitoring invariants

Documentation updates:
- func-info.sgml: add function entry
- maintenance.sgml: mention monitoring multixact usage

Build and catalog:
- Add function to existing multixactfuncs.c
- pg_proc.dat entry

Author: Naga Appani <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CA%2BQeY%2BAAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg%40mail.gmail.com
---
 doc/src/sgml/func/func-info.sgml              |  34 +++++
 doc/src/sgml/maintenance.sgml                 |  57 +++++++-
 src/backend/utils/adt/multixactfuncs.c        |  47 +++++++
 src/include/catalog/pg_proc.dat               |  15 +++
 .../isolation/expected/multixact_stats.out    |  94 +++++++++++++
 src/test/isolation/isolation_schedule         |   1 +
 src/test/isolation/specs/multixact_stats.spec | 127 ++++++++++++++++++
 7 files changed, 369 insertions(+), 6 deletions(-)
 create mode 100644 src/test/isolation/expected/multixact_stats.out
 create mode 100644 src/test/isolation/specs/multixact_stats.spec

diff --git a/doc/src/sgml/func/func-info.sgml b/doc/src/sgml/func/func-info.sgml
index c393832d94c..fcc5e4bde38 100644
--- a/doc/src/sgml/func/func-info.sgml
+++ b/doc/src/sgml/func/func-info.sgml
@@ -2975,6 +2975,40 @@ acl      | {postgres=arwdDxtm/postgres,foo=r/postgres}
         modify key columns.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_get_multixact_stats</primary>
+        </indexterm>
+        <function>pg_get_multixact_stats</function> ()
+        <returnvalue>record</returnvalue>
+        ( <parameter>num_mxids</parameter> <type>integer</type>,
+          <parameter>num_members</parameter> <type>bigint</type>,
+          <parameter>members_size</parameter> <type>bigint</type>,
+          <parameter>oldest_multixact</parameter> <type>xid</type>,
+          <parameter>oldest_offset</parameter> <type>bigint</type> )
+       </para>
+       <para>
+        Returns statistics about current multixact usage:
+        <literal>num_mxids</literal> is the number of multixact IDs assigned,
+        <literal>num_members</literal> is the number of multixact member entries created,
+        <literal>members_size</literal> is the storage occupied by <literal>num_members</literal>
+        in <literal>pg_multixact/members</literal> directory,
+        <literal>oldest_multixact</literal> is the oldest multixact ID still in use, and
+        <literal>oldest_offset</literal> is the oldest member offset still in use.
+        See <xref linkend="vacuum-for-multixact-wraparound"/> for further details on multixact wraparound.
+       </para>
+       <para>
+        The function reports statistics at the time it is invoked. Values may vary between calls,
+        even within a single transaction.
+       </para>
+       <para>
+        The function returns <literal>NULL</literal> when multixact statistics are unavailable.
+        For example, during startup before multixact initialization completes or when
+        the oldest member offset cannot be determined.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index e7a9f58c015..8b265e2348f 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -813,14 +813,59 @@ HINT:  Execute a database-wide VACUUM in that database.
     <para>
      As a safety device, an aggressive vacuum scan will
      occur for any table whose multixact-age is greater than <xref
-     linkend="guc-autovacuum-multixact-freeze-max-age"/>.  Also, if the
-     storage occupied by multixacts members exceeds about 10GB, aggressive vacuum
-     scans will occur more often for all tables, starting with those that
-     have the oldest multixact-age.  Both of these kinds of aggressive
-     scans will occur even if autovacuum is nominally disabled. The members storage
-     area can grow up to about 20GB before reaching wraparound.
+     linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the number
+     of multixact member entries created exceeds approximately 2^31 entries
+     (occupying roughly 10GB in the <literal>pg_multixact/members</literal> directory),
+     aggressive vacuum scans will occur more often for all tables, starting with those that
+     have the oldest multixact-age. Both of these kinds of aggressive
+     scans will occur even if autovacuum is nominally disabled. The members can grow
+     up to approximately 2^32 entries before reaching wraparound.
     </para>
 
+    <para>
+     The <function>pg_get_multixact_stats()</function> function described in
+     <xref linkend="functions-pg-snapshot"/> provides a way to monitor
+     multixact allocation and usage patterns in real time. For example:
+     <programlisting>
+postgres=# SELECT *,pg_size_pretty(members_size) members_size_pretty FROM pg_catalog.pg_get_multixact_stats();
+ num_mxids | num_members | members_size | oldest_multixact | oldest_offset | members_size_pretty
+-----------+-------------+--------------+------------------+---------------+---------------------
+ 311740299 |  2785241176 |  13926205880 |                2 |             3 | 13 GB
+(1 row)
+     </programlisting>
+     This output shows a system with significant multixact activity: about ~312 million
+     multixact IDs and ~2.8 billion member entries consuming 13 GB of storage space.
+     By leveraging this information, the function helps:
+     <orderedlist>
+      <listitem>
+       <simpara>
+        Identify unusual multixact activity from concurrent row-level locks
+        or foreign key operations. For example, a spike in <literal>num_mxids</literal> might indicate
+        multiple sessions running <literal>UPDATE</literal> statements with foreign key checks,
+        concurrent <literal>SELECT FOR SHARE</literal> operations, or frequent use of savepoints
+        causing lock contention.
+       </simpara>
+      </listitem>
+      <listitem>
+       <simpara>
+        Track multixact cleanup efficiency by monitoring oldest_multixact.
+        If this value remains unchanged while <literal>num_members</literal> grows, it could indicate
+        that long-running transactions are preventing cleanup, or autovacuum is
+        not keeping up with the workload.
+       </simpara>
+      </listitem>
+      <listitem>
+       <simpara>
+        Detect potential performance impacts before they become critical.
+        For instance, high multixact usage from frequent row-level locking or
+        foreign key operations can lead to increased I/O and CPU overhead during
+        vacuum operations. Monitoring these stats helps tune autovacuum frequency
+        and transaction patterns.
+       </simpara>
+      </listitem>
+     </orderedlist>
+     </para>
+
     <para>
      Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
      system will begin to emit warning messages when the database's oldest MXIDs reach forty
diff --git a/src/backend/utils/adt/multixactfuncs.c b/src/backend/utils/adt/multixactfuncs.c
index e74ea938348..c393f11a38c 100644
--- a/src/backend/utils/adt/multixactfuncs.c
+++ b/src/backend/utils/adt/multixactfuncs.c
@@ -85,3 +85,50 @@ pg_get_multixact_members(PG_FUNCTION_ARGS)
 
 	SRF_RETURN_DONE(funccxt);
 }
+
+/*
+ * pg_get_multixact_stats
+ *
+ * Returns statistics about current MultiXact usage.
+ *
+ * Returns NULL if the oldest referenced offset is unknown, which happens during
+ * system startup or when no MultiXact references exist in any relation.
+ */
+Datum
+pg_get_multixact_stats(PG_FUNCTION_ARGS)
+{
+	TupleDesc	tupdesc;
+	Datum		values[5];
+	bool		nulls[5];
+	MultiXactOffset	members;
+	MultiXactId	oldestMultiXactId;
+	uint32		multixacts;
+	MultiXactOffset	oldestOffset;
+	int64		membersBytes;
+
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("return type must be a row type")));
+
+	if (GetMultiXactInfo(&multixacts, &members, &oldestMultiXactId, &oldestOffset))
+	{
+		/*
+		 * Calculate storage space for members. Members are stored in groups of 4,
+		 * with each group taking 20 bytes, resulting in 5 bytes per member.
+		 * Note: This ignores small page overhead (12 bytes per 8KB)
+		 */
+		membersBytes = (int64) members * 5;
+
+		values[0] = Int32GetDatum(multixacts);
+		values[1] = Int64GetDatum(members);
+		values[2] = Int64GetDatum(membersBytes);
+		values[3] = UInt32GetDatum(oldestMultiXactId);
+		values[4] = Int64GetDatum(oldestOffset);
+		memset(nulls, false, sizeof(nulls));
+
+		return HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls));
+	}
+
+	PG_RETURN_NULL();
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 118d6da1ace..f5f224b8a14 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12576,4 +12576,19 @@
   proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
   prosrc => 'pg_get_aios' },
 
+# Get MultiXact state
+{
+  oid => '9001',
+  descr => 'get current multixact member and multixact ID counts and oldest values',
+  proname => 'pg_get_multixact_stats',
+  prorettype => 'record',
+  proargtypes => '',
+  proallargtypes => '{int4,int8,int8,xid,int8}',
+  proargmodes => '{o,o,o,o,o}',
+  proargnames => '{num_mxids,num_members,members_size,oldest_multixact,oldest_offset}',
+  provolatile => 'v',
+  proparallel => 's',
+  prosrc => 'pg_get_multixact_stats'
+},
+
 ]
diff --git a/src/test/isolation/expected/multixact_stats.out b/src/test/isolation/expected/multixact_stats.out
new file mode 100644
index 00000000000..2893c4d9f36
--- /dev/null
+++ b/src/test/isolation/expected/multixact_stats.out
@@ -0,0 +1,94 @@
+Parsed test spec with 3 sessions
+
+starting permutation: d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
+step d_begin: BEGIN; SET client_min_messages = warning;
+step snap0: 
+  CREATE TEMP TABLE snap0 AS
+  SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+  FROM pg_get_multixact_stats();
+
+step s1_begin: BEGIN;
+step s1_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+       1
+(1 row)
+
+step snap1: 
+  CREATE TEMP TABLE snap1 AS
+  SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+  FROM pg_get_multixact_stats();
+
+step s2_begin: BEGIN;
+step s2_lock: SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE;
+?column?
+--------
+       1
+(1 row)
+
+step snap2: 
+  CREATE TEMP TABLE snap2 AS
+  SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+  FROM pg_get_multixact_stats();
+
+step check_while_pinned: 
+  SELECT r.assertion, r.ok
+  FROM snap0 s0
+  JOIN snap1 s1 ON TRUE
+  JOIN snap2 s2 ON TRUE,
+  LATERAL unnest(
+    ARRAY[
+      'is_init_mxids',
+      'is_init_members',
+      'is_init_oldest_mxid',
+      'is_init_oldest_off',
+      'is_oldest_mxid_nondec_01',
+      'is_oldest_mxid_nondec_12',
+      'is_oldest_off_nondec_01',
+      'is_oldest_off_nondec_12',
+      'is_members_increased_ge1',
+      'is_mxids_nondec_01',
+      'is_mxids_nondec_12',
+      'is_members_nondec_01',
+      'is_members_nondec_12'
+    ],
+    ARRAY[
+      (s2.num_mxids        IS NOT NULL),
+      (s2.num_members      IS NOT NULL),
+      (s2.oldest_multixact IS NOT NULL),
+      (s2.oldest_offset    IS NOT NULL),
+
+      (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+      (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+      (s1.oldest_offset    >= COALESCE(s0.oldest_offset,    0)),
+      (s2.oldest_offset    >= COALESCE(s1.oldest_offset,    0)),
+
+      (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+      (s1.num_mxids   >= COALESCE(s0.num_mxids,   0)),
+      (s2.num_mxids   >= COALESCE(s1.num_mxids,   0)),
+      (s1.num_members >= COALESCE(s0.num_members, 0)),
+      (s2.num_members >= COALESCE(s1.num_members, 0))
+    ]
+  ) AS r(assertion, ok);
+
+assertion               |ok
+------------------------+--
+is_init_mxids           |t 
+is_init_members         |t 
+is_init_oldest_mxid     |t 
+is_init_oldest_off      |t 
+is_oldest_mxid_nondec_01|t 
+is_oldest_mxid_nondec_12|t 
+is_oldest_off_nondec_01 |t 
+is_oldest_off_nondec_12 |t 
+is_members_increased_ge1|t 
+is_mxids_nondec_01      |t 
+is_mxids_nondec_12      |t 
+is_members_nondec_01    |t 
+is_members_nondec_12    |t 
+(13 rows)
+
+step d_commit: COMMIT;
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 9f1e997d81b..4d94fc94e77 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -118,3 +118,4 @@ test: serializable-parallel-2
 test: serializable-parallel-3
 test: matview-write-skew
 test: lock-nowait
+test: multixact_stats
diff --git a/src/test/isolation/specs/multixact_stats.spec b/src/test/isolation/specs/multixact_stats.spec
new file mode 100644
index 00000000000..9098b6f5c5d
--- /dev/null
+++ b/src/test/isolation/specs/multixact_stats.spec
@@ -0,0 +1,127 @@
+# High-signal invariants for pg_get_multixact_stats()
+# We create exactly one fresh MultiXact on a brand-new table. While it is pinned
+# by two open transactions, we assert only invariants that background VACUUM/FREEZE
+# cannot violate:
+#   • members increased by ≥ 1 when the second locker arrived,
+#   • num_mxids / num_members did not decrease vs earlier snapshots,
+#   • oldest_* never decreases.
+# We make NO assertions after releasing locks (freezing/truncation may shrink deltas).
+# NOTE: Snapshots snap0 and subsequent checks are taken inside an open driver
+# transaction to narrow the window for unrelated truncation between snapshots.
+#
+# Terminology (global counters):
+#   num_mxids, num_members    : “in-use” deltas derived from global horizons
+#   oldest_multixact, offset  : oldest horizons; they move forward, never backward
+#
+# All assertions execute while our multixact is pinned by open txns, which protects
+# the truncation horizon (VACUUM can’t advance past our pinned multi).
+
+setup
+{
+    CREATE TABLE mxq(id int PRIMARY KEY, v int);
+    INSERT INTO mxq VALUES (1, 42);
+}
+
+teardown
+{
+    DROP TABLE mxq;
+}
+
+# Two lockers on the SAME tuple -> one MultiXact with >= 2 members.
+session "s1"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s1_begin  { BEGIN; }
+step s1_lock   { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s1_commit { COMMIT; }
+
+session "s2"
+setup { SET client_min_messages = warning; SET lock_timeout = '5s'; }
+step s2_begin  { BEGIN; }
+step s2_lock   { SELECT 1 FROM mxq WHERE id=1 FOR KEY SHARE; }
+step s2_commit { COMMIT; }
+
+# Driver session: keep a transaction open while we take snapshots and check.
+session "driver"
+step d_begin { BEGIN; SET client_min_messages = warning; }
+
+# Baseline BEFORE any locking; may be NULLs if multixact isn't initialized yet.
+step snap0 {
+  CREATE TEMP TABLE snap0 AS
+  SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+  FROM pg_get_multixact_stats();
+}
+
+# After s1 has locked the row (still in driver xact).
+step snap1 {
+  CREATE TEMP TABLE snap1 AS
+  SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+  FROM pg_get_multixact_stats();
+}
+
+# After s2 joins on the SAME tuple -> multixact with >= 2 members (still in driver xact).
+step snap2 {
+  CREATE TEMP TABLE snap2 AS
+  SELECT num_mxids, num_members, oldest_multixact, oldest_offset
+  FROM pg_get_multixact_stats();
+}
+
+# Pretty, deterministic key/value output of boolean checks.
+# Keys:
+#   is_init_mxids            : num_mxids is non-NULL
+#   is_init_members          : num_members is non-NULL
+#   is_init_oldest_mxid      : oldest_multixact is non-NULL
+#   is_init_oldest_off       : oldest_offset is non-NULL
+#   is_oldest_mxid_nondec_01 : oldest_multixact did not decrease (snap0→snap1)
+#   is_oldest_mxid_nondec_12 : oldest_multixact did not decrease (snap1→snap2)
+#   is_oldest_off_nondec_01  : oldest_offset did not decrease (snap0→snap1)
+#   is_oldest_off_nondec_12  : oldest_offset did not decrease (snap1→snap2)
+#   is_members_increased_ge1 : members increased by at least 1 when s2 joined
+#   is_mxids_nondec_01       : num_mxids did not decrease (snap0→snap1)
+#   is_mxids_nondec_12       : num_mxids did not decrease (snap1→snap2)
+#   is_members_nondec_01     : num_members did not decrease (snap0→snap1)
+#   is_members_nondec_12     : num_members did not decrease (snap1→snap2)
+step check_while_pinned {
+  SELECT r.assertion, r.ok
+  FROM snap0 s0
+  JOIN snap1 s1 ON TRUE
+  JOIN snap2 s2 ON TRUE,
+  LATERAL unnest(
+    ARRAY[
+      'is_init_mxids',
+      'is_init_members',
+      'is_init_oldest_mxid',
+      'is_init_oldest_off',
+      'is_oldest_mxid_nondec_01',
+      'is_oldest_mxid_nondec_12',
+      'is_oldest_off_nondec_01',
+      'is_oldest_off_nondec_12',
+      'is_members_increased_ge1',
+      'is_mxids_nondec_01',
+      'is_mxids_nondec_12',
+      'is_members_nondec_01',
+      'is_members_nondec_12'
+    ],
+    ARRAY[
+      (s2.num_mxids        IS NOT NULL),
+      (s2.num_members      IS NOT NULL),
+      (s2.oldest_multixact IS NOT NULL),
+      (s2.oldest_offset    IS NOT NULL),
+
+      (s1.oldest_multixact::text::bigint >= COALESCE(s0.oldest_multixact::text::bigint, 0)),
+      (s2.oldest_multixact::text::bigint >= COALESCE(s1.oldest_multixact::text::bigint, 0)),
+      (s1.oldest_offset    >= COALESCE(s0.oldest_offset,    0)),
+      (s2.oldest_offset    >= COALESCE(s1.oldest_offset,    0)),
+
+      (s2.num_members >= COALESCE(s1.num_members, 0) + 1),
+
+      (s1.num_mxids   >= COALESCE(s0.num_mxids,   0)),
+      (s2.num_mxids   >= COALESCE(s1.num_mxids,   0)),
+      (s1.num_members >= COALESCE(s0.num_members, 0)),
+      (s2.num_members >= COALESCE(s1.num_members, 0))
+    ]
+  ) AS r(assertion, ok);
+}
+
+step d_commit { COMMIT; }
+
+permutation d_begin snap0 s1_begin s1_lock snap1 s2_begin s2_lock snap2 check_while_pinned d_commit s1_commit s2_commit
-- 
2.47.3



view thread (42+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
  In-Reply-To: <CA+QeY+DLsjj4RxY-Z2wRC-sYCeFxuPSd8zSkt4Nvs==YV6GP6w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox