public inbox for [email protected]  
help / color / mirror / Atom feed
 [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
4+ messages / 3 participants
[nested] [flat]

*  [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
@ 2026-03-27 00:29  SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-03-27 00:29 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>

Hi Hackers,

I'd like to propose a new system view, pg_stat_wal_records, that exposes
per-resource-manager, per-record-type WAL generation counts.

*Sample Output:*
postgres=# SELECT * FROM pg_stat_wal_records ORDER BY count DESC LIMIT 10;
 resource_manager |  record_type   | count  |          stats_reset
------------------+----------------+--------+-------------------------------
 Heap             | INSERT         | 500000 | 2026-03-26 22:15:00.12345+00
 Transaction      | COMMIT         | 500000 |
 Btree            | INSERT_LEAF    |  53821 |
 Heap             | HOT_UPDATE     |  12744 |
 XLOG             | FPI            |   8923 |

*The Gap:*

Postgre already has pg_stat_wal for aggregate WAL volume (bytes, full-page
images, buffers), and pg_walinspect (superuser access required) for
post-hoc forensic analysis of individual WAL segments. But I don't see a
lightweight, observability tool that answers in real time which record
types are responsible for the WAL. Additionally, pg_walinspect runs against
on-disk WAL files, which is expensive. This view will be useful for
monitoring systems to poll cheaply.

*Use cases:*
WAL volume investigation: see which record types dominate WAL generation in
real time without touching disk.
Monitoring integration: Prometheus/Grafana can poll the view to track WAL
composition over time and alert on anomalies.
Replication tuning: identify whether WAL volume is dominated by data
changes, index maintenance, FPIs, or vacuum activity to guide tuning.
Extension debugging: custom WAL resource managers get visibility
automatically.

*Key design decisions*
*Counting mechanism:*
The counting mechanism is a single backend-local array increment in
XLogInsert():
pgstat_pending_wal_records[rmid][(info >> 4) & 0x0F]++;

This indexes into a uint64[256][16] array (32 KB per backend) using the
rmgr ID and the 4-bit record-type subfield of the WAL info byte. Counters
are flushed to shared memory via the standard pgstat infrastructure.
I am using per-backend pending array instead of direct shared-memory
writes. The counter is incremented in backend-local memory and flushed to
shared memory by the existing pgstat flush cycle. Don't expect to see any
contention in the hot path (please see perf results below).
Fixed 256×16 matrix. All 256 possible rmgr IDs × 16 possible record types.
This accommodates core resource managers and any custom WAL resource
managers from extensions without configuration. The 32 KB per-backend cost
is modest. Uses rm_identify() for human-readable names. The SRF calls each
resource manager's rm_identify callback to translate the info byte into a
readable record type name (for example INSERT, COMMIT, VACUUM, HOT_UPDATE).
Added the reset functionality via pg_stat_reset_shared('wal_records'),
consistent with the existing pattern for wal, bgwriter, archiver, etc.
View skips zero-count entries, keeping output clean.

*Performance overhead*
Benchmarked with pgbench (scale 50, 16 clients, 16 threads, 30s,
synchronous_commit=off) on 64 vCPU machine with data and WAL on NVMe:

*Configuration Avg TPS*
With patch 42,266
Without patch 42,053
The overhead is within measurement noise (~0.5%). The increment hits a
backend-local, L1-hot array and is dwarfed by XLogInsert's existing CRC,
locking, and memcpy work.

Attached a draft patch, please share your thoughts.


Thanks,
Satya


Attachments:

  [application/octet-stream] v1-0001-pg-stat-wal-records.patch (18.5K, 3-v1-0001-pg-stat-wal-records.patch)
  download | inline diff:
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bb75ed1..c8dd257 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -551,6 +551,15 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal_records</structname><indexterm><primary>pg_stat_wal_records</primary></indexterm></entry>
+      <entry>One row per WAL record type, showing how many WAL records of
+       each type have been generated. See
+       <link linkend="monitoring-pg-stat-wal-records-view">
+       <structname>pg_stat_wal_records</structname></link> for details.
+      </entry>
+     </row>
+
      <!-- all "stat" for schema objects, by "importance" -->
 
      <row>
@@ -3689,6 +3698,92 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
 </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-records-view">
+  <title><structname>pg_stat_wal_records</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal_records</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal_records</structname> view contains one row
+   for each WAL record type that has been generated since the last statistics
+   reset.  It provides a breakdown of WAL generation by resource manager and
+   record type, which can be used to identify which operations are
+   responsible for WAL volume.  Record types with a count of zero are omitted.
+  </para>
+
+  <para>
+   Unlike <structname>pg_stat_wal</structname>, which provides aggregate
+   totals, this view shows the composition of WAL traffic.  Unlike
+   <function>pg_get_wal_stats()</function> from
+   <xref linkend="pgwalinspect"/>, this view tracks live cumulative
+   counters without needing to read WAL files from disk.
+  </para>
+
+  <table id="pg-stat-wal-records-view" xreflabel="pg_stat_wal_records">
+   <title><structname>pg_stat_wal_records</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>resource_manager</structfield> <type>text</type>
+      </para>
+      <para>
+       Name of the WAL resource manager (e.g.
+       <literal>Heap</literal>, <literal>Btree</literal>,
+       <literal>Transaction</literal>, <literal>XLOG</literal>).
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>record_type</structfield> <type>text</type>
+      </para>
+      <para>
+       Name of the WAL record type within the resource manager (e.g.
+       <literal>INSERT</literal>, <literal>HOT_UPDATE</literal>,
+       <literal>COMMIT</literal>, <literal>INSERT_LEAF</literal>).
+       These names are provided by each resource manager's
+       <function>rm_identify</function> callback.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>count</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of WAL records of this type generated since the last
+       statistics reset.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which WAL record statistics were last reset.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+ </sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -5500,6 +5595,12 @@ description | Waiting for a newly initialized WAL file to reach durable storage
           <structname>pg_stat_wal</structname> view.
          </para>
         </listitem>
+        <listitem>
+         <para>
+          <literal>wal_records</literal>: Reset all the counters shown in the
+          <structname>pg_stat_wal_records</structname> view.
+         </para>
+        </listitem>
         <listitem>
          <para>
           <literal>NULL</literal> or not specified: All the counters from the
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index e4a819e..f2556f5 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -36,6 +36,7 @@
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "pgstat.h"
 #include "replication/origin.h"
 #include "storage/bufmgr.h"
 #include "storage/proc.h"
@@ -531,6 +532,8 @@ XLogInsert(RmgrId rmid, uint8 info)
 								  fpi_bytes, topxid_included);
 	} while (!XLogRecPtrIsValid(EndPos));
 
+	pgstat_count_wal_record(rmid, info);
+
 	XLogResetInsertion();
 
 	return EndPos;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e540180..a8b7d5c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1274,6 +1274,14 @@ CREATE VIEW pg_stat_wal AS
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
+CREATE VIEW pg_stat_wal_records AS
+    SELECT
+        w.resource_manager,
+        w.record_type,
+        w.count,
+        w.stats_reset
+    FROM pg_stat_get_wal_records() w;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index 1aa7ece..23c6c6c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -18,6 +18,7 @@ backend_sources += files(
   'pgstat_slru.c',
   'pgstat_subscription.c',
   'pgstat_wal.c',
+  'pgstat_walrecords.c',
   'pgstat_xact.c',
 )
 
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index eb8ccba..dde41b8 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -500,6 +500,23 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
 	},
+
+	[PGSTAT_KIND_WAL_RECORDS] = {
+		.name = "wal_records",
+
+		.fixed_amount = true,
+		.write_to_file = true,
+
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, wal_records),
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, wal_records),
+		.shared_data_off = offsetof(PgStatShared_WalRecords, stats),
+		.shared_data_len = sizeof(((PgStatShared_WalRecords *) 0)->stats),
+
+		.flush_static_cb = pgstat_walrecords_flush_cb,
+		.init_shmem_cb = pgstat_walrecords_init_shmem_cb,
+		.reset_all_cb = pgstat_walrecords_reset_all_cb,
+		.snapshot_cb = pgstat_walrecords_snapshot_cb,
+	},
 };
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9185a8e..49951d3 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -16,6 +16,7 @@
 
 #include "access/htup_details.h"
 #include "access/xlog.h"
+#include "access/xlog_internal.h"
 #include "access/xlogprefetcher.h"
 #include "catalog/catalog.h"
 #include "catalog/pg_authid.h"
@@ -1741,6 +1742,75 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 									wal_stats->stat_reset_timestamp));
 }
 
+/*
+ * Returns per-resource-manager WAL record type statistics.
+ */
+Datum
+pg_stat_get_wal_records(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_WAL_RECORDS_COLS	4
+	ReturnSetInfo *rsinfo;
+	PgStat_WalRecordStats *stats;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	stats = pgstat_fetch_stat_wal_records();
+
+	for (int rmid = 0; rmid < PGSTAT_NUM_WAL_RMGRS; rmid++)
+	{
+		const char *rm_name;
+
+		/* Skip resource managers that don't exist */
+		if (!RmgrIdExists((RmgrId) rmid))
+			continue;
+
+		rm_name = RmgrTable[rmid].rm_name;
+
+		for (int info_idx = 0; info_idx < PGSTAT_NUM_WAL_REC_TYPES; info_idx++)
+		{
+			Datum		values[PG_STAT_GET_WAL_RECORDS_COLS] = {0};
+			bool		nulls[PG_STAT_GET_WAL_RECORDS_COLS] = {0};
+			const char *record_type = NULL;
+			char		record_type_buf[32];
+
+			/* Skip entries with no recorded activity */
+			if (stats->counts[rmid][info_idx] == 0)
+				continue;
+
+			/* Resource manager name */
+			values[0] = CStringGetTextDatum(rm_name);
+
+			/* Record type name from rm_identify, with numeric fallback */
+			if (RmgrTable[rmid].rm_identify)
+				record_type = RmgrTable[rmid].rm_identify((uint8) (info_idx << 4));
+
+			if (record_type)
+				values[1] = CStringGetTextDatum(record_type);
+			else
+			{
+				snprintf(record_type_buf, sizeof(record_type_buf),
+						 "0x%02X", info_idx << 4);
+				values[1] = CStringGetTextDatum(record_type_buf);
+			}
+
+			/* Count */
+			values[2] = Int64GetDatum(stats->counts[rmid][info_idx]);
+
+			/* Stats reset timestamp */
+			if (stats->stat_reset_timestamp != 0)
+				values[3] = TimestampTzGetDatum(stats->stat_reset_timestamp);
+			else
+				nulls[3] = true;
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 Datum
 pg_stat_get_lock(PG_FUNCTION_ARGS)
 {
@@ -1965,6 +2035,7 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		XLogPrefetchResetStats();
 		pgstat_reset_of_kind(PGSTAT_KIND_SLRU);
 		pgstat_reset_of_kind(PGSTAT_KIND_WAL);
+		pgstat_reset_of_kind(PGSTAT_KIND_WAL_RECORDS);
 
 		PG_RETURN_VOID();
 	}
@@ -1987,11 +2058,13 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		pgstat_reset_of_kind(PGSTAT_KIND_SLRU);
 	else if (strcmp(target, "wal") == 0)
 		pgstat_reset_of_kind(PGSTAT_KIND_WAL);
+	else if (strcmp(target, "wal_records") == 0)
+		pgstat_reset_of_kind(PGSTAT_KIND_WAL_RECORDS);
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\", \"bgwriter\", \"checkpointer\", \"io\", \"recovery_prefetch\", \"slru\", or \"wal\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\", \"checkpointer\", \"io\", \"recovery_prefetch\", \"slru\", \"wal\", or \"wal_records\".")));
 
 	PG_RETURN_VOID();
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0118e97..ecc9fb2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6079,6 +6079,15 @@
   proargmodes => '{i,o,o,o,o,o,o}',
   proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_fpi_bytes,wal_buffers_full,stats_reset}',
   prosrc => 'pg_stat_get_backend_wal' },
+{ oid => '6',
+  descr => 'statistics: per-resource-manager WAL record type counts',
+  proname => 'pg_stat_get_wal_records', prorows => '100', proisstrict => 'f',
+  proretset => 't', provolatile => 's', proparallel => 'r',
+  prorettype => 'record', proargtypes => '',
+  proallargtypes => '{text,text,int8,timestamptz}',
+  proargmodes => '{o,o,o,o}',
+  proargnames => '{resource_manager,record_type,count,stats_reset}',
+  prosrc => 'pg_stat_get_wal_records' },
 { oid => '6248', descr => 'statistics: information about WAL prefetching',
   proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8e3549c..15a1d63 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -514,6 +514,23 @@ typedef struct PgStat_WalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
+/*
+ * Number of WAL resource managers (must match RM_N_IDS from rmgr.h) and
+ * record info types per resource manager (top 4 bits of info byte).
+ */
+#define PGSTAT_NUM_WAL_RMGRS		256
+#define PGSTAT_NUM_WAL_REC_TYPES	16
+
+/* -------
+ * PgStat_WalRecordStats		Per-resource-manager WAL record type counts
+ * -------
+ */
+typedef struct PgStat_WalRecordStats
+{
+	PgStat_Counter counts[PGSTAT_NUM_WAL_RMGRS][PGSTAT_NUM_WAL_REC_TYPES];
+	TimestampTz stat_reset_timestamp;
+} PgStat_WalRecordStats;
+
 /* -------
  * PgStat_Backend		Backend statistics
  * -------
@@ -834,6 +851,24 @@ extern void pgstat_report_wal(bool force);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 
 
+/*
+ * Functions in pgstat_walrecords.c
+ */
+
+extern PgStat_WalRecordStats *pgstat_fetch_stat_wal_records(void);
+
+/*
+ * Variables in pgstat_walrecords.c
+ */
+
+extern PGDLLIMPORT PgStat_Counter pgstat_pending_wal_records[PGSTAT_NUM_WAL_RMGRS][PGSTAT_NUM_WAL_REC_TYPES];
+
+#define pgstat_count_wal_record(rmid, info) \
+	do { \
+		pgstat_pending_wal_records[(rmid)][((info) >> 4) & 0x0F]++; \
+	} while (0)
+
+
 /*
  * Variables in pgstat.c
  */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 9770442..060a44a 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -488,6 +488,13 @@ typedef struct PgStatShared_Wal
 	PgStat_WalStats stats;
 } PgStatShared_Wal;
 
+typedef struct PgStatShared_WalRecords
+{
+	/* lock protects ->stats */
+	LWLock		lock;
+	PgStat_WalRecordStats stats;
+} PgStatShared_WalRecords;
+
 
 
 /* ----------
@@ -583,6 +590,7 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Lock lock;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
+	PgStatShared_WalRecords wal_records;
 
 	/*
 	 * Custom stats data with fixed-numbered objects, indexed by (PgStat_Kind
@@ -619,6 +627,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_WalStats wal;
 
+	PgStat_WalRecordStats wal_records;
+
 	/*
 	 * Data in snapshot for custom fixed-numbered statistics, indexed by
 	 * (PgStat_Kind - PGSTAT_KIND_CUSTOM_MIN).  Each entry is allocated in
@@ -847,6 +857,16 @@ extern void pgstat_wal_reset_all_cb(TimestampTz ts);
 extern void pgstat_wal_snapshot_cb(void);
 
 
+/*
+ * Functions in pgstat_walrecords.c
+ */
+
+extern bool pgstat_walrecords_flush_cb(bool nowait);
+extern void pgstat_walrecords_init_shmem_cb(void *stats);
+extern void pgstat_walrecords_reset_all_cb(TimestampTz ts);
+extern void pgstat_walrecords_snapshot_cb(void);
+
+
 /*
  * Functions in pgstat_subscription.c
  */
diff --git a/src/include/utils/pgstat_kind.h b/src/include/utils/pgstat_kind.h
index 2d78a02..85fc070 100644
--- a/src/include/utils/pgstat_kind.h
+++ b/src/include/utils/pgstat_kind.h
@@ -39,9 +39,10 @@
 #define PGSTAT_KIND_LOCK	11
 #define PGSTAT_KIND_SLRU	12
 #define PGSTAT_KIND_WAL	13
+#define PGSTAT_KIND_WAL_RECORDS	14
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
-#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
+#define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL_RECORDS
 #define PGSTAT_KIND_BUILTIN_SIZE (PGSTAT_KIND_BUILTIN_MAX + 1)
 
 /* Custom stats kinds */
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index ea7f784..279d7be 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -982,6 +982,38 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
  t
 (1 row)
 
+-- Test pg_stat_wal_records
+-- WAL-generating activity above should have produced records
+SELECT count(*) > 0 AS has_records FROM pg_stat_wal_records;
+ has_records 
+-------------
+ t
+(1 row)
+
+SELECT count(DISTINCT resource_manager) > 0 AS has_rmgrs FROM pg_stat_wal_records;
+ has_rmgrs 
+-----------
+ t
+(1 row)
+
+-- Every row should have non-null resource_manager and record_type
+SELECT count(*) = 0 AS no_nulls
+  FROM pg_stat_wal_records
+  WHERE resource_manager IS NULL OR record_type IS NULL;
+ no_nulls 
+----------
+ t
+(1 row)
+
+-- Counts should all be positive
+SELECT count(*) = 0 AS all_positive
+  FROM pg_stat_wal_records
+  WHERE count <= 0;
+ all_positive 
+--------------
+ t
+(1 row)
+
 SELECT pg_stat_force_next_flush();
  pg_stat_force_next_flush 
 --------------------------
@@ -1127,10 +1159,25 @@ SELECT stats_reset > :'wal_reset_ts'::timestamptz FROM pg_stat_wal;
  t
 (1 row)
 
+-- Test that reset_shared with wal_records specified as the stats type works
+SELECT stats_reset AS wal_records_reset_ts FROM pg_stat_wal_records LIMIT 1 \gset
+SELECT pg_stat_reset_shared('wal_records');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT count(*) AS wal_records_post_reset FROM pg_stat_wal_records \gset
+SELECT :wal_records_post_reset = 0 AS reset_cleared_counts;
+ reset_cleared_counts 
+----------------------
+ t
+(1 row)
+
 -- Test error case for reset_shared with unknown stats type
 SELECT pg_stat_reset_shared('unknown');
 ERROR:  unrecognized reset target: "unknown"
-HINT:  Target must be "archiver", "bgwriter", "checkpointer", "io", "recovery_prefetch", "slru", or "wal".
+HINT:  Target must be "archiver", "bgwriter", "checkpointer", "io", "recovery_prefetch", "slru", "wal", or "wal_records".
 -- Test that reset works for pg_stat_database and pg_stat_database_conflicts
 -- Since pg_stat_database stats_reset starts out as NULL, reset it once first so that we
 -- have a baseline for comparison. The same for pg_stat_database_conflicts as it shares
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 65d8968..a3378ef 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -464,6 +464,19 @@ CHECKPOINT (FLUSH_UNLOGGED);
 SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
 SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
 
+-- Test pg_stat_wal_records
+-- WAL-generating activity above should have produced records
+SELECT count(*) > 0 AS has_records FROM pg_stat_wal_records;
+SELECT count(DISTINCT resource_manager) > 0 AS has_rmgrs FROM pg_stat_wal_records;
+-- Every row should have non-null resource_manager and record_type
+SELECT count(*) = 0 AS no_nulls
+  FROM pg_stat_wal_records
+  WHERE resource_manager IS NULL OR record_type IS NULL;
+-- Counts should all be positive
+SELECT count(*) = 0 AS all_positive
+  FROM pg_stat_wal_records
+  WHERE count <= 0;
+
 SELECT pg_stat_force_next_flush();
 SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
 
@@ -520,6 +533,12 @@ SELECT stats_reset AS wal_reset_ts FROM pg_stat_wal \gset
 SELECT pg_stat_reset_shared('wal');
 SELECT stats_reset > :'wal_reset_ts'::timestamptz FROM pg_stat_wal;
 
+-- Test that reset_shared with wal_records specified as the stats type works
+SELECT stats_reset AS wal_records_reset_ts FROM pg_stat_wal_records LIMIT 1 \gset
+SELECT pg_stat_reset_shared('wal_records');
+SELECT count(*) AS wal_records_post_reset FROM pg_stat_wal_records \gset
+SELECT :wal_records_post_reset = 0 AS reset_cleared_counts;
+
 -- Test error case for reset_shared with unknown stats type
 SELECT pg_stat_reset_shared('unknown');
 


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
@ 2026-03-27 02:59  Michael Paquier <[email protected]>
  parent: SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Michael Paquier @ 2026-03-27 02:59 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On Thu, Mar 26, 2026 at 05:29:25PM -0700, SATYANARAYANA NARLAPURAM wrote:
> Postgre already has pg_stat_wal for aggregate WAL volume (bytes, full-page
> images, buffers), and pg_walinspect (superuser access required) for
> post-hoc forensic analysis of individual WAL segments. But I don't see a
> lightweight, observability tool that answers in real time which record
> types are responsible for the WAL. Additionally, pg_walinspect runs against
> on-disk WAL files, which is expensive. This view will be useful for
> monitoring systems to poll cheaply.
> 
> *Use cases:*
> WAL volume investigation: see which record types dominate WAL generation in
> real time without touching disk.
> Monitoring integration: Prometheus/Grafana can poll the view to track WAL
> composition over time and alert on anomalies.
> Replication tuning: identify whether WAL volume is dominated by data
> changes, index maintenance, FPIs, or vacuum activity to guide tuning.
> Extension debugging: custom WAL resource managers get visibility
> automatically.

Why is it useful to have access to this information in live for
monitoring purposes, divided by RMGR?  What do you define as an
anomaly in this context and what can be changed on the server side to
get out of an anomaly, based on the definition you would give to it?

The current WAL and IO stats are directly useful because they provide
numbers about flush, read and write quantity and timings.  These are
useful metrics for benchmarking.

This proposal only informs about the number of records, and we have a
various number of record types that have a variable length, that can
influence the distribution of the data written on disk.

As a whole, I am doubtful that this information is worth counting in
live, eating a stats kind ID.  One could also implement a background
worker that provides more advanced aggregate stats outside the WAL
insert path, with a custom stats kind.  No need to have that into
core, especially if the code that increments the stats eats more and
more cycles.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

*  Re: [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
@ 2026-03-27 18:26  SATYANARAYANA NARLAPURAM <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-03-27 18:26 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

Hi  Michael,

Thank you for your feedback!

On Thu, Mar 26, 2026 at 7:59 PM Michael Paquier <[email protected]> wrote:

> On Thu, Mar 26, 2026 at 05:29:25PM -0700, SATYANARAYANA NARLAPURAM wrote:
> > Postgre already has pg_stat_wal for aggregate WAL volume (bytes,
> full-page
> > images, buffers), and pg_walinspect (superuser access required) for
> > post-hoc forensic analysis of individual WAL segments. But I don't see a
> > lightweight, observability tool that answers in real time which record
> > types are responsible for the WAL. Additionally, pg_walinspect runs
> against
> > on-disk WAL files, which is expensive. This view will be useful for
> > monitoring systems to poll cheaply.
> >
> > *Use cases:*
> > WAL volume investigation: see which record types dominate WAL generation
> in
> > real time without touching disk.
> > Monitoring integration: Prometheus/Grafana can poll the view to track WAL
> > composition over time and alert on anomalies.
> > Replication tuning: identify whether WAL volume is dominated by data
> > changes, index maintenance, FPIs, or vacuum activity to guide tuning.
> > Extension debugging: custom WAL resource managers get visibility
> > automatically.
>
> Why is it useful to have access to this information in live for
> monitoring purposes, divided by RMGR?


Per-RMGR breakdown is essentially a real-time X-ray of what the database
is actually doing. Existing view help us understand what queries were
submitted,
but what physical operations resulted from them can be seen with this view.



> What do you define as an
> anomaly in this context and what can be changed on the server side to
> get out of an anomaly, based on the definition you would give to it?
>

A few examples, HOT ratio dropped, BTree page splits, some app adding
logical decoding messages, GIN generates more WAL than the corresponding
heap modifications, high lock counts etc.


>
> The current WAL and IO stats are directly useful because they provide
> numbers about flush, read and write quantity and timings.  These are
> useful metrics for benchmarking.
>
> This proposal only informs about the number of records, and we have a
> various number of record types that have a variable length, that can
> influence the distribution of the data written on disk.
>

yeah, that was a downside, didn't add sizes to keep the overhead less.

As a whole, I am doubtful that this information is worth counting in
> live, eating a stats kind ID.  One could also implement a background
> worker that provides more advanced aggregate stats outside the WAL
> insert path, with a custom stats kind.  No need to have that into
> core, especially if the code that increments the stats eats more and
> more cycles.


Your argument makes sense to me, I was a bit hesitant on
writing a background worker because of the potential side effects of tailing
 the WAL. Let me try a different approach by implementing an ondemand
sniffing of the WAL, which can be implemented as an extension without
changes to core. Do you have thoughts on making it a contrib module or
modify existing pg_walinspect to accommodate these requirements?

Thanks,
Satya


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

*  Re: [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics
@ 2026-04-28 01:30  Bharath Rupireddy <[email protected]>
  parent: SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Bharath Rupireddy @ 2026-04-28 01:30 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: Michael Paquier <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi,

On Fri, Mar 27, 2026 at 11:27 AM SATYANARAYANA NARLAPURAM
<[email protected]> wrote:
>
>> As a whole, I am doubtful that this information is worth counting in
>> live, eating a stats kind ID.  One could also implement a background
>> worker that provides more advanced aggregate stats outside the WAL
>> insert path, with a custom stats kind.  No need to have that into
>> core, especially if the code that increments the stats eats more and
>> more cycles.
>
> Your argument makes sense to me, I was a bit hesitant on
> writing a background worker because of the potential side effects of tailing
>  the WAL. Let me try a different approach by implementing an ondemand
> sniffing of the WAL, which can be implemented as an extension without
> changes to core. Do you have thoughts on making it a contrib module or
> modify existing pg_walinspect to accommodate these requirements?

Hi,

Please find an attached patch that implements a page_read callback
which reads WAL directly from WAL buffers using WALReadFromBuffers.
I've included a test module as a demo to show how one can collect WAL
record statistics without hitting WAL files.

The only core change needed is a fix in ReadPageInternal.
ReadPageInternal validates the first page of a WAL segment whenever it
switches to a new segment. When reading from WAL buffers, that first
page may already be overwritten by newer WAL, so we need to skip this
validation when no segment has been previously loaded (ws_segno == 0).

I understand that this will not give exact stats like the other
approaches discussed; however, it requires fewer changes to core, and
WALReadFromBuffers was designed to work without any locks, so it
should have minimal impact.

Appreciate any feedback. Thank you!

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v1-0001-Add-test-extension-for-WAL-stats-using-WALReadFro.patch (15.5K, 2-v1-0001-Add-test-extension-for-WAL-stats-using-WALReadFro.patch)
  download | inline diff:
From 2fda0987b2a38fda22d70ebd019dc102ddd0b11d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 22 Apr 2026 03:50:25 +0000
Subject: [PATCH v1] Add test extension for WAL stats using WALReadFromBuffers

Add a test module that collects per-resource-manager WAL record
statistics by reading directly from in-memory WAL buffers via
WALReadFromBuffers, avoiding WAL file I/O.

This requires a small fix in ReadPageInternal to handle
buffer-based page_read callbacks where the first page of a segment
may no longer be available in the WAL buffers.
---
 src/backend/access/transam/xlog.c             |   4 +-
 src/backend/access/transam/xlogreader.c       |  11 +-
 src/include/access/xlog.h                     |   1 +
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 .../modules/test_wal_record_stats/.gitignore  |   4 +
 .../modules/test_wal_record_stats/Makefile    |  23 ++
 .../modules/test_wal_record_stats/meson.build |  23 ++
 .../test_wal_record_stats--1.0.sql            |  18 ++
 .../test_wal_record_stats.c                   | 236 ++++++++++++++++++
 .../test_wal_record_stats.control             |   4 +
 src/tools/pgindent/typedefs.list              |   1 +
 12 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 src/test/modules/test_wal_record_stats/.gitignore
 create mode 100644 src/test/modules/test_wal_record_stats/Makefile
 create mode 100644 src/test/modules/test_wal_record_stats/meson.build
 create mode 100644 src/test/modules/test_wal_record_stats/test_wal_record_stats--1.0.sql
 create mode 100644 src/test/modules/test_wal_record_stats/test_wal_record_stats.c
 create mode 100644 src/test/modules/test_wal_record_stats/test_wal_record_stats.control

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e39af79c03b..2731c1fe0f2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -738,7 +738,7 @@ static void ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos,
 									  XLogRecPtr *EndPos, XLogRecPtr *PrevPtr);
 static bool ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos,
 							  XLogRecPtr *PrevPtr);
-static XLogRecPtr WaitXLogInsertionsToFinish(XLogRecPtr upto);
+XLogRecPtr	WaitXLogInsertionsToFinish(XLogRecPtr upto);
 static char *GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli);
 static XLogRecPtr XLogBytePosToRecPtr(uint64 bytepos);
 static XLogRecPtr XLogBytePosToEndRecPtr(uint64 bytepos);
@@ -1541,7 +1541,7 @@ WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt)
  * uninitialized page), and the inserter might need to evict an old WAL buffer
  * to make room for a new one, which in turn requires WALWriteLock.
  */
-static XLogRecPtr
+XLogRecPtr
 WaitXLogInsertionsToFinish(XLogRecPtr upto)
 {
 	uint64		bytepos;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 8849610db00..a559fcc2fa9 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1044,8 +1044,17 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	 * file and validate its header, even if that's not where the target
 	 * record is.  This is so that we can check the additional identification
 	 * info that is present in the first page's "long" header.
+	 *
+	 * When reading WAL from the in-memory WAL buffers via WALReadFromBuffers,
+	 * the first page of a segment may have already been overwritten by newer
+	 * WAL in the WAL buffers by the time we need to read a later page from
+	 * that same segment. In this case, we skip this first page validation.
+	 *
+	 * XXX: Consider adding a flag in XLogReaderState that callers reading
+	 * from WAL buffers can set, rather than relying on ws_segno == 0.
 	 */
-	if (targetSegNo != state->seg.ws_segno && targetPageOff != 0)
+	if (state->seg.ws_segno != 0 &&
+		targetSegNo != state->seg.ws_segno && targetPageOff != 0)
 	{
 		XLogRecPtr	targetSegmentPtr = pageptr - targetPageOff;
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 437b4f32349..ad31a9cb618 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -284,6 +284,7 @@ extern void SetWalWriterSleeping(bool sleeping);
 
 extern void WakeupCheckpointer(void);
 
+extern XLogRecPtr WaitXLogInsertionsToFinish(XLogRecPtr upto);
 extern Size WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count,
 							   TimeLineID tli);
 
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 0a74ab5c86f..c3bb366b090 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
 		  oauth_validator \
 		  plsample \
 		  spgist_name_ops \
+		  test_wal_record_stats \
 		  test_aio \
 		  test_autovacuum \
 		  test_binaryheap \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4bca42bb370..f3133193705 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('nbtree')
 subdir('oauth_validator')
 subdir('plsample')
 subdir('spgist_name_ops')
+subdir('test_wal_record_stats')
 subdir('ssl_passphrase_callback')
 subdir('test_aio')
 subdir('test_autovacuum')
diff --git a/src/test/modules/test_wal_record_stats/.gitignore b/src/test/modules/test_wal_record_stats/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_wal_record_stats/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_wal_record_stats/Makefile b/src/test/modules/test_wal_record_stats/Makefile
new file mode 100644
index 00000000000..bdde4eb595e
--- /dev/null
+++ b/src/test/modules/test_wal_record_stats/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_wal_record_stats/Makefile
+
+MODULE_big = test_wal_record_stats
+OBJS = \
+	$(WIN32RES) \
+	test_wal_record_stats.o
+PGFILEDESC = "test_wal_record_stats - test WAL record stats from WAL buffers"
+
+EXTENSION = test_wal_record_stats
+DATA = test_wal_record_stats--1.0.sql
+
+TAP_TESTS = 0
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_wal_record_stats
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_wal_record_stats/meson.build b/src/test/modules/test_wal_record_stats/meson.build
new file mode 100644
index 00000000000..d135f4b2721
--- /dev/null
+++ b/src/test/modules/test_wal_record_stats/meson.build
@@ -0,0 +1,23 @@
+# Copyright (c) 2024-2026, PostgreSQL Global Development Group
+
+test_wal_record_stats_sources = files(
+  'test_wal_record_stats.c',
+)
+
+if host_system == 'windows'
+  test_wal_record_stats_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_wal_record_stats',
+    '--FILEDESC', 'test_wal_record_stats - test WAL record stats from WAL buffers',])
+endif
+
+test_wal_record_stats = shared_module('test_wal_record_stats',
+  test_wal_record_stats_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_wal_record_stats
+
+test_install_data += files(
+  'test_wal_record_stats.control',
+  'test_wal_record_stats--1.0.sql',
+)
+
diff --git a/src/test/modules/test_wal_record_stats/test_wal_record_stats--1.0.sql b/src/test/modules/test_wal_record_stats/test_wal_record_stats--1.0.sql
new file mode 100644
index 00000000000..cb8e3d07d00
--- /dev/null
+++ b/src/test/modules/test_wal_record_stats/test_wal_record_stats--1.0.sql
@@ -0,0 +1,18 @@
+/* src/test/modules/test_wal_record_stats/test_wal_record_stats--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_wal_record_stats" to load this file. \quit
+
+CREATE FUNCTION get_wal_record_stats_from_buffers(
+    IN start_lsn pg_lsn,
+    IN end_lsn pg_lsn,
+    OUT resource_manager text,
+    OUT record_type text,
+    OUT count int8,
+    OUT total_record_length int8,
+    OUT total_main_data_length int8,
+    OUT total_fpi_length int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'get_wal_record_stats_from_buffers'
+LANGUAGE C STRICT PARALLEL SAFE;
diff --git a/src/test/modules/test_wal_record_stats/test_wal_record_stats.c b/src/test/modules/test_wal_record_stats/test_wal_record_stats.c
new file mode 100644
index 00000000000..adee5dc2b60
--- /dev/null
+++ b/src/test/modules/test_wal_record_stats/test_wal_record_stats.c
@@ -0,0 +1,236 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_wal_record_stats.c
+ *		Test module exercising WALReadFromBuffers() to read WAL records
+ *		directly from WAL buffers (shared memory, no disk I/O).
+ *
+ * Copyright (c) 1996-2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_wal_record_stats/test_wal_record_stats.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogreader.h"
+#include "access/xlogrecovery.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/pg_lsn.h"
+#include "utils/tuplestore.h"
+
+PG_MODULE_MAGIC;
+
+/*
+ * page_read callback that reads WAL directly from WAL buffers.
+ */
+static int
+wal_buffer_page_read(XLogReaderState *state, XLogRecPtr targetPagePtr,
+					 int reqLen, XLogRecPtr targetRecPtr,
+					 char *cur_page)
+{
+	XLogRecPtr	read_upto;
+	XLogRecPtr	loc;
+	TimeLineID	tli = GetWALInsertionTimeLine();
+	Size		count;
+	Size		nbytes;
+
+	loc = targetPagePtr + reqLen;
+
+	read_upto = GetXLogInsertRecPtr();
+
+	/*
+	 * If the requested WAL hasn't been inserted yet, return -1 rather than
+	 * waiting.  The WAL between start_lsn and end_lsn should already be
+	 * inserted by the time we're called.
+	 */
+	if (loc > read_upto)
+		return -1;
+
+	/* Ensure any in-progress insertions up to this point are visible */
+	WaitXLogInsertionsToFinish(loc);
+
+	if (targetPagePtr + XLOG_BLCKSZ <= read_upto)
+		count = XLOG_BLCKSZ;
+	else if (targetPagePtr + reqLen > read_upto)
+		return -1;
+	else
+		count = read_upto - targetPagePtr;
+
+	nbytes = WALReadFromBuffers(cur_page, targetPagePtr, count, tli);
+
+	if (nbytes <= 0)
+		return -1;		/* data evicted from circular WAL buffer */
+
+	return nbytes;
+}
+
+/* Per-rmgr/record_type accumulation entry. */
+typedef struct WalRecordStat
+{
+	char		resource_manager[64];
+	char		record_type[64];
+	int64		count;
+	int64		total_record_length;
+	int64		total_main_data_length;
+	int64		total_fpi_length;
+} WalRecordStat;
+
+#define MAX_WAL_STAT_ENTRIES 256
+
+/*
+ * get_wal_record_stats_from_buffers(start_lsn, end_lsn)
+ *
+ * Returns per-resource_manager/record_type WAL record statistics by reading
+ * directly from WAL buffers via WALReadFromBuffers().
+ */
+PG_FUNCTION_INFO_V1(get_wal_record_stats_from_buffers);
+Datum
+get_wal_record_stats_from_buffers(PG_FUNCTION_ARGS)
+{
+#define WAL_RECORD_STATS_COLS 6
+	XLogRecPtr	start_lsn = PG_GETARG_LSN(0);
+	XLogRecPtr	end_lsn = PG_GETARG_LSN(1);
+	XLogReaderState *xlogreader;
+	XLogRecPtr	first_valid_record;
+	char	   *errormsg;
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	WalRecordStat stats[MAX_WAL_STAT_ENTRIES];
+	int			nstats = 0;
+	int			i;
+
+	if (RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("WAL buffers can only be read on a primary server")));
+
+	if (start_lsn < XLOG_BLCKSZ)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("could not read WAL at LSN %X/%08X",
+						LSN_FORMAT_ARGS(start_lsn))));
+
+	if (start_lsn > end_lsn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("WAL start LSN must be less than end LSN")));
+
+	memset(stats, 0, sizeof(stats));
+
+	InitMaterializedSRF(fcinfo, 0);
+
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
+									XL_ROUTINE(.page_read = &wal_buffer_page_read,
+											   .segment_open = NULL,
+											   .segment_close = NULL),
+									NULL);
+
+	if (xlogreader == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory"),
+				 errdetail("Failed while allocating a WAL reading processor.")));
+
+	first_valid_record = XLogFindNextRecord(xlogreader, start_lsn, &errormsg);
+
+	if (XLogRecPtrIsInvalid(first_valid_record))
+	{
+		XLogReaderFree(xlogreader);
+		PG_RETURN_VOID();
+	}
+
+	/* Scan WAL records and accumulate stats */
+	while (true)
+	{
+		XLogRecord *record;
+		RmgrData	desc;
+		const char *rmgr_name;
+		const char *rec_type;
+		int			found = -1;
+
+		record = XLogReadRecord(xlogreader, &errormsg);
+		if (record == NULL)
+			break;
+
+		if (xlogreader->EndRecPtr > end_lsn)
+			break;
+
+		desc = GetRmgr(XLogRecGetRmid(xlogreader));
+		rmgr_name = desc.rm_name;
+		rec_type = desc.rm_identify(XLogRecGetInfo(xlogreader));
+		if (rec_type == NULL)
+			rec_type = "UNKNOWN";
+
+		/* Find existing entry or create new one */
+		for (i = 0; i < nstats; i++)
+		{
+			if (strcmp(stats[i].resource_manager, rmgr_name) == 0 &&
+				strcmp(stats[i].record_type, rec_type) == 0)
+			{
+				found = i;
+				break;
+			}
+		}
+
+		if (found < 0)
+		{
+			if (nstats >= MAX_WAL_STAT_ENTRIES)
+			{
+				ereport(WARNING,
+						(errmsg("WAL record stat entries limit reached (%d)",
+								MAX_WAL_STAT_ENTRIES)));
+				break;
+			}
+			found = nstats++;
+			strlcpy(stats[found].resource_manager, rmgr_name,
+					sizeof(stats[found].resource_manager));
+			strlcpy(stats[found].record_type, rec_type,
+					sizeof(stats[found].record_type));
+		}
+
+		stats[found].count++;
+		stats[found].total_record_length += XLogRecGetTotalLen(xlogreader);
+		stats[found].total_main_data_length += XLogRecGetDataLen(xlogreader);
+
+		if (XLogRecHasAnyBlockRefs(xlogreader))
+		{
+			uint32		fpi_len = 0;
+			StringInfoData dummy;
+
+			initStringInfo(&dummy);
+			XLogRecGetBlockRefInfo(xlogreader, false, false, &dummy, &fpi_len);
+			pfree(dummy.data);
+			stats[found].total_fpi_length += fpi_len;
+		}
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	XLogReaderFree(xlogreader);
+
+	/* Emit result rows */
+	for (i = 0; i < nstats; i++)
+	{
+		Datum		values[WAL_RECORD_STATS_COLS] = {0};
+		bool		nulls[WAL_RECORD_STATS_COLS] = {0};
+		int			col = 0;
+
+		values[col++] = CStringGetTextDatum(stats[i].resource_manager);
+		values[col++] = CStringGetTextDatum(stats[i].record_type);
+		values[col++] = Int64GetDatum(stats[i].count);
+		values[col++] = Int64GetDatum(stats[i].total_record_length);
+		values[col++] = Int64GetDatum(stats[i].total_main_data_length);
+		values[col++] = Int64GetDatum(stats[i].total_fpi_length);
+
+		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+							 values, nulls);
+	}
+
+	PG_RETURN_VOID();
+
+#undef WAL_RECORD_STATS_COLS
+}
diff --git a/src/test/modules/test_wal_record_stats/test_wal_record_stats.control b/src/test/modules/test_wal_record_stats/test_wal_record_stats.control
new file mode 100644
index 00000000000..85a33766c71
--- /dev/null
+++ b/src/test/modules/test_wal_record_stats/test_wal_record_stats.control
@@ -0,0 +1,4 @@
+comment = 'Test WAL record stats reading from WAL buffers via WALReadFromBuffers'
+default_version = '1.0'
+module_pathname = '$libdir/test_wal_record_stats'
+relocatable = true
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49dfb662abc..fd283b7e88b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3419,6 +3419,7 @@ WalRcvStreamOptions
 WalRcvWakeupReason
 WalReceiverConn
 WalReceiverFunctionsType
+WalRecordStat
 WalSnd
 WalSndCtlData
 WalSndSendDataCallback
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2026-04-28 01:30 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-27 00:29  [Proposal] pg_stat_wal_records – per-record-type WAL generation statistics SATYANARAYANA NARLAPURAM <[email protected]>
2026-03-27 02:59 ` Michael Paquier <[email protected]>
2026-03-27 18:26   ` SATYANARAYANA NARLAPURAM <[email protected]>
2026-04-28 01:30     ` Bharath Rupireddy <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox