public inbox for [email protected]  
help / color / mirror / Atom feed
Allow to collect statistics on virtual generated columns
21+ messages / 3 participants
[nested] [flat]

* Allow to collect statistics on virtual generated columns
@ 2025-04-22 09:10  Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Yugo Nagata @ 2025-04-22 09:10 UTC (permalink / raw)
  To: pgsql-hackers

Hi hackers,

Hi hackers,

Now we can create a table with a virtual generated column, but
when a condition in WHERE clause contains virtual generated column,
estimated rows are not correct since no statistics on this is
collectef.

[Ex.1]

 test=# CREATE TABLE t (i int, v int GENERATED ALWAYS AS (i+1) VIRTUAL);
 CREATE TABLE

 test=# INSERT INTO t SELECT generate_series(1,1000);
 INSERT 0 1000

 test=# INSERT INTO t SELECT 1 FROM generate_series(1,1000);
 INSERT 0 1000

 test=# EXPLAIN ANALYZE SELECT * FROM t WHERE v = 2;
                                             QUERY PLAN                                            
 --------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..36.02 rows=9 width=8) (actual time=0.093..3.059 rows=1001.00 loops=1)
    Filter: ((i + 1) = 2)
    Rows Removed by Filter: 999
    Buffers: shared hit=9
  Planning:
    Buffers: shared hit=26
  Planning Time: 1.142 ms
  Execution Time: 3.434 ms
 (8 rows)

Therefore, I would like to allow to collect statistics on virtual enerated columns.

I think there are at least three approaches for this.

(1) Allow the normal ANALYZE to collect statistics on virtual generated columns

ANALYZE expands virtual generated columns' expression, and collects statistics
on evaluated values. In this approach, the statistics on virtual generated columns
are collected in default, but ANALYZE on the table would become a bit expensive.

(2) Allow to create an index on virtual generated column

This is proposed in [1]. This proposal itself would be useful, I believe it is better
to provide a way to collect statistics without cost of creating an index.

[1] https://www.postgresql.org/message-id/flat/CACJufxGao-cypdNhifHAdt8jHfK6-HX=tRBovBkgRuxw063GaA@mail....

(3) Allow to create extended statistics on virtual generated columns

In this approach, ANALYZE processes virtual generated columns only if corresponding
extended statistics are defined. Although we can create extended statistics on
expressions of virtual generated columns even in the current implementation, this enables
that users to create a useful statistics this just by specifying a column name without
specifying complex expression.

I can also think of two variations for this approach.

(3a)
At the timing when an extended statistics is created, virtual generated columns are
expanded, and the statistics is defined on this expression.

(3b)
At the timing when an extended statistics is created, virtual generated columns are
NOT expanded. The statistics is defined on the virtual generated column itself and,
the expression is expanded when ANALYZE processes the extended statistics.

I've attached a draft patch based on (3a).  However, if it is possible we could change
the definition of generated columns in future (as proposed in [2]), (3b) might be preferred.  

[2] https://www.postgresql.org/message-id/flat/CACJufxH3VETr7orF5rW29GnDk3n1wWbOE3WdkHYd3iPGrQ9E_A@mail....

Here is an example of how the patch works.

[Ex.2]

 test=# CREATE STATISTICS exstat ON v FROM t;
 CREATE STATISTICS
 test=# ANALYZE t;
 ANALYZE
 test=# EXPLAIN ANALYZE SELECT * FROM t WHERE v = 2;
                                              QUERY PLAN                                              
 -----------------------------------------------------------------------------------------------------
  Seq Scan on t  (cost=0.00..41.50 rows=1001 width=8) (actual time=0.067..2.422 rows=1001.00 loops=1)
    Filter: ((i + 1) = 2)
    Rows Removed by Filter: 999
    Buffers: shared hit=9
  Planning:
    Buffers: shared hit=14
  Planning Time: 0.785 ms
  Execution Time: 2.744 ms
 (8 rows)


What do you think of this?  Which approach of (1), (3a), or (3b) is good?
Or, completely different approach is better? 
With your feedback, I would like to progress or rework the patch.

Regards,
Yugo Nagata

-- 
Yugo Nagata <[email protected]>


Attachments:

  [text/x-diff] v1-0001-Allow-to-create-extended-statistics-on-virtual-ge.patch (5.5K, 2-v1-0001-Allow-to-create-extended-statistics-on-virtual-ge.patch)
  download | inline diff:
From a6b0be714f6d4e4e0e7423f07432d3135c807a63 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <[email protected]>
Date: Tue, 22 Apr 2025 17:03:50 +0900
Subject: [PATCH v1] Allow to create extended statistics on virtual generated
 columns

---
 src/backend/commands/statscmds.c | 86 +++++++++++++++-----------------
 1 file changed, 40 insertions(+), 46 deletions(-)

diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index e24d540cd45..9b7f27fec28 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -29,6 +29,7 @@
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/statistics.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -240,28 +241,27 @@ CreateStatistics(CreateStatsStmt *stmt)
 								attname)));
 			attForm = (Form_pg_attribute) GETSTRUCT(atttuple);
 
-			/* Disallow use of system attributes in extended stats */
-			if (attForm->attnum <= 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("statistics creation on system columns is not supported")));
-
-			/* Disallow use of virtual generated columns in extended stats */
 			if (attForm->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("statistics creation on virtual generated columns is not supported")));
+				selem->expr = build_generation_expression(rel, attForm->attnum);
+			else
+			{
+				/* Disallow use of system attributes in extended stats */
+				if (attForm->attnum <= 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("statistics creation on system columns is not supported")));
 
-			/* Disallow data types without a less-than operator */
-			type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
-			if (type->lt_opr == InvalidOid)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
-								attname, format_type_be(attForm->atttypid))));
+				/* Disallow data types without a less-than operator */
+				type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
+				if (type->lt_opr == InvalidOid)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
+									attname, format_type_be(attForm->atttypid))));
 
-			attnums[nattnums] = attForm->attnum;
-			nattnums++;
+				attnums[nattnums] = attForm->attnum;
+				nattnums++;
+			}
 			ReleaseSysCache(atttuple);
 		}
 		else if (IsA(selem->expr, Var)) /* column reference in parens */
@@ -269,30 +269,28 @@ CreateStatistics(CreateStatsStmt *stmt)
 			Var		   *var = (Var *) selem->expr;
 			TypeCacheEntry *type;
 
-			/* Disallow use of system attributes in extended stats */
-			if (var->varattno <= 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("statistics creation on system columns is not supported")));
-
-			/* Disallow use of virtual generated columns in extended stats */
-			if (get_attgenerated(relid, var->varattno) == ATTRIBUTE_GENERATED_VIRTUAL)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("statistics creation on virtual generated columns is not supported")));
+			if (get_attgenerated(relid, var->varattno) != ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				/* Disallow use of system attributes in extended stats */
+				if (var->varattno <= 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("statistics creation on system columns is not supported")));
 
-			/* Disallow data types without a less-than operator */
-			type = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
-			if (type->lt_opr == InvalidOid)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
-								get_attname(relid, var->varattno, false), format_type_be(var->vartype))));
+				/* Disallow data types without a less-than operator */
+				type = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+				if (type->lt_opr == InvalidOid)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
+									get_attname(relid, var->varattno, false), format_type_be(var->vartype))));
 
-			attnums[nattnums] = var->varattno;
-			nattnums++;
+				attnums[nattnums] = var->varattno;
+				nattnums++;
+			}
 		}
-		else					/* expression */
+
+		if (selem->expr)
 		{
 			Node	   *expr = selem->expr;
 			Oid			atttype;
@@ -302,6 +300,8 @@ CreateStatistics(CreateStatsStmt *stmt)
 
 			Assert(expr != NULL);
 
+			expr = expand_generated_columns_in_expr(expr, rel, 1);
+
 			pull_varattnos(expr, 1, &attnums);
 
 			k = -1;
@@ -314,12 +314,6 @@ CreateStatistics(CreateStatsStmt *stmt)
 					ereport(ERROR,
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("statistics creation on system columns is not supported")));
-
-				/* Disallow use of virtual generated columns in extended stats */
-				if (get_attgenerated(relid, attnum) == ATTRIBUTE_GENERATED_VIRTUAL)
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("statistics creation on virtual generated columns is not supported")));
 			}
 
 			/*
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2025-06-17 14:43  Andres Freund <[email protected]>
  parent: Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Andres Freund @ 2025-06-17 14:43 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: pgsql-hackers

Hi,

On 2025-04-22 18:10:06 +0900, Yugo Nagata wrote:
> With your feedback, I would like to progress or rework the patch.

Right now the tests seem to always fail:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5714

Fails e.g. with:
https://api.cirrus-ci.com/v1/artifact/task/5921189782093824/testrun/build/testrun/regress/regress/re...

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/stats_ext.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/stats_ext.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/stats_ext.out	2025-05-26 00:59:01.813042000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/stats_ext.out	2025-05-26 01:02:20.350387000 +0000
@@ -56,7 +56,6 @@
 ERROR:  unrecognized statistics kind "unrecognized"
 -- incorrect expressions
 CREATE STATISTICS tst ON (y) FROM ext_stats_test; -- single column reference
-ERROR:  extended statistics require at least 2 columns
 CREATE STATISTICS tst ON y + z FROM ext_stats_test; -- missing parentheses
 ERROR:  syntax error at or near "+"
 LINE 1: CREATE STATISTICS tst ON y + z FROM ext_stats_test;
@@ -69,25 +68,24 @@
 -- statistics on virtual generated column not allowed
 CREATE TABLE ext_stats_test1 (x int, y int, z int GENERATED ALWAYS AS (x+y) VIRTUAL, w xid);
 CREATE STATISTICS tst on z from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
 CREATE STATISTICS tst on (z) from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
+ERROR:  statistics object "tst" already exists
 CREATE STATISTICS tst on (z+1) from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
+ERROR:  statistics object "tst" already exists
 CREATE STATISTICS tst (ndistinct) ON z from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
+ERROR:  statistics object "tst" already exists
 -- statistics on system column not allowed
 CREATE STATISTICS tst on tableoid from ext_stats_test1;
-ERROR:  statistics creation on system columns is not supported
+ERROR:  statistics object "tst" already exists
 CREATE STATISTICS tst on (tableoid) from ext_stats_test1;
-ERROR:  statistics creation on system columns is not supported
+ERROR:  statistics object "tst" already exists
 CREATE STATISTICS tst on (tableoid::int+1) from ext_stats_test1;
-ERROR:  statistics creation on system columns is not supported
+ERROR:  statistics object "tst" already exists
 CREATE STATISTICS tst (ndistinct) ON xmin from ext_stats_test1;
-ERROR:  statistics creation on system columns is not supported
+ERROR:  statistics object "tst" already exists
 -- statistics without a less-than operator not supported
 CREATE STATISTICS tst (ndistinct) ON w from ext_stats_test1;
-ERROR:  column "w" cannot be used in statistics because its type xid has no default btree operator class
+ERROR:  statistics object "tst" already exists
 DROP TABLE ext_stats_test1;
 -- Ensure stats are dropped sanely, and test IF NOT EXISTS while at it
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);

Greetings,

Andres





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2025-06-24 08:05  Yugo Nagata <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Yugo Nagata @ 2025-06-24 08:05 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers

On Tue, 17 Jun 2025 10:43:41 -0400
Andres Freund <[email protected]> wrote:

> Hi,
> 
> On 2025-04-22 18:10:06 +0900, Yugo Nagata wrote:
> > With your feedback, I would like to progress or rework the patch.
> 
> Right now the tests seem to always fail:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F571

Thank you for letting me know it.

I've attached an updated patch to fix the test failure.

However, I'm now reconsidering the current approach, where the expression
of a virtual generated column is expanded at the time of creating extended
statistics. This seems not be ideal, as the statistics would become useless
if the expression is later modified.

Instead, I'm thinking of an alternative approach: expanding the expression
at the time statistics are collected.

Best regards,
Yugo Nagata

> 
> Fails e.g. with:
> https://api.cirrus-ci.com/v1/artifact/task/5921189782093824/testrun/build/testrun/regress/regress/re...
> 
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/stats_ext.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/stats_ext.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/stats_ext.out	2025-05-26 00:59:01.813042000 +0000
> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/stats_ext.out	2025-05-26 01:02:20.350387000 +0000
> @@ -56,7 +56,6 @@
>  ERROR:  unrecognized statistics kind "unrecognized"
>  -- incorrect expressions
>  CREATE STATISTICS tst ON (y) FROM ext_stats_test; -- single column reference
> -ERROR:  extended statistics require at least 2 columns
>  CREATE STATISTICS tst ON y + z FROM ext_stats_test; -- missing parentheses
>  ERROR:  syntax error at or near "+"
>  LINE 1: CREATE STATISTICS tst ON y + z FROM ext_stats_test;
> @@ -69,25 +68,24 @@
>  -- statistics on virtual generated column not allowed
>  CREATE TABLE ext_stats_test1 (x int, y int, z int GENERATED ALWAYS AS (x+y) VIRTUAL, w xid);
>  CREATE STATISTICS tst on z from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
>  CREATE STATISTICS tst on (z) from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst on (z+1) from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst (ndistinct) ON z from ext_stats_test1;
> -ERROR:  statistics creation on virtual generated columns is not supported
> +ERROR:  statistics object "tst" already exists
>  -- statistics on system column not allowed
>  CREATE STATISTICS tst on tableoid from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst on (tableoid) from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst on (tableoid::int+1) from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  CREATE STATISTICS tst (ndistinct) ON xmin from ext_stats_test1;
> -ERROR:  statistics creation on system columns is not supported
> +ERROR:  statistics object "tst" already exists
>  -- statistics without a less-than operator not supported
>  CREATE STATISTICS tst (ndistinct) ON w from ext_stats_test1;
> -ERROR:  column "w" cannot be used in statistics because its type xid has no default btree operator class
> +ERROR:  statistics object "tst" already exists
>  DROP TABLE ext_stats_test1;
>  -- Ensure stats are dropped sanely, and test IF NOT EXISTS while at it
>  CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);
> 
> Greetings,
> 
> Andres
> 
> 


-- 
Yugo Nagata <[email protected]>


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* [PATCH v5] Allow to collect statistics on virtual generated columns
@ 2025-07-18 00:58  Yugo Nagata <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: Yugo Nagata @ 2025-07-18 00:58 UTC (permalink / raw)

During ANALYZE, generation expressions are expanded, and statistics are
computed using compute_expr_stats(). To support this, both compute_expr_stats()
and AnlExprData are now exported from extended_stats.c.

To enable the optimizer to make use of these statistics, a new field named
virtual_gencols is added to RelOptInfo. This field holds the expressions of
virtual generated columns in the table. In examine_variable(), if an
expression in a WHERE clause matches a virtual generated column, the
corresponding statistics are used for that expression.
---
 doc/src/sgml/ref/alter_table.sgml             |  9 +--
 src/backend/commands/analyze.c                | 32 +++++++---
 src/backend/optimizer/util/plancat.c          | 61 +++++++++++++++++++
 src/backend/statistics/extended_stats.c       | 11 +---
 src/backend/utils/adt/selfuncs.c              | 44 ++++++++++++-
 src/include/nodes/pathnodes.h                 | 19 ++++++
 .../statistics/extended_stats_internal.h      |  9 +++
 .../regress/expected/generated_virtual.out    |  7 +++
 src/test/regress/sql/generated_virtual.sql    |  3 +
 9 files changed, 167 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 1bd479c917a..5caea32e52f 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -216,8 +216,6 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       When this form is used, the column's statistics are removed,
       so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
@@ -281,12 +279,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      </para>
 
      <para>
-      When this form is used on a stored generated column, its statistics
-      are removed, so running
-      <link linkend="sql-analyze"><command>ANALYZE</command></link>
+      When this form is used, the column's statistics are removed,
+      so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5e2a7a8234e..2404e205748 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -40,6 +40,7 @@
 #include "parser/parse_oper.h"
 #include "parser/parse_relation.h"
 #include "pgstat.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -564,13 +565,28 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 		{
 			VacAttrStats *stats = vacattrstats[i];
 			AttributeOpts *aopt;
+			Form_pg_attribute attr = TupleDescAttr(onerel->rd_att, stats->tupattnum - 1);
 
-			stats->rows = rows;
-			stats->tupDesc = onerel->rd_att;
-			stats->compute_stats(stats,
-								 std_fetch_func,
-								 numrows,
-								 totalrows);
+			/*
+			 * For a virtual generated column, compute statistics for the expression value.
+			 */
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				AnlExprData *exprdata = (AnlExprData *) palloc0(sizeof(AnlExprData));
+
+				exprdata->expr = build_generation_expression(onerel, stats->tupattnum);
+				exprdata->vacattrstat = stats;
+				compute_expr_stats(onerel, exprdata, 1, rows, numrows);
+			}
+			else
+			{
+				stats->rows = rows;
+				stats->tupDesc = onerel->rd_att;
+				stats->compute_stats(stats,
+									 std_fetch_func,
+									 numrows,
+									 totalrows);
+			}
 
 			/*
 			 * If the appropriate flavor of the n_distinct option is
@@ -1055,10 +1071,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
 	if (attr->attisdropped)
 		return NULL;
 
-	/* Don't analyze virtual generated columns */
-	if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-		return NULL;
-
 	/*
 	 * Get attstattarget value.  Set to -1 if null.  (Analyze functions expect
 	 * -1 to mean use default_statistics_target; see for example
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 43ca5fd0213..f1bdd0c975d 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -78,6 +78,8 @@ static List *get_relation_constraints(PlannerInfo *root,
 									  bool include_partition);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 							   Relation heapRelation);
+static List *get_relation_virtual_gencols(PlannerInfo *root, RelOptInfo *rel,
+										  Relation relation);
 static List *get_relation_statistics(PlannerInfo *root, RelOptInfo *rel,
 									 Relation relation);
 static void set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
@@ -529,6 +531,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	/* Make list of virtual generated columns */
+	rel->virtual_gencols = get_relation_virtual_gencols(root, rel, relation);
+
 	rel->statlist = get_relation_statistics(root, rel, relation);
 
 	/* Grab foreign-table info using the relcache, while we have it */
@@ -1647,6 +1652,62 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_virtual_gencols
+ *		Retrieve virtual generated columns defined on the table.
+ *
+ * Returns a List (possibly empty) of VirtualGeneratedColumnInfoInfo objects
+ * containing the generation expressions. Each one has been processed by
+ * eval_const_expressions(), and its Vars are changed to have the varno
+ * indicated by rel->relid.  This allows the expressions to be easily
+ * compared to expressions taken from WHERE.
+ */
+static List *get_relation_virtual_gencols(PlannerInfo *root, RelOptInfo *rel,
+						Relation relation)
+{
+	TupleDesc	tupdesc = RelationGetDescr(relation);
+	Index		varno = rel->relid;
+	List	   *virtual_gencols = NIL;
+
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				VirtualGeneratedColumnInfo *info;
+
+				info = makeNode(VirtualGeneratedColumnInfo);
+				info->attno = attr->attnum;
+				info->expr = build_generation_expression(relation, attr->attnum);
+
+				/*
+				 * Run the expressions through eval_const_expressions. This is
+				 * not just an optimization, but is necessary, because the
+				 * planner will be comparing them to similarly-processed qual
+				 * clauses, and may fail to detect valid matches without this.
+				 * We must not use canonicalize_qual, however, since these
+				 * aren't qual expressions.
+				 */
+				info->expr = eval_const_expressions(NULL, info->expr);
+
+				/* May as well fix opfuncids too */
+				fix_opfuncids(info->expr);
+
+				/* Fix Vars to have the desired varno */
+				if (varno != 1)
+					ChangeVarNodes((Node *) info->expr, 1, varno, 0);
+
+				virtual_gencols = lappend(virtual_gencols, info);
+			}
+		}
+	}
+
+	return virtual_gencols;
+}
+
 /*
  * Try loading data for the statistics object.
  *
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 19778b773d2..ca46a134f26 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -82,15 +82,6 @@ static void statext_store(Oid statOid, bool inh,
 static int	statext_compute_stattarget(int stattarget,
 									   int nattrs, VacAttrStats **stats);
 
-/* Information needed to analyze a single simple expression. */
-typedef struct AnlExprData
-{
-	Node	   *expr;			/* expression to analyze */
-	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
-} AnlExprData;
-
-static void compute_expr_stats(Relation onerel, AnlExprData *exprdata,
-							   int nexprs, HeapTuple *rows, int numrows);
 static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs);
 static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static AnlExprData *build_expr_data(List *exprs, int stattarget);
@@ -2083,7 +2074,7 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 /*
  * Compute statistics about expressions of a relation.
  */
-static void
+void
 compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
 				   HeapTuple *rows, int numrows)
 {
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index a996f0c4939..f5e53056de7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5755,6 +5755,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		 */
 		ListCell   *ilist;
 		ListCell   *slist;
+		ListCell   *vlist;
 
 		/*
 		 * The nullingrels bits within the expression could prevent us from
@@ -5874,6 +5875,46 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				break;
 		}
 
+		/*
+		 * Search virtual generated columns for one with a matching expression
+		 * and use the statistics collected for it if we have.
+		 */
+		foreach(vlist, onerel->virtual_gencols)
+		{
+			VirtualGeneratedColumnInfo *info = (VirtualGeneratedColumnInfo *) lfirst(vlist);
+			Node *expr = info->expr;
+
+			/*
+			 * Stop once we've found statistics for the expression (either
+			 * for a virtual generated columns or an index in the preceding
+			 * loop).
+			 */
+			if (vardata->statsTuple)
+				break;
+
+			/* strip RelabelType before comparing it */
+			if (expr && IsA(expr, RelabelType))
+				expr = (Node *) ((RelabelType *) expr)->arg;
+
+			if (equal(node, expr))
+			{
+				Var	*var = makeVar(onerel->relid,
+								   info->attno,
+								   vardata->atttype,
+								   vardata->atttypmod,
+								   exprCollation(node),
+								   0);
+				/*
+				 * There cannot be a unique constraint on a virtual generated column.
+				 * Other fields other than the stats tuple must be already set.
+				 */
+				vardata->isunique = false;
+
+				/* Try to locate some stats */
+				examine_simple_variable(root, var, vardata);
+			}
+		}
+
 		/*
 		 * Search extended statistics for one with a matching expression.
 		 * There might be multiple ones, so just grab the first one. In the
@@ -5889,7 +5930,8 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 
 			/*
 			 * Stop once we've found statistics for the expression (either
-			 * from extended stats, or for an index in the preceding loop).
+			 * from extended stats, or for an index or a virtual generated
+			 * column in the preceding loop).
 			 */
 			if (vardata->statsTuple)
 				break;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b5ff456ef7f..901d9ed742d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -995,6 +995,8 @@ typedef struct RelOptInfo
 	List	   *indexlist;
 	/* list of StatisticExtInfo */
 	List	   *statlist;
+	/* list of VirtualGeneratedColumnInfo */
+	List	   *virtual_gencols;
 	/* size estimates derived from pg_class */
 	BlockNumber pages;
 	Cardinality tuples;
@@ -1434,6 +1436,23 @@ typedef struct StatisticExtInfo
 	List	   *exprs;
 } StatisticExtInfo;
 
+/*
+ * VirtualGeneratedColumnInfo
+ *		Information about virtual generated columns for planning/optimization
+ */
+typedef struct VirtualGeneratedColumnInfo
+{
+	pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/* attribute number of virtual generated column */
+	AttrNumber	attno;
+
+	/* generation expression */
+	Node	   *expr;
+} VirtualGeneratedColumnInfo;
+
 /*
  * JoinDomains
  *
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index efcb7dc3546..8eaea3b7566 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,6 +68,12 @@ typedef struct StatsBuildData
 	bool	  **nulls;
 } StatsBuildData;
 
+/* Information needed to analyze a single simple expression. */
+typedef struct AnlExprData
+{
+	Node	   *expr;			/* expression to analyze */
+	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
+} AnlExprData;
 
 extern MVNDistinct *statext_ndistinct_build(double totalrows, StatsBuildData *data);
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
@@ -127,4 +133,7 @@ extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root,
 											 Selectivity *overlap_basesel,
 											 Selectivity *totalsel);
 
+extern void
+compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
+				   HeapTuple *rows, int numrows);
 #endif							/* EXTENDED_STATS_INTERNAL_H */
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 249e68be654..61b858d4774 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1515,6 +1515,13 @@ create table gtest32 (
 );
 insert into gtest32 (a, f) values (1, 1), (2, 2);
 analyze gtest32;
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+ count 
+-------
+     6
+(1 row)
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index 81152b39a79..1c5881fb98e 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -825,6 +825,9 @@ create table gtest32 (
 insert into gtest32 (a, f) values (1, 1), (2, 2);
 analyze gtest32;
 
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
-- 
2.43.0


--Multipart=_Wed__31_Dec_2025_19_44_55_+0900_Ok0z3oC/bht_7Z=e--





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* [PATCH v2] Allow to collect statistics on virtual generated columns
@ 2025-07-18 00:58  Yugo Nagata <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: Yugo Nagata @ 2025-07-18 00:58 UTC (permalink / raw)

During ANALYZE, generation expressions are expanded, and statistics are
computed using compute_expr_stats(). To support this, both compute_expr_stats()
and AnlExprData are now exported from extended_stats.c.

To enable the optimizer to make use of these statistics, a new field named
virtual_gencols is added to RelOptInfo. This field holds the expressions of
virtual generated columns in the table. In examine_variable(), if an
expression in a WHERE clause matches a virtual generated column, the
corresponding statistics are used for that expression.
---
 doc/src/sgml/ref/alter_table.sgml             |  9 +--
 src/backend/commands/analyze.c                | 32 ++++++----
 src/backend/optimizer/util/plancat.c          | 60 +++++++++++++++++++
 src/backend/statistics/extended_stats.c       | 11 +---
 src/backend/utils/adt/selfuncs.c              | 44 +++++++++++++-
 src/include/nodes/pathnodes.h                 | 19 ++++++
 .../statistics/extended_stats_internal.h      |  9 +++
 .../regress/expected/generated_virtual.out    |  7 +++
 src/test/regress/sql/generated_virtual.sql    |  3 +
 9 files changed, 166 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 8867da6c693..bffb07775f7 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -210,8 +210,6 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       When this form is used, the column's statistics are removed,
       so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
@@ -275,12 +273,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      </para>
 
      <para>
-      When this form is used on a stored generated column, its statistics
-      are removed, so running
-      <link linkend="sql-analyze"><command>ANALYZE</command></link>
+      When this form is used, the column's statistics are removed,
+      so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 40d66537ad7..438f9595e8e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -41,6 +41,7 @@
 #include "parser/parse_oper.h"
 #include "parser/parse_relation.h"
 #include "pgstat.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -559,13 +560,28 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 		{
 			VacAttrStats *stats = vacattrstats[i];
 			AttributeOpts *aopt;
+			Form_pg_attribute attr = TupleDescAttr(onerel->rd_att, stats->tupattnum - 1);
 
-			stats->rows = rows;
-			stats->tupDesc = onerel->rd_att;
-			stats->compute_stats(stats,
-								 std_fetch_func,
-								 numrows,
-								 totalrows);
+			/*
+			 * For a virtual generated column, compute statistics for the expression value.
+			 */
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				AnlExprData *exprdata = (AnlExprData *) palloc0(sizeof(AnlExprData));
+
+				exprdata->expr = build_generation_expression(onerel, stats->tupattnum);
+				exprdata->vacattrstat = stats;
+				compute_expr_stats(onerel, exprdata, 1, rows, numrows);
+			}
+			else
+			{
+				stats->rows = rows;
+				stats->tupDesc = onerel->rd_att;
+				stats->compute_stats(stats,
+									 std_fetch_func,
+									 numrows,
+									 totalrows);
+			}
 
 			/*
 			 * If the appropriate flavor of the n_distinct option is
@@ -1049,10 +1065,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
 	if (attr->attisdropped)
 		return NULL;
 
-	/* Don't analyze virtual generated columns */
-	if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-		return NULL;
-
 	/*
 	 * Get attstattarget value.  Set to -1 if null.  (Analyze functions expect
 	 * -1 to mean use default_statistics_target; see for example
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c6a58afc5e5..68f7349ffc8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -77,6 +78,7 @@ static List *get_relation_constraints(PlannerInfo *root,
 									  bool include_partition);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 							   Relation heapRelation);
+static List *get_relation_virtual_gencols(RelOptInfo *rel, Relation relation);
 static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
 static void set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 										Relation relation);
@@ -508,6 +510,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	/* Make list of virtual generated columns */
+	rel->virtual_gencols = get_relation_virtual_gencols(rel, relation);
+
 	rel->statlist = get_relation_statistics(rel, relation);
 
 	/* Grab foreign-table info using the relcache, while we have it */
@@ -1482,6 +1487,61 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_virtual_gencols
+ *		Retrieve virtual generated columns defined on the table.
+ *
+ * Returns a List (possibly empty) of VirtualGeneratedColumnInfoInfo objects
+ * containing the generation expressions. Each one has been processed by
+ * eval_const_expressions(), and its Vars are changed to have the varno
+ * indicated by rel->relid.  This allows the expressions to be easily
+ * compared to expressions taken from WHERE.
+ */
+static List *get_relation_virtual_gencols(RelOptInfo *rel, Relation relation)
+{
+	TupleDesc	tupdesc = RelationGetDescr(relation);
+	Index		varno = rel->relid;
+	List	   *virtual_gencols = NIL;
+
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				VirtualGeneratedColumnInfo *info;
+
+				info = makeNode(VirtualGeneratedColumnInfo);
+				info->attno = attr->attnum;
+				info->expr = build_generation_expression(relation, attr->attnum);
+
+				/*
+				 * Run the expressions through eval_const_expressions. This is
+				 * not just an optimization, but is necessary, because the
+				 * planner will be comparing them to similarly-processed qual
+				 * clauses, and may fail to detect valid matches without this.
+				 * We must not use canonicalize_qual, however, since these
+				 * aren't qual expressions.
+				 */
+				info->expr = eval_const_expressions(NULL, info->expr);
+
+				/* May as well fix opfuncids too */
+				fix_opfuncids(info->expr);
+
+				/* Fix Vars to have the desired varno */
+				if (varno != 1)
+					ChangeVarNodes((Node *) info->expr, 1, varno, 0);
+
+				virtual_gencols = lappend(virtual_gencols, info);
+			}
+		}
+	}
+
+	return virtual_gencols;
+}
+
 /*
  * Try loading data for the statistics object.
  *
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index a8b63ec0884..e26352cacec 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -82,15 +82,6 @@ static void statext_store(Oid statOid, bool inh,
 static int	statext_compute_stattarget(int stattarget,
 									   int nattrs, VacAttrStats **stats);
 
-/* Information needed to analyze a single simple expression. */
-typedef struct AnlExprData
-{
-	Node	   *expr;			/* expression to analyze */
-	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
-} AnlExprData;
-
-static void compute_expr_stats(Relation onerel, AnlExprData *exprdata,
-							   int nexprs, HeapTuple *rows, int numrows);
 static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs);
 static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static AnlExprData *build_expr_data(List *exprs, int stattarget);
@@ -2103,7 +2094,7 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 /*
  * Compute statistics about expressions of a relation.
  */
-static void
+void
 compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
 				   HeapTuple *rows, int numrows)
 {
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 17fbfa9b410..00e3972ff4f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5408,6 +5408,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		 */
 		ListCell   *ilist;
 		ListCell   *slist;
+		ListCell   *vlist;
 		Oid			userid;
 
 		/*
@@ -5576,6 +5577,46 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				break;
 		}
 
+		/*
+		 * Search virtual generated columns for one with a matching expression
+		 * and use the statistics collected for it if we have.
+		 */
+		foreach(vlist, onerel->virtual_gencols)
+		{
+			VirtualGeneratedColumnInfo *info = (VirtualGeneratedColumnInfo *) lfirst(vlist);
+			Node *expr = info->expr;
+
+			/*
+			 * Stop once we've found statistics for the expression (either
+			 * for a virtual generated columns or an index in the preceding
+			 * loop).
+			 */
+			if (vardata->statsTuple)
+				break;
+
+			/* strip RelabelType before comparing it */
+			if (expr && IsA(expr, RelabelType))
+				expr = (Node *) ((RelabelType *) expr)->arg;
+
+			if (equal(node, expr))
+			{
+				Var	*var = makeVar(onerel->relid,
+								   info->attno,
+								   vardata->atttype,
+								   vardata->atttypmod,
+								   exprCollation(node),
+								   0);
+				/*
+				 * There cannot be a unique constraint on a virtual generated column.
+				 * Other fields other than the stats tuple must be already set.
+				 */
+				vardata->isunique = false;
+
+				/* Try to locate some stats */
+				examine_simple_variable(root, var, vardata);
+			}
+		}
+
 		/*
 		 * Search extended statistics for one with a matching expression.
 		 * There might be multiple ones, so just grab the first one. In the
@@ -5591,7 +5632,8 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 
 			/*
 			 * Stop once we've found statistics for the expression (either
-			 * from extended stats, or for an index in the preceding loop).
+			 * from extended stats, or for an index or a virtual generated
+			 * column in the preceding loop).
 			 */
 			if (vardata->statsTuple)
 				break;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ad2726f026f..433b7cc6d50 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -970,6 +970,8 @@ typedef struct RelOptInfo
 	List	   *indexlist;
 	/* list of StatisticExtInfo */
 	List	   *statlist;
+	/* list of VirtualGeneratedColumnInfo */
+	List	   *virtual_gencols;
 	/* size estimates derived from pg_class */
 	BlockNumber pages;
 	Cardinality tuples;
@@ -1322,6 +1324,23 @@ typedef struct StatisticExtInfo
 	List	   *exprs;
 } StatisticExtInfo;
 
+/*
+ * VirtualGeneratedColumnInfo
+ *		Information about virtual generated columns for planning/optimization
+ */
+typedef struct VirtualGeneratedColumnInfo
+{
+	pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/* attribute number of virtual generated column */
+	AttrNumber	attno;
+
+	/* generation expression */
+	Node	   *expr;
+} VirtualGeneratedColumnInfo;
+
 /*
  * JoinDomains
  *
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index efcb7dc3546..8eaea3b7566 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,6 +68,12 @@ typedef struct StatsBuildData
 	bool	  **nulls;
 } StatsBuildData;
 
+/* Information needed to analyze a single simple expression. */
+typedef struct AnlExprData
+{
+	Node	   *expr;			/* expression to analyze */
+	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
+} AnlExprData;
 
 extern MVNDistinct *statext_ndistinct_build(double totalrows, StatsBuildData *data);
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
@@ -127,4 +133,7 @@ extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root,
 											 Selectivity *overlap_basesel,
 											 Selectivity *totalsel);
 
+extern void
+compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
+				   HeapTuple *rows, int numrows);
 #endif							/* EXTENDED_STATS_INTERNAL_H */
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index aca6347babe..4469206dab0 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1493,6 +1493,13 @@ create table gtest32 (
 );
 insert into gtest32 values (1), (2);
 analyze gtest32;
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+ count 
+-------
+     5
+(1 row)
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index ba19bc4c701..523ab188dac 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -817,6 +817,9 @@ create table gtest32 (
 insert into gtest32 values (1), (2);
 analyze gtest32;
 
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
-- 
2.43.0


--Multipart=_Fri__8_Aug_2025_12_21_25_+0900_yNJX4hBoaUaCAKCp--





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* [PATCH v3] Allow to collect statistics on virtual generated columns
@ 2025-07-18 00:58  Yugo Nagata <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: Yugo Nagata @ 2025-07-18 00:58 UTC (permalink / raw)

During ANALYZE, generation expressions are expanded, and statistics are
computed using compute_expr_stats(). To support this, both compute_expr_stats()
and AnlExprData are now exported from extended_stats.c.

To enable the optimizer to make use of these statistics, a new field named
virtual_gencols is added to RelOptInfo. This field holds the expressions of
virtual generated columns in the table. In examine_variable(), if an
expression in a WHERE clause matches a virtual generated column, the
corresponding statistics are used for that expression.
---
 doc/src/sgml/ref/alter_table.sgml             |  9 +--
 src/backend/commands/analyze.c                | 32 ++++++----
 src/backend/optimizer/util/plancat.c          | 60 +++++++++++++++++++
 src/backend/statistics/extended_stats.c       | 11 +---
 src/backend/utils/adt/selfuncs.c              | 44 +++++++++++++-
 src/include/nodes/pathnodes.h                 | 19 ++++++
 .../statistics/extended_stats_internal.h      |  9 +++
 .../regress/expected/generated_virtual.out    |  7 +++
 src/test/regress/sql/generated_virtual.sql    |  3 +
 9 files changed, 166 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 8867da6c693..bffb07775f7 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -210,8 +210,6 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       When this form is used, the column's statistics are removed,
       so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
@@ -275,12 +273,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      </para>
 
      <para>
-      When this form is used on a stored generated column, its statistics
-      are removed, so running
-      <link linkend="sql-analyze"><command>ANALYZE</command></link>
+      When this form is used, the column's statistics are removed,
+      so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 40d66537ad7..438f9595e8e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -41,6 +41,7 @@
 #include "parser/parse_oper.h"
 #include "parser/parse_relation.h"
 #include "pgstat.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -559,13 +560,28 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 		{
 			VacAttrStats *stats = vacattrstats[i];
 			AttributeOpts *aopt;
+			Form_pg_attribute attr = TupleDescAttr(onerel->rd_att, stats->tupattnum - 1);
 
-			stats->rows = rows;
-			stats->tupDesc = onerel->rd_att;
-			stats->compute_stats(stats,
-								 std_fetch_func,
-								 numrows,
-								 totalrows);
+			/*
+			 * For a virtual generated column, compute statistics for the expression value.
+			 */
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				AnlExprData *exprdata = (AnlExprData *) palloc0(sizeof(AnlExprData));
+
+				exprdata->expr = build_generation_expression(onerel, stats->tupattnum);
+				exprdata->vacattrstat = stats;
+				compute_expr_stats(onerel, exprdata, 1, rows, numrows);
+			}
+			else
+			{
+				stats->rows = rows;
+				stats->tupDesc = onerel->rd_att;
+				stats->compute_stats(stats,
+									 std_fetch_func,
+									 numrows,
+									 totalrows);
+			}
 
 			/*
 			 * If the appropriate flavor of the n_distinct option is
@@ -1049,10 +1065,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
 	if (attr->attisdropped)
 		return NULL;
 
-	/* Don't analyze virtual generated columns */
-	if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-		return NULL;
-
 	/*
 	 * Get attstattarget value.  Set to -1 if null.  (Analyze functions expect
 	 * -1 to mean use default_statistics_target; see for example
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 6ce4efea118..db78e1b639a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -77,6 +78,7 @@ static List *get_relation_constraints(PlannerInfo *root,
 									  bool include_partition);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 							   Relation heapRelation);
+static List *get_relation_virtual_gencols(RelOptInfo *rel, Relation relation);
 static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
 static void set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 										Relation relation);
@@ -508,6 +510,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	/* Make list of virtual generated columns */
+	rel->virtual_gencols = get_relation_virtual_gencols(rel, relation);
+
 	rel->statlist = get_relation_statistics(rel, relation);
 
 	/* Grab foreign-table info using the relcache, while we have it */
@@ -1482,6 +1487,61 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_virtual_gencols
+ *		Retrieve virtual generated columns defined on the table.
+ *
+ * Returns a List (possibly empty) of VirtualGeneratedColumnInfoInfo objects
+ * containing the generation expressions. Each one has been processed by
+ * eval_const_expressions(), and its Vars are changed to have the varno
+ * indicated by rel->relid.  This allows the expressions to be easily
+ * compared to expressions taken from WHERE.
+ */
+static List *get_relation_virtual_gencols(RelOptInfo *rel, Relation relation)
+{
+	TupleDesc	tupdesc = RelationGetDescr(relation);
+	Index		varno = rel->relid;
+	List	   *virtual_gencols = NIL;
+
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				VirtualGeneratedColumnInfo *info;
+
+				info = makeNode(VirtualGeneratedColumnInfo);
+				info->attno = attr->attnum;
+				info->expr = build_generation_expression(relation, attr->attnum);
+
+				/*
+				 * Run the expressions through eval_const_expressions. This is
+				 * not just an optimization, but is necessary, because the
+				 * planner will be comparing them to similarly-processed qual
+				 * clauses, and may fail to detect valid matches without this.
+				 * We must not use canonicalize_qual, however, since these
+				 * aren't qual expressions.
+				 */
+				info->expr = eval_const_expressions(NULL, info->expr);
+
+				/* May as well fix opfuncids too */
+				fix_opfuncids(info->expr);
+
+				/* Fix Vars to have the desired varno */
+				if (varno != 1)
+					ChangeVarNodes((Node *) info->expr, 1, varno, 0);
+
+				virtual_gencols = lappend(virtual_gencols, info);
+			}
+		}
+	}
+
+	return virtual_gencols;
+}
+
 /*
  * Try loading data for the statistics object.
  *
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index af0b99243c6..21c86e8d21a 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -82,15 +82,6 @@ static void statext_store(Oid statOid, bool inh,
 static int	statext_compute_stattarget(int stattarget,
 									   int nattrs, VacAttrStats **stats);
 
-/* Information needed to analyze a single simple expression. */
-typedef struct AnlExprData
-{
-	Node	   *expr;			/* expression to analyze */
-	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
-} AnlExprData;
-
-static void compute_expr_stats(Relation onerel, AnlExprData *exprdata,
-							   int nexprs, HeapTuple *rows, int numrows);
 static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs);
 static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static AnlExprData *build_expr_data(List *exprs, int stattarget);
@@ -2083,7 +2074,7 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 /*
  * Compute statistics about expressions of a relation.
  */
-static void
+void
 compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
 				   HeapTuple *rows, int numrows)
 {
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1c480cfaaf7..b651cb83416 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5408,6 +5408,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		 */
 		ListCell   *ilist;
 		ListCell   *slist;
+		ListCell   *vlist;
 
 		/*
 		 * The nullingrels bits within the expression could prevent us from
@@ -5527,6 +5528,46 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				break;
 		}
 
+		/*
+		 * Search virtual generated columns for one with a matching expression
+		 * and use the statistics collected for it if we have.
+		 */
+		foreach(vlist, onerel->virtual_gencols)
+		{
+			VirtualGeneratedColumnInfo *info = (VirtualGeneratedColumnInfo *) lfirst(vlist);
+			Node *expr = info->expr;
+
+			/*
+			 * Stop once we've found statistics for the expression (either
+			 * for a virtual generated columns or an index in the preceding
+			 * loop).
+			 */
+			if (vardata->statsTuple)
+				break;
+
+			/* strip RelabelType before comparing it */
+			if (expr && IsA(expr, RelabelType))
+				expr = (Node *) ((RelabelType *) expr)->arg;
+
+			if (equal(node, expr))
+			{
+				Var	*var = makeVar(onerel->relid,
+								   info->attno,
+								   vardata->atttype,
+								   vardata->atttypmod,
+								   exprCollation(node),
+								   0);
+				/*
+				 * There cannot be a unique constraint on a virtual generated column.
+				 * Other fields other than the stats tuple must be already set.
+				 */
+				vardata->isunique = false;
+
+				/* Try to locate some stats */
+				examine_simple_variable(root, var, vardata);
+			}
+		}
+
 		/*
 		 * Search extended statistics for one with a matching expression.
 		 * There might be multiple ones, so just grab the first one. In the
@@ -5542,7 +5583,8 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 
 			/*
 			 * Stop once we've found statistics for the expression (either
-			 * from extended stats, or for an index in the preceding loop).
+			 * from extended stats, or for an index or a virtual generated
+			 * column in the preceding loop).
 			 */
 			if (vardata->statsTuple)
 				break;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..4f0e0bf22e0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -982,6 +982,8 @@ typedef struct RelOptInfo
 	List	   *indexlist;
 	/* list of StatisticExtInfo */
 	List	   *statlist;
+	/* list of VirtualGeneratedColumnInfo */
+	List	   *virtual_gencols;
 	/* size estimates derived from pg_class */
 	BlockNumber pages;
 	Cardinality tuples;
@@ -1355,6 +1357,23 @@ typedef struct StatisticExtInfo
 	List	   *exprs;
 } StatisticExtInfo;
 
+/*
+ * VirtualGeneratedColumnInfo
+ *		Information about virtual generated columns for planning/optimization
+ */
+typedef struct VirtualGeneratedColumnInfo
+{
+	pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/* attribute number of virtual generated column */
+	AttrNumber	attno;
+
+	/* generation expression */
+	Node	   *expr;
+} VirtualGeneratedColumnInfo;
+
 /*
  * JoinDomains
  *
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index efcb7dc3546..8eaea3b7566 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,6 +68,12 @@ typedef struct StatsBuildData
 	bool	  **nulls;
 } StatsBuildData;
 
+/* Information needed to analyze a single simple expression. */
+typedef struct AnlExprData
+{
+	Node	   *expr;			/* expression to analyze */
+	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
+} AnlExprData;
 
 extern MVNDistinct *statext_ndistinct_build(double totalrows, StatsBuildData *data);
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
@@ -127,4 +133,7 @@ extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root,
 											 Selectivity *overlap_basesel,
 											 Selectivity *totalsel);
 
+extern void
+compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
+				   HeapTuple *rows, int numrows);
 #endif							/* EXTENDED_STATS_INTERNAL_H */
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index aca6347babe..4469206dab0 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1493,6 +1493,13 @@ create table gtest32 (
 );
 insert into gtest32 values (1), (2);
 analyze gtest32;
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+ count 
+-------
+     5
+(1 row)
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index ba19bc4c701..523ab188dac 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -817,6 +817,9 @@ create table gtest32 (
 insert into gtest32 values (1), (2);
 analyze gtest32;
 
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
-- 
2.43.0


--Multipart=_Wed__20_Aug_2025_14_10_28_+0900_u9fdIbTgN.58HH3l--





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* [PATCH] Allow to collect statistics on virtual generated columns
@ 2025-07-18 00:58  Yugo Nagata <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: Yugo Nagata @ 2025-07-18 00:58 UTC (permalink / raw)

During ANALYZE, generation expressions are expanded, and statistics are
computed using compute_expr_stats(). To support this, both compute_expr_stats()
and AnlExprData are now exported from extended_stats.c.

To enable the optimizer to make use of these statistics, a new field named
virtual_gencols is added to RelOptInfo. This field holds the expressions of
virtual generated columns in the table. In examine_variable(), if an
expression in a WHERE clause matches a virtual generated column, the
corresponding statistics are used for that expression.
---
 src/backend/commands/analyze.c                | 32 ++++++----
 src/backend/optimizer/util/plancat.c          | 60 +++++++++++++++++++
 src/backend/statistics/extended_stats.c       | 11 +---
 src/backend/utils/adt/selfuncs.c              | 44 +++++++++++++-
 src/include/nodes/pathnodes.h                 | 19 ++++++
 .../statistics/extended_stats_internal.h      |  9 +++
 6 files changed, 154 insertions(+), 21 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7111d5d5334..553f1bb52db 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -41,6 +41,7 @@
 #include "parser/parse_oper.h"
 #include "parser/parse_relation.h"
 #include "pgstat.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -559,13 +560,28 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 		{
 			VacAttrStats *stats = vacattrstats[i];
 			AttributeOpts *aopt;
+			Form_pg_attribute attr = TupleDescAttr(onerel->rd_att, stats->tupattnum - 1);
 
-			stats->rows = rows;
-			stats->tupDesc = onerel->rd_att;
-			stats->compute_stats(stats,
-								 std_fetch_func,
-								 numrows,
-								 totalrows);
+			/*
+			 * For a virtual generated column, compute statistics for the expression value.
+			 */
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				AnlExprData *exprdata = (AnlExprData *) palloc0(sizeof(AnlExprData));
+
+				exprdata->expr = build_generation_expression(onerel, stats->tupattnum);
+				exprdata->vacattrstat = stats;
+				compute_expr_stats(onerel, exprdata, 1, rows, numrows);
+			}
+			else
+			{
+				stats->rows = rows;
+				stats->tupDesc = onerel->rd_att;
+				stats->compute_stats(stats,
+									 std_fetch_func,
+									 numrows,
+									 totalrows);
+			}
 
 			/*
 			 * If the appropriate flavor of the n_distinct option is
@@ -1049,10 +1065,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
 	if (attr->attisdropped)
 		return NULL;
 
-	/* Don't analyze virtual generated columns */
-	if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-		return NULL;
-
 	/*
 	 * Get attstattarget value.  Set to -1 if null.  (Analyze functions expect
 	 * -1 to mean use default_statistics_target; see for example
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index c6a58afc5e5..68f7349ffc8 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -77,6 +78,7 @@ static List *get_relation_constraints(PlannerInfo *root,
 									  bool include_partition);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 							   Relation heapRelation);
+static List *get_relation_virtual_gencols(RelOptInfo *rel, Relation relation);
 static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
 static void set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 										Relation relation);
@@ -508,6 +510,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	/* Make list of virtual generated columns */
+	rel->virtual_gencols = get_relation_virtual_gencols(rel, relation);
+
 	rel->statlist = get_relation_statistics(rel, relation);
 
 	/* Grab foreign-table info using the relcache, while we have it */
@@ -1482,6 +1487,61 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_virtual_gencols
+ *		Retrieve virtual generated columns defined on the table.
+ *
+ * Returns a List (possibly empty) of VirtualGeneratedColumnInfoInfo objects
+ * containing the generation expressions. Each one has been processed by
+ * eval_const_expressions(), and its Vars are changed to have the varno
+ * indicated by rel->relid.  This allows the expressions to be easily
+ * compared to expressions taken from WHERE.
+ */
+static List *get_relation_virtual_gencols(RelOptInfo *rel, Relation relation)
+{
+	TupleDesc	tupdesc = RelationGetDescr(relation);
+	Index		varno = rel->relid;
+	List	   *virtual_gencols = NIL;
+
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				VirtualGeneratedColumnInfo *info;
+
+				info = makeNode(VirtualGeneratedColumnInfo);
+				info->attno = attr->attnum;
+				info->expr = build_generation_expression(relation, attr->attnum);
+
+				/*
+				 * Run the expressions through eval_const_expressions. This is
+				 * not just an optimization, but is necessary, because the
+				 * planner will be comparing them to similarly-processed qual
+				 * clauses, and may fail to detect valid matches without this.
+				 * We must not use canonicalize_qual, however, since these
+				 * aren't qual expressions.
+				 */
+				info->expr = eval_const_expressions(NULL, info->expr);
+
+				/* May as well fix opfuncids too */
+				fix_opfuncids(info->expr);
+
+				/* Fix Vars to have the desired varno */
+				if (varno != 1)
+					ChangeVarNodes((Node *) info->expr, 1, varno, 0);
+
+				virtual_gencols = lappend(virtual_gencols, info);
+			}
+		}
+	}
+
+	return virtual_gencols;
+}
+
 /*
  * Try loading data for the statistics object.
  *
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index a8b63ec0884..e26352cacec 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -82,15 +82,6 @@ static void statext_store(Oid statOid, bool inh,
 static int	statext_compute_stattarget(int stattarget,
 									   int nattrs, VacAttrStats **stats);
 
-/* Information needed to analyze a single simple expression. */
-typedef struct AnlExprData
-{
-	Node	   *expr;			/* expression to analyze */
-	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
-} AnlExprData;
-
-static void compute_expr_stats(Relation onerel, AnlExprData *exprdata,
-							   int nexprs, HeapTuple *rows, int numrows);
 static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs);
 static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static AnlExprData *build_expr_data(List *exprs, int stattarget);
@@ -2103,7 +2094,7 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 /*
  * Compute statistics about expressions of a relation.
  */
-static void
+void
 compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
 				   HeapTuple *rows, int numrows)
 {
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 17fbfa9b410..00e3972ff4f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5408,6 +5408,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		 */
 		ListCell   *ilist;
 		ListCell   *slist;
+		ListCell   *vlist;
 		Oid			userid;
 
 		/*
@@ -5576,6 +5577,46 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				break;
 		}
 
+		/*
+		 * Search virtual generated columns for one with a matching expression
+		 * and use the statistics collected for it if we have.
+		 */
+		foreach(vlist, onerel->virtual_gencols)
+		{
+			VirtualGeneratedColumnInfo *info = (VirtualGeneratedColumnInfo *) lfirst(vlist);
+			Node *expr = info->expr;
+
+			/*
+			 * Stop once we've found statistics for the expression (either
+			 * for a virtual generated columns or an index in the preceding
+			 * loop).
+			 */
+			if (vardata->statsTuple)
+				break;
+
+			/* strip RelabelType before comparing it */
+			if (expr && IsA(expr, RelabelType))
+				expr = (Node *) ((RelabelType *) expr)->arg;
+
+			if (equal(node, expr))
+			{
+				Var	*var = makeVar(onerel->relid,
+								   info->attno,
+								   vardata->atttype,
+								   vardata->atttypmod,
+								   exprCollation(node),
+								   0);
+				/*
+				 * There cannot be a unique constraint on a virtual generated column.
+				 * Other fields other than the stats tuple must be already set.
+				 */
+				vardata->isunique = false;
+
+				/* Try to locate some stats */
+				examine_simple_variable(root, var, vardata);
+			}
+		}
+
 		/*
 		 * Search extended statistics for one with a matching expression.
 		 * There might be multiple ones, so just grab the first one. In the
@@ -5591,7 +5632,8 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 
 			/*
 			 * Stop once we've found statistics for the expression (either
-			 * from extended stats, or for an index in the preceding loop).
+			 * from extended stats, or for an index or a virtual generated
+			 * column in the preceding loop).
 			 */
 			if (vardata->statsTuple)
 				break;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e5dd15098f6..0c173f307b7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -970,6 +970,8 @@ typedef struct RelOptInfo
 	List	   *indexlist;
 	/* list of StatisticExtInfo */
 	List	   *statlist;
+	/* list of VirtualGeneratedColumnInfo */
+	List	   *virtual_gencols;
 	/* size estimates derived from pg_class */
 	BlockNumber pages;
 	Cardinality tuples;
@@ -1322,6 +1324,23 @@ typedef struct StatisticExtInfo
 	List	   *exprs;
 } StatisticExtInfo;
 
+/*
+ * VirtualGeneratedColumnInfo
+ *		Information about virtual generated columns for planning/optimization
+ */
+typedef struct VirtualGeneratedColumnInfo
+{
+	pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/* attribute number of virtual generated column */
+	AttrNumber	attno;
+
+	/* generation expression */
+	Node	   *expr;
+} VirtualGeneratedColumnInfo;
+
 /*
  * JoinDomains
  *
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index efcb7dc3546..8eaea3b7566 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,6 +68,12 @@ typedef struct StatsBuildData
 	bool	  **nulls;
 } StatsBuildData;
 
+/* Information needed to analyze a single simple expression. */
+typedef struct AnlExprData
+{
+	Node	   *expr;			/* expression to analyze */
+	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
+} AnlExprData;
 
 extern MVNDistinct *statext_ndistinct_build(double totalrows, StatsBuildData *data);
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
@@ -127,4 +133,7 @@ extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root,
 											 Selectivity *overlap_basesel,
 											 Selectivity *totalsel);
 
+extern void
+compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
+				   HeapTuple *rows, int numrows);
 #endif							/* EXTENDED_STATS_INTERNAL_H */
-- 
2.43.0


--Multipart=_Fri__1_Aug_2025_00_28_30_+0900_PkRV9PSGNhLRtsZF--





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* [PATCH v4] Allow to collect statistics on virtual generated columns
@ 2025-07-18 00:58  Yugo Nagata <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: Yugo Nagata @ 2025-07-18 00:58 UTC (permalink / raw)

During ANALYZE, generation expressions are expanded, and statistics are
computed using compute_expr_stats(). To support this, both compute_expr_stats()
and AnlExprData are now exported from extended_stats.c.

To enable the optimizer to make use of these statistics, a new field named
virtual_gencols is added to RelOptInfo. This field holds the expressions of
virtual generated columns in the table. In examine_variable(), if an
expression in a WHERE clause matches a virtual generated column, the
corresponding statistics are used for that expression.
---
 doc/src/sgml/ref/alter_table.sgml             |  9 +--
 src/backend/commands/analyze.c                | 32 +++++++---
 src/backend/optimizer/util/plancat.c          | 62 +++++++++++++++++++
 src/backend/statistics/extended_stats.c       | 11 +---
 src/backend/utils/adt/selfuncs.c              | 44 ++++++++++++-
 src/include/nodes/pathnodes.h                 | 19 ++++++
 .../statistics/extended_stats_internal.h      |  9 +++
 .../regress/expected/generated_virtual.out    |  7 +++
 src/test/regress/sql/generated_virtual.sql    |  3 +
 9 files changed, 168 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 8867da6c693..bffb07775f7 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -210,8 +210,6 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       When this form is used, the column's statistics are removed,
       so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
@@ -275,12 +273,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      </para>
 
      <para>
-      When this form is used on a stored generated column, its statistics
-      are removed, so running
-      <link linkend="sql-analyze"><command>ANALYZE</command></link>
+      When this form is used, the column's statistics are removed,
+      so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
-      For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8ea2913d906..c53e0f0eacf 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -40,6 +40,7 @@
 #include "parser/parse_oper.h"
 #include "parser/parse_relation.h"
 #include "pgstat.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -558,13 +559,28 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
 		{
 			VacAttrStats *stats = vacattrstats[i];
 			AttributeOpts *aopt;
+			Form_pg_attribute attr = TupleDescAttr(onerel->rd_att, stats->tupattnum - 1);
 
-			stats->rows = rows;
-			stats->tupDesc = onerel->rd_att;
-			stats->compute_stats(stats,
-								 std_fetch_func,
-								 numrows,
-								 totalrows);
+			/*
+			 * For a virtual generated column, compute statistics for the expression value.
+			 */
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				AnlExprData *exprdata = (AnlExprData *) palloc0(sizeof(AnlExprData));
+
+				exprdata->expr = build_generation_expression(onerel, stats->tupattnum);
+				exprdata->vacattrstat = stats;
+				compute_expr_stats(onerel, exprdata, 1, rows, numrows);
+			}
+			else
+			{
+				stats->rows = rows;
+				stats->tupDesc = onerel->rd_att;
+				stats->compute_stats(stats,
+									 std_fetch_func,
+									 numrows,
+									 totalrows);
+			}
 
 			/*
 			 * If the appropriate flavor of the n_distinct option is
@@ -1048,10 +1064,6 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
 	if (attr->attisdropped)
 		return NULL;
 
-	/* Don't analyze virtual generated columns */
-	if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-		return NULL;
-
 	/*
 	 * Get attstattarget value.  Set to -1 if null.  (Analyze functions expect
 	 * -1 to mean use default_statistics_target; see for example
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 4536bdd6cb4..c2ee90c9d35 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -42,6 +42,7 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -77,6 +78,8 @@ static List *get_relation_constraints(PlannerInfo *root,
 									  bool include_partition);
 static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
 							   Relation heapRelation);
+static List *get_relation_virtual_gencols(PlannerInfo *root, RelOptInfo *rel,
+										  Relation relation);
 static List *get_relation_statistics(PlannerInfo *root, RelOptInfo *rel,
 									 Relation relation);
 static void set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
@@ -509,6 +512,9 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	rel->indexlist = indexinfos;
 
+	/* Make list of virtual generated columns */
+	rel->virtual_gencols = get_relation_virtual_gencols(root, rel, relation);
+
 	rel->statlist = get_relation_statistics(root, rel, relation);
 
 	/* Grab foreign-table info using the relcache, while we have it */
@@ -1487,6 +1493,62 @@ get_relation_constraints(PlannerInfo *root,
 	return result;
 }
 
+/*
+ * get_relation_virtual_gencols
+ *		Retrieve virtual generated columns defined on the table.
+ *
+ * Returns a List (possibly empty) of VirtualGeneratedColumnInfoInfo objects
+ * containing the generation expressions. Each one has been processed by
+ * eval_const_expressions(), and its Vars are changed to have the varno
+ * indicated by rel->relid.  This allows the expressions to be easily
+ * compared to expressions taken from WHERE.
+ */
+static List *get_relation_virtual_gencols(PlannerInfo *root, RelOptInfo *rel,
+						Relation relation)
+{
+	TupleDesc	tupdesc = RelationGetDescr(relation);
+	Index		varno = rel->relid;
+	List	   *virtual_gencols = NIL;
+
+	if (tupdesc->constr && tupdesc->constr->has_generated_virtual)
+	{
+		for (int i = 0; i < tupdesc->natts; i++)
+		{
+			Form_pg_attribute attr = TupleDescAttr(tupdesc, i);
+
+			if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				VirtualGeneratedColumnInfo *info;
+
+				info = makeNode(VirtualGeneratedColumnInfo);
+				info->attno = attr->attnum;
+				info->expr = build_generation_expression(relation, attr->attnum);
+
+				/*
+				 * Run the expressions through eval_const_expressions. This is
+				 * not just an optimization, but is necessary, because the
+				 * planner will be comparing them to similarly-processed qual
+				 * clauses, and may fail to detect valid matches without this.
+				 * We must not use canonicalize_qual, however, since these
+				 * aren't qual expressions.
+				 */
+				info->expr = eval_const_expressions(NULL, info->expr);
+
+				/* May as well fix opfuncids too */
+				fix_opfuncids(info->expr);
+
+				/* Fix Vars to have the desired varno */
+				if (varno != 1)
+					ChangeVarNodes((Node *) info->expr, 1, varno, 0);
+
+				virtual_gencols = lappend(virtual_gencols, info);
+			}
+		}
+	}
+
+	return virtual_gencols;
+}
+
 /*
  * Try loading data for the statistics object.
  *
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index af0b99243c6..21c86e8d21a 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -82,15 +82,6 @@ static void statext_store(Oid statOid, bool inh,
 static int	statext_compute_stattarget(int stattarget,
 									   int nattrs, VacAttrStats **stats);
 
-/* Information needed to analyze a single simple expression. */
-typedef struct AnlExprData
-{
-	Node	   *expr;			/* expression to analyze */
-	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
-} AnlExprData;
-
-static void compute_expr_stats(Relation onerel, AnlExprData *exprdata,
-							   int nexprs, HeapTuple *rows, int numrows);
 static Datum serialize_expr_stats(AnlExprData *exprdata, int nexprs);
 static Datum expr_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static AnlExprData *build_expr_data(List *exprs, int stattarget);
@@ -2083,7 +2074,7 @@ examine_opclause_args(List *args, Node **exprp, Const **cstp,
 /*
  * Compute statistics about expressions of a relation.
  */
-static void
+void
 compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
 				   HeapTuple *rows, int numrows)
 {
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1c480cfaaf7..b651cb83416 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5408,6 +5408,7 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 		 */
 		ListCell   *ilist;
 		ListCell   *slist;
+		ListCell   *vlist;
 
 		/*
 		 * The nullingrels bits within the expression could prevent us from
@@ -5527,6 +5528,46 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 				break;
 		}
 
+		/*
+		 * Search virtual generated columns for one with a matching expression
+		 * and use the statistics collected for it if we have.
+		 */
+		foreach(vlist, onerel->virtual_gencols)
+		{
+			VirtualGeneratedColumnInfo *info = (VirtualGeneratedColumnInfo *) lfirst(vlist);
+			Node *expr = info->expr;
+
+			/*
+			 * Stop once we've found statistics for the expression (either
+			 * for a virtual generated columns or an index in the preceding
+			 * loop).
+			 */
+			if (vardata->statsTuple)
+				break;
+
+			/* strip RelabelType before comparing it */
+			if (expr && IsA(expr, RelabelType))
+				expr = (Node *) ((RelabelType *) expr)->arg;
+
+			if (equal(node, expr))
+			{
+				Var	*var = makeVar(onerel->relid,
+								   info->attno,
+								   vardata->atttype,
+								   vardata->atttypmod,
+								   exprCollation(node),
+								   0);
+				/*
+				 * There cannot be a unique constraint on a virtual generated column.
+				 * Other fields other than the stats tuple must be already set.
+				 */
+				vardata->isunique = false;
+
+				/* Try to locate some stats */
+				examine_simple_variable(root, var, vardata);
+			}
+		}
+
 		/*
 		 * Search extended statistics for one with a matching expression.
 		 * There might be multiple ones, so just grab the first one. In the
@@ -5542,7 +5583,8 @@ examine_variable(PlannerInfo *root, Node *node, int varRelid,
 
 			/*
 			 * Stop once we've found statistics for the expression (either
-			 * from extended stats, or for an index in the preceding loop).
+			 * from extended stats, or for an index or a virtual generated
+			 * column in the preceding loop).
 			 */
 			if (vardata->statsTuple)
 				break;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..4f0e0bf22e0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -982,6 +982,8 @@ typedef struct RelOptInfo
 	List	   *indexlist;
 	/* list of StatisticExtInfo */
 	List	   *statlist;
+	/* list of VirtualGeneratedColumnInfo */
+	List	   *virtual_gencols;
 	/* size estimates derived from pg_class */
 	BlockNumber pages;
 	Cardinality tuples;
@@ -1355,6 +1357,23 @@ typedef struct StatisticExtInfo
 	List	   *exprs;
 } StatisticExtInfo;
 
+/*
+ * VirtualGeneratedColumnInfo
+ *		Information about virtual generated columns for planning/optimization
+ */
+typedef struct VirtualGeneratedColumnInfo
+{
+	pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/* attribute number of virtual generated column */
+	AttrNumber	attno;
+
+	/* generation expression */
+	Node	   *expr;
+} VirtualGeneratedColumnInfo;
+
 /*
  * JoinDomains
  *
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index efcb7dc3546..8eaea3b7566 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -68,6 +68,12 @@ typedef struct StatsBuildData
 	bool	  **nulls;
 } StatsBuildData;
 
+/* Information needed to analyze a single simple expression. */
+typedef struct AnlExprData
+{
+	Node	   *expr;			/* expression to analyze */
+	VacAttrStats *vacattrstat;	/* statistics attrs to analyze */
+} AnlExprData;
 
 extern MVNDistinct *statext_ndistinct_build(double totalrows, StatsBuildData *data);
 extern bytea *statext_ndistinct_serialize(MVNDistinct *ndistinct);
@@ -127,4 +133,7 @@ extern Selectivity mcv_clause_selectivity_or(PlannerInfo *root,
 											 Selectivity *overlap_basesel,
 											 Selectivity *totalsel);
 
+extern void
+compute_expr_stats(Relation onerel, AnlExprData *exprdata, int nexprs,
+				   HeapTuple *rows, int numrows);
 #endif							/* EXTENDED_STATS_INTERNAL_H */
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index aca6347babe..4469206dab0 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -1493,6 +1493,13 @@ create table gtest32 (
 );
 insert into gtest32 values (1), (2);
 analyze gtest32;
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+ count 
+-------
+     5
+(1 row)
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index ba19bc4c701..523ab188dac 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -817,6 +817,9 @@ create table gtest32 (
 insert into gtest32 values (1), (2);
 analyze gtest32;
 
+-- Ensure that statistics on virtual generated column are available
+select count(*) from pg_stats where tablename = 'gtest32';
+
 -- Ensure that nullingrel bits are propagated into the generation expressions
 explain (costs off)
 select sum(t2.b) over (partition by t2.a),
-- 
2.43.0


--Multipart=_Tue__2_Sep_2025_16_33_41_+0900_LULRV8Lnk+KDzwmW--





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2025-07-31 15:28  Yugo Nagata <[email protected]>
  parent: Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Yugo Nagata @ 2025-07-31 15:28 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

Hi,

On Tue, 24 Jun 2025 17:05:33 +0900
Yugo Nagata <[email protected]> wrote:

> Instead, I'm thinking of an alternative approach: expanding the expression
> at the time statistics are collected.

I've attached a new patch in this approache.

This allows to collect statistics on virtual generated columns.

During ANALYZE, generation expressions are expanded, and statistics are computed
using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
are now exported from extended_stats.c. However, since they are no longer specific
to extended statistics, it might be better to move them to analyze.c and vacuum.h.

To enable the optimizer to make use of these statistics, a new field named
virtual_gencols is added to RelOptInfo. This field holds the expressions of
virtual generated columns in the table. In examine_variable(), if an expression
in a WHERE clause matches a virtual generated column, the corresponding statistics
are used for that expression.

Example:

- Before applying the patch, the cardinality estimate is erroneous.

test=# create table t (i int, j int generated always as (i*10) virtual);
CREATE TABLE
test=# insert into t select generate_series(1,1000);
INSERT 0 1000
test=# insert into t select 1 from generate_series(1,1000);
INSERT 0 1000
test=# analyze t;
ANALYZE
test=# explain analyze select * from t where j = 10;
                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
   Filter: ((i * 10) = 10)
   Rows Removed by Filter: 999
   Buffers: shared hit=9
 Planning:
   Buffers: shared hit=10
 Planning Time: 0.299 ms
 Execution Time: 0.948 ms
(8 rows)


- After applying the patch, the cardinality estimate is correct.

test=# analyze t;
ANALYZE
test=# explain analyze select * from t where j = 10;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
   Filter: ((i * 10) = 10)
   Rows Removed by Filter: 999
   Buffers: shared hit=9
 Planning:
   Buffers: shared hit=6
 Planning Time: 0.374 ms
 Execution Time: 1.028 ms
(8 rows)


Note that the patch is still a work in progress, so documentation and tests are not included.

Regards,
Yugo Nagata


-- 
Yugo Nagata <[email protected]>


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2025-08-08 03:21  Yugo Nagata <[email protected]>
  parent: Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Yugo Nagata @ 2025-08-08 03:21 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Fri, 1 Aug 2025 00:28:30 +0900
Yugo Nagata <[email protected]> wrote:

> Hi,
> 
> On Tue, 24 Jun 2025 17:05:33 +0900
> Yugo Nagata <[email protected]> wrote:
> 
> > Instead, I'm thinking of an alternative approach: expanding the expression
> > at the time statistics are collected.
> 
> I've attached a new patch in this approache.
> 
> This allows to collect statistics on virtual generated columns.
> 
> During ANALYZE, generation expressions are expanded, and statistics are computed
> using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
> are now exported from extended_stats.c. However, since they are no longer specific
> to extended statistics, it might be better to move them to analyze.c and vacuum.h.
> 
> To enable the optimizer to make use of these statistics, a new field named
> virtual_gencols is added to RelOptInfo. This field holds the expressions of
> virtual generated columns in the table. In examine_variable(), if an expression
> in a WHERE clause matches a virtual generated column, the corresponding statistics
> are used for that expression.
> 
> Example:
> 
> - Before applying the patch, the cardinality estimate is erroneous.
> 
> test=# create table t (i int, j int generated always as (i*10) virtual);
> CREATE TABLE
> test=# insert into t select generate_series(1,1000);
> INSERT 0 1000
> test=# insert into t select 1 from generate_series(1,1000);
> INSERT 0 1000
> test=# analyze t;
> ANALYZE
> test=# explain analyze select * from t where j = 10;
>                                            QUERY PLAN                                            
> -------------------------------------------------------------------------------------------------
>  Seq Scan on t  (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
>    Filter: ((i * 10) = 10)
>    Rows Removed by Filter: 999
>    Buffers: shared hit=9
>  Planning:
>    Buffers: shared hit=10
>  Planning Time: 0.299 ms
>  Execution Time: 0.948 ms
> (8 rows)
> 
> 
> - After applying the patch, the cardinality estimate is correct.
> 
> test=# analyze t;
> ANALYZE
> test=# explain analyze select * from t where j = 10;
>                                              QUERY PLAN                                              
> -----------------------------------------------------------------------------------------------------
>  Seq Scan on t  (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
>    Filter: ((i * 10) = 10)
>    Rows Removed by Filter: 999
>    Buffers: shared hit=9
>  Planning:
>    Buffers: shared hit=6
>  Planning Time: 0.374 ms
>  Execution Time: 1.028 ms
> (8 rows)
> 
> 
> Note that the patch is still a work in progress, so documentation and tests are not included.

I've attached an updated patch.

I modified the documentation to remove the statement that virtual generated columns
do not have statistics.

In addition, I added a test to ensure that statistics on virtual generated columns
are available.

Regards,
Yugo Nagata

-- 
Yugo Nagata <[email protected]>


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2025-08-20 05:10  Yugo Nagata <[email protected]>
  parent: Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Yugo Nagata @ 2025-08-20 05:10 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Fri, 8 Aug 2025 12:21:25 +0900
Yugo Nagata <[email protected]> wrote:

> On Fri, 1 Aug 2025 00:28:30 +0900
> Yugo Nagata <[email protected]> wrote:
> 
> > Hi,
> > 
> > On Tue, 24 Jun 2025 17:05:33 +0900
> > Yugo Nagata <[email protected]> wrote:
> > 
> > > Instead, I'm thinking of an alternative approach: expanding the expression
> > > at the time statistics are collected.
> > 
> > I've attached a new patch in this approache.
> > 
> > This allows to collect statistics on virtual generated columns.
> > 
> > During ANALYZE, generation expressions are expanded, and statistics are computed
> > using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
> > are now exported from extended_stats.c. However, since they are no longer specific
> > to extended statistics, it might be better to move them to analyze.c and vacuum.h.
> > 
> > To enable the optimizer to make use of these statistics, a new field named
> > virtual_gencols is added to RelOptInfo. This field holds the expressions of
> > virtual generated columns in the table. In examine_variable(), if an expression
> > in a WHERE clause matches a virtual generated column, the corresponding statistics
> > are used for that expression.
> > 
> > Example:
> > 
> > - Before applying the patch, the cardinality estimate is erroneous.
> > 
> > test=# create table t (i int, j int generated always as (i*10) virtual);
> > CREATE TABLE
> > test=# insert into t select generate_series(1,1000);
> > INSERT 0 1000
> > test=# insert into t select 1 from generate_series(1,1000);
> > INSERT 0 1000
> > test=# analyze t;
> > ANALYZE
> > test=# explain analyze select * from t where j = 10;
> >                                            QUERY PLAN                                            
> > -------------------------------------------------------------------------------------------------
> >  Seq Scan on t  (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
> >    Filter: ((i * 10) = 10)
> >    Rows Removed by Filter: 999
> >    Buffers: shared hit=9
> >  Planning:
> >    Buffers: shared hit=10
> >  Planning Time: 0.299 ms
> >  Execution Time: 0.948 ms
> > (8 rows)
> > 
> > 
> > - After applying the patch, the cardinality estimate is correct.
> > 
> > test=# analyze t;
> > ANALYZE
> > test=# explain analyze select * from t where j = 10;
> >                                              QUERY PLAN                                              
> > -----------------------------------------------------------------------------------------------------
> >  Seq Scan on t  (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
> >    Filter: ((i * 10) = 10)
> >    Rows Removed by Filter: 999
> >    Buffers: shared hit=9
> >  Planning:
> >    Buffers: shared hit=6
> >  Planning Time: 0.374 ms
> >  Execution Time: 1.028 ms
> > (8 rows)
> > 
> > 
> > Note that the patch is still a work in progress, so documentation and tests are not included.
> 
> I've attached an updated patch.
> 
> I modified the documentation to remove the statement that virtual generated columns
> do not have statistics.
> 
> In addition, I added a test to ensure that statistics on virtual generated columns
> are available.

I've attached a rebased patch.

Regards,
Yugo Nagata

-- 
Yugo Nagata <[email protected]>


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2025-09-02 07:33  Yugo Nagata <[email protected]>
  parent: Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Yugo Nagata @ 2025-09-02 07:33 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Wed, 20 Aug 2025 14:10:28 +0900
Yugo Nagata <[email protected]> wrote:

> On Fri, 8 Aug 2025 12:21:25 +0900
> Yugo Nagata <[email protected]> wrote:
> 
> > On Fri, 1 Aug 2025 00:28:30 +0900
> > Yugo Nagata <[email protected]> wrote:
> > 
> > > Hi,
> > > 
> > > On Tue, 24 Jun 2025 17:05:33 +0900
> > > Yugo Nagata <[email protected]> wrote:
> > > 
> > > > Instead, I'm thinking of an alternative approach: expanding the expression
> > > > at the time statistics are collected.
> > > 
> > > I've attached a new patch in this approache.
> > > 
> > > This allows to collect statistics on virtual generated columns.
> > > 
> > > During ANALYZE, generation expressions are expanded, and statistics are computed
> > > using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
> > > are now exported from extended_stats.c. However, since they are no longer specific
> > > to extended statistics, it might be better to move them to analyze.c and vacuum.h.
> > > 
> > > To enable the optimizer to make use of these statistics, a new field named
> > > virtual_gencols is added to RelOptInfo. This field holds the expressions of
> > > virtual generated columns in the table. In examine_variable(), if an expression
> > > in a WHERE clause matches a virtual generated column, the corresponding statistics
> > > are used for that expression.
> > > 
> > > Example:
> > > 
> > > - Before applying the patch, the cardinality estimate is erroneous.
> > > 
> > > test=# create table t (i int, j int generated always as (i*10) virtual);
> > > CREATE TABLE
> > > test=# insert into t select generate_series(1,1000);
> > > INSERT 0 1000
> > > test=# insert into t select 1 from generate_series(1,1000);
> > > INSERT 0 1000
> > > test=# analyze t;
> > > ANALYZE
> > > test=# explain analyze select * from t where j = 10;
> > >                                            QUERY PLAN                                            
> > > -------------------------------------------------------------------------------------------------
> > >  Seq Scan on t  (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
> > >    Filter: ((i * 10) = 10)
> > >    Rows Removed by Filter: 999
> > >    Buffers: shared hit=9
> > >  Planning:
> > >    Buffers: shared hit=10
> > >  Planning Time: 0.299 ms
> > >  Execution Time: 0.948 ms
> > > (8 rows)
> > > 
> > > 
> > > - After applying the patch, the cardinality estimate is correct.
> > > 
> > > test=# analyze t;
> > > ANALYZE
> > > test=# explain analyze select * from t where j = 10;
> > >                                              QUERY PLAN                                              
> > > -----------------------------------------------------------------------------------------------------
> > >  Seq Scan on t  (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
> > >    Filter: ((i * 10) = 10)
> > >    Rows Removed by Filter: 999
> > >    Buffers: shared hit=9
> > >  Planning:
> > >    Buffers: shared hit=6
> > >  Planning Time: 0.374 ms
> > >  Execution Time: 1.028 ms
> > > (8 rows)
> > > 
> > > 
> > > Note that the patch is still a work in progress, so documentation and tests are not included.
> > 
> > I've attached an updated patch.
> > 
> > I modified the documentation to remove the statement that virtual generated columns
> > do not have statistics.
> > 
> > In addition, I added a test to ensure that statistics on virtual generated columns
> > are available.

I've attached a rebased patch.

Regards,
Yugo Nagata

-- 
Yugo Nagata <[email protected]>


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2025-12-31 10:44  Yugo Nagata <[email protected]>
  parent: Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Yugo Nagata @ 2025-12-31 10:44 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Tue, 2 Sep 2025 16:33:41 +0900
Yugo Nagata <[email protected]> wrote:

> > > > This allows to collect statistics on virtual generated columns.
> > > > 
> > > > During ANALYZE, generation expressions are expanded, and statistics are computed
> > > > using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
> > > > are now exported from extended_stats.c. However, since they are no longer specific
> > > > to extended statistics, it might be better to move them to analyze.c and vacuum.h.
> > > > 
> > > > To enable the optimizer to make use of these statistics, a new field named
> > > > virtual_gencols is added to RelOptInfo. This field holds the expressions of
> > > > virtual generated columns in the table. In examine_variable(), if an expression
> > > > in a WHERE clause matches a virtual generated column, the corresponding statistics
> > > > are used for that expression.
> > > > 
> > > > Example:
> > > > 
> > > > - Before applying the patch, the cardinality estimate is erroneous.
> > > > 
> > > > test=# create table t (i int, j int generated always as (i*10) virtual);
> > > > CREATE TABLE
> > > > test=# insert into t select generate_series(1,1000);
> > > > INSERT 0 1000
> > > > test=# insert into t select 1 from generate_series(1,1000);
> > > > INSERT 0 1000
> > > > test=# analyze t;
> > > > ANALYZE
> > > > test=# explain analyze select * from t where j = 10;
> > > >                                            QUERY PLAN                                            
> > > > -------------------------------------------------------------------------------------------------
> > > >  Seq Scan on t  (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
> > > >    Filter: ((i * 10) = 10)
> > > >    Rows Removed by Filter: 999
> > > >    Buffers: shared hit=9
> > > >  Planning:
> > > >    Buffers: shared hit=10
> > > >  Planning Time: 0.299 ms
> > > >  Execution Time: 0.948 ms
> > > > (8 rows)
> > > > 
> > > > 
> > > > - After applying the patch, the cardinality estimate is correct.
> > > > 
> > > > test=# analyze t;
> > > > ANALYZE
> > > > test=# explain analyze select * from t where j = 10;
> > > >                                              QUERY PLAN                                              
> > > > -----------------------------------------------------------------------------------------------------
> > > >  Seq Scan on t  (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
> > > >    Filter: ((i * 10) = 10)
> > > >    Rows Removed by Filter: 999
> > > >    Buffers: shared hit=9
> > > >  Planning:
> > > >    Buffers: shared hit=6
> > > >  Planning Time: 0.374 ms
> > > >  Execution Time: 1.028 ms
> > > > (8 rows)
> > > > 
> > > > 
> > > > Note that the patch is still a work in progress, so documentation and tests are not included.
> > > 
> > > I've attached an updated patch.
> > > 
> > > I modified the documentation to remove the statement that virtual generated columns
> > > do not have statistics.
> > > 
> > > In addition, I added a test to ensure that statistics on virtual generated columns
> > > are available.
> 

I've attached an updated patch that fixes the broken test since 10c4fe074a.

Regards,
Yugo Nagata

-- 
Yugo Nagata <[email protected]>


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-03-24 09:18  Dean Rasheed <[email protected]>
  parent: Yugo Nagata <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Dean Rasheed @ 2026-03-24 09:18 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Wed, 31 Dec 2025 at 10:45, Yugo Nagata <[email protected]> wrote:
>
> I've attached an updated patch that fixes the broken test since 10c4fe074a.
>

I took a look at this, and the patch appears to work as intended.
However, I have my doubts as to whether this is the best approach.

Building stats on a virtual generated column is potentially quite
expensive, and not something that everyone will want, so I think this
really should be an optional feature that people can selectively
enable, if they want.

Therefore, I think that the previous approach was probably better. If
I'm understanding it correctly, that allowed

  CREATE STATISTICS stat_name ON virt_col FROM tbl;

as well as allowing statistics to be built on expressions including
virtual generated columns, making it more flexible.

The problem with this previous approach was that it didn't work
correctly if a virtual generated column's expression was changed using
ALTER TABLE ... SET EXPRESSION. I think that could be solved by
expanding generated column expressions at ANALYZE time, rather than at
CREATE STATISTICS time. So then the expression stored in the catalogs
would be one referring to virtual generated columns, not their
expanded forms (compare a CHECK constraint referring to a virtual
generated column).

Regards,
Dean





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-03-24 11:43  Dean Rasheed <[email protected]>
  parent: Dean Rasheed <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Dean Rasheed @ 2026-03-24 11:43 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Tue, 24 Mar 2026 at 10:57, Yugo Nagata <[email protected]> wrote:
>
> One concern is that CREATE STATISTICS does not allow creating extended
> statistics on a single colmun. If we try this, it raises the following error:
>
>  ERROR:  extended statistics require at least 2 columns
>
> Therefore, if we take this approach, we would need to add an exception
> to this rule for virtual columns.

I think that would be fine. It makes sense because a virtual column
expands to an expression, and we already allow CREATE STATISTICS on a
single expression.

Regards,
Dean





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-03-26 16:00  Dean Rasheed <[email protected]>
  parent: Dean Rasheed <[email protected]>
  0 siblings, 2 replies; 21+ messages in thread

From: Dean Rasheed @ 2026-03-26 16:00 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Thu, 26 Mar 2026 at 15:09, Yugo Nagata <[email protected]> wrote:
>
> I've attached an updated patch including the documentation and tests.

I think this is a much better approach.

One thing that stands out in CreateStatistics() is that the check for
a less-than operator can be skipped if there is just a single virtual
generated column, for the same reason as for statistics on a single
expression. I.e., it should be possible to build statistics on a
single virtual generated column of any data type.

Regards,
Dean





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-03-26 17:25  Dean Rasheed <[email protected]>
  parent: Dean Rasheed <[email protected]>
  1 sibling, 0 replies; 21+ messages in thread

From: Dean Rasheed @ 2026-03-26 17:25 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Thu, 26 Mar 2026 at 16:00, Dean Rasheed <[email protected]> wrote:
>
> On Thu, 26 Mar 2026 at 15:09, Yugo Nagata <[email protected]> wrote:
> >
> > I've attached an updated patch including the documentation and tests.

Looking at get_relation_statistics(), I think that you need to call
expand_generated_columns_in_expr() *before* ChangeVarNodes() so that
Vars in the expanded expression end up with the correct varno.

This obviously affects queries with more than one table in the FROM
clause, e.g.:

drop table if exists foo;
create table foo (a int, b int generated always as (a*2) virtual);
insert into foo select x from generate_series(1,10) x;
insert into foo select 100 from generate_series(1,500);
create statistics s on b from foo;
analyse foo;
explain select * from foo f1, foo f2 where f1.b = 200 and f2.b = 200;

                            QUERY PLAN
-------------------------------------------------------------------
 Nested Loop  (cost=0.00..47.56 rows=1500 width=16)
   ->  Seq Scan on foo f1  (cost=0.00..10.65 rows=500 width=4)
         Filter: ((a * 2) = 200)
   ->  Materialize  (cost=0.00..10.66 rows=3 width=4)
         ->  Seq Scan on foo f2  (cost=0.00..10.65 rows=3 width=4)
               Filter: ((a * 2) = 200)
(6 rows)

Regards,
Dean





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-03-26 17:44  Dean Rasheed <[email protected]>
  parent: Dean Rasheed <[email protected]>
  1 sibling, 1 reply; 21+ messages in thread

From: Dean Rasheed @ 2026-03-26 17:44 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Thu, 26 Mar 2026 at 17:18, Yugo Nagata <[email protected]> wrote:
>
> On Thu, 26 Mar 2026 16:00:38 +0000
> Dean Rasheed <[email protected]> wrote:
>
> > On Thu, 26 Mar 2026 at 15:09, Yugo Nagata <[email protected]> wrote:
> > >
> > > I've attached an updated patch including the documentation and tests.
> >
> > I think this is a much better approach.
> >
> > One thing that stands out in CreateStatistics() is that the check for
> > a less-than operator can be skipped if there is just a single virtual
> > generated column, for the same reason as for statistics on a single
> > expression. I.e., it should be possible to build statistics on a
> > single virtual generated column of any data type.
>
> I've attached a revised patch to skip the less-than operator check
> for a single virtual generated column.
>
> In fact, this change skips the check for any single column,
> not just virtual generated columns. However, using a non-virtual
> single column will result in an error elsewhere.

Makes sense. I think the comments could just be simplified to say
"Disallow data types without a less-than operator in multivariate
statistics", and add the word "multivariate" to the error message, as
in the expression case.

Regards,
Dean





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-03-29 10:42  Dean Rasheed <[email protected]>
  parent: Dean Rasheed <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Dean Rasheed @ 2026-03-29 10:42 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Fri, 27 Mar 2026 at 07:44, Yugo Nagata <[email protected]> wrote:
>
> Sorry, I forgot to update the test along with the previous fix.
> I've attached a revised patch including the updated test.
>

I did some more testing of this (things like dependencies, permission
checks in the selectivity functions, and dump/restore), and everything
worked as expected.

The code changes look good. It's nice that it worked out to be quite a
small, neat patch, and I think it's more-or-less ready for commit,
though I did find a few minor issues:

I noted that ALTER TABLE ... SET STATISTICS doesn't work for a virtual
generated column. I think that's OK, because we don't have statistics
by default on such columns, and ALTER STATISTICS ... SET STATISTICS
can be used instead if extended statistics are added. We could perhaps
relax that restriction in a future patch, but I don't think it's
necessary right now. However, I do think it's worth documenting the
limitation, so I'm attaching an update doing that.

In create_statistics.sgml, there were a number of other things that
needed updating. Please check the attached updates.

I found another comment in CreateStatistics() that needed updating,
and some indentation needed fixing.

In the regression tests, I reduced the size of the new test table --
given the recent discussion around how long the tests take to run, we
don't want to make that worse.

I also added a couple more simple tests, just to confirm that
univariate statistics work correctly, and also fixed the test
statistics "virtual_gen_stats_single_without_less_than", which was
built on the wrong column.

If you're happy with those changes, I'll push this.

Regards,
Dean


Attachments:

  [text/x-patch] v12-0001-Add-support-for-extended-statistics-on-virtual-g.patch (26.1K, 2-v12-0001-Add-support-for-extended-statistics-on-virtual-g.patch)
  download | inline diff:
From 53f933e3ade0fd2a76fb399949c4ba39020113be Mon Sep 17 00:00:00 2001
From: Yugo Nagata <[email protected]>
Date: Fri, 18 Jul 2025 09:58:56 +0900
Subject: [PATCH v12] Add support for extended statistics on virtual generated
 columns.

This allows both univariate and multivariate statistics to be built on
virtual generated columns and expressions that refer to virtual
generated columns. The restriction disallowing extended statistics on
a single column is lifted in the case of a single virtual generated
column, since it is treated as a single expression.

In the catalogs, references to virtual generated columns are stored
as-is. They are expanded at ANALYZE time to build the statistics, and
at planning time to allow the optimizer to make use of the statistics.
This allows the statistics to be correctly rebuilt using ANALYZE, if a
column's generation expression is altered (which causes any existing
statistics data to be deleted).

Author: Yugo Nagata <[email protected]>
Reviewed-by: Dean Rasheed <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
 doc/src/sgml/ref/alter_table.sgml       |  12 ++-
 doc/src/sgml/ref/create_statistics.sgml |  30 ++++---
 src/backend/commands/statscmds.c        | 101 ++++++++++++++----------
 src/backend/optimizer/util/plancat.c    |   5 ++
 src/backend/statistics/extended_stats.c |  15 +++-
 src/test/regress/expected/stats_ext.out |  89 ++++++++++++++++-----
 src/test/regress/sql/stats_ext.sql      |  52 ++++++++----
 7 files changed, 212 insertions(+), 92 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 8591a6b5014..453395c5c73 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -217,7 +217,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       so running <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
       For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
+      is not necessary unless extended statistics are defined on it,
+      since such columns do not have statistics by default.
      </para>
     </listitem>
    </varlistentry>
@@ -289,7 +290,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       <link linkend="sql-analyze"><command>ANALYZE</command></link>
       on the table afterwards is recommended.
       For a virtual generated column, <command>ANALYZE</command>
-      is not necessary because such columns never have statistics.
+      is not necessary unless extended statistics are defined on it,
+      since such columns do not have statistics by default.
      </para>
     </listitem>
    </varlistentry>
@@ -368,6 +370,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
       <productname>PostgreSQL</productname> query planner, refer to
       <xref linkend="planner-stats"/>.
      </para>
+     <para>
+      This form is not supported on virtual generated columns, since such
+      columns do not have statistics by default.  If extended statistics are
+      defined on such columns, the statistics-gathering target may be set on
+      the extended statistics object using <xref linkend="sql-alterstatistics"/>.
+     </para>
      <para>
       <literal>SET STATISTICS</literal> acquires a
       <literal>SHARE UPDATE EXCLUSIVE</literal> lock.
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index d6b25ed2c9b..678efe1eb43 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -22,7 +22,7 @@ PostgreSQL documentation
  <refsynopsisdiv>
 <synopsis>
 CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_name</replaceable> ]
-    ON ( <replaceable class="parameter">expression</replaceable> )
+    ON { <replaceable class="parameter">column_name</replaceable> | ( <replaceable class="parameter">expression</replaceable> ) }
     FROM <replaceable class="parameter">table_name</replaceable>
 
 CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_name</replaceable> ]
@@ -45,7 +45,8 @@ CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_
 
   <para>
    The <command>CREATE STATISTICS</command> command has two basic forms. The
-   first form allows univariate statistics for a single expression to be
+   first form allows univariate statistics for a single expression
+   or virtual generated column to be
    collected, providing benefits similar to an expression index without the
    overhead of index maintenance.  This form does not allow the statistics
    kind to be specified, since the various statistics kinds refer only to
@@ -53,7 +54,7 @@ CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_
    multivariate statistics on multiple columns and/or expressions to be
    collected, optionally specifying which statistics kinds to include.  This
    form will also automatically cause univariate statistics to be collected on
-   any expressions included in the list.
+   any expressions and virtual generated columns included in the list.
   </para>
 
   <para>
@@ -109,7 +110,8 @@ CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_
       If this clause is omitted, all supported statistics kinds are
       included in the statistics object. Univariate expression statistics are
       built automatically if the statistics definition includes any complex
-      expressions rather than just simple column references.
+      expressions or references to virtual generated columns
+      rather than just simple column references.
       For more information, see <xref linkend="planner-stats-extended"/>
       and <xref linkend="multivariate-statistics-examples"/>.
      </para>
@@ -121,9 +123,16 @@ CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_
     <listitem>
      <para>
       The name of a table column to be covered by the computed statistics.
-      This is only allowed when building multivariate statistics.  At least
-      two column names or expressions must be specified, and their order is
-      not significant.
+      This may be used to build univariate statistics on a single virtual
+      generated column, or as part of a list of multiple columns (stored or
+      virtual) and/or expressions to build multivariate statistics.  In the
+      latter case, separate univariate statistics are built automatically for
+      each expression and virtual generated column in the list.
+     </para>
+     <para>
+      Defining extended statistics on a single <emphasis>stored</emphasis>
+      column is not supported or necessary, because statistics are built
+      automatically on such columns.
      </para>
     </listitem>
    </varlistentry>
@@ -136,7 +145,8 @@ CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_
       used to build univariate statistics on a single expression, or as part
       of a list of multiple column names and/or expressions to build
       multivariate statistics.  In the latter case, separate univariate
-      statistics are built automatically for each expression in the list.
+      statistics are built automatically for each expression and virtual
+      generated column in the list.
      </para>
     </listitem>
    </varlistentry>
@@ -169,7 +179,9 @@ CREATE STATISTICS [ [ IF NOT EXISTS ] <replaceable class="parameter">statistics_
    Expression statistics are per-expression and are similar to creating an
    index on the expression, except that they avoid the overhead of index
    maintenance. Expression statistics are built automatically for each
-   expression in the statistics object definition.
+   expression in the statistics object definition.  Extended statistics on
+   a virtual generated column behave the same as expression statistics on the
+   column's generation expression.
   </para>
 
   <para>
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index c1da79f36ba..765395b4b6a 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -28,6 +28,7 @@
 #include "commands/comment.h"
 #include "commands/defrem.h"
 #include "miscadmin.h"
+#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
 #include "statistics/statistics.h"
@@ -232,7 +233,8 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 	 * Convert the expression list to a simple array of attnums, but also keep
 	 * a list of more complex expressions.  While at it, enforce some
 	 * constraints - we don't allow extended statistics on system attributes,
-	 * and we require the data type to have a less-than operator.
+	 * and we require the data type to have a less-than operator, if we're
+	 * building multivariate statistics.
 	 *
 	 * There are many ways to "mask" a simple attribute reference as an
 	 * expression, for example "(a+0)" etc. We can't possibly detect all of
@@ -268,22 +270,38 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						 errmsg("statistics creation on system columns is not supported")));
 
-			/* Disallow use of virtual generated columns in extended stats */
-			if (attForm->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("statistics creation on virtual generated columns is not supported")));
-
-			/* Disallow data types without a less-than operator */
-			type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
-			if (type->lt_opr == InvalidOid)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
-								attname, format_type_be(attForm->atttypid))));
+			/*
+			 * Disallow data types without a less-than operator in
+			 * multivariate statistics.
+			 */
+			if (list_length(stmt->exprs) > 1)
+			{
+				type = lookup_type_cache(attForm->atttypid, TYPECACHE_LT_OPR);
+				if (type->lt_opr == InvalidOid)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("column \"%s\" cannot be used in multivariate statistics because its type %s has no default btree operator class",
+									attname, format_type_be(attForm->atttypid))));
+			}
 
-			attnums[nattnums] = attForm->attnum;
-			nattnums++;
+			/* Treat virtual generated columns as expressions */
+			if (attForm->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				Node	   *expr;
+
+				expr = (Node *) makeVar(1,
+										attForm->attnum,
+										attForm->atttypid,
+										attForm->atttypmod,
+										attForm->attcollation,
+										0);
+				stxexprs = lappend(stxexprs, expr);
+			}
+			else
+			{
+				attnums[nattnums] = attForm->attnum;
+				nattnums++;
+			}
 			ReleaseSysCache(atttuple);
 		}
 		else if (IsA(selem->expr, Var)) /* column reference in parens */
@@ -297,22 +315,30 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						 errmsg("statistics creation on system columns is not supported")));
 
-			/* Disallow use of virtual generated columns in extended stats */
-			if (get_attgenerated(relid, var->varattno) == ATTRIBUTE_GENERATED_VIRTUAL)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("statistics creation on virtual generated columns is not supported")));
-
-			/* Disallow data types without a less-than operator */
-			type = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
-			if (type->lt_opr == InvalidOid)
-				ereport(ERROR,
-						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						 errmsg("column \"%s\" cannot be used in statistics because its type %s has no default btree operator class",
-								get_attname(relid, var->varattno, false), format_type_be(var->vartype))));
+			/*
+			 * Disallow data types without a less-than operator in
+			 * multivariate statistics.
+			 */
+			if (list_length(stmt->exprs) > 1)
+			{
+				type = lookup_type_cache(var->vartype, TYPECACHE_LT_OPR);
+				if (type->lt_opr == InvalidOid)
+					ereport(ERROR,
+							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+							 errmsg("column \"%s\" cannot be used in multivariate statistics because its type %s has no default btree operator class",
+									get_attname(relid, var->varattno, false), format_type_be(var->vartype))));
+			}
 
-			attnums[nattnums] = var->varattno;
-			nattnums++;
+			/* Treat virtual generated columns as expressions */
+			if (get_attgenerated(relid, var->varattno) == ATTRIBUTE_GENERATED_VIRTUAL)
+			{
+				stxexprs = lappend(stxexprs, (Node *) var);
+			}
+			else
+			{
+				attnums[nattnums] = var->varattno;
+				nattnums++;
+			}
 		}
 		else					/* expression */
 		{
@@ -336,20 +362,11 @@ CreateStatistics(CreateStatsStmt *stmt, bool check_rights)
 					ereport(ERROR,
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 							 errmsg("statistics creation on system columns is not supported")));
-
-				/* Disallow use of virtual generated columns in extended stats */
-				if (get_attgenerated(relid, attnum) == ATTRIBUTE_GENERATED_VIRTUAL)
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("statistics creation on virtual generated columns is not supported")));
 			}
 
 			/*
-			 * Disallow data types without a less-than operator.
-			 *
-			 * We ignore this for statistics on a single expression, in which
-			 * case we'll build the regular statistics only (and that code can
-			 * deal with such data types).
+			 * Disallow data types without a less-than operator in
+			 * multivariate statistics.
 			 */
 			if (list_length(stmt->exprs) > 1)
 			{
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b2fbd6a082b..8b7c9a9f2e4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1792,6 +1792,11 @@ get_relation_statistics(PlannerInfo *root, RelOptInfo *rel,
 				exprs = (List *) stringToNode(exprsString);
 				pfree(exprsString);
 
+				/*
+				 * Expand virtual generated columns in the expressions.
+				 */
+				exprs = (List *) expand_generated_columns_in_expr((Node *) exprs, relation, 1);
+
 				/*
 				 * Modify the copies we obtain from the relcache to have the
 				 * correct varno for the parent relation, so that they match
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 334c6498581..9225a168a8a 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -32,6 +32,7 @@
 #include "parser/parsetree.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
+#include "rewrite/rewriteHandler.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "utils/acl.h"
@@ -73,7 +74,7 @@ typedef struct StatExtEntry
 } StatExtEntry;
 
 
-static List *fetch_statentries_for_relation(Relation pg_statext, Oid relid);
+static List *fetch_statentries_for_relation(Relation pg_statext, Relation rel);
 static VacAttrStats **lookup_var_attr_stats(Bitmapset *attrs, List *exprs,
 											int nvacatts, VacAttrStats **vacatts);
 static void statext_store(Oid statOid, bool inh,
@@ -125,7 +126,7 @@ BuildRelationExtStatistics(Relation onerel, bool inh, double totalrows,
 
 	/* the list of stats has to be allocated outside the memory context */
 	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
-	statslist = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
+	statslist = fetch_statentries_for_relation(pg_stext, onerel);
 
 	/* memory context for building each statistics object */
 	cxt = AllocSetContextCreate(CurrentMemoryContext,
@@ -279,7 +280,7 @@ ComputeExtStatisticsRows(Relation onerel,
 	oldcxt = MemoryContextSwitchTo(cxt);
 
 	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
-	lstats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
+	lstats = fetch_statentries_for_relation(pg_stext, onerel);
 
 	foreach(lc, lstats)
 	{
@@ -416,12 +417,13 @@ statext_is_kind_built(HeapTuple htup, char type)
  * Return a list (of StatExtEntry) of statistics objects for the given relation.
  */
 static List *
-fetch_statentries_for_relation(Relation pg_statext, Oid relid)
+fetch_statentries_for_relation(Relation pg_statext, Relation rel)
 {
 	SysScanDesc scan;
 	ScanKeyData skey;
 	HeapTuple	htup;
 	List	   *result = NIL;
+	Oid			relid = RelationGetRelid(rel);
 
 	/*
 	 * Prepare to scan pg_statistic_ext for entries having stxrelid = this
@@ -491,6 +493,11 @@ fetch_statentries_for_relation(Relation pg_statext, Oid relid)
 
 			pfree(exprsString);
 
+			/*
+			 * Expand virtual generated columns in the expressions.
+			 */
+			exprs = (List *) expand_generated_columns_in_expr((Node *) exprs, rel, 1);
+
 			/*
 			 * Run the expressions through eval_const_expressions. This is not
 			 * just an optimization, but is necessary, because the planner
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index b6431d1ee95..8b0f96a2d2e 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -25,7 +25,7 @@ begin
 end;
 $$;
 -- Verify failures
-CREATE TABLE ext_stats_test (x text, y int, z int);
+CREATE TABLE ext_stats_test (x text, y int, z int, w xid);
 CREATE STATISTICS tst;
 ERROR:  syntax error at or near ";"
 LINE 1: CREATE STATISTICS tst;
@@ -84,30 +84,19 @@ CREATE STATISTICS tst ON (x, y) FROM ext_stats_test; -- tuple expression
 ERROR:  syntax error at or near ","
 LINE 1: CREATE STATISTICS tst ON (x, y) FROM ext_stats_test;
                                    ^
-DROP TABLE ext_stats_test;
--- statistics on virtual generated column not allowed
-CREATE TABLE ext_stats_test1 (x int, y int, z int GENERATED ALWAYS AS (x+y) VIRTUAL, w xid);
-CREATE STATISTICS tst on z from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
-CREATE STATISTICS tst on (z) from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
-CREATE STATISTICS tst on (z+1) from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
-CREATE STATISTICS tst (ndistinct) ON z from ext_stats_test1;
-ERROR:  statistics creation on virtual generated columns is not supported
 -- statistics on system column not allowed
-CREATE STATISTICS tst on tableoid from ext_stats_test1;
+CREATE STATISTICS tst on tableoid from ext_stats_test;
 ERROR:  statistics creation on system columns is not supported
-CREATE STATISTICS tst on (tableoid) from ext_stats_test1;
+CREATE STATISTICS tst on (tableoid) from ext_stats_test;
 ERROR:  statistics creation on system columns is not supported
-CREATE STATISTICS tst on (tableoid::int+1) from ext_stats_test1;
+CREATE STATISTICS tst on (tableoid::int+1) from ext_stats_test;
 ERROR:  statistics creation on system columns is not supported
-CREATE STATISTICS tst (ndistinct) ON xmin from ext_stats_test1;
+CREATE STATISTICS tst (ndistinct) ON xmin from ext_stats_test;
 ERROR:  statistics creation on system columns is not supported
--- statistics without a less-than operator not supported
-CREATE STATISTICS tst (ndistinct) ON w from ext_stats_test1;
-ERROR:  column "w" cannot be used in statistics because its type xid has no default btree operator class
-DROP TABLE ext_stats_test1;
+-- multivariate statistics without a less-than operator not supported
+CREATE STATISTICS tst (ndistinct) ON x, w from ext_stats_test;
+ERROR:  column "w" cannot be used in multivariate statistics because its type xid has no default btree operator class
+DROP TABLE ext_stats_test;
 -- Ensure stats are dropped sanely, and test IF NOT EXISTS while at it
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);
 CREATE STATISTICS IF NOT EXISTS ab1_a_b_stats ON a, b FROM ab1;
@@ -3153,6 +3142,66 @@ SELECT c0 FROM ONLY expr_stats_incompatible_test WHERE
 (0 rows)
 
 DROP TABLE expr_stats_incompatible_test;
+-- multivariate statistics on virtual generated columns
+CREATE TABLE virtual_gen_stats (a int, b int, c int GENERATED ALWAYS AS (2*a), d int GENERATED ALWAYS AS (a+b), w xid GENERATED ALWAYS AS (a::text::xid));
+INSERT INTO virtual_gen_stats SELECT mod(i,10), mod(i,10) FROM generate_series(1,100) s(i);
+ANALYZE virtual_gen_stats;
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0 AND (3*b) = 0');
+ estimated | actual 
+-----------+--------
+         1 |     10
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE d = 0 AND (d-2*a) = 0');
+ estimated | actual 
+-----------+--------
+         1 |     10
+(1 row)
+
+CREATE STATISTICS virtual_gen_stats_1 (mcv) ON c, (3*b), d, (d-2*a) FROM virtual_gen_stats;
+ANALYZE virtual_gen_stats;
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0 AND (3*b) = 0');
+ estimated | actual 
+-----------+--------
+        10 |     10
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE d = 0 AND (d-2*a) = 0');
+ estimated | actual 
+-----------+--------
+        10 |     10
+(1 row)
+
+-- univariate statistics on individual virtual generated columns
+DROP STATISTICS virtual_gen_stats_1;
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0');
+ estimated | actual 
+-----------+--------
+         1 |     10
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE w = 0');
+ estimated | actual 
+-----------+--------
+         1 |     10
+(1 row)
+
+CREATE STATISTICS virtual_gen_stats_single ON c FROM virtual_gen_stats;
+CREATE STATISTICS virtual_gen_stats_single_without_less_than ON w FROM virtual_gen_stats;
+ANALYZE virtual_gen_stats;
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0');
+ estimated | actual 
+-----------+--------
+        10 |     10
+(1 row)
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE w = 0');
+ estimated | actual 
+-----------+--------
+        10 |     10
+(1 row)
+
+DROP TABLE virtual_gen_stats;
 -- Permission tests. Users should not be able to see specific data values in
 -- the extended statistics, if they lack permission to see those values in
 -- the underlying table.
diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 9dcce3440c8..72602e1e995 100644
--- a/src/test/regress/sql/stats_ext.sql
+++ b/src/test/regress/sql/stats_ext.sql
@@ -28,7 +28,7 @@ end;
 $$;
 
 -- Verify failures
-CREATE TABLE ext_stats_test (x text, y int, z int);
+CREATE TABLE ext_stats_test (x text, y int, z int, w xid);
 CREATE STATISTICS tst;
 CREATE STATISTICS tst ON a, b;
 CREATE STATISTICS tst FROM sometab;
@@ -56,21 +56,14 @@ DROP FUNCTION tftest;
 CREATE STATISTICS tst ON (y) FROM ext_stats_test; -- single column reference
 CREATE STATISTICS tst ON y + z FROM ext_stats_test; -- missing parentheses
 CREATE STATISTICS tst ON (x, y) FROM ext_stats_test; -- tuple expression
-DROP TABLE ext_stats_test;
--- statistics on virtual generated column not allowed
-CREATE TABLE ext_stats_test1 (x int, y int, z int GENERATED ALWAYS AS (x+y) VIRTUAL, w xid);
-CREATE STATISTICS tst on z from ext_stats_test1;
-CREATE STATISTICS tst on (z) from ext_stats_test1;
-CREATE STATISTICS tst on (z+1) from ext_stats_test1;
-CREATE STATISTICS tst (ndistinct) ON z from ext_stats_test1;
 -- statistics on system column not allowed
-CREATE STATISTICS tst on tableoid from ext_stats_test1;
-CREATE STATISTICS tst on (tableoid) from ext_stats_test1;
-CREATE STATISTICS tst on (tableoid::int+1) from ext_stats_test1;
-CREATE STATISTICS tst (ndistinct) ON xmin from ext_stats_test1;
--- statistics without a less-than operator not supported
-CREATE STATISTICS tst (ndistinct) ON w from ext_stats_test1;
-DROP TABLE ext_stats_test1;
+CREATE STATISTICS tst on tableoid from ext_stats_test;
+CREATE STATISTICS tst on (tableoid) from ext_stats_test;
+CREATE STATISTICS tst on (tableoid::int+1) from ext_stats_test;
+CREATE STATISTICS tst (ndistinct) ON xmin from ext_stats_test;
+-- multivariate statistics without a less-than operator not supported
+CREATE STATISTICS tst (ndistinct) ON x, w from ext_stats_test;
+DROP TABLE ext_stats_test;
 
 -- Ensure stats are dropped sanely, and test IF NOT EXISTS while at it
 CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);
@@ -1584,6 +1577,35 @@ SELECT c0 FROM ONLY expr_stats_incompatible_test WHERE
 
 DROP TABLE expr_stats_incompatible_test;
 
+-- multivariate statistics on virtual generated columns
+CREATE TABLE virtual_gen_stats (a int, b int, c int GENERATED ALWAYS AS (2*a), d int GENERATED ALWAYS AS (a+b), w xid GENERATED ALWAYS AS (a::text::xid));
+INSERT INTO virtual_gen_stats SELECT mod(i,10), mod(i,10) FROM generate_series(1,100) s(i);
+ANALYZE virtual_gen_stats;
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0 AND (3*b) = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE d = 0 AND (d-2*a) = 0');
+
+CREATE STATISTICS virtual_gen_stats_1 (mcv) ON c, (3*b), d, (d-2*a) FROM virtual_gen_stats;
+ANALYZE virtual_gen_stats;
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0 AND (3*b) = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE d = 0 AND (d-2*a) = 0');
+
+-- univariate statistics on individual virtual generated columns
+DROP STATISTICS virtual_gen_stats_1;
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE w = 0');
+
+CREATE STATISTICS virtual_gen_stats_single ON c FROM virtual_gen_stats;
+CREATE STATISTICS virtual_gen_stats_single_without_less_than ON w FROM virtual_gen_stats;
+ANALYZE virtual_gen_stats;
+
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE c = 0');
+SELECT * FROM check_estimated_rows('SELECT * FROM virtual_gen_stats WHERE w = 0');
+
+DROP TABLE virtual_gen_stats;
+
 -- Permission tests. Users should not be able to see specific data values in
 -- the extended statistics, if they lack permission to see those values in
 -- the underlying table.
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-04-01 16:12  Dean Rasheed <[email protected]>
  parent: Dean Rasheed <[email protected]>
  0 siblings, 1 reply; 21+ messages in thread

From: Dean Rasheed @ 2026-04-01 16:12 UTC (permalink / raw)
  To: Yugo Nagata <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Tue, 31 Mar 2026 at 15:35, Yugo Nagata <[email protected]> wrote:
>
> Thank you for updating the patch. I am fine with that.
>
> One concern is that users might interpret "stored" as referring to
> "stored generated columns", rather than including regular columns,

Yeah, possibly. I changed "stored" to "non-virtual", which should
reduce the chances of that particular confusion.

> Also, the meaning of "automatically" might be a bit unclear, so we
> could clarify it by adding "without defining extended statistics."
>
>       Defining extended statistics on a single <emphasis>stored</emphasis>
>       column is not supported or necessary, because statistics are built
>       automatically on such columns without defining extended statistics.

OK, pushed that way.

I also noticed that a few places in CreateStatistics() could use the
variable "numcols" instead of "list_length(stmt->exprs)", so I changed
that.

I decided to include the change to the error message discussed in [1],
since there seemed to be a consensus there, except that I think it
also needs to make it clear that it refers only to non-virtual
columns.

In addition, there was another nearby error message which was no
longer quite right for statistics on virtual generated columns, and I
changed the order of checks, since that allowed the "if" statements to
be simplified a bit.

Regards,
Dean

[1] https://www.postgresql.org/message-id/20260324203430.411331c59ca462457ec0aa8b%40sraoss.co.jp





^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: Allow to collect statistics on virtual generated columns
@ 2026-04-03 01:10  Yugo Nagata <[email protected]>
  parent: Dean Rasheed <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: Yugo Nagata @ 2026-04-03 01:10 UTC (permalink / raw)
  To: Dean Rasheed <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Wed, 1 Apr 2026 17:12:38 +0100
Dean Rasheed <[email protected]> wrote:

> On Tue, 31 Mar 2026 at 15:35, Yugo Nagata <[email protected]> wrote:
> >
> > Thank you for updating the patch. I am fine with that.
> >
> > One concern is that users might interpret "stored" as referring to
> > "stored generated columns", rather than including regular columns,
> 
> Yeah, possibly. I changed "stored" to "non-virtual", which should
> reduce the chances of that particular confusion.
> 
> > Also, the meaning of "automatically" might be a bit unclear, so we
> > could clarify it by adding "without defining extended statistics."
> >
> >       Defining extended statistics on a single <emphasis>stored</emphasis>
> >       column is not supported or necessary, because statistics are built
> >       automatically on such columns without defining extended statistics.
> 
> OK, pushed that way.

Thank you!

> 
> I also noticed that a few places in CreateStatistics() could use the
> variable "numcols" instead of "list_length(stmt->exprs)", so I changed
> that.
> 
> I decided to include the change to the error message discussed in [1],
> since there seemed to be a consensus there, except that I think it
> also needs to make it clear that it refers only to non-virtual
> columns.
> 
> In addition, there was another nearby error message which was no
> longer quite right for statistics on virtual generated columns, and I
> changed the order of checks, since that allowed the "if" statements to
> be simplified a bit.

I reviewed the commit, and it looks good to me.
Thanks again.

Regards,
Yugo Nagata


-- 
Yugo Nagata <[email protected]>





^ permalink  raw  reply  [nested|flat] 21+ messages in thread


end of thread, other threads:[~2026-04-03 01:10 UTC | newest]

Thread overview: 21+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-04-22 09:10 Allow to collect statistics on virtual generated columns Yugo Nagata <[email protected]>
2025-06-17 14:43 ` Andres Freund <[email protected]>
2025-06-24 08:05   ` Yugo Nagata <[email protected]>
2025-07-31 15:28     ` Yugo Nagata <[email protected]>
2025-08-08 03:21       ` Yugo Nagata <[email protected]>
2025-08-20 05:10         ` Yugo Nagata <[email protected]>
2025-09-02 07:33           ` Yugo Nagata <[email protected]>
2025-12-31 10:44             ` Yugo Nagata <[email protected]>
2026-03-24 09:18               ` Dean Rasheed <[email protected]>
2026-03-24 11:43                 ` Dean Rasheed <[email protected]>
2026-03-26 16:00                   ` Dean Rasheed <[email protected]>
2026-03-26 17:25                     ` Dean Rasheed <[email protected]>
2026-03-26 17:44                     ` Dean Rasheed <[email protected]>
2026-03-29 10:42                       ` Dean Rasheed <[email protected]>
2026-04-01 16:12                         ` Dean Rasheed <[email protected]>
2026-04-03 01:10                           ` Yugo Nagata <[email protected]>
2025-07-18 00:58 [PATCH v5] Allow to collect statistics on virtual generated columns Yugo Nagata <[email protected]>
2025-07-18 00:58 [PATCH v4] Allow to collect statistics on virtual generated columns Yugo Nagata <[email protected]>
2025-07-18 00:58 [PATCH v2] Allow to collect statistics on virtual generated columns Yugo Nagata <[email protected]>
2025-07-18 00:58 [PATCH v3] Allow to collect statistics on virtual generated columns Yugo Nagata <[email protected]>
2025-07-18 00:58 [PATCH] Allow to collect statistics on virtual generated columns Yugo Nagata <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox