public inbox for [email protected]
help / color / mirror / Atom feedFrom: Yugo Nagata <[email protected]>
To: Yugo Nagata <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: [email protected]
Subject: Re: Allow to collect statistics on virtual generated columns
Date: Tue, 2 Sep 2025 16:33:41 +0900
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<kp43uuykq64fyqxnlihq2rzntarextvzibuzturajfucl6bayo@6fcnflboemtt>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
On Wed, 20 Aug 2025 14:10:28 +0900
Yugo Nagata <[email protected]> wrote:
> On Fri, 8 Aug 2025 12:21:25 +0900
> Yugo Nagata <[email protected]> wrote:
>
> > On Fri, 1 Aug 2025 00:28:30 +0900
> > Yugo Nagata <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > On Tue, 24 Jun 2025 17:05:33 +0900
> > > Yugo Nagata <[email protected]> wrote:
> > >
> > > > Instead, I'm thinking of an alternative approach: expanding the expression
> > > > at the time statistics are collected.
> > >
> > > I've attached a new patch in this approache.
> > >
> > > This allows to collect statistics on virtual generated columns.
> > >
> > > During ANALYZE, generation expressions are expanded, and statistics are computed
> > > using compute_expr_stats(). To support this, both compute_expr_stats() and AnlExprData
> > > are now exported from extended_stats.c. However, since they are no longer specific
> > > to extended statistics, it might be better to move them to analyze.c and vacuum.h.
> > >
> > > To enable the optimizer to make use of these statistics, a new field named
> > > virtual_gencols is added to RelOptInfo. This field holds the expressions of
> > > virtual generated columns in the table. In examine_variable(), if an expression
> > > in a WHERE clause matches a virtual generated column, the corresponding statistics
> > > are used for that expression.
> > >
> > > Example:
> > >
> > > - Before applying the patch, the cardinality estimate is erroneous.
> > >
> > > test=# create table t (i int, j int generated always as (i*10) virtual);
> > > CREATE TABLE
> > > test=# insert into t select generate_series(1,1000);
> > > INSERT 0 1000
> > > test=# insert into t select 1 from generate_series(1,1000);
> > > INSERT 0 1000
> > > test=# analyze t;
> > > ANALYZE
> > > test=# explain analyze select * from t where j = 10;
> > > QUERY PLAN
> > > -------------------------------------------------------------------------------------------------
> > > Seq Scan on t (cost=0.00..0.02 rows=1 width=8) (actual time=0.031..0.806 rows=1001.00 loops=1)
> > > Filter: ((i * 10) = 10)
> > > Rows Removed by Filter: 999
> > > Buffers: shared hit=9
> > > Planning:
> > > Buffers: shared hit=10
> > > Planning Time: 0.299 ms
> > > Execution Time: 0.948 ms
> > > (8 rows)
> > >
> > >
> > > - After applying the patch, the cardinality estimate is correct.
> > >
> > > test=# analyze t;
> > > ANALYZE
> > > test=# explain analyze select * from t where j = 10;
> > > QUERY PLAN
> > > -----------------------------------------------------------------------------------------------------
> > > Seq Scan on t (cost=0.00..41.50 rows=1001 width=8) (actual time=0.034..0.871 rows=1001.00 loops=1)
> > > Filter: ((i * 10) = 10)
> > > Rows Removed by Filter: 999
> > > Buffers: shared hit=9
> > > Planning:
> > > Buffers: shared hit=6
> > > Planning Time: 0.374 ms
> > > Execution Time: 1.028 ms
> > > (8 rows)
> > >
> > >
> > > Note that the patch is still a work in progress, so documentation and tests are not included.
> >
> > I've attached an updated patch.
> >
> > I modified the documentation to remove the statement that virtual generated columns
> > do not have statistics.
> >
> > In addition, I added a test to ensure that statistics on virtual generated columns
> > are available.
I've attached a rebased patch.
Regards,
Yugo Nagata
--
Yugo Nagata <[email protected]>
view thread (21+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Allow to collect statistics on virtual generated columns
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox