Re: Row pattern recognition

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tatsuo Ishii <[email protected]>
To: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Fri, 27 Feb 2026 22:54:56 +0900 (JST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAAAe_zDaaOH4kaNfeN3Gk9u8VyTpUm-UgFebKc1KKB8o+PSjPg@mail.gmail.com>
References: <[email protected]>
	<[email protected]>
	<CAAAe_zDaaOH4kaNfeN3Gk9u8VyTpUm-UgFebKc1KKB8o+PSjPg@mail.gmail.com>

Hi Henson,

> Hi Tatsuo,
> 
> Currently we do not account the cost of RPR while planning.  Attached
>> is the first attempt to try to estimate the RPR costs. The cost model
>> is very simple:
>>
>> expression cost per PATTERN variable * number of input tuples
>>
>> Any idea to make this estimation better?
>>
> 
>  >   foreach(lc, windowFuncs)
>>   {
>>   ...
>> + /* also add DEFINE clause expressions' cost to per-input-row costs */
>> + if (winclause->rpPattern)
>> + {
>> + List   *pattern_vars; /* list of pattern variable names */
>> + ListCell   *lc2;
>> +
>> + pattern_vars = collectPatternVariables(winclause->rpPattern);
>> +
>> + /* iterate according to the pattern variable */
>> + foreach(lc2, pattern_vars)
>> + {
>> + char   *ptname = strVal((char *) lfirst(lc2));
> 
> `collectPatternVariables` returns a list of String nodes (via
> `makeString()`), so `strVal(lfirst(lc2))` is the idiomatic form.
> The `(char *)` cast is misleading.

Ok.

> There is also a correctness issue: DEFINE expressions belong to the
> window clause, not to individual window functions, so their cost
> should not be multiplied by the number of window functions sharing
> the clause.

You are right.

> The fix is to compute the DEFINE cost once outside the loop and add
> it to `startup_cost` and `total_cost` directly, after the
> `foreach(lc, windowFuncs)` block.

Looks good.

> Regarding the cost model: the NFA executor evaluates all DEFINE
> expressions once per row into a shared `nfaVarMatched[]` array that
> all active contexts read from, and contexts advance strictly forward
> so no prior row is ever re-evaluated.  The one-evaluation-per-row
> cost model is therefore accurate.  NFA-aware cost
> modeling could be built on top of this foundation in a separate patch
> down the road, once the NFA implementation has matured.
> 
> For now, the DEFINE expression costs themselves already serve as a
> natural penalty ― a window clause with RPR will consistently appear
> more expensive than a comparable plain window function.  This gives
> the surrounding plan a reasonable cost signal for decisions such as
> join ordering and materialization of RPR subqueries.  So the current
> approach is reasonable as a first step.
> 
> Other than that, the approach looks good to me.  Would it be okay if
> I revise the patch along those lines?

Yes, no problem. Thanks!
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Row pattern recognition
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox