public inbox for [email protected]
help / color / mirror / Atom feedFrom: Tatsuo Ishii <[email protected]>
To: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Fri, 10 Apr 2026 18:35:15 +0900 (JST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAAAe_zCCnCJYd2aovB1wf1BFXCSR1raFmVZS1dZjj76_Vtyt-Q@mail.gmail.com>
References: <CAAAe_zDEDtJEf6aO=x3qhPOs8kXmsha7Q8ROJXE8VZNp+Dxe2A@mail.gmail.com>
<[email protected]>
<CAAAe_zCCnCJYd2aovB1wf1BFXCSR1raFmVZS1dZjj76_Vtyt-Q@mail.gmail.com>
Hi Henson,
> Hi Tatsuo,
>
> Thank you for the careful review.
>
> Before this, there's a check in the function:
>>
>> if (tle->ressortgroupref || tle->resjunk)
>> continue;
>>
>> Below is the query in the regression test that is suppoed to trigger
>> the error. In this case column "i" in the target list should have
>> resjunk flag to be set to true. So I thought the if statenment above
>> becomes true and the column i is not removed from subquery's target
>> list. Am I missing something?
>
>
> That is a very good observation. For the test query you cited, the
> resjunk check does indeed protect column "i", as you noted.
>
> The allpaths.c guard is needed for a different scenario: when a Var
> already exists in the inner SELECT as a non-resjunk entry, but the
> outer query does not reference it. For example:
>
> SELECT count(*) FROM (
> SELECT i, count(*) OVER w FROM generate_series(1,10) i
> -- ^ in SELECT -> resjunk=false
> WINDOW w AS (
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> PATTERN (A+)
> DEFINE A AS i > PREV(i)
> )
> ) t;
> -- outer uses only count(*) -> i is not in attrs_used
>
> In this case, the Var addition logic in parse_rpr.c (pull_var_clause
> + dup check) finds that "i" already exists in the targetlist, so it
> does not add a new resjunk entry. Since "i" is resjunk=false and the
> outer query does not reference it, remove_unused_subquery_outputs()
> would replace it with NULL, breaking DEFINE evaluation.
>
> Changing the existing entry to resjunk=true is not an option either,
> since it would alter the subquery's output schema -- the outer query
> may reference that column, and at parse time we do not yet know
> whether it will. The existing protection mechanisms (resjunk,
> ressortgroupref, attrs_used, volatile check) cannot express
> "referenced by DEFINE", since DEFINE is a new column reference path
> that did not exist in PostgreSQL before. The allpaths.c guard
> follows the same pattern as the existing SRF and volatile guards in
> that function.
>
> Also I think removal of the WindowAgg node is fine here because it's
>> not necessary to calculate the result of the sub query at
>> all. Actually the query above runs fine on v46. I guess some of the
>> incremental patches caused this to stop working. Can you elaborate?
>>
>
> I agree -- for this specific query, removing the WindowAgg does not
> change the count(*) result, since window functions do not change the
> number of rows.
>
> Regarding v46: the two bugs were actually latent from the initial
> implementation, not caused by incremental patches. The old code
> added the entire DEFINE expression (e.g., "i > PREV(i)") to the
> query targetlist via findTargetlistEntrySQL99(). In most cases,
> DEFINE conditions involve comparison operators, so the resulting
> OpExpr did not match existing Var entries in the targetlist, and a
> new resjunk entry was created. This accidentally prevented removal.
> (In fact, a bare boolean DEFINE such as "DEFINE A AS flag" would
> have exhibited the same removal bug even in v46, since
> findTargetlistEntrySQL99 would reuse the existing entry instead
> of creating a new resjunk one -- confirming that the issue was
> latent in the original implementation, not introduced by recent
> patches.)
>
> However, that same approach caused the SIGSEGV (Bug 1): when RPR
> and non-RPR windows coexist, the non-RPR WindowAgg inherits
> targetlist entries containing RPRNavExpr nodes that it cannot
> evaluate.
>
> The two bugs are two sides of the same coin -- the old approach
> prevented removal but caused SIGSEGV. Switching to Var-only
> addition fixed the SIGSEGV but required the allpaths.c guard to
> prevent removal.
>
> As for more precise optimization (allowing removal when all
> WindowFuncs for an RPR clause are unused), it would require
> structural changes to the loop in remove_unused_subquery_outputs(),
> which currently evaluates entries individually. Since this function
> runs for all subqueries, not just RPR, such a change would need
> broader testing.
>
> The patch addresses two distinct issues: the parse_rpr.c change
> (Var-only addition) prevents RPRNavExpr from propagating to
> non-RPR WindowAgg nodes, while the allpaths.c guard prevents
> incorrect column removal. Since both are needed, the guard
> cannot simply be reverted.
Thank you for the detailed explanation. That makes sense.
> Would it make sense to keep it as-is
> for now, and explore the more precise optimization as a separate
> follow-up patch?
I agree keep it as is. I think priority of "more precise optimization"
is low for now. We could explore when we have time. What do you think?
Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Row pattern recognition
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox