Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvw3u-00B7Gr-1m for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Feb 2026 11:31:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvw3t-002kF5-0u for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Feb 2026 11:31:13 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvw3s-002kEs-2k for pgsql-hackers@lists.postgresql.org; Fri, 27 Feb 2026 11:31:12 +0000 Received: from mail-pj1-x102b.google.com ([2607:f8b0:4864:20::102b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vvw3p-00000001Tq0-3Lok for pgsql-hackers@postgresql.org; Fri, 27 Feb 2026 11:31:11 +0000 Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-3590d548576so1084099a91.2 for ; Fri, 27 Feb 2026 03:31:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772191870; cv=none; d=google.com; s=arc-20240605; b=BghPPiisM66HD+ajbAU/htyHGc52uf/4P58CxCo0v2d0D1sGRukf8g2xptQcxmq8EV bfdnUm6Vv02K8NJDD35pS0TF4ImSr5IgyTbdIMTvx0Cp8fWcaee2VBO3XZ6fQvzjmoKb 0Kvi5WE7HXYsij1G1dqgh2HoJoDxCpPtUQ9fXKBMeT2G0GjNDOjYqjv6kzYKJnaVCpP8 D9iP11UFKmsN2Xyj3TwgPlgSJMGS+S1USiaRg5VLGz/MN7E4dHfh4Rv2Atz/n2nOTjjg nFXKLTAlzScE+rygg0Qg8jGod0tF89tzvI5ojyf3tVj/G54gCHyyiZznYWF52oa3+Y0K QBTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=KGSkkN1oCC6Z5pgCRcWifKIszpilc49MF4Y5VXk14I4=; fh=8eshTHZg3bh0kg02WP0tGL0gz8eaP3tReBMZ2bkLl54=; b=XJPKtGZLZCpXpN5FNj/z4JtAtaodoF34HvVCmaImobT+hpzooBXHcJCaIXk0iWT25T cclM+zyft+GUkgA4GCLNxg8fX2GOqYGMD2vOErp/4nOjCbn77RNlGLNEfJgvPfSfOS6r G8zfsM+rHo2MusTCv0xFDT5VZ1hW4Oqvmv50Eso4ArNWN2CmIVqgc1LexbQfwHAu4LaP ujtD0zp64yaKACktCjCmTzVoFAssesq+rsXKBwWqvG3BXY3tAJzTYeV9JOeF85kUa/LP lJKyb324PkPYiCIjhRHBe/AupZhDAV0xlDJjAGfCkMXdi+bEj/WzGKKjGxY4Hz/XJEcI D2Qg==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772191870; x=1772796670; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KGSkkN1oCC6Z5pgCRcWifKIszpilc49MF4Y5VXk14I4=; b=C0R9KoOkidX8XfRreTWMSOLxkWpQZqgJ5WyZdohfVERpjvRMF15auIoJ+xyrijB9QN 2cBncM3b0ZUohEuG2V5uoFxEUBuhSYC62dVwIjqTMzL4IWHzPdblM9VjBVbx3usa7Meu b7GOD6YVsNyDswJnmlkBp0H5XJiDH3jv85YNloXVbdQ9e1sjetPm8+UWYPr4f7lvJ9Lx J0l2IyHA/TE28d3Ygq5ZoELRqe4NrGeryMEt41a/Ig1EZGPpNow7RRrRKZvRwOFIz8H0 OvvHZ9F2eMRMBRW1hG/XqEgzrjN/S46YXSmXxUtDDdyFjwtsFoSEbprOfW1EyeuaXo1b ZiCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772191870; x=1772796670; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KGSkkN1oCC6Z5pgCRcWifKIszpilc49MF4Y5VXk14I4=; b=gFX3kQikkxp1BvkjCWSCdouApaFRM6LkZgwoCMU8zG1r8LbQxQi3txA+yyn7r6f0yL bwN1IS/9WU8mvZkMZ/FuyftXGYwPdwsZUWkM+CoIZnZHUzaiMfgqn+hNgQ7dgGwXzLP+ gbP+DHuBLmHj2vdT7a6P6Y2J8bSJLAqt1kdxS6Fpv82StaXmc0Z2fMUTSXOKnMt25dNx W7ntFepSv5zzW5y60xUNSL+3EmIwtFchwE4cYzVkXjiGcGnYS/cqDMJhDYcnHeTCAA8K ij2f0x5eI+cYNDKcb7a977eMZ/Zt1Soe3oX0yt8S7LOewm+lrOgJDwD3BaOdVJKBKgXZ GPQw== X-Forwarded-Encrypted: i=1; AJvYcCUDSPDxrl0I7ZM88znNccqXhBsgFTilx0J4/0QYmSxBy+de3oQiCzZ4pYCucxswz9v/eit3bAGM0Dq90LXA@postgresql.org X-Gm-Message-State: AOJu0YwsBjtZn76yG4MVwI/HkEzcp+h4akAoMsj9mjdgKjwc/zMnnP8X 8GuLmZA2z/9Cgj6dxkg2VUkQQ21UUfI4YmO3+F0/cUMK9Fq184p3GyxQ3XSw1+5GyYgCjxh6+86 HU8rX1CU3gtCssGLYKLc0ISro21xDmEA= X-Gm-Gg: ATEYQzzCGViMdayJretDFGQfvF044I2BuCvkaKxFbWNuvesnF4vN/2y6WuITtDRZYm2 oXWdeE4vSUgFQ5AxgskgcxflmeRBv+GHw//Tra/Vaw0QpvVgMp8QNP9GbxVQM797rIaC3gSn5ZU eI1frGjlbMSBZgYZGWG7FA6JQIPDDd4lg0VMaMk7i5r2FZf1eRL4B46vMXDUxIGFEy4UAvVW3TY Vckan2Zz3MRXNwczhadVzI3dBXRkYddNtwx/p+/tGdbWtlMP8rICjyaJgzLI8e+rX58cUzT+2FG r2gXYjmslfM+yTlWFnAF/nU6B3ZTeM+aBlpAzcNmo7JI/Ee2OTA= X-Received: by 2002:a17:90b:524a:b0:355:35b0:8b78 with SMTP id 98e67ed59e1d1-35965cd1f74mr2270838a91.27.1772191869997; Fri, 27 Feb 2026 03:31:09 -0800 (PST) MIME-Version: 1.0 References: <20260224.115625.1966558814200895991.ishii@postgresql.org> <20260224.140927.1828965853586507533.ishii@postgresql.org> <20260224.204417.914253025244188300.ishii@postgresql.org> In-Reply-To: <20260224.204417.914253025244188300.ishii@postgresql.org> Reply-To: assam258@gmail.com From: Henson Choi Date: Fri, 27 Feb 2026 20:30:57 +0900 X-Gm-Features: AaiRm50HBG9JgW2OJXh2NqvNc8qxDRi6dR40HbPf2j9Cnzr9xzcQt3sORrDMCG4 Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii Cc: vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000e297d8064bcc922a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e297d8064bcc922a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Tatsuo, Currently we do not account the cost of RPR while planning. Attached > is the first attempt to try to estimate the RPR costs. The cost model > is very simple: > > expression cost per PATTERN variable * number of input tuples > > Any idea to make this estimation better? > > foreach(lc, windowFuncs) > { > ... > + /* also add DEFINE clause expressions' cost to per-input-row costs */ > + if (winclause->rpPattern) > + { > + List *pattern_vars; /* list of pattern variable names */ > + ListCell *lc2; > + > + pattern_vars =3D collectPatternVariables(winclause->rpPattern); > + > + /* iterate according to the pattern variable */ > + foreach(lc2, pattern_vars) > + { > + char *ptname =3D strVal((char *) lfirst(lc2)); `collectPatternVariables` returns a list of String nodes (via `makeString()`), so `strVal(lfirst(lc2))` is the idiomatic form. The `(char *)` cast is misleading. There is also a correctness issue: DEFINE expressions belong to the window clause, not to individual window functions, so their cost should not be multiplied by the number of window functions sharing the clause. The fix is to compute the DEFINE cost once outside the loop and add it to `startup_cost` and `total_cost` directly, after the `foreach(lc, windowFuncs)` block. Regarding the cost model: the NFA executor evaluates all DEFINE expressions once per row into a shared `nfaVarMatched[]` array that all active contexts read from, and contexts advance strictly forward so no prior row is ever re-evaluated. The one-evaluation-per-row cost model is therefore accurate. NFA-aware cost modeling could be built on top of this foundation in a separate patch down the road, once the NFA implementation has matured. For now, the DEFINE expression costs themselves already serve as a natural penalty =E2=80=94 a window clause with RPR will consistently appear more expensive than a comparable plain window function. This gives the surrounding plan a reasonable cost signal for decisions such as join ordering and materialization of RPR subqueries. So the current approach is reasonable as a first step. Other than that, the approach looks good to me. Would it be okay if I revise the patch along those lines? Best regards, Henson --000000000000e297d8064bcc922a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tatsuo,

=
Cur= rently we do not account the cost of RPR while planning.=C2=A0 Attached
is the first attempt to try to estimate the RPR costs. The cost model
is very simple:

expression cost per PATTERN variable * number of input tuples

Any idea to make this estimation better?

=C2=A0> =C2=A0 foreach(lc, windowFuncs)
> =C2=A0 {
>= =C2=A0 ...
> + /* also add DEFINE clause expressions' cost to = per-input-row costs */
> + if (winclause->rpPattern)
> + {=
> + List =C2=A0 *pattern_vars; /* list of pattern variable names = */
> + ListCell =C2=A0 *lc2;
> +
> + pattern_vars =3D= collectPatternVariables(winclause->rpPattern);
> +
> + /*= iterate according to the pattern variable */
> + foreach(lc2, patt= ern_vars)
> + {
> + char =C2=A0 *ptname =3D strVal((char = *) lfirst(lc2));

`collectPatternVariables` returns a list of String = nodes (via
`makeString()`), so `strVal(lfirst(lc2))` is the idiomatic fo= rm.
The `(char *)` cast is misleading.

There is also a correctnes= s issue: DEFINE expressions belong to the
window clause, not to individu= al window functions, so their cost
should not be multiplied by the numbe= r of window functions sharing
the clause.

The fix is to compute t= he DEFINE cost once outside the loop and add
it to `startup_cost` and `t= otal_cost` directly, after the
`foreach(lc, windowFuncs)` block.

= Regarding the cost model: the NFA executor evaluates all DEFINE
expressi= ons once per row into a shared `nfaVarMatched[]` array that
all active c= ontexts read from, and contexts advance strictly forward
so no prior row= is ever re-evaluated.=C2=A0 The one-evaluation-per-row
cost model is th= erefore accurate.=C2=A0 NFA-aware cost
modeling could be built on top of= this foundation in a separate patch
down the road, once the NFA impleme= ntation has matured.

For now, the DEFINE expression costs themselves= already serve as a
natural penalty =E2=80=94 a window clause with RPR w= ill consistently appear
more expensive than a comparable plain window fu= nction.=C2=A0 This gives
the surrounding plan a reasonable cost signal f= or decisions such as
join ordering and materialization of RPR subqueries= .=C2=A0 So the current
approach is reasonable as a first step.

Ot= her than that, the approach looks good to me.=C2=A0 Would it be okay if
= I revise the patch along those lines?

Best regards,
Henson
--000000000000e297d8064bcc922a--