Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5zaI-003puy-1O for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 05:18:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5zaF-007eLs-1l for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 05:18:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5zaE-007eLj-29 for pgsql-hackers@lists.postgresql.org; Fri, 27 Mar 2026 05:18:11 +0000 Received: from fout-b7-smtp.messagingengine.com ([202.12.124.150]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w5zaB-00000001Q1A-09oa for pgsql-hackers@postgresql.org; Fri, 27 Mar 2026 05:18:10 +0000 Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfout.stl.internal (Postfix) with ESMTP id 8485F1D0024B; Fri, 27 Mar 2026 01:18:04 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-02.internal (MEProxy); Fri, 27 Mar 2026 01:18:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paquier.xyz; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1774588684; x=1774675084; bh=0VBER9mXSo 95jPIYUaTeu2x+jGrMHXL+nhC1at4dlfU=; b=gk2RJ0CDKjELuBCNjtp2P2AW/Q mbjMXvjntmu0FfQIp6tMZIZcLkpp03ldpMAzBhQlsie9HJECr7BJUae2RtcXDA5v Z06OqSnBBPuJDteNrXWrnM5S/6HXhmXTFOWYVgH+rO+0LtOggwnXrA57vV4koPhp sXzHhSfrq4toBd9J/XzffF7i/uIVPlfodswr4jy1aXakQzXj1eQNtsyU0fc/R1Bf w/61DqtuMoX94V+GkAL7Ae/T7uhLjc06UjpaWG+F2j6XKNUx9rmb49SS383DZKIQ aQ5w0MEmANJDtTpamptRrYa6SVqR93Ki8JZe/jbX3Soh9uFZKt40Znpn1fhg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1774588684; x=1774675084; bh=0VBER9mXSo95jPIYUaTeu2x+jGrMHXL+nhC 1at4dlfU=; b=OmnIrGKehLzX1BFS+eeAxJ7w2oIU/fYq0V6LIaIY17KC7Cd71Hk jIl3tHbJYdACCrWylteFpkhQGyxEteQwBlgzV+Gl1yKZKVJm9xfRM6gGKZqAU5qA bHzPkxTCiPQo0/0KwETA3lHzy44GpBJOfCnmCdw965NF/bLNvAPeyy0WAV/C1G6h /1vKFZlPMmTHCsn/5HDyFdB+NxS/6xwLz4ShtXYWK42AoRuqmNrtA11n667vrV++ KQwCuZxwIqbs3xs7nsxwLUwyVZYRLM51uYZLRy3bp9Ni7uhWF4/x4uQNY5DdbuxM Iaiv2uyAR6rLTSbTtQJlswZ67EaBccSSe9A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdefvdelgedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnegfrh hlucfvnfffucdljedtmdenucfjughrpeffhffvvefukfhfgggtuggjsehgtderredttdej necuhfhrohhmpefoihgthhgrvghlucfrrghquhhivghruceomhhitghhrggvlhesphgrqh huihgvrhdrgiihiieqnecuggftrfgrthhtvghrnhepveetjefgjeevgedukeehieeuieeu ieevueeiudegheevuefggfduueelgeelieetnecuvehluhhsthgvrhfuihiivgeptdenuc frrghrrghmpehmrghilhhfrhhomhepmhhitghhrggvlhesphgrqhhuihgvrhdrgiihiidp nhgspghrtghpthhtohephedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtoheplhhukh grshesfhhithhtlhdrtghomhdprhgtphhtthhopehsrghmihhmshgvihhhsehgmhgrihhl rdgtohhmpdhrtghpthhtohepiigvnhhgmhgrnheshhgrlhhouggsthgvtghhrdgtohhmpd hrtghpthhtohepphhgshhqlhdqhhgrtghkvghrshesphhoshhtghhrvghsqhhlrdhorhhg pdhrtghpthhtoheprhhjuhhjuhduvdefsehgmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: i0fe9450f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 27 Mar 2026 01:18:01 -0400 (EDT) Date: Fri, 27 Mar 2026 14:17:57 +0900 From: Michael Paquier To: Lukas Fittl Cc: Sami Imseih , zengman , pgsql-hackers , Julien Rouhaud Subject: Re: Refactor query normalization into core query jumbling Message-ID: References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="Jpx1OuXfZqosqf3/" Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Jpx1OuXfZqosqf3/ Content-Type: multipart/mixed; boundary="oXoZJ4iogav/1GXV" Content-Disposition: inline --oXoZJ4iogav/1GXV Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 26, 2026 at 06:50:20PM -0700, Lukas Fittl wrote: > On Thu, Mar 26, 2026 at 10:19=E2=80=AFAM Sami Imseih wrote: > > Both of those changes belong in 0002. See the attached v7. >=20 > Looks good! >=20 > I've also done a quick follow-up test with pg_tracing and that > simplifies its logic as expected [0] to be able to extract inline > parameter values. I have been looking at what you have here. As mentioned upthread, I am on board with aiming for JumbleState to be a const so as we enforce as policy that no extension is allowed to touch what the core code has computed. However, I am not convinced by most of the patch. The logic to recompute the lengths of the constants is a concept proper to PGSS. Perhaps we could reconsider moving that into core at some point, but I am honestly not sure to see the range of benefits we'd gain from that. This line of arguments is stronger for the normalization of the query string. Why should the core code decide what a normalized string should look like when it comes to the detection of the constants, if any? Instead of a dollar-quoted number, we could enforce a bunch of things, like a '?' or a '$woozah$' at these locations. Saying that, the key point of the patch is to take a copy of the JumbleState locations, and recompute it for the needs of PGSS for the sake of the normalized query representation. Hence, why don't we just do that at the end? That should be enough to enforce the const marker for the JumbleState across all the loaded extensions that want to look at it. This leads me to the simpler patch attached. Comments or tomatoes? That's simpler, and I'd be OK with just that for v19. That would be much better than the current state of affairs, where PGSS decides to enforce its own ideas in the JumbleState, ideas=20 fed to anything looping into the post-parse-analyze hook after PGSS. -- Michael --oXoZJ4iogav/1GXV Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=v8-0001-Make-JumbleState-const-in-post_parse_analyze-hook.patch Content-Transfer-Encoding: quoted-printable =46rom 336e6aa38d6dc3901c3fa1f746796a0b8fc87cd2 Mon Sep 17 00:00:00 2001 =46rom: Michael Paquier Date: Fri, 27 Mar 2026 14:13:00 +0900 Subject: [PATCH v8] Make JumbleState const in post_parse_analyze hook Change the post_parse_analyze_hook_type signature to take a const JumbleState parameter, preventing hooks from modifying the jumble state during query analysis. This improves API safety by making it clear that hooks should only read from the jumble state, not modify it. Update pg_stat_statements and related functions to match the new const signature. Refactor fill_in_constant_lengths() to return a newly allocated LocationLen array instead of modifying the JumbleState, so as PGSS does not touch the now-const JumbleState. The routine is renamed to compute_constant_lengths(). Discussion: https://postgr.es/m/CAA5RZ0tZp5qU0ikZEEqJnxvdSNGh1DWv80sb-k4QAU= miMoOp_Q@mail.gmail.com --- src/include/parser/analyze.h | 2 +- .../pg_stat_statements/pg_stat_statements.c | 75 ++++++++++++------- 2 files changed, 50 insertions(+), 27 deletions(-) diff --git a/src/include/parser/analyze.h b/src/include/parser/analyze.h index e10270ff0ffd..b2db6fa4e8c4 100644 --- a/src/include/parser/analyze.h +++ b/src/include/parser/analyze.h @@ -21,7 +21,7 @@ /* Hook for plugins to get control at end of parse analysis */ typedef void (*post_parse_analyze_hook_type) (ParseState *pstate, Query *query, - JumbleState *jstate); + const JumbleState *jstate); extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook; =20 =20 diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_s= tat_statements/pg_stat_statements.c index 6cb14824ec3b..0b6be5e255a4 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -335,7 +335,7 @@ static void pgss_shmem_request(void); static void pgss_shmem_startup(void); static void pgss_shmem_shutdown(int code, Datum arg); static void pgss_post_parse_analyze(ParseState *pstate, Query *query, - JumbleState *jstate); + const JumbleState *jstate); static PlannedStmt *pgss_planner(Query *parse, const char *query_string, int cursorOptions, @@ -359,7 +359,7 @@ static void pgss_store(const char *query, int64 queryId, const BufferUsage *bufusage, const WalUsage *walusage, const struct JitInstrumentation *jitusage, - JumbleState *jstate, + const JumbleState *jstate, int parallel_workers_to_launch, int parallel_workers_launched, PlannedStmtOrigin planOrigin); @@ -378,13 +378,14 @@ static char *qtext_fetch(Size query_offset, int query= _len, static bool need_gc_qtexts(void); static void gc_qtexts(void); static TimestampTz entry_reset(Oid userid, Oid dbid, int64 queryid, bool m= inmax_only); -static char *generate_normalized_query(JumbleState *jstate, const char *qu= ery, +static char *generate_normalized_query(const JumbleState *jstate, + const char *query, int query_loc, int *query_len_p); -static void fill_in_constant_lengths(JumbleState *jstate, const char *quer= y, - int query_loc); +static LocationLen *compute_constant_lengths(const JumbleState *jstate, + const char *query, + int query_loc); static int comp_location(const void *a, const void *b); =20 - /* * Module load callback */ @@ -840,7 +841,7 @@ error: * Post-parse-analysis hook: mark query with a queryId */ static void -pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jst= ate) +pgss_post_parse_analyze(ParseState *pstate, Query *query, const JumbleStat= e *jstate) { if (prev_post_parse_analyze_hook) prev_post_parse_analyze_hook(pstate, query, jstate); @@ -1297,7 +1298,7 @@ pgss_store(const char *query, int64 queryId, const BufferUsage *bufusage, const WalUsage *walusage, const struct JitInstrumentation *jitusage, - JumbleState *jstate, + const JumbleState *jstate, int parallel_workers_to_launch, int parallel_workers_launched, PlannedStmtOrigin planOrigin) @@ -2845,7 +2846,7 @@ release_lock: * Returns a palloc'd string. */ static char * -generate_normalized_query(JumbleState *jstate, const char *query, +generate_normalized_query(const JumbleState *jstate, const char *query, int query_loc, int *query_len_p) { char *norm_query; @@ -2857,12 +2858,14 @@ generate_normalized_query(JumbleState *jstate, cons= t char *query, last_off =3D 0, /* Offset from start for previous tok */ last_tok_len =3D 0; /* Length (in bytes) of that tok */ int num_constants_replaced =3D 0; + LocationLen *locs =3D NULL; =20 /* - * Get constants' lengths (core system only gives us locations). Note - * this also ensures the items are sorted by location. + * Determine constants' lengths (core system only gives us locations), + * and return a sorted copy of jstate's LocationLen data with lengths + * filled in. */ - fill_in_constant_lengths(jstate, query, query_loc); + locs =3D compute_constant_lengths(jstate, query, query_loc); =20 /* * Allow for $n symbols to be longer than the constants they replace. @@ -2888,15 +2891,15 @@ generate_normalized_query(JumbleState *jstate, cons= t char *query, * the parameter in the next iteration (or after the loop is done), * which is a bit odd but seems to work okay in most cases. */ - if (jstate->clocations[i].extern_param && !jstate->has_squashed_lists) + if (locs[i].extern_param && !jstate->has_squashed_lists) continue; =20 - off =3D jstate->clocations[i].location; + off =3D locs[i].location; =20 /* Adjust recorded location if we're dealing with partial string */ off -=3D query_loc; =20 - tok_len =3D jstate->clocations[i].length; + tok_len =3D locs[i].length; =20 if (tok_len < 0) continue; /* ignore any duplicates */ @@ -2915,7 +2918,7 @@ generate_normalized_query(JumbleState *jstate, const = char *query, */ n_quer_loc +=3D sprintf(norm_query + n_quer_loc, "$%d%s", num_constants_replaced + 1 + jstate->highest_extern_param_id, - jstate->clocations[i].squashed ? " /*, ... */" : ""); + locs[i].squashed ? " /*, ... */" : ""); num_constants_replaced++; =20 /* move forward */ @@ -2924,6 +2927,10 @@ generate_normalized_query(JumbleState *jstate, const= char *query, last_tok_len =3D tok_len; } =20 + /* Clean up, if needed */ + if (locs) + pfree(locs); + /* * We've copied up until the last ignorable constant. Copy over the * remaining bytes of the original query string. @@ -2942,8 +2949,9 @@ generate_normalized_query(JumbleState *jstate, const = char *query, } =20 /* - * Given a valid SQL string and an array of constant-location records, - * fill in the textual lengths of those constants. + * Given a valid SQL string and an array of constant-location records, ret= urn + * the textual lengths of those constants in a newly allocated LocationLen + * array, or NULL if there are no constants. * * The constants may use any allowed constant syntax, such as float litera= ls, * bit-strings, single-quoted strings and dollar-quoted strings. This is @@ -2959,16 +2967,19 @@ generate_normalized_query(JumbleState *jstate, cons= t char *query, * past the first to -1 so that they can later be ignored. * * If query_loc > 0, then "query" has been advanced by that much compared = to - * the original string start, so we need to translate the provided locatio= ns - * to compensate. (This lets us avoid re-scanning statements before the o= ne - * of interest, so it's worth doing.) + * the original string start, as is the case with multi-statement strings,= so + * we need to translate the provided locations to compensate. (This lets = us + * avoid re-scanning statements before the one of interest, so it's worth + * doing.) * * N.B. There is an assumption that a '-' character at a Const location be= gins * a negative numeric constant. This precludes there ever being another * reason for a constant to start with a '-'. + * + * It is the caller's responsibility to free the result, if necessary. */ -static void -fill_in_constant_lengths(JumbleState *jstate, const char *query, +static LocationLen * +compute_constant_lengths(const JumbleState *jstate, const char *query, int query_loc) { LocationLen *locs; @@ -2977,14 +2988,20 @@ fill_in_constant_lengths(JumbleState *jstate, const= char *query, core_YYSTYPE yylval; YYLTYPE yylloc; =20 + if (jstate->clocations_count =3D=3D 0) + return NULL; + + /* Copy constant locations to avoid modifying jstate */ + locs =3D palloc_array(LocationLen, jstate->clocations_count); + memcpy(locs, jstate->clocations, jstate->clocations_count * sizeof(Locati= onLen)); + /* * Sort the records by location so that we can process them in order while * scanning the query text. */ if (jstate->clocations_count > 1) - qsort(jstate->clocations, jstate->clocations_count, + qsort(locs, jstate->clocations_count, sizeof(LocationLen), comp_location); - locs =3D jstate->clocations; =20 /* initialize the flex scanner --- should match raw_parser() */ yyscanner =3D scanner_init(query, @@ -3008,7 +3025,11 @@ fill_in_constant_lengths(JumbleState *jstate, const = char *query, if (locs[i].squashed) continue; /* squashable list, ignore */ =20 - /* Adjust recorded location if we're dealing with partial string */ + /* + * Adjust the constant's location using the provided starting location + * of the current statement. This allows us to avoid scanning a + * multi-statement string from the beginning. + */ loc =3D locs[i].location - query_loc; Assert(loc >=3D 0); =20 @@ -3064,6 +3085,8 @@ fill_in_constant_lengths(JumbleState *jstate, const c= har *query, } =20 scanner_finish(yyscanner); + + return locs; } =20 /* --=20 2.53.0 --oXoZJ4iogav/1GXV-- --Jpx1OuXfZqosqf3/ Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEG72nH6vTowiyblFKnvQgOdbyQH0FAmnGEwUACgkQnvQgOdby QH0JZg/+Jrmcz2lFZ46Twx1pR5WBWWdXo9LcPCq6c7ro/9x258HLgu8+v8n0Eun0 rX1UT8JdeD2f3m8qbDvgGPXTKMJQCCje5gRmYUrPIHMh0nR6DbJ9mcTbfioI9nlE OwV4os7qIrJsKunGNZ9uDSt/rJseAkDD/4K+vbWR/8aHW35NvMpB3lK4C/vD0kvl wV8g8iav1eamZT/VWLX/j85vLBE3CvN1wGTPwhz0f8D7DK8DKkuDfBjrb4ZYfUAe AQcw94iBw/5g59/CUpKE21+T3N4GdzUe+vBS6TUJ6gPdxt9pRaEztxtYRUJzlHWg CrzOh7tas+iTgASDDPDQ/2yvqabUE/1T+Ku0ZD+Ry7YFqje361xeLoBKlhh2K0V3 /fZTN0uxxcKjEoetpK6g8Vmr/sKLuRdL83FPwj7ImyUsSYTrhH7JQVxLz/umJJgt z+YwIidvd+fyg8Gbp3+VUaH7H1fSF6rCAZcplu1B00qSDRGACcaP76aIKi137olL pXLzWTEYyKVa7PSzzZlzYb7fUJDQhkchTCUJdLfQktE3/1P8waZ06snmpN/bKdBa 0RzRNyevU40HAnvP6CjODcioExK+LK9tYsOxkN82hB3Y3ipbWwqbMX3LJTjvTMAl bh87Hb5+LfxDvcl5kp/Z5+5INPtvT08wM+asvN+2lkyU3ULvkNQ= =wME3 -----END PGP SIGNATURE----- --Jpx1OuXfZqosqf3/--