Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vYG9l-00H5wm-1p for pgsql-hackers@arkaria.postgresql.org; Wed, 24 Dec 2025 04:07:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vYG9i-003Lzl-2o for pgsql-hackers@arkaria.postgresql.org; Wed, 24 Dec 2025 04:07:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vYG9i-003Lza-0H for pgsql-hackers@lists.postgresql.org; Wed, 24 Dec 2025 04:07:23 +0000 Received: from fout-a6-smtp.messagingengine.com ([103.168.172.149]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vYG9h-002Kw3-0d for pgsql-hackers@postgresql.org; Wed, 24 Dec 2025 04:07:21 +0000 Received: from phl-compute-11.internal (phl-compute-11.internal [10.202.2.51]) by mailfout.phl.internal (Postfix) with ESMTP id 32F06EC0169; Tue, 23 Dec 2025 23:07:19 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-11.internal (MEProxy); Tue, 23 Dec 2025 23:07:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paquier.xyz; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1766549239; x=1766635639; bh=C7d7ER3Yb1 BeeV7qU65qkOeq6ivc1HEZJw7T9UYJGy0=; b=PTWWrNHKYdC2Az+sgGAZUdX4PY HOfoMFHCgt9+ixsgAvhswWsV+16Q6/agzQmlTeETlZ+EXiXhBNAJXelx17TIAmmN hZx+LQx9xP8pcoBzlIy6D4QJoQpMnCh/FOiP+SSLp03w+rzY4FpzCDEyirs6HpqK YqD2dxJ8JUys98rSRHwXhCCUeRzTt1/FgXWBi7BrguhmdAksh2hIKx+CChE0BuCy DW7BTBWSPYGbEcG7j3H6QO7ZggzAW0lhky/u0jsOZH8KTrhrEI+pXL5QuAQkkF9u 7zBhaMjhIz7UgGeHiVcYAI2Ea5Lk3R/h/YbSPht0Nhy2iMATSq8FuapvhJog== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1766549239; x=1766635639; bh=C7d7ER3Yb1BeeV7qU65qkOeq6ivc1HEZJw7 T9UYJGy0=; b=VyYFRVdOjcV8aAJ44L1NVYtewgJHx62mgf8s16KUBioOOQCiakG ez/2dK1RXu9L/jKjQuYCFtd14fSXIyHKdoY0HeOCUCKVoKqDNxNK2hpTHg7eg+1S YiUgXvniYoSVTtLWw7L9RiLu6/WECgG9JYlLGV6gHCYTZHW2kks/ju/4bEZSAutH tP0rhxhWZVXlJh4a6OccBZ752taS7iTngjO1mc0YF4Zogoxj+EhUyO28xrDpOdi6 7Ll1MDbn8ZpK6OveONklTgnkzloZBTG7CmNz/a8yVKGFZW+dmaLwkBYtmGDDY071 f3f/Cz7Ar+9mWMM3QAWYuso/r6VSioiP9jw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdeiudejtdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenfghrlh cuvffnffculdejtddmnecujfgurhepfffhvfevuffkfhggtggujgesghdtreertddtvden ucfhrhhomhepofhitghhrggvlhcurfgrqhhuihgvrhcuoehmihgthhgrvghlsehprghquh hivghrrdighiiiqeenucggtffrrghtthgvrhhnpeetleeifedufffhhfdtteelgeeggeff hfekueevteeigfduudevudetgfegiedvjeenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehmihgthhgrvghlsehprghquhhivghrrdighiiipdhn sggprhgtphhtthhopeefpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehsrghmih hmshgvihhhsehgmhgrihhlrdgtohhmpdhrtghpthhtohepiigvnhhgmhgrnheshhgrlhho uggsthgvtghhrdgtohhmpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrshesphhosh htghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: i0fe9450f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 23 Dec 2025 23:07:16 -0500 (EST) Date: Wed, 24 Dec 2025 13:06:59 +0900 From: Michael Paquier To: Sami Imseih Cc: zengman , pgsql-hackers Subject: Re: Refactor query normalization into core query jumbling Message-ID: References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="I7SKfRjmYIa+CS4q" Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --I7SKfRjmYIa+CS4q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Dec 23, 2025 at 10:35:18AM -0600, Sami Imseih wrote: > > Though this may be tangential to the current topic, I have long been wa= nting to revise the two instances of `Assert(len_to_wrt >=3D 0);` > > in the code to the implementation below. Would you kindly advise if thi= s modification is worthwhile? > > > > ``` > > if (len_to_wrt > 0) > > { > > memcpy(norm_query + n_quer_loc, query + quer_loc, len_t= o_wrt); > > n_quer_loc +=3D len_to_wrt; > > } > > ``` >=20 > I am not sure I like this idea. The len_to_wrt being less than 0 > indicates a bug which will be hidden behind such a loop. I > think it's better to keep the Assert as-is. This set of asserts are critical to keep. An incorrect computation is a synonym of an out-of-bound write in this case, which would classify any problem as something that could be worth a CVE. > v3 implements this approach without a callback. This establishes a clear > boundary: core owns JumbleState modifications, extensions consume the > results through the API. >=20 > I kept the post_parse_analyze hook signature unchanged since > GenerateNormalizedQuery still needs to modify JumbleState > (to populate constant lengths). Making it const would be misleading. >=20 > For the second part of making more jumbling utilities global, this can > be taken up in another thread. I am now thinking we would be better > off actually taking all the generic jumbling routines and separating > them into their own C file. So we can have a new file like primjumble.c > which will have all the "primitive" jumbling utilities, and queryjumblefu= ncs.c > can worry about its core business of jumbling a Query tree. Anyhow, > these are just some high level thoughts on this part for now. Hmm. Moving from pgss to the core backup the code that updates the clocations of an existing JumbleState does not solve the issue that we should not modify the internals of the JumbleState computed. So this does not move the ball at all. The updates of clocations depend on GenerateNormalizedQuery(), which depends on the pre-work done in CleanQuerytext() to get a query string and its location for a multi-query string case. If we really want to make a JumbleState not touchable at all, we could either push more work of CleanQuerytext() and GenerateNormalizedQuery() into core. Or a second thing that we can do, which would be actually much simpler, is to rework entirely fill_in_constant_lengths() so as we generate a *copy* of the location data PGSS wants rather than force it into a JumbleState that would be pushed across more stacks of the post-parse-analyze hook. Doing that would allow us to pull the plug into making JumbleState and the original copies of clocations a set of consts when given to extensions. -- Michael --I7SKfRjmYIa+CS4q Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEG72nH6vTowiyblFKnvQgOdbyQH0FAmlLZuMACgkQnvQgOdby QH3wlQ/9HU1sDMED+ydsyKOWVgTF//ebMhN2maqrTZ6VQfCr/JavfhnDd8JrUq+O xcu4yFluuf1dyxhYvfJlhfVOHr0DgdrBAi9lKypAvfjQvchcAx4++p35XwNocszo wJ5+TbY3Z1gEEDJ9CQIUCqdOzCBIW61fNtDYQVF6GNYW2evsf3Gdp9zoXI9l9pDb dLQoJSaJHhnYK5STvEiuvZT0hLOUBjhWjTU/oJsKF6f2AM4o5c+wHJB2B1cIQFZL /YW+jNi70jD/XJxmiEZui2Gl6amT+pM4ikbiWhHbfQAX6UpbVFAkxFfFEcLcUVd6 T1D+ZOmrPzalsuYJb2WIJmrPOtofHoz9fb+UWJwr9W3K5iZ8nafTBoUyoFF+g6+/ 7gPVtu+E2RpQ9KgCD3XfNnt5QcDuj4Skg3ALSv6PReXdyTdcqtf9efQDFrW4uPtf ww1+3L5/9n5iT6OPGRl3YTTCgh9lCK9Op2qP7rOhIfCUKKRJ+1XLrru63DSO5y+2 6trRNNwGWaBhSMMo6bsLuBcn0Hwnc7XexcY3OSXZrdokEMZnMv0vvTmt4F5Gqe2X qhW9yLxKpJ0UWLYbjeIYV8I1Q+vMLoV5UdJZntbV2RCRu4lmF+GvjFM55+qmjZNj SJLTPL9MOVHWce4Dj17jajYWzF0i6sN6Z8EK7SMVsTrDxLEAKeY= =hGlH -----END PGP SIGNATURE----- --I7SKfRjmYIa+CS4q--