Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vYSjT-00442o-2i for pgsql-hackers@arkaria.postgresql.org; Wed, 24 Dec 2025 17:33:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vYSjS-005FeJ-07 for pgsql-hackers@arkaria.postgresql.org; Wed, 24 Dec 2025 17:33:06 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vYSjR-005Fe8-29 for pgsql-hackers@lists.postgresql.org; Wed, 24 Dec 2025 17:33:06 +0000 Received: from mail-ed1-x52c.google.com ([2a00:1450:4864:20::52c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vYSjP-002YOk-0J for pgsql-hackers@postgresql.org; Wed, 24 Dec 2025 17:33:05 +0000 Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-64c893f3a94so6182138a12.0 for ; Wed, 24 Dec 2025 09:33:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766597577; x=1767202377; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=d2Dul0ic8CmZhUYWAMJRruxIZI0gh4wGSKsuLK5Ti0M=; b=TEVcg0G9l053KzD/mBFhJdd/AykDc+02I1uNZN82NlKh7ngoU6M07VSsS6qu/lKhBG RmPRBEpdqo4jsaRXnGzRYh1xFykt8kJDKjUHGr6eRAlNbpK4KupD/UBxXzqYtgRBtG9t x2JkKWNMjyFxZ4ACe4tMloBgwwh+Yz3WC/eCTDyGA3qA2juYo9ivuySI8QftOcI9R+Hi LUVJYgNiDPOUQP5hzBykmvdwmR/ro7sLIBiPBXgk/nT1w0Pisk3C2gh4K5gJRkVWlhC1 x4gU/i8RzPPho+X5TAzh0m/q4Z44H8DKbKXIeg+HWIOcbN4yfn3tS+no0F1VuaKbUrz/ YKeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766597577; x=1767202377; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=d2Dul0ic8CmZhUYWAMJRruxIZI0gh4wGSKsuLK5Ti0M=; b=cBhaspwkkEzKvxYH1kKQsxZkSd+hkc6sEMlTC2P2ckYMnh5cBw6KbOHSOobTR4x/PJ o7ZwHuw+q5iIOGgDeR5BXPkabuQ5tiUAyX45wzEiYybXM9A4c4AtkxTrNJojo/Gh7s90 MQzCRENrz9I4sLDDy/VIvLqkIsB7xJEGQK/2IbQXaY8XK2UVMLNsCnBqrScX1Zv9JEPD tzxcDhww6DA3LR6ogEGtndYwxiiB16p35cugqXy99WSHvBBtXWlHrhD2p2ObZ6is+NrR aVPluONZemlOmOSRfZbktf+NJN+7gUFJ1kELh86+Fit4aRQPK2gQqCdZQg/rQ6KLtrwG UArg== X-Forwarded-Encrypted: i=1; AJvYcCVwXBo5KTK0UfVttBhqp2vMfJVQLc81FfNG769vGaUnsOJL6L3xgMWJzoODuA/krFBiVmC7nxrVCy1jnNj/@postgresql.org X-Gm-Message-State: AOJu0YzbZwMXW7rtM6aVS8RCCibjkzxs3iqB87cuZGhbSOlfPk1kRSoe /enfvUmcnSs+SsSdMnFAxLGF/MDWOEXMtACIKNulDKdeTG4rvBtj6sofCZQC4+EK0elMu2c+AEz 2DG6GQHtv7XnWMeJUTBHRmsh10GirzJe+Fbti2epJ9g== X-Gm-Gg: AY/fxX4Gtrglpz+593a/VpyTOYQL2Q0PYtD3PtzM5U418Hup98XTmFMgthhtXVQEVLT byRUUr2w8ofI6UAvtCpt9P1ZZzDXrbTqtPMrsqjdI42OCXDiYQcSCyryG693QIX0wJ3WPSIDPsz ZQVGb5EfUS0JZwluxmYCcG0lLsh5G+XhkvIzbYRild2PIZxnDiIyDf2IyhOuUPlBhSzhXCP2P10 qKZhgTyx/4kwjCc6d3mCvKPRB+0gzGAUSW/3VV+G0jMbxrL0TDlhtIJblbVKMxz0mQ= X-Google-Smtp-Source: AGHT+IFK8YGOY4MxrV99ZdYiNb66Mi/Ym3VGLGeho9ws+a6+KlYtR6Ma6rz+ptPehMUlla25H7NcDShEi/rhM3FT/IE= X-Received: by 2002:a17:907:960d:b0:b73:9222:6a64 with SMTP id a640c23a62f3a-b80355b7995mr2166393166b.3.1766597576596; Wed, 24 Dec 2025 09:32:56 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Sami Imseih Date: Wed, 24 Dec 2025 11:32:45 -0600 X-Gm-Features: AQt7F2qgWWvaoVj8M4_qJNLjeHRnWwSAXfYZoAOgDR6Ips9zJf0nPgxubB8yn2A Message-ID: Subject: Re: Refactor query normalization into core query jumbling To: Michael Paquier Cc: zengman , pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > > For the second part of making more jumbling utilities global, this can > > be taken up in another thread. I am now thinking we would be better > > off actually taking all the generic jumbling routines and separating > > them into their own C file. So we can have a new file like primjumble.c > > which will have all the "primitive" jumbling utilities, and queryjumble= funcs.c > > can worry about its core business of jumbling a Query tree. Anyhow, > > these are just some high level thoughts on this part for now. > > Hmm. Moving from pgss to the core backup the code that updates the > clocations of an existing JumbleState does not solve the issue that > we should not modify the internals of the JumbleState computed. So > this does not move the ball at all. This is for the point "I am wondering if we should not expose a bit more the jumble query APIs" [0], which as I mention is a separate topic and is not about extensions being able to modify JumbleState. > The updates of clocations depend on GenerateNormalizedQuery(), which > depends on the pre-work done in CleanQuerytext() to get a query string > and its location for a multi-query string case. If we really want to > make a JumbleState not touchable at all, we could either push more > work of CleanQuerytext() and GenerateNormalizedQuery() into core. When you say =E2=80=9Cpush more work into core,=E2=80=9D I understand that = to mean this work should occur during jumbling. If so, there are several complications. 1/ CleanQueryText() requires access to the query text, which we do not have during core jumbling as of 2ecbb0a4935. 2/ GenerateNormalizedQuery() should be invoked on demand by the extension that needs the normalized string, for example when pg_stat_statements needs to store a query string. Both operations are expensive, especially GenerateNormalizedQuery(). Doing this work on every JumbleQuery() would introduce unnecessary overhead. > Or a second thing that we can do, which would be actually much > simpler, is to rework entirely fill_in_constant_lengths() so as we > generate a *copy* of the location data PGSS wants rather than force it > into a JumbleState yes, that's an idea that did cross my mind. I have hesitation of copying clocations, but that may not be such a big deal, since it will only be occurring on demand when the extension requests a normalized string. > that would be pushed across more stacks of the > post-parse-analyze hook. Doing that would allow us to pull the plug > into making JumbleState and the original copies of clocations a set of > consts when given to extensions. Yes correct. The hook signature will turn into, so all extensions now just have a constant pointer to the jumblestate. They can of course work hard to cast away constants, but at that point, they are doing things the wrong way. ``` typedef void (*post_parse_analyze_hook_type) (ParseState *pstate, Query *query, const JumbleState *jstate); ``` This of course will be a big breaking change to all the extensions out there using this hook. This is not just about a signature change of the hook, but an author now has to figure out how to deal with a constant JumbleState in their code, which may not be trivial. My proposal in v3 was a bit more softer, and keeps the hook backwards compatible. Basically, we keep JumbleState a non-constant, but provide core APIs for any operation, such as generate_normalized_query= , that needs to modify the state. So, my approach was not about enforcing a read-only JumbleState, but about providing the API to dissuade an author from modifying a JumbleState. If we need a stronger API contract, then we can go with the hook receving JumbleState as a constant and implement the copy of clocations to return back to extensions that need to normalize a query. We also have to keep in mind that if there are future requirements where the state has to be modified by an extension, we will need to implement more copy functions for those field. [0] https://www.postgresql.org/message-id/aUXeTEpSymo6n7lF%40paquier.xyz -- Sami Imseih Amazon Web Services (AWS)