Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSJcW-0036f9-0b for pgsql-hackers@arkaria.postgresql.org; Wed, 27 May 2026 19:08:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wSJcT-009Ju2-2k for pgsql-hackers@arkaria.postgresql.org; Wed, 27 May 2026 19:08:46 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSJcT-009Jtt-1S for pgsql-hackers@lists.postgresql.org; Wed, 27 May 2026 19:08:46 +0000 Received: from relay7-d.mail.gandi.net ([2001:4b98:dc4:8::227]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wSJcS-00000001jp5-1DNd for pgsql-hackers@lists.postgresql.org; Wed, 27 May 2026 19:08:45 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 77CC83ED82; Wed, 27 May 2026 19:08:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1779908917; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tv3OWuT1k5h2ny1pVy5e6K1aJJ1y0X91a4NO6+bZ2zk=; b=OV7uQfBzk99z7BR3Vtd1orhihnRXgOjSVeAurssZrSeSxZItQT+T6NSC8rIg2jib1CNzSW 1xOA6TgZb4+N5roO0KGmR7uOSUMDFH43/snWzJ35nBiJcnasOEDcd8oufzzWUfxVeYkTSY 1tK51jLvFBAHplbuVYq4keNYJmitfamtsOgk4EvTx+vrwii832+eKWkKIb9AX4eSsp0kc2 72pJx3rPM9PgOAzix0/S4NzNVV5d5FN8pKUi8sADTI0hNX6JBHDBKEUCxcrakIh4xSK5v7 87a2zbggzK7R49QrF62NGcJkbgF0ZnjmSAIFB5JEpbRN9t2LnRvo76U+YaHMsw== Message-ID: <48aaff7c-5f37-44e1-9df0-859b168d8b1f@vondra.me> Date: Wed, 27 May 2026 21:08:34 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Is there value in having optimizer stats for joins/foreignkeys? To: Alexandra Wang , Tom Lane Cc: jian he , pgsql-hackers@lists.postgresql.org, Andrei Lepikhov , Corey Huinker , hs@cybertec.at, Jeff Davis References: <3c477f2f-10e4-4705-bb21-90ccbe67e9d2@gmail.com> <24247.1779395113@sss.pgh.pa.us> <910a4628-720a-4912-af8f-8b5a96a0b336@vondra.me> <2468544.1779721437@sss.pgh.pa.us> Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-GND-Sasl: tomas@vondra.me X-GND-Score: -100 X-GND-Cause: dmFkZTFbwJghjdiLiKuCflErY05xTD7yERBjqQUhmYSoCIPTvJcwj3FtvnJPYwWjcFgZWvhDLdERaZic6YPFMR0ZMIHnxFimHGsOjteiwMUcEdUe8Emm4gFQyTVtN432FGsitOwL83nU5r+rLCLZb2lK4urYoV97X3j6ywIQ8UfU40GAZRqsrOr/RJDzSuSaA/PDW6pdX+NLSHraNsNtPGULL0rvXtz+3/EsIt016ZuCEqxw1oJjR35m4VzlI3QHCTx7zl4bBRHbsm2FoiQ3F+SZgp1CNPriHuKBV0ZoFB0encKpjaKbFYIMkjdYWCskoXBjoOC+Vb1IuGi2SuoLMSFRHluBhP0I60GkvP/Ff4pN9OwxdzTB30Q5nKuRdlTa2dwxITA2nWt7AeifG8/gQ7CdS3Qcr3Jjsv7RfiMHTDmJHq+ya0gkNUcr6YRf2iqWHVQrOiz39rxuzZl8UmY4kCuArg/5+fOFwrX42I7mBkT4WSsZI4UrrNDQoCRQFuRoNHr0ZVeRuy2/Sw3esg444jZ1gNID+k7/0JxJtBu/x6SoXVY8UAEtxGJDQ4z8QRpCiI4xRbBgjvMH9cYARlCNEgX+3xUOGXiuybcGrLu/iI1yRfU5mtCcOrnvMlgWRVljhJvycYpGoT4BGo+jpVWbcGEFdphEYC3+P2Q8kDSc8VFaaCh9WQ X-GND-State: clean List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 5/27/26 19:49, Alexandra Wang wrote: > Hi Tom and Tomas, > > Thank you so much for the feedback! > > On Mon, May 25, 2026 at 8:04 AM Tom Lane > wrote: >> Tomas Vondra > writes: >> > On 5/21/26 22:25, Tom Lane wrote: >> >> I don't love stxkeyrefs[].  I wonder if it's time to throw away >> >> stxkeys[], represent all the target columns as regular expression >> >> trees in stxexprs, and then special-case columns that are simple >> >> Vars where appropriate at execution. >> >> (In the same vein, I dislike the grammar's separation of plain >> >> columns from expressions; I'd like to replace stats_params >> >> with expr_list and sort it all out later.  But perhaps that's >> >> material for a separate patch.) >> >> > FWIW the extended stats copied this from pg_index, which also stores >> > keys and expressions separately. I suppose there was a reason for that, >> > most likely performance - is cheaper to compare attnums than >> > expressions, and plain keys are much more common. >> >> I think I might be to blame for the separate storage of indexprs. >> If so, the motivation was to avoid breakage of older code that only >> knew about indkey[].  (Of course, such code would necessarily fail >> on indexes with expressions, but we wanted to avoid breakage for the >> common case of no-expressions.)  I don't think that consideration is >> nearly as pressing for extended stats.  There's probably a lot less >> client-side code that knows about extended stats at all, and what >> there is seems more likely to rely on the server-side display >> functions than to dig into the catalog details for itself.  Also, >> if there is anything that's looking at pg_statistic_ext details, >> it will need work anyway after this patch; there's no way around that. > > I'm working on removing stxkeys[] as a prerequisite commit before the > main join > stats patch, representing all target columns as Var nodes in stxexprs, > as you > both suggested. > > One question about the pg_stats_ext view: currently it has two complementary > columns: > > - attnames (name[]) — Names of the columns included in the statistics object > - exprs (text[]) — Expressions included in the statistics object > > With stxkeys gone from the catalog, should the view: > > (a) Stay as-is: keep attnames and exprs as separate columns with the same > semantics. Implemented via a helper function that extracts plain column > names > from the unified stxexprs. > > or > > (b) Mirror the catalog: remove attnames, make exprs show all entries (both > column names and expressions together in one text[] array). > > Any preference? > My 2c: AFAIR there's no fundamental reason to keep those two lists separate, other than that expressions were "bolted on" later, after we already had stats on plain attributes. In hindsight, it might have been better to just unify the view back then, probably. I personally would be OK with just unifying adjusting the view, and showing a single list (with both attributes and expressions). IIUC the plan is to just store a list of expressions anyway, with attributes represented as Vars. So the view would have to do more work just to produce the "old" output, with little benefit. The one argument against this that I can think of is possibly breaking tools that use this view. IIRC pg_dump is reading the view when exporting/importing the statistics. That might need some adjustments (and there's also pg_stats_ext_exprs), but maybe it's easier to keep the view consistent. Also, maybe this is one more argument against the "optimizer view" idea Corey mentioned in Vancouver last week? Because surely we'll want to include the join statistics in export/import, and for the view approach we'd need to invent a fair amount of code to do that. Maybe? regards -- Tomas Vondra