Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmaUK-00Dtpp-2t for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Feb 2026 16:39:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vmaUI-00ADgt-2r for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Feb 2026 16:39:51 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vmaUI-00ADgl-1p for pgsql-hackers@lists.postgresql.org; Sun, 01 Feb 2026 16:39:51 +0000 Received: from relay5-d.mail.gandi.net ([2001:4b98:dc4:8::225]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vmaUG-00000000XHX-1Kjs for pgsql-hackers@lists.postgresql.org; Sun, 01 Feb 2026 16:39:50 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id A5E6743A32; Sun, 1 Feb 2026 16:39:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1769963982; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=h6UQL0kQ1EQ6A2DQEOjnCM2F/6OmJwAj3W72D8qfU8s=; b=DfBzUqTOqDN7OnRqn1JRkdbV786Q9UbWMTQCJlszgXqWEeWiqHzNxrkDRMJbtTh7XWa3yG be09VOm5GqzthO9v6ATTi/oP8Q8uMHhl160Sy94e6YbeLz/f2GrYzoKrtpooC2epGTojks RIGNXY/dt7QLq69ZLDB4MYbKswySXoVHrgfvrpM7DYosa4Gu7Pxt1mdHq4jlpywQI3QxWR 8a9TpBREjtKR2PPDv+LaUwT6WHk18tLPWGGlNuJQPXphu8Kw4QOCu1w+ykjkASPdOnTzue iVpn6j6gGqFTOxTeEu41MDlMGbDOB8GtVv8P0jQfii7Yst+70Apu7xRJoGCd1Q== Message-ID: Date: Sun, 1 Feb 2026 17:39:38 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Is there value in having optimizer stats for joins/foreignkeys? To: Andrei Lepikhov , Alexandra Wang , Corey Huinker Cc: Tom Lane , pgsql-hackers@lists.postgresql.org, hs@cybertec.at, Jeff Davis References: <246035.1764627115@sss.pgh.pa.us> <8df3d212-5d60-4e30-9606-d8849f7d37ae@gmail.com> Content-Language: en-US From: Tomas Vondra In-Reply-To: <8df3d212-5d60-4e30-9606-d8849f7d37ae@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-GND-Sasl: tomas@vondra.me X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddujeehvdelucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfevfhfhjggtgfesthejredttddvjeenucfhrhhomhepvfhomhgrshcugghonhgurhgruceothhomhgrshesvhhonhgurhgrrdhmvgeqnecuggftrfgrthhtvghrnhepledugeeikefglefhgfffuedvleetteevgefhvdeikeefudduuddvhfevudefhfevnecukfhppedvudefrddvgeeirddvfeekrddukedunecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepvddufedrvdegiedrvdefkedrudekuddphhgvlhhopegluddtrddufeejrddtrddukegnpdhmrghilhhfrhhomhepthhomhgrshesvhhonhgurhgrrdhmvgdpqhhiugepteehgfeijeegfeetfedvpdhmohguvgepshhmthhpohhuthdpnhgspghrtghpthhtohepjedprhgtphhtthhopehlvghpihhhohhvsehgmhgrihhlrdgtohhmpdhrtghpthhtoheprghlvgigrghnughrrgifrghnghhoshhssehgmhgrihhlrdgtohhmpdhrtghpthhtoheptghorhgvhihhuhhinhhkvghrsehgmhgrihhlrdgtohhmpdhrtghpthhtohepthhglhesshhsshdrphhghhdrphgrrdhushdprhgtphhtthhopehpghhsqhhlqdhhrggtkhgvrhhss ehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrghdprhgtphhtthhopehhshestgihsggvrhhtvggtrdgrth X-GND-State: clean List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 1/31/26 12:18, Andrei Lepikhov wrote: > On 29/1/26 06:04, Alexandra Wang wrote: >> Hi hackers, >> >> As promised in my previous email, I'm sharing a proof-of-concept patch >> exploring join statistics for correlated columns across relations. >> This is a POC at this point, but I hope the performance numbers below >> give a better idea of both the potential usefulness of join statistics >> and the complexity of implementing them. > I wonder why you chose the JOIN operator only? > > It seems to me that any relational operator produces relational output > that can be treated as a table. The extended statistics code may be > adopted to such relations. > I think it may be a VIEW that you can declare (manually or > automatically) and allow Postgres to build statistics on this 'virtual' > table. So, the main focus may shift to the question: how to provably > match a query subtree to a specific statistic. > Because for each "supported" operator we need to know two things: (1) how to sample it efficiently (2) how to apply it in selectivity estimation We can't add support for everything at once, and for some cases we may not even know answers to (1) and/or (2). We can't simply store an opaque VIEW, and build the stats by simply executing it (and sampling the results). The whole premise of extended stats is that people define them to fix incorrect estimates. And with incorrect estimates the plan may be terrible, and the VIEW may not even complete. regards -- Tomas Vondra