Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uC6dp-00AV6L-FZ for pgsql-hackers@arkaria.postgresql.org; Tue, 06 May 2025 00:58:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uC6dn-006GRl-KY for pgsql-hackers@arkaria.postgresql.org; Tue, 06 May 2025 00:58:35 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uC6dn-006GRc-B8 for pgsql-hackers@lists.postgresql.org; Tue, 06 May 2025 00:58:35 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uC6dk-000MZ5-11 for pgsql-hackers@lists.postgresql.org; Tue, 06 May 2025 00:58:34 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 5460wVUj3073504; Mon, 5 May 2025 20:58:31 -0400 From: Tom Lane To: Tomas Vondra cc: pgsql-hackers@lists.postgresql.org Subject: Re: Improve hash join's handling of tuples with null join keys In-reply-to: References: <3061845.1746486714@sss.pgh.pa.us> Comments: In-reply-to Tomas Vondra message dated "Tue, 06 May 2025 02:12:09 +0200" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3073502.1746493111.1@sss.pgh.pa.us> Date: Mon, 05 May 2025 20:58:31 -0400 Message-ID: <3073503.1746493111@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Tomas Vondra writes: > My personal experience is that the growEnabled heuristics is overly > sensitive, and probably does not trigger very often. Yeah, it would be good to make it not quite all-or-nothing. > But more importantly, wasn't the issue discussed in [1] about parallel > hash joins? I'm not clear on that either; it seemed that the OP was able to trigger it in some non-parallel cases too. But we don't have a reproducer so I can't say for sure. Building a reproducer would be a useful exercise for testing this. There might well be some parallel-specific misbehavior that would be worth ameliorating independently of this work, in case of a lot of non-null duplicate keys. >> This passes check-world, and I've extended a couple of existing test >> cases to ensure that the new code paths are exercised. I've not done >> any real performance testing, though. > Are you planning to? If not, I can try to collect some numbers, but I > can't promise that before pgconf.dev. If you have time after the conference, please feel free. > BTW do you consider this to be a bugfix for PG18? Or would it have to > wait for PG19 at this point? This has been like this forever I suspect --- certainly for as long as we've had PHJ, and probably longer. So I'm seeing it as new work for v19, not something we'd attempt to back-patch. regards, tom lane