Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uo7Xn-00GawZ-Sr for pgsql-hackers@arkaria.postgresql.org; Mon, 18 Aug 2025 21:37:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uo7Xl-002W4d-Pk for pgsql-hackers@arkaria.postgresql.org; Mon, 18 Aug 2025 21:37:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uo7Xl-002W4U-GW for pgsql-hackers@lists.postgresql.org; Mon, 18 Aug 2025 21:37:30 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uo7Xj-000XIW-1j for pgsql-hackers@lists.postgresql.org; Mon, 18 Aug 2025 21:37:29 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 57ILbQle463297; Mon, 18 Aug 2025 17:37:26 -0400 From: Tom Lane To: Chao Li cc: pgsql-hackers@lists.postgresql.org Subject: Re: Improve hash join's handling of tuples with null join keys In-reply-to: <5C5EE031-F086-4353-A17A-DA563CD24DDD@gmail.com> References: <3061845.1746486714@sss.pgh.pa.us> <496221.1748882849@sss.pgh.pa.us> <175507656113.993.1381684440543440253.pgcf@coridan.postgresql.org> <544D7C83-CECE-44E7-B5D7-530E9318D231@gmail.com> <616751.1755276726@sss.pgh.pa.us> <5C5EE031-F086-4353-A17A-DA563CD24DDD@gmail.com> Comments: In-reply-to Chao Li message dated "Mon, 18 Aug 2025 10:48:08 +0800" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <463295.1755553046.1@sss.pgh.pa.us> Content-Transfer-Encoding: quoted-printable Date: Mon, 18 Aug 2025 17:37:26 -0400 Message-ID: <463296.1755553046@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Chao Li writes: > My comment was trying to say that if there are a lot of null join key tu= ples in outer table, then hj_NullOuterTupleStore might use a lot of memory= or swap data to disk, which might lead to performance burden. So, I was t= hinking we could keep the original logic for outer table, and return null = join key tuples immediately. I don't think that works for the parallel-hash-join case, at least not for the multi-batch code path. That path insists on putting every potentially-outputtable tuple into some batch's shared tuplestore, cf ExecParallelHashJoinPartitionOuter. We can make that function put the tuple into a different tuplestore instead, but I think it's quite unreasonable to think of returning the tuple immediately from there. It certainly wouldn't be "keeping the original logic". Yeah, we could make multi-batch PHJ do this differently from the other cases, but I don't want to go there: too much complication and risk of bugs for what is a purely hypothetical performance issue. Besides which, if the join is large enough to be worth worrying over, it's most likely taking that code path anyhow. > We can simply added a new flag to HashTable, say named skip_building_has= h. Upon right join (join to the hash side), and outer table is empty, set = the flag to true, then in the MultiExecPrivateHash(), if skip_building_has= h is true, directly put all tuples into node->null_tuple_store without bui= lding a hash table. > Then in ExecHashJoinImpl(), after "(void) MultiExecProcNode()" is called= , if hashtable->skip_building_hash is true, directly set node->hj_JoinStat= e =3D HJ_FILL_INNER_NULL_TUPLES. I'm not excited about this idea either. It's completely abusing the data structure, because the "null_tuple_store" is now being used for tuples that (probably) don't have null join keys. The fact that you could cram it in with not very many lines of code does not mean that the result will be understandable or maintainable --- and certainly, hash join is on the hairy edge of being too complicated already. regards, tom lane