public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: Improve hash join's handling of tuples with null join keys
Date: Mon, 18 Aug 2025 10:48:08 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>



> On Aug 16, 2025, at 00:52, Tom Lane <[email protected]> wrote:
> 
> Chao Li <[email protected]> writes:
>> With this patch, “isnull” now becomes true because of the change of strict op. Then the outer null join key tuple must be stored in a tuplestore. When an outer table contains a lot of null join key tuples, then the tuplestore could bump to very large, in that case, it would be hard to say this patch really benefits.
> 
> What's your point?  If we don't divert those tuples into the
> tuplestore, then they will end up in the main hash table instead,
> and the consequences of bloat there are far worse.

I might not state clearly. For this comments, I meant the outer table. For example:

SELECT a.*, b.* from a RIGHT JOIN b on a.id = b.a_id;

Let’s say table a is used to build hash, table b is the outer table.

And say, table b has 1000 tuples whose a_id are NULL.

Before this patch, when fetching such a tuple (a_id is null) from table b, the tuple will be returned to parent node immediately. 

With this tuple, all of such tuples will be put into hj_NullOuterTupleStore, and only be returned after all non-null tuples are processed.

My comment was trying to say that if there are a lot of null join key tuples in outer table, then hj_NullOuterTupleStore might use a lot of memory or swap data to disk, which might lead to performance burden. So, I was thinking we could keep the original logic for outer table, and return null join key tuples immediately.


> 
>> Based on this patch, if we are doing a left join, and outer table is empty, then all tuples from the inner table should be returned. In that case, we can skip building a hash table, instead, we can put all inner table tuples into hashtable.innerNullTupleStore. Building a tuplestore should be cheaper than building a hash table, so this way makes a little bit more performance improvement.
> 
> I think that would make the logic completely unintelligible.  Also,
> a totally-empty input relation is not a common situation.  We try to
> optimize such cases when it's simple to do so, but we shouldn't let
> that drive the fundamental design.
> 

I absolutely agree we should not touch the fundamental design for the tiny optimization, that’s why I mentioned “based on this patch”.

With this patch, you have introduced a change in MultiExecPrivateHash():

		else if (node->keep_null_tuples)
		{
			/* null join key, but we must save tuple to be emitted later */
			if (node->null_tuple_store == NULL)
				node->null_tuple_store = ExecHashBuildNullTupleStore(hashtable);
			tuplestore_puttupleslot(node->null_tuple_store, slot);
		}

We can simply added a new flag to HashTable, say named skip_building_hash. Upon right join (join to the hash side), and outer table is empty, set the flag to true, then in the MultiExecPrivateHash(), if skip_building_hash is true, directly put all tuples into node->null_tuple_store without building a hash table.

Then in ExecHashJoinImpl(), after "(void) MultiExecProcNode()" is called, if hashtable->skip_building_hash is true, directly set node->hj_JoinState = HJ_FILL_INNER_NULL_TUPLES.

So, the tiny optimization is totally based on this patch, it depends on the HashTable.null_tuple_store (if you take this comment, then maybe rename this variable) and the new state HJ_FILL_INNER_NULL_TUPLES.

Best regards,
==
Chao Li (Evan)
--------------------
HighGo Software Co., Ltd.
https://www.highgo.com/





view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Improve hash join's handling of tuples with null join keys
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox