Re: Improve hash join's handling of tuples with null join keys

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tom Lane <[email protected]>
To: Chao Li <[email protected]>
Cc: [email protected]
Subject: Re: Improve hash join's handling of tuples with null join keys
Date: Mon, 18 Aug 2025 17:37:26 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

Chao Li <[email protected]> writes:
> My comment was trying to say that if there are a lot of null join key tuples in outer table, then hj_NullOuterTupleStore might use a lot of memory or swap data to disk, which might lead to performance burden. So, I was thinking we could keep the original logic for outer table, and return null join key tuples immediately.

I don't think that works for the parallel-hash-join case, at least not
for the multi-batch code path.  That path insists on putting every
potentially-outputtable tuple into some batch's shared tuplestore, cf
ExecParallelHashJoinPartitionOuter.  We can make that function put
the tuple into a different tuplestore instead, but I think it's quite
unreasonable to think of returning the tuple immediately from there.
It certainly wouldn't be "keeping the original logic".

Yeah, we could make multi-batch PHJ do this differently from the other
cases, but I don't want to go there: too much complication and risk of
bugs for what is a purely hypothetical performance issue.  Besides
which, if the join is large enough to be worth worrying over, it's
most likely taking that code path anyhow.

> We can simply added a new flag to HashTable, say named skip_building_hash. Upon right join (join to the hash side), and outer table is empty, set the flag to true, then in the MultiExecPrivateHash(), if skip_building_hash is true, directly put all tuples into node->null_tuple_store without building a hash table.
> Then in ExecHashJoinImpl(), after "(void) MultiExecProcNode()" is called, if hashtable->skip_building_hash is true, directly set node->hj_JoinState = HJ_FILL_INNER_NULL_TUPLES.

I'm not excited about this idea either.  It's completely abusing the
data structure, because the "null_tuple_store" is now being used for
tuples that (probably) don't have null join keys.  The fact that you
could cram it in with not very many lines of code does not mean that
the result will be understandable or maintainable --- and certainly,
hash join is on the hairy edge of being too complicated already.

			regards, tom lane

view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Improve hash join's handling of tuples with null join keys
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox