public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tom Lane <[email protected]>
To: Dirschel, Steve <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Hash join and picking which result set to build the hash table with.
Date: Wed, 22 May 2024 16:53:53 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <BL0PR03MB4001869EF0A4859686FDCF16FAEB2@BL0PR03MB4001.namprd03.prod.outlook.com>
References: <[email protected]>
	<[email protected]>
	<BL0PR03MB4001869EF0A4859686FDCF16FAEB2@BL0PR03MB4001.namprd03.prod.outlook.com>

"Dirschel, Steve" <[email protected]> writes:
> The query and execution plan are shown below.  My question is
> related to the result set the optimizer is choosing to build the
> hash table from.  My understanding is for a hash join you want to
> build the hash table out of the smaller result set.

That's *a* consideration, but not the only one.  We also consider
whether the hash key has a flat distribution; if it is too skewed,
we might find specific hash chains getting too long.

> When running some tests I forgot to create the PK on table
> docloc_test.  When the PK was not on the table the optimizer decided
> to create the hash table off the 1000 rows from collection.  But as
> soon as I put the PK on that table it then decides to use
> docloc_test to build the hash table.

I think that the presence of a unique index overrides the statistics
(or the lack of any) so that the planner knows that the column is
unique and thus safe to use as a hash key.  Now, it should have
known that anyway, unless maybe this is a freshly-built table that
auto-analyze hasn't gotten to yet?

			regards, tom lane






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Hash join and picking which result set to build the hash table with.
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox