From: Tom Lane <tgl@sss.pgh.pa.us>
To: "Dirschel, Steve" <steve.dirschel@thomsonreuters.com>
cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject: Re: Hash join and picking which result set to build the hash table with.
In-reply-to: <BL0PR03MB4001869EF0A4859686FDCF16FAEB2@BL0PR03MB4001.namprd03.prod.outlook.com>
References: <202405221100.fy66dsew2f52@alvherre.pgsql> <6806.1716408332@sss.pgh.pa.us> <BL0PR03MB4001869EF0A4859686FDCF16FAEB2@BL0PR03MB4001.namprd03.prod.outlook.com>
Comments: In-reply-to "Dirschel, Steve" <steve.dirschel@thomsonreuters.com>
	message dated "Wed, 22 May 2024 20:42:39 -0000"
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <14386.1716411233.1@sss.pgh.pa.us>
Date: Wed, 22 May 2024 16:53:53 -0400
Message-ID: <14387.1716411233@sss.pgh.pa.us>
Archived-At: <https://www.postgresql.org/message-id/14387.1716411233%40sss.pgh.pa.us>
Precedence: bulk

"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes:
> The query and execution plan are shown below.  My question is
> related to the result set the optimizer is choosing to build the
> hash table from.  My understanding is for a hash join you want to
> build the hash table out of the smaller result set.

That's *a* consideration, but not the only one.  We also consider
whether the hash key has a flat distribution; if it is too skewed,
we might find specific hash chains getting too long.

> When running some tests I forgot to create the PK on table
> docloc_test.  When the PK was not on the table the optimizer decided
> to create the hash table off the 1000 rows from collection.  But as
> soon as I put the PK on that table it then decides to use
> docloc_test to build the hash table.

I think that the presence of a unique index overrides the statistics
(or the lack of any) so that the planner knows that the column is
unique and thus safe to use as a hash key.  Now, it should have
known that anyway, unless maybe this is a freshly-built table that
auto-analyze hasn't gotten to yet?

			regards, tom lane