Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umNpO-00Ej3B-3L for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 02:36:30 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umNpL-002H9x-Uk for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 02:36:28 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umNpL-002H9p-Hz for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 02:36:27 +0000 Received: from mail-pf1-x433.google.com ([2607:f8b0:4864:20::433]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1umNpJ-000eHr-0V for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 02:36:27 +0000 Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-76e2e8e54f8so441267b3a.1 for ; Wed, 13 Aug 2025 19:36:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755138982; x=1755743782; darn=lists.postgresql.org; h=message-id:in-reply-to:to:references:date:subject:mime-version:from :from:to:cc:subject:date:message-id:reply-to; bh=O91xy1TqREmGtrpDyMHd3x9Fg6KJ7e57z8wA8+nSVi4=; b=mRC8ucfJ8M/BSXVU4H2Z6WVXbUmyo3cM/0+14vMBrnIxOSfDXPCK9P/gl+f9F1rWne 52moTEqTM/yQfg0UjBucKL4sUFaZoPQrSyu4iekbZsP4tZ7XzR9YjKZhc8DVvudcRduH KZIUHMXvEQv6Nch1rQR5SINcYeYQ8gu23TIaQ1d9MZdarQ5LAuPkXI9/vwsGF3CB5KTe c1OMMZX8f+9t8ifdcr7ZNo4VuL78yq1aGgBBjxgnIuX9XiBFR8GY8JBTV1lNSZ2MwWPU UAmBQ7kqOubLw7vHEaDTblsyeW6O4Hb4Xr+wbGNEvTYt+fqul2h5kRfzSWS1gRAD4Z9W xhsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755138982; x=1755743782; h=message-id:in-reply-to:to:references:date:subject:mime-version:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O91xy1TqREmGtrpDyMHd3x9Fg6KJ7e57z8wA8+nSVi4=; b=wYWNyF/hLmhK3r5U3BH0u6phCJnnQiMt7Wd1NQCQAlWeODhfliAXj0QhMgen7wtlLK NVhu/5C7rGrDhayD3VPEWivi7jRZngILWUoEmDtOaoE6wBQBjcyW31AGGAD9PnNnBwAH xXcog6lvz6ucPLeOf6sSdodgMtihf9Ps62oWCz5VkcjK+NeJufw+BVndPt7LueoHji9r qrJNNDvP85jRQz+N7w47t24zx0HfYvW9d5d0CFJpQicLOCAfdo/tUSy4PrzxyGrKrDP5 O7gN8hExRMhYpg+Lh0YBzyCdLa9qCClrwmtxwPufYVd2Sm003tn6cxbA2Rm1nEAkDexf ulGQ== X-Gm-Message-State: AOJu0Yy8gYV/k1N+Ieiaqp9qh9nNj5R2L7LSqWtPxv8ok9A200WdhcG5 bHfLAtM3sN8SnNJ2rGmZNpk1GDkvIO/fqctk+ccbM52dqKC40Pnr+IB9aZoiiz9YoQo= X-Gm-Gg: ASbGncveKaQkw2WiyuLl+FSD2CZSVt2kqJO9d1/0S4HO9eGylEGxFRMjv2bVlK2s1lt 1veNSS2GNEk0qhC9EJpI5P1GDl1TKHHiV/4cfPg0UF/xOBTYG1fMrV9p6xC9Z43CidcDv89ozxX G3dZYxWRwqZd+Hh9i6ZEnHpkd+vTy2NN4DfKT0C2VdKM3OPeF6YwtMCCIq+ARpBYj/j1Ra6ezD0 BwWFG3huJ/cTchqzfaQ20m3GFVx8FCHFhfLVG1gQrjSc4C8ED8jMq2EBxo9tq05SKFWOCw3mUzY F6A3nOy9uK/eZEdIWmWdlBYpWP4X4PMQmDGeG08f77wtrvOQeUBOjqEuwI7jcm/rEcCOiMEhb+T Ay7QTEaQoPWg826t/n05LCr79wHRiQe1/ X-Google-Smtp-Source: AGHT+IE5MlJm2cY/PSUknHOyVKWPFrdmxkd3uI44Da4ODWxhHg4pw6E3sksvnfEbnoD76Db3FCr3fQ== X-Received: by 2002:aa7:8886:0:b0:76b:d7e7:f1de with SMTP id d2e1a72fcca58-76e2fb10af2mr1907493b3a.17.1755138982055; Wed, 13 Aug 2025 19:36:22 -0700 (PDT) Received: from smtpclient.apple ([142.171.105.12]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-76bfcb26905sm26290666b3a.123.2025.08.13.19.36.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 Aug 2025 19:36:21 -0700 (PDT) From: Chao Li Content-Type: multipart/alternative; boundary="Apple-Mail=_F9608BD3-8DF6-4BCC-B312-3AD2981559D5" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: Improve hash join's handling of tuples with null join keys Date: Thu, 14 Aug 2025 10:36:06 +0800 References: <3061845.1746486714@sss.pgh.pa.us> <496221.1748882849@sss.pgh.pa.us> <175507656113.993.1381684440543440253.pgcf@coridan.postgresql.org> To: pgsql-hackers@lists.postgresql.org, Tom Lane In-Reply-To: <175507656113.993.1381684440543440253.pgcf@coridan.postgresql.org> Message-Id: <544D7C83-CECE-44E7-B5D7-530E9318D231@gmail.com> X-Mailer: Apple Mail (2.3826.700.81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_F9608BD3-8DF6-4BCC-B312-3AD2981559D5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Aug 13, 2025, at 17:16, Chao Li wrote: >=20 > I downloaded the patch and tested all join types: inner, left, right, = full, semi and anti. Basically my tests all passed. However, I didn't = test any case of parallel query. >=20 > I have two nit comments: >=20 > 1. In hashjoin.h, line 76-78, the added comment says "(In the unlikely = but supported case of a non-strict join operator, we treat null keys as = normal data.)". But I don't see where non-strict join is handled. So, = how this patch impact non-strict joins? >=20 I take back this comment, and I get a new comment related. @@ -1015,11 +1144,19 @@ ExecHashJoinOuterGetTuple(PlanState *outerNode, if (!isnull) { + /* normal case with a non-null join key = */ /* remember outer relation is not empty = for possible rescan */ hjstate->hj_OuterNotEmpty =3D true; return slot; } + else if (hjstate->hj_KeepNullTuples) + { + /* null join key, but we must save tuple = to be emitted later */ + if (hjstate->hj_NullOuterTupleStore =3D=3D= NULL) + hjstate->hj_NullOuterTupleStore = =3D ExecHashBuildNullTupleStore(hashtable); + = tuplestore_puttupleslot(hjstate->hj_NullOuterTupleStore, slot); + } When an outer tuple contains null join key, without this patch, = =E2=80=9Cisnull=E2=80=9D flag is false, so the tuple will still be = returned, and for an outer join, the tuple slot will be returned to = parent node immediately. With this patch, =E2=80=9Cisnull=E2=80=9D now becomes true because of = the change of strict op. Then the outer null join key tuple must be = stored in a tuplestore. When an outer table contains a lot of null join = key tuples, then the tuplestore could bump to very large, in that case, = it would be hard to say this patch really benefits. I am think that, can we only do the first half of this patch? Only = putting inner table=E2=80=99s null join key tuple into a tuplestore. So = that inner hash table=E2=80=99s performance gets improved, and outer = table=E2=80=99s logic keeps the same, then overall this patch makes a = pure improvement without the potential memory burden from = outerNullTupleStore. =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94 I also got an idea for improving the hash logic. /* * If the outer relation is completely = empty, and it's not * right/right-anti/full join, we can = quit without building * the hash table. However, for an = inner join it is only a * win to check this when the outer = relation's startup cost is * less than the projected cost of = building the hash table. * Otherwise it's best to build the hash = table first and see * if the inner relation is empty. = (When it's a left join, we * should always make this check, since = we aren't going to be * able to skip the join on the strength = of an empty inner * relation anyway.) */ if (HJ_FILL_INNER(node)) { /* no chance to not build the = hash table */ node->hj_FirstOuterTupleSlot =3D = NULL; } Based on this patch, if we are doing a left join, and outer table is = empty, then all tuples from the inner table should be returned. In that = case, we can skip building a hash table, instead, we can put all inner = table tuples into hashtable.innerNullTupleStore. Building a tuplestore = should be cheaper than building a hash table, so this way makes a little = bit more performance improvement. Regards, Chao Li (Evan) -------------------- HighGo Software Co., Ltd. https://www.highgo.com/ --Apple-Mail=_F9608BD3-8DF6-4BCC-B312-3AD2981559D5 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Aug 13, 2025, at 17:16, Chao Li = <li.evan.chao@gmail.com> wrote:

I downloaded the patch and = tested all join types: inner, left, right, full, semi and anti. = Basically my tests all passed. However, I didn't test any case of = parallel query.

I have two nit comments:

1. In hashjoin.h, = line 76-78, the added comment says "(In the unlikely but supported case = of a non-strict join operator, we treat null keys as normal data.)". But = I don't see where non-strict join is handled. So, how this patch impact = non-strict joins?


I take = back this comment, and I get a new comment = related.

@@ -1015,11 +1144,19 @@ = ExecHashJoinOuterGetTuple(PlanState = *outerNode,

          =               if = (!isnull)
              =           {
+       =                     =     /* normal case with a non-null join key = */
                =                 /* remember = outer relation is not empty for possible rescan */
  =                     =           hjstate->hj_OuterNotEmpty =3D = true;

            =                     = return slot;
              =           }
+       =                 else if = (hjstate->hj_KeepNullTuples)
+         =               {
+   =                     =         /* null join key, but we must save tuple to = be emitted later */
+           =                     if = (hjstate->hj_NullOuterTupleStore =3D=3D NULL)
+   =                     =                 = hjstate->hj_NullOuterTupleStore =3D = ExecHashBuildNullTupleStore(hashtable);
+       =                     =     = tuplestore_puttupleslot(hjstate->hj_NullOuterTupleStore, = slot);
+               =         }

When an outer = tuple contains null join key, without this patch, =E2=80=9Cisnull=E2=80=9D= flag is false, so the tuple will still be returned, and for an outer = join, the tuple slot will be returned to parent node = immediately.

With this patch, =E2=80=9Cisnull=E2=80= =9D now becomes true because of the change of strict op. Then the outer = null join key tuple must be stored in a tuplestore. When an outer table = contains a lot of null join key tuples, then the tuplestore could bump = to very large, in that case, it would be hard to say this patch really = benefits.

I am think that, can we only do the = first half of this patch? Only putting inner table=E2=80=99s null join = key tuple into a tuplestore. So that inner hash table=E2=80=99s = performance gets improved, and outer table=E2=80=99s logic keeps the = same, then overall this patch makes a pure improvement without the = potential memory burden from = outerNullTupleStore.

=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94

I also got an idea for improving the hash = logic.

= /*
* If the = outer relation is completely empty, and it's not
= * right/right-anti/full join, we can quit without = building
* the = hash table.  However, for an inner join it is only = a
= * win to check this when the outer relation's = startup cost is
* less = than the projected cost of building the hash table.
= * Otherwise it's best to build the hash table first and = see
= * if the inner relation is empty.  (When = it's a left join, we
* should = always make this check, since we aren't going to be
= * able to skip the join on the strength of an empty = inner
= * relation anyway.)
= */
if = (HJ_FILL_INNER(node))
= {
= /* no chance to not build the hash table */
= node->hj_FirstOuterTupleSlot =3D NULL;
= }

Based on this patch, if we are = doing a left join, and outer table is empty, then all tuples from the = inner table should be returned. In that case, we can skip building a = hash table, instead, we can put all inner table tuples into = hashtable.innerNullTupleStore. Building a tuplestore should be cheaper = than building a hash table, so this way makes a little bit more = performance = improvement.

Regards,

Chao Li (Evan)
--------------------
HighGo Software Co., = Ltd.
https://www.highgo.com/



= --Apple-Mail=_F9608BD3-8DF6-4BCC-B312-3AD2981559D5--