Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Chengpeng Yan <[email protected]>
To: David Rowley <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: cca5507 <[email protected]>
Cc: Tom Lane <[email protected]>
Subject: Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
Date: Mon, 20 Apr 2026 06:17:47 +0000
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAApHDvr7K+Wu+ECgmZw057Mb1H6H87+ZWETHi28Ez_ALBKgmQA@mail.gmail.com>
References: <[email protected]>
	<CAApHDvoLFhtAFSwJdjsBcs9L4J4SzE09gkHLr9nJSGSj1CBaRw@mail.gmail.com>
	<CAApHDvr7K+Wu+ECgmZw057Mb1H6H87+ZWETHi28Ez_ALBKgmQA@mail.gmail.com>

> On Apr 20, 2026, at 11:46, David Rowley <[email protected]> wrote:
> 
> Of course, it is possible to make the strict function do that, and
> non-hashed IN / NOT IN handles it, so the hashed version shouldn't
> have an excuse to not do the right thing.
> 
> I've attached a version that "probes" the equality function for its
> NULL = NULL behaviour and its NULL = non-NULL behaviour and returns
> whatever the result of the probe was at the appropriate time.
> 
> What I came up with does feel quite elaborate, so I'd quite like a 2nd opinion.
> 
> The patch does assume that the non-strict function will return the
> same thing for NULL = non-NULL as it will for non-NULL = NULL.
> Technically, if you coded the function to do something different
> there, the hashed vs non-hashed could differ in their result. My
> thoughts there, if someone is expecting anything sane out of such an
> equality function, then they're probably going to be disappointed due
> to various other optimisations we have.

Hi,

Thanks for the discussion.

I agree with Tom's concern that it does not seem safe to generalize from
NULL = first-non-NULL to all non-NULL values. Unless I am missing one, I
do not know of a planner/executor-visible contract that would justify
that assumption.

For the original NULL-LHS bug, a linear fallback still seems like the
safest baseline fix to me. It is conservative, but it matches
ExecEvalScalarArrayOp() without adding extra assumptions. The obvious
downside is performance, although this path only triggers when the
runtime LHS is NULL and the comparator is non-strict. It may also be
possible to cache the NULL-LHS outcome once per expression, since the
RHS array is constant in the hashed SAOP case, which might help reduce
the cost of that fallback.

ChangAo's example also seems to expose a separate correctness issue. If
the comparator can return NULL even for non-NULL inputs, then a lookup
hit seems sufficient, but a miss is no longer enough to distinguish
FALSE for IN / TRUE for NOT IN from NULL.

A conservative fix there would again be a linear fallback after miss,
which should recover the right semantics, but that case does seem much
more performance-sensitive.

So I would be interested to hear what people think about both cases,
especially if there is a better way to preserve correctness without
paying the full linear-fallback cost.

--
Best regards,
Chengpeng Yan

view thread (11+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox