public inbox for [email protected]  
help / color / mirror / Atom feed
From: David Rowley <[email protected]>
To: Ilia Evdokimov <[email protected]>
Cc: Zsolt Parragi <[email protected]>
Cc: David Geier <[email protected]>
Cc: PostgreSQL Developers <[email protected]>
Subject: Re: Reduce planning time for large NOT IN lists containing NULL
Date: Tue, 3 Mar 2026 14:08:19 +1300
Message-ID: <CAApHDvphShGABn-3AoE36dTvGHW7gUpFSw0_ZZnH84wGCW3hHw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAN4CZFNJHro_pEVTHggBX43f+AebwKLZdp39+V4pV0EB4BZuZw@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAN4CZFOTAyVk9srBp560049Xu655xUDZv5NBTvj7RYND-xKymA@mail.gmail.com>
	<[email protected]>

On Tue, 3 Mar 2026 at 04:04, Ilia Evdokimov
<[email protected]> wrote:
> I've fixed this in v5-patch.

I had a look at this and wondered if we guarantee that no rows will
match, then why can't we perform constant folding on the
ScalarArrayOpExpr when !useOr and the array contains a NULL element
and the operator is strict. Seemingly, one of the reasons for that is
down to the expression returning NULL vs false. Take the following two
tests from expressions.out:

select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 2, null);
 ?column?
----------

(1 row)

select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1, null);
 ?column?
----------
 f
(1 row)

Here we see that we return false when we find the left operand in the
array, but NULL when we don't find it and the array contains NULL. So,
unless the left operand is a const, we wouldn't know how to simplify
the ScalarArrayOpExpr during planning as the false or NULL would only
be known during evaluation of the expression.

However, when the expression being simplified is an EXPRKIND_QUAL, it
shouldn't matter if the result is false or NULL as both mean the same
and there shouldn't be any code that cares about the difference.
Currently, we don't pass the "kind" down into
eval_const_expressions(), but I don't really see why we couldn't. It
would be a fair bit of work figuring out with confidence what the
extra arg should be passed as in all the existing call sites of that
function. We'd have to document in the header comment for
eval_const_expressions() that constant-folding on EXPRKIND_QUAL
expressions can enable additional optimisations which disregard the
difference between NULL and false.

For the patch, I imagine it's still a useful optimisation as the
ScalarArrayOpExpr might not be in an EXPRKIND_QUAL.

There are a couple of things I don't like:

1) The new test is in expressions.sql. The comment at the top of that
file reads: "expression evaluation tests that don't fit into a more
specific file". The new test isn't anything to do with expression
evaluation. It's about planner estimation. I see that
misc_function.sql has the explain_mask_costs() function. I'm not sure
that's the right place either, as the usages of that function are for
testing SupportRequestRows prosupport functions. I wonder if we need a
dedicated row_estimate.sql or selectivity_est.sql file. The
explain_mask_costs() wouldn't be out of place if they were moved into
a new test like that. It was me that started putting those in
misc_function.sql, and I don't object to them being moved to a new
test. I'd be as a separate commit, however.

2) The new test creates a new table and inserts 1000 rows. There does
not seem to be anything special about the new table. Why don't you use
one of the ones from test_setup.sql?

3) Looking at var_eq_const(), it seems like it's coded to assume the
operator is always strict, per "If the constant is NULL, assume
operator is strict and return zero". If that's good enough for
var_eq_const(), then it should be good enough for the new code. I
think it would be good if you wrote that or something similar in the
new code so that the reader knows taking the short-circuit with
non-strict functions is on purpose.

David





view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Reduce planning time for large NOT IN lists containing NULL
  In-Reply-To: <CAApHDvphShGABn-3AoE36dTvGHW7gUpFSw0_ZZnH84wGCW3hHw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox