public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrian Garcia Badaracco <[email protected]>
To: Tom Lane <[email protected]>
Cc: David G. Johnston <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Wrapping a where clause to preserve rows with nulls
Date: Thu, 19 Dec 2024 00:47:50 -0600
Message-ID: <CAE8z92Fb1cUH2Dh_U7OSuwR3MAmPcsEvt2D7eLLmfS9DcJMazA@mail.gmail.com> (raw)
In-Reply-To: <CAE8z92FTVnCfbS54F01st0QxeLMsgt1mcafnQAW94h-=6-sZ4g@mail.gmail.com>
References: <CAE8z92GN0-5J=4q6RSsPGk_OP-Y_EHVNgQimTWwcq+BE+HGqRQ@mail.gmail.com>
	<CAKFQuwa2jcBnsZxHmNn7_1MjcmOZx_fU93epBx9+orw3yMNgKw@mail.gmail.com>
	<[email protected]>
	<CAE8z92FTVnCfbS54F01st0QxeLMsgt1mcafnQAW94h-=6-sZ4g@mail.gmail.com>

Well, there is a wrinkle: if the predicate returns `false` but one of the
columns is null then the whole thing ends up `true` when I'd want it to be
`false`. Say col_a = [1] and col_b = [null]:

WHERE (col_a < 1 AND col_b > 1) OR col_a IS NULL OR col_b IS NULL -> WHERE
(false AND null) OR false OR true -> WHERE false OR false OR true -> true.

That's still a pretty good solution for now.

On Wed, Dec 18, 2024 at 10:41 PM Adrian Garcia Badaracco <
[email protected]> wrote:

> Thank you for the great idea Tom. While yes I can't modify the original
> WHERE clause I do think I'll be able to introspect it or get the system
> generating it to tell me which columns it references and then add an OR x
> is NULL OR y is NULL ...
>
> For context, just in case it's interesting, I store Parquet statistics in
> a Postgres table and run the output of this thing on them:
> https://github.com/apache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff8e335ec/datafusion/physic...
> Hence why I can't really control the WHERE clause (at least not without
> re-implementing a bunch of finicky error prone code).
>
> On Wed, Dec 18, 2024 at 10:38 PM Tom Lane <[email protected]> wrote:
>
>> "David G. Johnston" <[email protected]> writes:
>> > On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
>> > [email protected]> wrote:
>> >> Is there any way to include the rows where the predicate evaluates to
>> null
>> >> while still using an index?
>>
>> > ... A btree index, which handles =, can’t be told to behave
>> > differently and so cannot fulfill your desire to produce rows where the
>> > stored value is null; it can only produce those equal to 5000.
>>
>> Not in a single scan, no.  But multiple scans are possible:
>>
>> regression=# create table t (id int unique);
>> CREATE TABLE
>> regression=# explain select * from t where id = 5000 or id is null;
>>                                   QUERY PLAN
>>
>>
>> ------------------------------------------------------------------------------
>>  Bitmap Heap Scan on t  (cost=8.42..18.98 rows=14 width=4)
>>    Recheck Cond: ((id IS NULL) OR (id = 5000))
>>    ->  BitmapOr  (cost=8.42..8.42 rows=14 width=0)
>>          ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.25 rows=13
>> width=0)
>>                Index Cond: (id IS NULL)
>>          ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.16 rows=1
>> width=0)
>>                Index Cond: (id = 5000)
>> (7 rows)
>>
>> The OP was quite unclear about what semantics he wants for
>> multiple-variable WHERE clauses, but maybe something like this
>> would work:
>>
>> WHERE (original-clause) OR x IS NULL OR y IS NULL OR ...
>>
>> where each variable mentioned in original-clause is allowed
>> to also be NULL.  Or perhaps what is wanted is
>>
>> WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...)
>>
>> ??
>>
>>                         regards, tom lane
>>
>


view thread (4+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Wrapping a where clause to preserve rows with nulls
  In-Reply-To: <CAE8z92Fb1cUH2Dh_U7OSuwR3MAmPcsEvt2D7eLLmfS9DcJMazA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox