public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Wrapping a where clause to preserve rows with nulls
4+ messages / 3 participants
[nested] [flat]

* Re: Wrapping a where clause to preserve rows with nulls
@ 2024-12-19 04:14  David G. Johnston <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: David G. Johnston @ 2024-12-19 04:14 UTC (permalink / raw)
  To: Adrian Garcia Badaracco <[email protected]>; +Cc: pgsql-general

On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
[email protected]> wrote:

>
> Is there any way to include the rows where the predicate evaluates to null
> while still using an index?
>

That seems quite unlikely.  Your definition of equality is incompatible
with the system’s standard definition while requiring that the standard
equals operator be used in the query.  Null values and non-null values are
not considered equal, or, put another way, always considered distinct from
each other.  A btree index, which handles =, can’t be told to behave
differently and so cannot fulfill your desire to produce rows where the
stored value is null; it can only produce those equal to 5000.

David J.


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Wrapping a where clause to preserve rows with nulls
@ 2024-12-19 04:38  Tom Lane <[email protected]>
  parent: David G. Johnston <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Tom Lane @ 2024-12-19 04:38 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Adrian Garcia Badaracco <[email protected]>; pgsql-general

"David G. Johnston" <[email protected]> writes:
> On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
> [email protected]> wrote:
>> Is there any way to include the rows where the predicate evaluates to null
>> while still using an index?

> ... A btree index, which handles =, can’t be told to behave
> differently and so cannot fulfill your desire to produce rows where the
> stored value is null; it can only produce those equal to 5000.

Not in a single scan, no.  But multiple scans are possible:

regression=# create table t (id int unique);
CREATE TABLE
regression=# explain select * from t where id = 5000 or id is null;
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=8.42..18.98 rows=14 width=4)
   Recheck Cond: ((id IS NULL) OR (id = 5000))
   ->  BitmapOr  (cost=8.42..8.42 rows=14 width=0)
         ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.25 rows=13 width=0)
               Index Cond: (id IS NULL)
         ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.16 rows=1 width=0)
               Index Cond: (id = 5000)
(7 rows)

The OP was quite unclear about what semantics he wants for
multiple-variable WHERE clauses, but maybe something like this
would work:

WHERE (original-clause) OR x IS NULL OR y IS NULL OR ...

where each variable mentioned in original-clause is allowed
to also be NULL.  Or perhaps what is wanted is

WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...)

??

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Wrapping a where clause to preserve rows with nulls
@ 2024-12-19 04:41  Adrian Garcia Badaracco <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Adrian Garcia Badaracco @ 2024-12-19 04:41 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-general

Thank you for the great idea Tom. While yes I can't modify the original
WHERE clause I do think I'll be able to introspect it or get the system
generating it to tell me which columns it references and then add an OR x
is NULL OR y is NULL ...

For context, just in case it's interesting, I store Parquet statistics in a
Postgres table and run the output of this thing on them:
https://github.com/apache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff8e335ec/datafusion/physic...
Hence why I can't really control the WHERE clause (at least not without
re-implementing a bunch of finicky error prone code).

On Wed, Dec 18, 2024 at 10:38 PM Tom Lane <[email protected]> wrote:

> "David G. Johnston" <[email protected]> writes:
> > On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
> > [email protected]> wrote:
> >> Is there any way to include the rows where the predicate evaluates to
> null
> >> while still using an index?
>
> > ... A btree index, which handles =, can’t be told to behave
> > differently and so cannot fulfill your desire to produce rows where the
> > stored value is null; it can only produce those equal to 5000.
>
> Not in a single scan, no.  But multiple scans are possible:
>
> regression=# create table t (id int unique);
> CREATE TABLE
> regression=# explain select * from t where id = 5000 or id is null;
>                                   QUERY PLAN
>
>
> ------------------------------------------------------------------------------
>  Bitmap Heap Scan on t  (cost=8.42..18.98 rows=14 width=4)
>    Recheck Cond: ((id IS NULL) OR (id = 5000))
>    ->  BitmapOr  (cost=8.42..8.42 rows=14 width=0)
>          ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.25 rows=13
> width=0)
>                Index Cond: (id IS NULL)
>          ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.16 rows=1
> width=0)
>                Index Cond: (id = 5000)
> (7 rows)
>
> The OP was quite unclear about what semantics he wants for
> multiple-variable WHERE clauses, but maybe something like this
> would work:
>
> WHERE (original-clause) OR x IS NULL OR y IS NULL OR ...
>
> where each variable mentioned in original-clause is allowed
> to also be NULL.  Or perhaps what is wanted is
>
> WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...)
>
> ??
>
>                         regards, tom lane
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Wrapping a where clause to preserve rows with nulls
@ 2024-12-19 06:47  Adrian Garcia Badaracco <[email protected]>
  parent: Adrian Garcia Badaracco <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Adrian Garcia Badaracco @ 2024-12-19 06:47 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-general

Well, there is a wrinkle: if the predicate returns `false` but one of the
columns is null then the whole thing ends up `true` when I'd want it to be
`false`. Say col_a = [1] and col_b = [null]:

WHERE (col_a < 1 AND col_b > 1) OR col_a IS NULL OR col_b IS NULL -> WHERE
(false AND null) OR false OR true -> WHERE false OR false OR true -> true.

That's still a pretty good solution for now.

On Wed, Dec 18, 2024 at 10:41 PM Adrian Garcia Badaracco <
[email protected]> wrote:

> Thank you for the great idea Tom. While yes I can't modify the original
> WHERE clause I do think I'll be able to introspect it or get the system
> generating it to tell me which columns it references and then add an OR x
> is NULL OR y is NULL ...
>
> For context, just in case it's interesting, I store Parquet statistics in
> a Postgres table and run the output of this thing on them:
> https://github.com/apache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff8e335ec/datafusion/physic...
> Hence why I can't really control the WHERE clause (at least not without
> re-implementing a bunch of finicky error prone code).
>
> On Wed, Dec 18, 2024 at 10:38 PM Tom Lane <[email protected]> wrote:
>
>> "David G. Johnston" <[email protected]> writes:
>> > On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
>> > [email protected]> wrote:
>> >> Is there any way to include the rows where the predicate evaluates to
>> null
>> >> while still using an index?
>>
>> > ... A btree index, which handles =, can’t be told to behave
>> > differently and so cannot fulfill your desire to produce rows where the
>> > stored value is null; it can only produce those equal to 5000.
>>
>> Not in a single scan, no.  But multiple scans are possible:
>>
>> regression=# create table t (id int unique);
>> CREATE TABLE
>> regression=# explain select * from t where id = 5000 or id is null;
>>                                   QUERY PLAN
>>
>>
>> ------------------------------------------------------------------------------
>>  Bitmap Heap Scan on t  (cost=8.42..18.98 rows=14 width=4)
>>    Recheck Cond: ((id IS NULL) OR (id = 5000))
>>    ->  BitmapOr  (cost=8.42..8.42 rows=14 width=0)
>>          ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.25 rows=13
>> width=0)
>>                Index Cond: (id IS NULL)
>>          ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.16 rows=1
>> width=0)
>>                Index Cond: (id = 5000)
>> (7 rows)
>>
>> The OP was quite unclear about what semantics he wants for
>> multiple-variable WHERE clauses, but maybe something like this
>> would work:
>>
>> WHERE (original-clause) OR x IS NULL OR y IS NULL OR ...
>>
>> where each variable mentioned in original-clause is allowed
>> to also be NULL.  Or perhaps what is wanted is
>>
>> WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...)
>>
>> ??
>>
>>                         regards, tom lane
>>
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2024-12-19 06:47 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-12-19 04:14 Re: Wrapping a where clause to preserve rows with nulls David G. Johnston <[email protected]>
2024-12-19 04:38 ` Tom Lane <[email protected]>
2024-12-19 04:41   ` Adrian Garcia Badaracco <[email protected]>
2024-12-19 06:47     ` Adrian Garcia Badaracco <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox