Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tO8Md-004L56-2a for pgsql-general@arkaria.postgresql.org; Thu, 19 Dec 2024 04:42:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tO8Mc-006yC7-9I for pgsql-general@arkaria.postgresql.org; Thu, 19 Dec 2024 04:42:18 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tO8Mb-006y8P-St for pgsql-general@lists.postgresql.org; Thu, 19 Dec 2024 04:42:17 +0000 Received: from mail-ed1-x536.google.com ([2a00:1450:4864:20::536]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tO8MU-000Wdw-84 for pgsql-general@postgresql.org; Thu, 19 Dec 2024 04:42:16 +0000 Received: by mail-ed1-x536.google.com with SMTP id 4fb4d7f45d1cf-5d3ea065b79so434703a12.3 for ; Wed, 18 Dec 2024 20:42:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adriangb-com.20230601.gappssmtp.com; s=20230601; t=1734583329; x=1735188129; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=d0+w7Si659F0ztgUsTIhnYzAgk1c+U+a1z1saxv697Y=; b=QrKh0J83/Fs6vITMODDAHYQcL+8iDu7J/cCdsrIcD4yYd0gx/oj8BPCvJc4EzB6+Fs jE4zoLcd079go7qb73uYMQYbutMd49+D9EQ9sI0U50zKf87+dlCHFgRwYjmFLQymwCsz EF8PwYePP1G8b1RMK2aVEWXUt8cDjeM1t58IZiyUQ+0eSztrnHMbv8CDw/Vif2656aWi EuS0LQIEQ41z3gk5i9YcZv6JAuw0rtoTAss741VvkRhdIoxyqUMAKA4OPS7PVLydpUPh QbhJg4hUcY2i7/I7UqP0B9Qvj0O7M1nLLQ4fHM3VVN2B63K6A9PUy+VfdSLRINCbrD4B GVfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734583329; x=1735188129; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=d0+w7Si659F0ztgUsTIhnYzAgk1c+U+a1z1saxv697Y=; b=hreaH2rqWZ6mQsteN9DYFdqP1zTFF91XKqypJ80s43wlzSmzzZ4PSelP/o5pg83X40 K8rO0NaBMUSrelMCXfcOUIJZZlwP/Yo9sM5BBx9hNMWqKfBsEx5nnH+KDBpIbgRGDskV 9BWb27OKjohPh6EUN5oGlXaFeMyjclQ5ZZxIUuSg0clqi/KCJ8gQcv9OwRoc5MEjDPts 8TaK7u8hoVQS14u4epsn8E3Ng1uF06aklK2uhT63Ewmy39PvlWEQCrbeHUH2spImSY84 82MT8ct0fmKcyd63OevgRNQcktkoKc3Pay9OFNSR1cHg6WD0b+KMJA5H59d3mTAWIGeF aTzA== X-Forwarded-Encrypted: i=1; AJvYcCUZaD0/L/MDFXl98OExQhNHyoEdim1NksZXjhEdroZGdP2kkAnNMEDmd4H/nsJWtN2SNsXvEzDZSe5T6+UR@postgresql.org X-Gm-Message-State: AOJu0YzGJ4oTnRZ1O7XQcHZoAjpc70eLdsycycgO1ki6phPyR3QZ07yR 1m7rKn/LWj9oFpzCd+l/8nsU+vfV9Ti6fCZTw7bWgjoFBsbuqwwgFnjJdsaMSf0/ErXQK1FePQk pRG8/BWFW71h33dih8NZ7PvWZQTFF2ak5ciyp9A== X-Gm-Gg: ASbGncvj3d8guktbpfy3gtmswtVLYLcS/zpnLlHzkdet7yplPbDhWwQwBUkykZy+YPR ubRoFrcBHgUoqvbveP/2tDgoE0LmfiHEFKRu9b2u5 X-Google-Smtp-Source: AGHT+IFnELc5dJQUJfJqBqV+mFvdeXZYK+llRtOwX7qRe04ymgldnltNNH4S1SYSho502mgugeo5TCuaId5LtTs4EpU= X-Received: by 2002:a05:6402:2712:b0:5d3:d7b0:b834 with SMTP id 4fb4d7f45d1cf-5d7ee3bac91mr4416662a12.1.1734583328876; Wed, 18 Dec 2024 20:42:08 -0800 (PST) MIME-Version: 1.0 References: <3062509.1734583088@sss.pgh.pa.us> In-Reply-To: <3062509.1734583088@sss.pgh.pa.us> From: Adrian Garcia Badaracco Date: Wed, 18 Dec 2024 22:41:58 -0600 Message-ID: Subject: Re: Wrapping a where clause to preserve rows with nulls To: Tom Lane Cc: "David G. Johnston" , "pgsql-general@postgresql.org" Content-Type: multipart/alternative; boundary="00000000000026ccf206299827b0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000026ccf206299827b0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you for the great idea Tom. While yes I can't modify the original WHERE clause I do think I'll be able to introspect it or get the system generating it to tell me which columns it references and then add an OR x is NULL OR y is NULL ... For context, just in case it's interesting, I store Parquet statistics in a Postgres table and run the output of this thing on them: https://github.com/apache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff= 8e335ec/datafusion/physical-optimizer/src/pruning.rs#L146-L456 Hence why I can't really control the WHERE clause (at least not without re-implementing a bunch of finicky error prone code). On Wed, Dec 18, 2024 at 10:38=E2=80=AFPM Tom Lane wrote= : > "David G. Johnston" writes: > > On Wednesday, December 18, 2024, Adrian Garcia Badaracco < > > adrian@adriangb.com> wrote: > >> Is there any way to include the rows where the predicate evaluates to > null > >> while still using an index? > > > ... A btree index, which handles =3D, can=E2=80=99t be told to behave > > differently and so cannot fulfill your desire to produce rows where the > > stored value is null; it can only produce those equal to 5000. > > Not in a single scan, no. But multiple scans are possible: > > regression=3D# create table t (id int unique); > CREATE TABLE > regression=3D# explain select * from t where id =3D 5000 or id is null; > QUERY PLAN > > > -------------------------------------------------------------------------= ----- > Bitmap Heap Scan on t (cost=3D8.42..18.98 rows=3D14 width=3D4) > Recheck Cond: ((id IS NULL) OR (id =3D 5000)) > -> BitmapOr (cost=3D8.42..8.42 rows=3D14 width=3D0) > -> Bitmap Index Scan on t_id_key (cost=3D0.00..4.25 rows=3D13 > width=3D0) > Index Cond: (id IS NULL) > -> Bitmap Index Scan on t_id_key (cost=3D0.00..4.16 rows=3D1 > width=3D0) > Index Cond: (id =3D 5000) > (7 rows) > > The OP was quite unclear about what semantics he wants for > multiple-variable WHERE clauses, but maybe something like this > would work: > > WHERE (original-clause) OR x IS NULL OR y IS NULL OR ... > > where each variable mentioned in original-clause is allowed > to also be NULL. Or perhaps what is wanted is > > WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...) > > ?? > > regards, tom lane > --00000000000026ccf206299827b0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for the great idea Tom. While yes I can't mo= dify the original WHERE clause I do think I'll be able to introspect it= or get the system generating it to tell me which columns it references and= then add an OR x is NULL OR y is NULL ...

For context, = just in case it's interesting, I store Parquet statistics in a Postgres= table and run the output of this thing on them:=C2=A0https://github.com/ap= ache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff8e335ec/datafusion/ph= ysical-optimizer/src/pruning.rs#L146-L456
Hence why I can'= ;t really control the WHERE clause (at least not without re-implementing a = bunch of finicky=C2=A0error prone code).

On Wed,= Dec 18, 2024 at 10:38=E2=80=AFPM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com= > writes:
> On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
> adrian@adrian= gb.com> wrote:
>> Is there any way to include the rows where the predicate evaluates= to null
>> while still using an index?

> ... A btree index, which handles =3D, can=E2=80=99t be told to behave<= br> > differently and so cannot fulfill your desire to produce rows where th= e
> stored value is null; it can only produce those equal to 5000.

Not in a single scan, no.=C2=A0 But multiple scans are possible:

regression=3D# create table t (id int unique);
CREATE TABLE
regression=3D# explain select * from t where id =3D 5000 or id is null;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0
---------------------------------------------------------------------------= ---
=C2=A0Bitmap Heap Scan on t=C2=A0 (cost=3D8.42..18.98 rows=3D14 width=3D4)<= br> =C2=A0 =C2=A0Recheck Cond: ((id IS NULL) OR (id =3D 5000))
=C2=A0 =C2=A0->=C2=A0 BitmapOr=C2=A0 (cost=3D8.42..8.42 rows=3D14 width= =3D0)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->=C2=A0 Bitmap Index Scan on t_id_key= =C2=A0 (cost=3D0.00..4.25 rows=3D13 width=3D0)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Cond: (id IS N= ULL)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->=C2=A0 Bitmap Index Scan on t_id_key= =C2=A0 (cost=3D0.00..4.16 rows=3D1 width=3D0)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Index Cond: (id =3D = 5000)
(7 rows)

The OP was quite unclear about what semantics he wants for
multiple-variable WHERE clauses, but maybe something like this
would work:

WHERE (original-clause) OR x IS NULL OR y IS NULL OR ...

where each variable mentioned in original-clause is allowed
to also be NULL.=C2=A0 Or perhaps what is wanted is

WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...)

??

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 regards, tom lane
--00000000000026ccf206299827b0--