public inbox for [email protected]  
help / color / mirror / Atom feed
From: David Rowley <[email protected]>
To: Shaozhong SHI <[email protected]>
Cc: pgsql-sql <[email protected]>
Subject: Re: Looking for an efficient way to replace efficient NOT IN when landling very large data
Date: Tue, 11 Apr 2023 21:33:42 +1200
Message-ID: <CAApHDvox+8heOrYyD5Y8gpz6+wcj0w4Th2gT9+msOaH__4eFGQ@mail.gmail.com> (raw)
In-Reply-To: <CA+i5JwYoxnYGHXn41AgnN9iJP8FCMDMozgzsn5hc0tvhSSMNVg@mail.gmail.com>
References: <CA+i5JwYoxnYGHXn41AgnN9iJP8FCMDMozgzsn5hc0tvhSSMNVg@mail.gmail.com>

On Tue, 11 Apr 2023 at 21:28, Shaozhong SHI <[email protected]> wrote:
>
> Select a.id, a.name, b.id, b.name from a_large_table a, definitive b where (a.id, b.name) not in
> (select b.id, b.name from definitive b)
>
> is very slow.
>
> Is there a faster way to do so?

It depends on what your exact requirements are for the NULL handling
that NOT IN provides.  Do you need the query to return 0 rows if b.id
and b.name are null?  This question is moot if none of the columns or
either table allow NULLs.

If you don't require that, then you'll give the planner more
flexibility to choose a more efficient plan if you use NOT EXISTS
instead.

David





view thread (3+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Looking for an efficient way to replace efficient NOT IN when landling very large data
  In-Reply-To: <CAApHDvox+8heOrYyD5Y8gpz6+wcj0w4Th2gT9+msOaH__4eFGQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox