Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pmAY4-0006CM-2B for pgsql-sql@arkaria.postgresql.org; Tue, 11 Apr 2023 09:44:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1pmAY2-0001FH-Jl for pgsql-sql@arkaria.postgresql.org; Tue, 11 Apr 2023 09:44:22 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pmAY2-0001F0-8a for pgsql-sql@lists.postgresql.org; Tue, 11 Apr 2023 09:44:22 +0000 Received: from mail-ej1-x630.google.com ([2a00:1450:4864:20::630]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pmAXv-0001tp-UK for pgsql-sql@lists.postgresql.org; Tue, 11 Apr 2023 09:44:21 +0000 Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-946a769ae5cso324817866b.1 for ; Tue, 11 Apr 2023 02:44:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1681206254; x=1683798254; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=T24x/Hu4WXqdxRYiG/lftT95EuaHdZFLOYZhm/783GA=; b=FIHpo+mCF9JvwbSvnI0FzzoCPLRTS3gmTkjvR8Up3HWxctkbm5j3qcjv1wZQME723f t1e7INpZqt0+Xc1lDWCgFszN2c90b9Wa778w95Om2/uG2iXKxJvsMPgkZNwQ4KybKWCo aYyLqAiympQ9fltLLgvDbBvhbgqpm69wFPzlaBwuEeunfHyavZf0/4Cmy/D2IhV/DmVZ JuED3rDSZlU2lrjB7AHP02uoPLlknYtN8JCS0Z6Jd+AOi/JP9ZBev0i4Q4JgNI8b4WLQ 9cEx0JKkkiX2tRuERfHTxHixPXJUr4iVaUVcLzSQTFoBzy+hOjSLVk8h59OWS70YOqby 5UfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681206254; x=1683798254; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=T24x/Hu4WXqdxRYiG/lftT95EuaHdZFLOYZhm/783GA=; b=oavJAOIojhIXsHRSerLXhRCLqg4v7Mo/d+RUHK7SKlTB2G70p5js4v6JcdX8PgFHNA C7dPSffKcd5XViXLUM3zMPHazj0zRfQrCa6XUM4eEEQ1BWsLv/HzkzS3eb7JxW9uYtwo 9ddII7c8XZLuyCgGyilT7kGTMORLtV38BazVvXAD64mNSGcWjbevSraGl0biD8aeU41n 1n5LBW0kDKuwvb9X45bfsrt2OuDH6HcPFEVKLgtR1MB8tJkQkyHpagv++64rH2bjeuOE CQOlIsQunflGM7dv7O7IT7uZLugAu2WmfamtP4JZpHzQ7NXkhqCBRdj3YxSbZFEMTVDd t53A== X-Gm-Message-State: AAQBX9e9g0EtjorbF21KO4nDNw7lr9sl2qZaWardiNzosVcqlHrV7D5n JTgu89CzjMIweB8tqeYBBvG+gkCv7tAMRSd5FBI= X-Google-Smtp-Source: AKy350bnridZiUMQyEYPONqPsk1oCerYhKY09LP4cOjqddriy3nwXKCVFJ6pFik/Jl8rUSOKyUA0A9fzO2mNvgQZ2qQ= X-Received: by 2002:a50:c31e:0:b0:501:d2fb:44bb with SMTP id a30-20020a50c31e000000b00501d2fb44bbmr6125010edb.5.1681206254065; Tue, 11 Apr 2023 02:44:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shaozhong SHI Date: Tue, 11 Apr 2023 10:44:02 +0100 Message-ID: Subject: Re: Looking for an efficient way to replace efficient NOT IN when landling very large data To: David Rowley Cc: pgsql-sql Content-Type: multipart/alternative; boundary="000000000000915b7405f90c5524" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000915b7405f90c5524 Content-Type: text/plain; charset="UTF-8" On Tue, 11 Apr 2023 at 10:33, David Rowley wrote: > On Tue, 11 Apr 2023 at 21:28, Shaozhong SHI > wrote: > > > > Select a.id, a.name, b.id, b.name from a_large_table a, definitive b > where (a.id, b.name) not in > > (select b.id, b.name from definitive b) > > > > is very slow. > > > > Is there a faster way to do so? > > It depends on what your exact requirements are for the NULL handling > that NOT IN provides. Do you need the query to return 0 rows if b.id > and b.name are null? This question is moot if none of the columns or > either table allow NULLs. > > If you don't require that, then you'll give the planner more > flexibility to choose a more efficient plan if you use NOT EXISTS > instead. > > David > I would like to try out an example of NOT EXISTS way and see how the replacement works. Regards, David --000000000000915b7405f90c5524 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, 11 Apr 2023 at 10:33, David R= owley <dgrowleyml@gmail.com&= gt; wrote:
On Tu= e, 11 Apr 2023 at 21:28, Shaozhong SHI <shishaozhong@gmail.com> wrote:
>
> Select a.= id, a.na= me, b.id, b.name from a_large_table a, definitive b where (a.id, b.name) not in
> (select b= .id, b.n= ame from definitive b)
>
> is very slow.
>
> Is there a faster way to do so?

It depends on what your exact requirements are for the NULL handling
that NOT IN provides.=C2=A0 Do you need the query to return 0 rows if b.id
and b.name are null?=C2=A0 This question is moot if none of the columns or
either table allow NULLs.

If you don't require that, then you'll give the planner more
flexibility to choose a more efficient plan if you use NOT EXISTS
instead.

David


=
Regards,

David=C2=A0
--000000000000915b7405f90c5524--