Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t8RmV-004FKG-5p for pgsql-general@arkaria.postgresql.org; Tue, 05 Nov 2024 22:12:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t8RmS-000tCG-IA for pgsql-general@arkaria.postgresql.org; Tue, 05 Nov 2024 22:12:09 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t8RmS-000tC7-6Z for pgsql-general@lists.postgresql.org; Tue, 05 Nov 2024 22:12:08 +0000 Received: from mail-lf1-x133.google.com ([2a00:1450:4864:20::133]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t8RmO-000OqK-PO for pgsql-general@postgresql.org; Tue, 05 Nov 2024 22:12:08 +0000 Received: by mail-lf1-x133.google.com with SMTP id 2adb3069b0e04-539f0f9ee49so6876859e87.1 for ; Tue, 05 Nov 2024 14:12:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730844724; x=1731449524; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sEIOTehsqHzJOLLdQW18kqa/TLiMWoYByGxwUoJ3/KI=; b=BJv4T1HtFxnjYh0PoQrpBhh7eUkb38ND5zInsyUVLGnhlbAsG5yd8CqPjME7dDhPuF IDStQCojLq7vsrNuXC8pl2rLqy5nd84sMj7n5SWphLOQfM8Rfn8BrKVyfEdRNLcL3XdQ TZfRa1j1ZkIoZjWy2T3yV/tb2c0UL6J1fEU/eeCIx3kkIK7pE5JannGhO8ABbdvWEDuN oCUc3iqPmQoikMgFgGxvnFYUnZt/GLLwemdh/exSxsZU4CY2nocZlebUGBaGXb/YjUgv D22CY5CvVzxUflDCCJTVAGkHgZLnNjQBV+VUmtSQyXNgqIZ2IXjS0e+PhaGTzI1rMn0N M86A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730844724; x=1731449524; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sEIOTehsqHzJOLLdQW18kqa/TLiMWoYByGxwUoJ3/KI=; b=t2Fl+oPNbJoHM3bpz6OULBlEv8xhm68HFlZmsVFfaSZuI7voDFL8MFomWxRlOIDJ5w ORPCh2lvJuRA0IcmY0Qsz+3XsCZI7BX+/yCwvJjJ8sW5pGurotkbBHHJ1vgVhyM21vTF 3FJHVAXofL4ZwojcOy5t6pFV8TVZXlfxW069z+r1ePUu5GeNM7GAMs//R1RPdU7W76+Q SsI2OikX5Vq1WNrXhzFp2oeZimH+h+JE6jDodu0SxMUqLmtzPHkXlSO88uW3/mry90xq 4Hyd0HZKLgyEh2zldDe1RWv8f/C11gNLg/NQnuLyB8Tc+iom/gKnDQ/WR7hdGkItL3XR gXcA== X-Forwarded-Encrypted: i=1; AJvYcCXAoyKFibFxrD7dN1S6Ox28MxrD//MjB6qLnWPwTnsu6DG1Sn1bDUk4+RD2oVTNy6IUnrQGVUxnVlxZr4kR@postgresql.org X-Gm-Message-State: AOJu0Yx7FarRU5hgbhEHftz8b3P8oZFlEOoelR67g3x+Sd47NzDT6xEa yKDIFk7wE7rxAJIZc6NMV9liorRk9IhoIeM7lR2qbPp19olqpROvEqRYrzySAx5Uchu8y3oBDSw TvigrFJnVN2tIwwsH2ido6V+tE5U= X-Google-Smtp-Source: AGHT+IHUslnactvTBoLumDbVWQiPqlOzDV1C/oQieeuBe4gLyM0+iMjngp2csEKjW7VbwfU9MWhv+V55VZ1yAKfySow= X-Received: by 2002:a05:6512:1390:b0:539:e6bf:ca9a with SMTP id 2adb3069b0e04-53b348b7e12mr19725116e87.6.1730844723935; Tue, 05 Nov 2024 14:12:03 -0800 (PST) MIME-Version: 1.0 References: <2631313.1730733484@sss.pgh.pa.us> In-Reply-To: <2631313.1730733484@sss.pgh.pa.us> From: David Rowley Date: Wed, 6 Nov 2024 11:11:51 +1300 Message-ID: Subject: Re: Why not do distinct before SetOp To: Tom Lane Cc: ma lz , "pgsql-general@postgresql.org" Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 5 Nov 2024 at 04:18, Tom Lane wrote: > A different idea that occurred to me while looking at this is: > why have we got all this machinery to add and check a flag > column, rather than arranging things so that the two input > relations are "outer" and "inner" children of the SetOp? I've no idea why it's not like that. The current design is quite strange and feels dated. It might be worth making that change as even if we gave joins better support for IS NOT DISTINCT FROM and made INTERSECT use INNER JOIN instead and EXCEPT use anti join, we'd still need nodeSetOp.c for INTERSECT ALL and EXCEPT ALL. > It's possible some of the performance difference reported here > is due to having to pass more tuples through the SubqueryScan > node (with its projection to add the flag) and Append node, > but we could remove those steps entirely. Seems plausible. > > If we did want to improve this area, I think the first thing we'd want > > to do is use standard join types rather than HashSetOp Intersect to > > implement INTERSECT (without ALL). To do that efficiently, we'd need > > to do a bit more work on the standard join types to have them > > efficiently support IS NOT DISTINCT FROM clauses as the join keys. > > Maybe. It'd be a big project, but we do get complaints every so > often about IS NOT DISTINCT FROM predicates not being efficient, > so the benefits would be wider than just INTERSECT. Yeah, I agree. I think that's step 1 towards making INTERSECT (without ALL) and EXCEPT (without ALL) better and it would probably make a few other people happy who use IS NOT DISTINCT FROM in their join conditions. David