Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tSB4n-00Edgp-E3 for pgsql-hackers@arkaria.postgresql.org; Mon, 30 Dec 2024 08:24:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tSB4l-00Ag42-Oe for pgsql-hackers@arkaria.postgresql.org; Mon, 30 Dec 2024 08:24:35 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tSB4l-00Ag3u-8D for pgsql-hackers@lists.postgresql.org; Mon, 30 Dec 2024 08:24:35 +0000 Received: from mail.postgrespro.ru ([93.174.131.139]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tSB4d-002JkJ-HD for pgsql-hackers@lists.postgresql.org; Mon, 30 Dec 2024 08:24:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=postgrespro.ru; s=mx2023; t=1735547066; bh=LhxuZk7z8WE0Hjd0PDvfNpq12HsBkURVh/Roo/fFuIE=; h=Message-ID:Date:User-Agent:Subject:To:Cc:References:From: In-Reply-To:From; b=vRm5oQL2o0C3R/IaObY+lUq0CJ+Mlj8nFHIzl1leV7XadMun/rLsfo50CcDx1lqM6 4jOSRDnss4P6FAt+clW2ETdxcvb6eltKh1w18wSlzkA7YLSqdRY6pzZYhVq0CNy75N mIqJRUOO1On8jaEGu8GX7/kfcad9Z8HdRClDo1SxzgqdElICEYV4u1uDa536ZjVpP1 u30Icqi7bj0mD/BCq9fQVbicoL6mw7gVJRFksZmC8clRZ3KNcui3IiOffIXhnfcXbo Yx4DjoCP3i2sDrxsySr14MmB95oTsIRz4TgQJ/9HjgFzcU074xEsrJd7k+CT+J6NKV veBpA9gUuC43w== Received: from [172.30.48.18] (unknown [172.30.48.18]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: a.rybakina@postgrespro.ru) by mail.postgrespro.ru (Postfix/587) with ESMTPSA id 30C79607E0; Mon, 30 Dec 2024 11:24:26 +0300 (MSK) Content-Type: multipart/alternative; boundary="------------T0gxhFGoWS9yHp3gU7HjtBaT" Message-ID: <8d7cbbf6-ddc0-4696-8af2-a68d740e14f2@postgrespro.ru> Date: Mon, 30 Dec 2024 11:24:25 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Exists pull-up application with JoinExpr To: Ilia Evdokimov Cc: PostgreSQL Hackers , Ranier Vilela References: <0b1f670d-b39d-4966-bf32-f0d502ebc564@postgrespro.ru> <46bc4eaf-58c5-42ab-8041-d3380a0768de@postgrespro.ru> <9b041978-06e3-4a50-8a5d-dacbb054f23e@tantorlabs.com> Content-Language: en-US From: Alena Rybakina In-Reply-To: <9b041978-06e3-4a50-8a5d-dacbb054f23e@tantorlabs.com> X-KSMG-AntiPhishing: NotDetected X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.1.0.7854, bases: 2024/12/30 04:24:00 #26928642 X-KSMG-AntiVirus-Status: NotDetected, skipped X-KSMG-LinksScanning: not scanned, disabled by settings X-KSMG-Message-Action: skipped X-KSMG-Rule-ID: 1 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------T0gxhFGoWS9yHp3gU7HjtBaT Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi! Thank you for your interest to this subject! On 27.12.2024 15:53, Ilia Evdokimov wrote: > Hi Alena, > > Thank you for your work on subqueries with JOIN. > > Have you considered the scenario where in subquery includes a qual > like (tc.aid = 1)? When I tried executing those queries I receive > different results. In my opinion, to prevent this, we should add > filters for such quals within the loop 'foreach (lc, all_clauses)' > > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta > WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1); >                               QUERY PLAN > ---------------------------------------------------------------------- >  Hash Join (actual rows=1 loops=1) >    Hash Cond: (ta.id = tb.id) >    Buffers: local hit=3 >    ->  Seq Scan on ta (actual rows=3 loops=1) >          Buffers: local hit=1 >    ->  Hash (actual rows=3 loops=1) >          Buckets: 4096  Batches: 1  Memory Usage: 33kB >          Buffers: local hit=2 >          ->  HashAggregate (actual rows=3 loops=1) >                Group Key: tb.id >                Batches: 1  Memory Usage: 121kB >                Buffers: local hit=2 >                ->  Nested Loop (actual rows=3 loops=1) >                      Buffers: local hit=2 >                      ->  Seq Scan on tb (actual rows=3 loops=1) >                            Buffers: local hit=1 >                      ->  Materialize (actual rows=1 loops=3) >                            Storage: Memory  Maximum Storage: 17kB >                            Buffers: local hit=1 >                            ->  Seq Scan on tc (actual rows=1 loops=1) >                                  Filter: (aid = 1) >                                  Rows Removed by Filter: 1 >                                  Buffers: local hit=1 > (23 rows) > > ============================ > > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id = > tb.id WHERE tc.aid = 1); >                                 QUERY PLAN > --------------------------------------------------------------------------- > >  Seq Scan on ta (actual rows=1 loops=1) >    Filter: EXISTS(SubPlan 1) >    Rows Removed by Filter: 2 >    Buffers: local hit=6 >    SubPlan 1 >      ->  Nested Loop (actual rows=0 loops=3) >            Buffers: local hit=5 >            ->  Index Only Scan using tb_pkey on tb (actual rows=0 > loops=3) >                  Index Cond: (id = ta.id) >                  Heap Fetches: 1 >                  Buffers: local hit=4 >            ->  Seq Scan on tc (actual rows=1 loops=1) >                  Filter: (aid = 1) >                  Buffers: local hit=1 > (14 rows) > > You are right, at the moment the code is not processed if there is a constant qual in the subquery (like t1.x1=1 in the example below) and this problem is not only related to the current patch. For example you can get such a query plan if you complete this request to the master: create table t (xint); create table t1 (x1int); create table t2 (x2int); EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT 1 FROM t WHERE EXISTS (SELECT 1 FROM t1 where t1.x1 = 1); QUERY PLAN ------------------------------------------------ Result (actual rows=0 loops=1) One-Time Filter: (InitPlan 1).col1 InitPlan 1 -> Seq Scan on t1 (actual rows=0 loops=1) Filter: (x1 = 1) -> Seq Scan on t (never executed) (6 rows) It's all because of the check in this function - this qual has levelsoup = 0, not 1 (see (!contain_vars_of_level(whereClause, 1)), but I already found out that by changing this, the logic of correction there is required a little more complicated. At the moment, I'm working to add this processing to the patch. Thanks for the case! -- Regards, Alena Rybakina Postgres Professional --------------T0gxhFGoWS9yHp3gU7HjtBaT Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi! Thank you for your interest to this subject!

On 27.12.2024 15:53, Ilia Evdokimov wrote:
Hi Alena,

Thank you for your work on subqueries with JOIN.

Have you considered the scenario where in subquery includes a qual like (tc.aid = 1)? When I tried executing those queries I receive different results. In my opinion, to prevent this, we should add filters for such quals within the loop 'foreach (lc, all_clauses)'

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
                              QUERY PLAN
----------------------------------------------------------------------
 Hash Join (actual rows=1 loops=1)
   Hash Cond: (ta.id = tb.id)
   Buffers: local hit=3
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Hash (actual rows=3 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 33kB
         Buffers: local hit=2
         ->  HashAggregate (actual rows=3 loops=1)
               Group Key: tb.id
               Batches: 1  Memory Usage: 121kB
               Buffers: local hit=2
               ->  Nested Loop (actual rows=3 loops=1)
                     Buffers: local hit=2
                     ->  Seq Scan on tb (actual rows=3 loops=1)
                           Buffers: local hit=1
                     ->  Materialize (actual rows=1 loops=3)
                           Storage: Memory  Maximum Storage: 17kB
                           Buffers: local hit=1
                           ->  Seq Scan on tc (actual rows=1 loops=1)
                                 Filter: (aid = 1)
                                 Rows Removed by Filter: 1
                                 Buffers: local hit=1
(23 rows)

============================

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id WHERE tc.aid = 1);
                                QUERY PLAN
---------------------------------------------------------------------------
 Seq Scan on ta (actual rows=1 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 2
   Buffers: local hit=6
   SubPlan 1
     ->  Nested Loop (actual rows=0 loops=3)
           Buffers: local hit=5
           ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
                 Index Cond: (id = ta.id)
                 Heap Fetches: 1
                 Buffers: local hit=4
           ->  Seq Scan on tc (actual rows=1 loops=1)
                 Filter: (aid = 1)
                 Buffers: local hit=1
(14 rows)


You are right, at the moment the code is not processed if there is a constant qual in the subquery (like t1.x1=1 in the example below) and this problem is not only related to the current patch. 

For example you can get such a query plan if you complete this request to the master:

create table t (x int);
create table t1 (x1 int);
create table t2 (x2 int);
 EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 SELECT 1
   FROM t
  WHERE EXISTS (SELECT 1
                  FROM t1
                  where t1.x1 = 1);
                   QUERY PLAN                   
------------------------------------------------
 Result (actual rows=0 loops=1)
   One-Time Filter: (InitPlan 1).col1
   InitPlan 1
     ->  Seq Scan on t1 (actual rows=0 loops=1)
           Filter: (x1 = 1)
   ->  Seq Scan on t (never executed)
(6 rows)

It's all because of the check in this function - this qual has levelsoup = 0, not 1 (see (!contain_vars_of_level(whereClause, 1)), but I already found out that by changing this, the logic of correction there is required a little more complicated. At the moment, I'm working to add this processing to the patch. 

Thanks for the case!

-- 
Regards,
Alena Rybakina
Postgres Professional
--------------T0gxhFGoWS9yHp3gU7HjtBaT--