Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tQPRf-002MkQ-EH for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Dec 2024 11:20:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tQPRe-00D4Gd-0g for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Dec 2024 11:20:53 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tQPRd-00D4GV-F0 for pgsql-hackers@lists.postgresql.org; Wed, 25 Dec 2024 11:20:53 +0000 Received: from mail.postgrespro.ru ([93.174.131.139]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tQPRV-001XZG-GV for pgsql-hackers@lists.postgresql.org; Wed, 25 Dec 2024 11:20:51 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=postgrespro.ru; s=mx2023; t=1735125644; bh=urcEZ7bdtsdxi4n5kOAdIUYO9dzefWvEyOXIoczjEYQ=; h=Message-ID:Date:User-Agent:Subject:To:Cc:References:From: In-Reply-To:From; b=Q/9kxNpoonngJR757JsZdLZSRBrjgVb7gPyTEEk9KikT5QAjcOXRnpFURLSkCtwrv JKI/NiazvR88/TPWEporJ+Jc9ifL3UFJUGlbsxDujy0v0BP+H1q6AvPC4A+GELTf72 a4SN7lYhqQPQAzDhV+dLbqNPI8qEAawUmjLi33KLW8Dwm1WNoSjFQM1YRMYd/f8r3B O+9ED2V+jhVFWkXa/UFQ/C5F/f0d95k/PvmX/rsE5sutN+ybpvGtYlJqQQYYmrF5Mr Z/irDVy6IOuzWIW8ag81Bf1si9Vfi8tkWtMnlaUK1q4BfJYB8JGnKxffxyafwmkBWS moL0Hs3m/R5lQ== Received: from [192.168.248.77] (unknown [212.233.86.201]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: a.rybakina@postgrespro.ru) by mail.postgrespro.ru (Postfix/587) with ESMTPSA id 31FF76013E; Wed, 25 Dec 2024 14:20:44 +0300 (MSK) Content-Type: multipart/alternative; boundary="------------0ppkoRsQHgWeyCB2vV4dgNYK" Message-ID: <46bc4eaf-58c5-42ab-8041-d3380a0768de@postgrespro.ru> Date: Wed, 25 Dec 2024 14:20:42 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Exists pull-up application with JoinExpr To: Ranier Vilela Cc: PostgreSQL Hackers References: <0b1f670d-b39d-4966-bf32-f0d502ebc564@postgrespro.ru> Content-Language: en-US From: Alena Rybakina In-Reply-To: X-KSMG-AntiPhishing: NotDetected, bases: 2024/12/25 10:45:00 X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.1.0.7854, bases: 2024/12/25 07:56:00 #26914998 X-KSMG-AntiVirus-Status: NotDetected, skipped X-KSMG-LinksScanning: not scanned, disabled by settings X-KSMG-Message-Action: skipped X-KSMG-Rule-ID: 1 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------0ppkoRsQHgWeyCB2vV4dgNYK Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 24.12.2024 13:25, Ranier Vilela wrote: > Hi Alena. > > Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina > escreveu: > > Hi, hackers! > > I found one pull-up that works if the inner join condition is > written through the where condition, > > |create temp table ta (id int primary key, val int); insert into > ta values(1,1); insert into ta values(2,2); ||insert into ta values(3,3);| > > |create temp table tb (id int primary key, aval int); insert into > tb values(4,1); insert into tb values(5,1); insert into tb > values(1,2); create temp table tc (id int primary key, aid int); > insert into tc values(6,1); insert into tc values(7,2);| > > |EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * > FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id > = tb.id );| > QUERY PLAN > ------------------------------------------------------------------------- > Nested Loop Semi Join (actual rows=1 loops=1) > Buffers: local hit=6 > -> Seq Scan on ta (actual rows=3 loops=1) > Buffers: local hit=1 > -> Nested Loop (actual rows=0 loops=3) > Buffers: local hit=5 > -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3) > Index Cond: (id =ta.id ) > Heap Fetches: 1 > Buffers: local hit=4 > -> Seq Scan on tc (actual rows=1 loops=1) > Buffers: local hit=1 > Planning: > Buffers: shared hit=67 read=12 > (14 rows) > > but it doesn't work if it is written through the outside condition. > > |alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING > OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON > ta.id = tb.id ); QUERY PLAN > ------------------------------------------------------ Seq Scan on > ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed > by Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual > rows=0 loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual > rows=0 loops=3) Filter: (ta.id = id) Rows Removed > by Filter: 3 Buffers: local hit=3 -> Seq Scan on tc (actual rows=1 > loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=16 > read=9 (15 rows) | > > |I have written a patch to add this functionality and now it gives > an query plan: | > > |alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING > OFF) >  SELECT * >    FROM ta >   WHERE EXISTS (SELECT * >                   FROM tb JOIN tc >                   ON ta.id = tb.id ); >                      QUERY PLAN > ------------------------------------------------------------------------- >  Nested Loop Semi Join (actual rows=1 loops=1) >    Buffers: local hit=6 >    ->  Seq Scan on ta (actual rows=3 loops=1) >          Buffers: local hit=1 >    ->  Nested Loop (actual rows=0 loops=3) >          Buffers: local hit=5 >          ->  Index Only Scan using tb_pkey on tb (actual rows=0 > loops=3) >                Index Cond: (id = ta.id ) >                Heap Fetches: 1 >                Buffers: local hit=4 >          ->  Seq Scan on tc (actual rows=1 loops=1) >                Buffers: local hit=1 > (12 rows)| > > tb and tc form a Cartesian product, but in the case of the > intersection condition with tuples from the table ta (ta.id > = tb.id ). So, according to the join > condition, tb intersects only with 1, and only it gets into the > result, but at the same time they appear twice - this is because > of the Cartesian product of tb with tc > > |*How it works:* > | > > I rewrote the code a bit so that it considers not only the quals > in jointree->quals, but also those in join expression > (subselect->jointree->fromlist). If they satisfy the conditions > for using pull up, I add them to the list of clauses and form a > "Bool" expression from them, joined by an "AND" operation. > > I took a look at this patch and I did a little polishing on it. > > And I believe that in testing, you need to set it to BUFFERS OFF, > because of the recent change made to ANALYZE. > > The tests are failing, like this: > QUERY PLAN >  ------------------------------------------------------------------------- >  Nested Loop Semi Join (actual rows=2 loops=1) > + Buffers: local hit=7 >  -> Seq Scan on ta (actual rows=2 loops=1) > + Buffers: local hit=1 >  -> Nested Loop (actual rows=1 loops=2) > + Buffers: local hit=6 >  -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2) >  Index Cond: (id = ta.id ) >  Heap Fetches: 2 > + Buffers: local hit=4 >  -> Seq Scan on tc (actual rows=1 loops=2) > -(7 rows) > + Buffers: local hit=2 > +(12 rows) > Yes, you are right) Thank you for your interest to this thread) -- Regards, Alena Rybakina Postgres Professional --------------0ppkoRsQHgWeyCB2vV4dgNYK Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
On 24.12.2024 13:25, Ranier Vilela wrote:
Hi Alena.

Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina <a.rybakina@postgrespro.ru> escreveu:

Hi, hackers!

I found one pull-up that works if the inner join condition is written through the where condition,

create temp table ta (id int primary key, val int);
insert into ta values(1,1);
insert into ta values(2,2);
insert into ta values(3,3);

create temp table tb (id int primary key, aval int);
insert into tb values(4,1);
insert into tb values(5,1);
insert into tb values(1,2);

create temp table tc (id int primary key, aid int);
insert into tc values(6,1);
insert into tc values(7,2);
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 SELECT *
   FROM ta
  WHERE EXISTS (SELECT *
                  FROM tb, tc
                  WHERE ta.id = tb.id);
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=1 loops=1)
   Buffers: local hit=6
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Nested Loop (actual rows=0 loops=3)
         Buffers: local hit=5
         ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
               Index Cond: (id = ta.id)
               Heap Fetches: 1
               Buffers: local hit=4
         ->  Seq Scan on tc (actual rows=1 loops=1)
               Buffers: local hit=1
 Planning:
   Buffers: shared hit=67 read=12
(14 rows)

but it doesn't work if it is written through the outside condition.

alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 SELECT *
   FROM ta
  WHERE EXISTS (SELECT *
                  FROM tb JOIN tc
                  ON ta.id = tb.id);
                      QUERY PLAN                      
------------------------------------------------------
 Seq Scan on ta (actual rows=1 loops=1)
   Filter: EXISTS(SubPlan 1)
   Rows Removed by Filter: 2
   Buffers: local hit=5
   SubPlan 1
     ->  Nested Loop (actual rows=0 loops=3)
           Buffers: local hit=4
           ->  Seq Scan on tb (actual rows=0 loops=3)
                 Filter: (ta.id = id)
                 Rows Removed by Filter: 3
                 Buffers: local hit=3
           ->  Seq Scan on tc (actual rows=1 loops=1)
                 Buffers: local hit=1
 Planning:
   Buffers: shared hit=16 read=9
(15 rows)

I have written a patch to add this functionality and now it gives an query plan:

alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 SELECT *
   FROM ta
  WHERE EXISTS (SELECT *
                  FROM tb JOIN tc
                  ON ta.id = tb.id);
                     QUERY PLAN                                
-------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=1 loops=1)
   Buffers: local hit=6
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Nested Loop (actual rows=0 loops=3)
         Buffers: local hit=5
         ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
               Index Cond: (id = ta.id)
               Heap Fetches: 1
               Buffers: local hit=4
         ->  Seq Scan on tc (actual rows=1 loops=1)
               Buffers: local hit=1
(12 rows)

tb and tc form a Cartesian product, but in the case of the intersection condition with tuples from the table ta (ta.id = tb.id). So, according to the join condition, tb intersects only with 1, and only it gets into the result, but at the same time they appear twice - this is because of the Cartesian product of tb with tc

How it works:

I rewrote the code a bit so that it considers not only the quals in jointree->quals, but also those in join expression (subselect->jointree->fromlist). If they satisfy the conditions for using pull up, I add them to the list of clauses and form a "Bool" expression from them, joined by an "AND" operation.

I took a look at this patch and I did a little polishing on it.

And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.

The tests are failing, like this: 
QUERY PLAN
 -------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
 -> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
 -> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
 -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
 Index Cond: (id = ta.id)
 Heap Fetches: 2
+ Buffers: local hit=4
 -> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)

Yes, you are right) Thank you for your interest to this thread)
-- 
Regards,
Alena Rybakina
Postgres Professional
--------------0ppkoRsQHgWeyCB2vV4dgNYK--