Re: checking for a NULL date in a partitioned table kills performance

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Rui DeSousa <[email protected]>
To: Tom Lane <[email protected]>
Cc: Sbob <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: checking for a NULL date in a partitioned table kills performance
Date: Fri, 23 Aug 2024 05:54:51 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>



> On Aug 22, 2024, at 8:05 PM, Tom Lane <[email protected]> wrote:
> 
> Sbob <[email protected]> writes:
>> 29 million of the 32 million rows in the table have NULL for contract_date
> 
> [ blink... ]  So your query is selecting at least 29/32nds of the
> table, plus however much matches the contract_date > '2022-01-01'
> alternative.  I'm not sure how you expect that to be significantly
> cheaper than scanning the whole table.
> 
> 			regards, tom lane


^^^ This is best answer; however, what is the actual query and how is it used?  I assume it’s a analytical query and not actually extracting the rows.  Is for a dashboard or an ad-hoc query? 


Here’s simple example; from 4/1 is uses the index 3/1 it does a full table scan.  Depending on what you using the query for you could use a covered index or a materialized view.


prod=# create index emp_idx3 on emp (contract_date);
CREATE INDEX
Time: 6036.030 ms (00:06.036)
prod=# select sum(site_id) from emp where contract_date > '4/1/2024';
    sum    
-----------
 927473447
(1 row)

Time: 711.774 ms
prod=# select sum(site_id) from emp where contract_date > '3/1/2024';
    sum     
------------
 1128971203
(1 row)

Time: 1945.397 ms (00:01.945)
prod=# select sum(site_id) from emp where contract_date > '3/1/2024' or contract_date is null;
    sum     
------------
 3823075309
(1 row)

Time: 1821.284 ms (00:01.821)
prod=# explain select sum(site_id) from emp where contract_date > '4/1/2024';
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=754070.31..754070.32 rows=1 width=8)
   ->  Gather  (cost=754069.59..754070.30 rows=7 width=8)
         Workers Planned: 7
         ->  Partial Aggregate  (cost=753069.59..753069.60 rows=1 width=8)
               ->  Parallel Bitmap Heap Scan on emp  (cost=5343.20..752867.80 rows=80715 width=4)
                     Recheck Cond: (contract_date > '2024-04-01'::date)
                     ->  Bitmap Index Scan on emp_idx3  (cost=0.00..5201.95 rows=565002 width=0)
                           Index Cond: (contract_date > '2024-04-01'::date)
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(11 rows)

Time: 1.196 ms
prod=# explain select sum(site_id) from emp where contract_date > '3/1/2024';
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=764566.90..764566.91 rows=1 width=8)
   ->  Gather  (cost=764566.18..764566.89 rows=7 width=8)
         Workers Planned: 7
         ->  Partial Aggregate  (cost=763566.18..763566.19 rows=1 width=8)
               ->  Parallel Seq Scan on emp  (cost=0.00..763320.15 rows=98411 width=4)
                     Filter: (contract_date > '2024-03-01'::date)
 JIT:
   Functions: 7
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(9 rows)

Time: 1.172 ms
prod=# drop index emp_idx3;
DROP INDEX
Time: 8.663 ms
prod=# create index emp_idx3 on emp (contract_date) include (site_id);
CREATE INDEX
Time: 7002.860 ms (00:07.003)
prod=# select sum(site_id) from emp where contract_date > '4/1/2024';
    sum    
-----------
 927473447
(1 row)

Time: 56.450 ms
prod=#  select sum(site_id) from emp where contract_date > '3/1/2024';
    sum     
------------
 1128971203
(1 row)

Time: 49.115 ms
prod=# select sum(site_id) from emp where contract_date > '3/1/2024' or contract_date is null;
    sum     
------------
 3823075309
(1 row)

Time: 702.962 ms
prod=# explain select sum(site_id) from emp where contract_date > '3/1/2024' or contract_date is null;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=216035.47..216035.48 rows=1 width=8)
   ->  Gather  (cost=216034.74..216035.45 rows=7 width=8)
         Workers Planned: 7
         ->  Partial Aggregate  (cost=215034.74..215034.75 rows=1 width=8)
               ->  Parallel Index Only Scan using emp_idx3 on emp  (cost=0.44..213686.81 rows=539174 width=4)
                     Filter: ((contract_date > '2024-03-01'::date) OR (contract_date IS NULL))
 JIT:
   Functions: 5
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(9 rows)

Time: 0.972 ms
prod=# select count(*) from emp;
  count   
----------
 16862243
(1 row)

Time: 629.995 ms

view thread (16+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: checking for a NULL date in a partitioned table kills performance
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox