public inbox for [email protected]  
help / color / mirror / Atom feed
From: Rui DeSousa <[email protected]>
To: Sbob <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: checking for a NULL date in a partitioned table kills performance
Date: Thu, 22 Aug 2024 20:07:55 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>



> On Aug 22, 2024, at 7:32 PM, Sbob <[email protected]> wrote:
> 
> 
> On 8/22/24 5:26 PM, Sbob wrote:
>> 
>> On 8/22/24 5:06 PM, Rui DeSousa wrote:
>>> 
>>>> On Aug 22, 2024, at 5:44 PM, Sbob <[email protected]> wrote:
>>>> 
>>>> All;
>>>> 
>>>> I am running a select from a partitioned table. The table (and all the partitions) have an index on contract_date like this:
>>>> CREATE INDEX on part_tab (contract_date) where contract_date > '2022-01-01'::date
>>>> 
>>>> The table (including all partitions) has 32million rows
>>>> The db server is an aurora postgresql instance with 128GB of ram and 16 vcpu's
>>>> 
>>>> The shared buffers is set to 90GB and effective_cache_size is also 90GB
>>>> I set default_statistics_target to 1000 and ram a vacuum analyze on the table
>>>> 
>>>> I am selecting a number of columns and specifying this where clause:
>>>> 
>>>> WHERE (
>>>>                      (contract_date IS NULL)
>>>>                      OR
>>>>                      (contract_date > '2022-01-01'::date)
>>>>               )
>>>> 
>>>> This takes 15 seconds to run and an explain says it's doing a table scan on all partitions (the query is not specifying the partition key)
>>>> If I change the where clause to look like this:
>>>> 
>>>> WHERE (
>>>>                    (contract_date > '2022-01-01'::date)
>>>>               )
>>>> 
>>>> Then it performs index scans on all the partitions and runs in about 600ms
>>>> 
>>>> If i leave the where clause off entirely it performs table scans of the partitions and takes approx 18 seconds to run
>>>> 
>>>> I am trying to get the performance to less than 2sec,
>>>> I have tried adding indexes on the table and all partitions like this:
>>>> CREATE INDEX ON table (contract_date NULLS FIRST) ;
>>>> but the performance with the full where clause is the same:
>>>> 
>>>> WHERE (
>>>>                      (contract_date IS NULL)
>>>>                      OR
>>>>                      (contract_date > '2022-01-01'::date)
>>>>               )
>>>> 
>>>> runs in 15 seconds and scans all partitions
>>>> 
>>>> I also tried indexes i=on the table and all partitions like this:
>>>> CREATE INDEX ON table (contract_date) WHERE contract_date IS NULL;
>>>> 
>>>> but I get the same result, table scans on all partitions and it runs in 15 seconds
>>>> 
>>>> Any help or advice ?
>>>> 
>>>> Thanks in advance
>>>> 
>>>> 
>>> What is contract_date and when will it be null?
>> 
>> 
>> it's a date data type and it allows NULL's not sure why, this is a client's system
>> 
>> 
> 29 million of the 32 million rows in the table have NULL for contract_date
> 

NULLs are not indexed thus the OR predicate invalidate the use of the index.  

Since you are already creating a partial index just include the NULLs.  It index will get used for both of your queries.

create index table_idx1 
  on table (contract_date) 
  where contract_date > ‘1/1/2022’
      or contract_date is null
;


The reason why I asked when is contract_date null is because attributes in a table should be non nullable.  If it’s nullable then that begs the question if it belong in that table in the first place; and sometimes the answer is yes.  I just see a lot of half baked schemas out there.  I refer to them as organically designed schemas. 

-Rui.





view thread (16+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: checking for a NULL date in a partitioned table kills performance
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox