Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1shSdN-001rJy-Bm for pgsql-admin@arkaria.postgresql.org; Fri, 23 Aug 2024 11:39:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1shSdK-00CFF7-Ns for pgsql-admin@arkaria.postgresql.org; Fri, 23 Aug 2024 11:39:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <010001917f08731e-0052ded1-1c0a-4f39-a63f-3e99767c5f59-000000@amazonses.com>) id 1shSdK-00CFEz-9g for pgsql-admin@lists.postgresql.org; Fri, 23 Aug 2024 11:39:10 +0000 Received: from a8-43.smtp-out.amazonses.com ([54.240.8.43]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from <010001917f08731e-0052ded1-1c0a-4f39-a63f-3e99767c5f59-000000@amazonses.com>) id 1shSdH-0016cE-73 for pgsql-admin@lists.postgresql.org; Fri, 23 Aug 2024 11:39:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug; d=amazonses.com; t=1724413146; h=Content-Type:Content-Transfer-Encoding:From:Mime-Version:Subject:Date:Message-Id:References:In-Reply-To:To:Feedback-ID; bh=qORHE5TSXy693mK5ZdiZ1bvLKlNwqzdaE4zngKhtoZM=; b=dDFUw1A2h3dYUhE7gZKUAy3JdVKfdUTOpERHB0erhIxLs8HwyD5Z8dxZSuGsaTWg 6RubkQ4onT3JFLv2xJ1th423DvTnDdngd3cCRRapkLjv/ccAD7SIKLh54gl7yQxTuU5 ZA20C4TLtrurnT30TzDbCzWCJfKsf7eFBs6acohE= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Doug Reynolds Mime-Version: 1.0 (1.0) Subject: Re: checking for a NULL date in a partitioned table kills performance Date: Fri, 23 Aug 2024 11:39:05 +0000 Message-ID: <010001917f08731e-0052ded1-1c0a-4f39-a63f-3e99767c5f59-000000@email.amazonses.com> References: In-Reply-To: To: Pgsql-admin X-Virus-Scanned: clamav-milter 0.103.9 at postfix X-Virus-Status: Clean X-Mailer: iPhone Mail (21G93) Feedback-ID: ::1.us-east-1.L+CISTGIImABHDx8bBKKEgULZ36xzLwmtjped/xJNPU=:AmazonSES X-SES-Outgoing: 2024.08.23-54.240.8.43 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk I don=E2=80=99t see how an index is going to help since virtually all of the= rows are null AND contract_date isn=E2=80=99t the partition key. =20 Perhaps, you could try a UNION ALL with one query selecting the date and the= other selecting where the date is null.=20 You could try something really ugly where you make a function index that COA= LESCEs the nulls to 1-1-1900 and use the COALESCE in the query. Sent from my iPhone > On Aug 22, 2024, at 7:43=E2=80=AFPM, Sbob wro= te: >=20 > =EF=BB=BF >> On 8/22/24 5:26 PM, Sbob wrote: >>=20 >>> On 8/22/24 5:06 PM, Rui DeSousa wrote: >>>=20 >>>> On Aug 22, 2024, at 5:44 PM, Sbob wrote: >>>>=20 >>>> All; >>>>=20 >>>> I am running a select from a partitioned table. The table (and all the p= artitions) have an index on contract_date like this: >>>> CREATE INDEX on part_tab (contract_date) where contract_date > '2022-01= -01'::date >>>>=20 >>>> The table (including all partitions) has 32million rows >>>> The db server is an aurora postgresql instance with 128GB of ram and 16= vcpu's >>>>=20 >>>> The shared buffers is set to 90GB and effective_cache_size is also 90GB= >>>> I set default_statistics_target to 1000 and ram a vacuum analyze on the= table >>>>=20 >>>> I am selecting a number of columns and specifying this where clause: >>>>=20 >>>> WHERE ( >>>> (contract_date IS NULL) >>>> OR >>>> (contract_date > '2022-01-01'::date) >>>> ) >>>>=20 >>>> This takes 15 seconds to run and an explain says it's doing a table sca= n on all partitions (the query is not specifying the partition key) >>>> If I change the where clause to look like this: >>>>=20 >>>> WHERE ( >>>> (contract_date > '2022-01-01'::date) >>>> ) >>>>=20 >>>> Then it performs index scans on all the partitions and runs in about 60= 0ms >>>>=20 >>>> If i leave the where clause off entirely it performs table scans of the= partitions and takes approx 18 seconds to run >>>>=20 >>>> I am trying to get the performance to less than 2sec, >>>> I have tried adding indexes on the table and all partitions like this: >>>> CREATE INDEX ON table (contract_date NULLS FIRST) ; >>>> but the performance with the full where clause is the same: >>>>=20 >>>> WHERE ( >>>> (contract_date IS NULL) >>>> OR >>>> (contract_date > '2022-01-01'::date) >>>> ) >>>>=20 >>>> runs in 15 seconds and scans all partitions >>>>=20 >>>> I also tried indexes i=3Don the table and all partitions like this: >>>> CREATE INDEX ON table (contract_date) WHERE contract_date IS NULL; >>>>=20 >>>> but I get the same result, table scans on all partitions and it runs in= 15 seconds >>>>=20 >>>> Any help or advice ? >>>>=20 >>>> Thanks in advance >>>>=20 >>>>=20 >>> What is contract_date and when will it be null? >>=20 >>=20 >> it's a date data type and it allows NULL's not sure why, this is a client= 's system >>=20 >>=20 > 29 million of the 32 million rows in the table have NULL for contract_date= >=20 >=20 >=20 >=20 >=20 >=20