How to create efficient index in this scenario?

public inbox for [email protected]  
help / color / mirror / Atom feed

From: veem v <[email protected]>
To: pgsql-general <[email protected]>
Subject: How to create efficient index in this scenario?
Date: Sat, 8 Jun 2024 19:03:07 +0530
Message-ID: <CAB+=1TX+Av1Fx+Q4YOmUGioUoa8TQ8kGa1h06zPSEona2az39A@mail.gmail.com> (raw)

Hi ,
It's postgres version 15.4. A table is daily range partitioned on a column
transaction_timestamp. It has a unique identifier which is the ideal for
primary key (say transaction_id) , however as there is a limitation in
which we have to include the partition key as part of the primary key, so
it has to be a composite index. Either it has to be
(transaction_id,transaction_timestamp) or ( transaction_timestamp,
transaction_id). But which one should we go for, if both of the columns get
used in all the queries?

We will always be using transaction_timestamp as mostly a range predicate
filter/join in the query and the transaction_id will be mostly used as a
join condition/direct filter in the queries. So we were wondering, which
column should we be using  as a leading column in this index?

There is a blog below (which is for oracle), showing how the index should
be chosen and it states ,  "*Stick the columns you do range scans on last
in the index, filters that get equality predicates should come first.* ",
and in that case we should have the PK created as in the order
(transaction_id,transaction_timestamp). It's because making the range
predicate as a leading column won't help use that as an access predicate
but as an filter predicate thus will read more blocks and thus more IO.
Does this hold true in postgres too?

https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/

Additionally there is another scenario in which we have the requirement to
have another timestamp column (say create_timestamp) to be added as part of
the primary key along with transaction_id and we are going to query this
table frequently by the column create_timestamp as a range predicate. And
ofcourse we will also have the range predicate filter on partition key
"transaction_timestamp". But we may or may not have join/filter on column
transaction_id, so in this scenario we should go for
 (create_timestamp,transaction_id,transaction_timestamp). because
"transaction_timestamp" is set as partition key , so putting it last
doesn't harm us. Will this be the correct order or any other index order is
appropriate?

Thanks and Regards
Veem

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: How to create efficient index in this scenario?
  In-Reply-To: <CAB+=1TX+Av1Fx+Q4YOmUGioUoa8TQ8kGa1h06zPSEona2az39A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox