Re: How to create efficient index in this scenario?

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: How to create efficient index in this scenario?
2+ messages / 2 participants
[nested] [flat]

* Re: How to create efficient index in this scenario?
@ 2024-06-09 04:15 Lok P <[email protected]>
  2024-06-09 04:51 ` Re: How to create efficient index in this scenario? veem v <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Lok P @ 2024-06-09 04:15 UTC (permalink / raw)
  To: veem v <[email protected]>; +Cc: pgsql-general <[email protected]>

On Sat, Jun 8, 2024 at 7:03 PM veem v <[email protected]> wrote:

> Hi ,
> It's postgres version 15.4. A table is daily range partitioned on a column
> transaction_timestamp. It has a unique identifier which is the ideal for
> primary key (say transaction_id) , however as there is a limitation in
> which we have to include the partition key as part of the primary key, so
> it has to be a composite index. Either it has to be
> (transaction_id,transaction_timestamp) or ( transaction_timestamp,
> transaction_id). But which one should we go for, if both of the columns get
> used in all the queries?
>
> We will always be using transaction_timestamp as mostly a range predicate
> filter/join in the query and the transaction_id will be mostly used as a
> join condition/direct filter in the queries. So we were wondering, which
> column should we be using  as a leading column in this index?
>
> There is a blog below (which is for oracle), showing how the index should
> be chosen and it states ,  "*Stick the columns you do range scans on last
> in the index, filters that get equality predicates should come first.* ",
> and in that case we should have the PK created as in the order
> (transaction_id,transaction_timestamp). It's because making the range
> predicate as a leading column won't help use that as an access predicate
> but as an filter predicate thus will read more blocks and thus more IO.
> Does this hold true in postgres too?
>
> https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/
>

I believe the analogy holds true here in postgres too and the index in this
case should be on (transaction_id, transaction_timestamp).


>
>
> Additionally there is another scenario in which we have the requirement to
> have another timestamp column (say create_timestamp) to be added as part of
> the primary key along with transaction_id and we are going to query this
> table frequently by the column create_timestamp as a range predicate. And
> ofcourse we will also have the range predicate filter on partition key
> "transaction_timestamp". But we may or may not have join/filter on column
> transaction_id, so in this scenario we should go for
>  (create_timestamp,transaction_id,transaction_timestamp). because
> "transaction_timestamp" is set as partition key , so putting it last
> doesn't harm us. Will this be the correct order or any other index order is
> appropriate?
>
>
>
In this case , the index should be on (
create_timestamp,transaction_id,transaction_timestamp), considering the
fact that you will always have queries with "create_timestamp" as predicate
and may not have transaction_id in the query predicate.


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: How to create efficient index in this scenario?
  2024-06-09 04:15 Re: How to create efficient index in this scenario? Lok P <[email protected]>
@ 2024-06-09 04:51 ` veem v <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: veem v @ 2024-06-09 04:51 UTC (permalink / raw)
  To: Lok P <[email protected]>; +Cc: pgsql-general <[email protected]>

On Sun, 9 Jun 2024 at 09:45, Lok P <[email protected]> wrote:

>
> On Sat, Jun 8, 2024 at 7:03 PM veem v <[email protected]> wrote:
>
>>
>> There is a blog below (which is for oracle), showing how the index should
>> be chosen and it states ,  "*Stick the columns you do range scans on
>> last in the index, filters that get equality predicates should come first.*
>> ", and in that case we should have the PK created as in the order
>> (transaction_id,transaction_timestamp). It's because making the range
>> predicate as a leading column won't help use that as an access predicate
>> but as an filter predicate thus will read more blocks and thus more IO.
>> Does this hold true in postgres too?
>>
>>
>> https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/
>>
>
> I believe the analogy holds true here in postgres too and the index in
> this case should be on (transaction_id, transaction_timestamp).
>
>
>>
>>
>> Additionally there is another scenario in which we have the requirement
>> to have another timestamp column (say create_timestamp) to be added as part
>> of the primary key along with transaction_id and we are going to query this
>> table frequently by the column create_timestamp as a range predicate. And
>> ofcourse we will also have the range predicate filter on partition key
>> "transaction_timestamp". But we may or may not have join/filter on column
>> transaction_id, so in this scenario we should go for
>>  (create_timestamp,transaction_id,transaction_timestamp). because
>> "transaction_timestamp" is set as partition key , so putting it last
>> doesn't harm us. Will this be the correct order or any other index order is
>> appropriate?
>>
>>
>>
> In this case , the index should be on (
> create_timestamp,transaction_id,transaction_timestamp), considering the
> fact that you will always have queries with "create_timestamp" as predicate
> and may not have transaction_id in the query predicate.
>

So in the second scenario, if we keep the create_timestamp as the leading
column ,is it not against the advice which the blog provides i.e. to not
have the range predicate as the leading column in the index?


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

end of thread, other threads:[~2024-06-09 04:51 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-06-09 04:15 Re: How to create efficient index in this scenario? Lok P <[email protected]>
2024-06-09 04:51 ` veem v <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox