Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sG9xn-009vUP-Nt for pgsql-general@arkaria.postgresql.org; Sun, 09 Jun 2024 04:15:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sG9xk-002W8e-KW for pgsql-general@arkaria.postgresql.org; Sun, 09 Jun 2024 04:15:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sG9xk-002W8V-5p for pgsql-general@lists.postgresql.org; Sun, 09 Jun 2024 04:15:25 +0000 Received: from mail-yw1-x1133.google.com ([2607:f8b0:4864:20::1133]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sG9xd-000icv-TW for pgsql-general@lists.postgresql.org; Sun, 09 Jun 2024 04:15:24 +0000 Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-62cdd144867so1178007b3.3 for ; Sat, 08 Jun 2024 21:15:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717906515; x=1718511315; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yx+Gr19WPThOkY5LjXKXkAiCh+Jvf2cJgM44xtXFyVw=; b=bANKrBizDzLw/sTnnaQ65tA8GEV2QNv7nhEHNY5JN2taTmiaFHK8+Vkeaksh+ogDLH kEmSCqcm0w0MBlHwCT23DZ5ziJlyScfMQbZJQqE9RJu6WPjPPCTz8lxK4xmTuBKEJqNa fWZgjo+w0itr3ibn5aqc1k1t/Xuh+Z1zKutZjGPtmy8ediVPMh5Vv/rKiy2F2pfyRQ6G S9Q9YNlHX0IUymgvw8p0y96GjlD8hIlEFSqnCH9r1Xt5gUBy4APHKbU2s9gjnKhwNLa3 /ddef2mjhqRilhzVv6EyH/yZ9Y+Amk9FgB9o9MTePM1BHklrufI/LvL53cUe4ECIjuR/ ZcBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717906515; x=1718511315; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yx+Gr19WPThOkY5LjXKXkAiCh+Jvf2cJgM44xtXFyVw=; b=jj3ShqboO+RmBMmifxiPAiQGS+N2uu5x+nGlSPiBaymFC3mbDHLam5WtoiM/nI0OFa yHG/z/MvTScGxTujRS8VKXLOjG8IB2I9ODUixbKOAcnw8SS8lZPGVkB75oy5aLa093ow +ng/N/Rc/1rHyqZ1ww80ASGfsZX4SBvfkraLLbbbCPHRg+UCQUwnK8SNilMgAgZjLJql z7GNsfIYBb9Ob7JaqTBr9lGklI1ABEvHsYW7QjV3/8AdVFR9X7GiDpmXhCGGwBykCNwE NuMmVElARxSw7st2QlnY9DlYLdtuSUjqWAQHmY0Xg2Kt5rO1gh5o8w8rsOFB+UDtqKxD WBqQ== X-Gm-Message-State: AOJu0YwaYsxUgDCzhXD+rHYoVacXIqmXd8pqWatyg9UzzG4855yYDOe2 rEyLGyUJdcRt6mPvdvlN2d1SK5zbDDV8rhtGbHrMfwAv7qKgFDmpX6GAkS99iph7cLbPfGGVdOX 4M3pO9v/I6v4MRUqjoV6VZ69pJUg= X-Google-Smtp-Source: AGHT+IEFFrSCIDfTghxI9qmHsHjR7O3h3O08ir51gpGlaim0ZXPRiKdTcOmpU54x5KBzF9Mf6m/A4V0S7L49MQMy0EI= X-Received: by 2002:a0d:f186:0:b0:62c:c5ea:66e6 with SMTP id 00721157ae682-62cd546b04fmr49681187b3.0.1717906515257; Sat, 08 Jun 2024 21:15:15 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Lok P Date: Sun, 9 Jun 2024 09:45:02 +0530 Message-ID: Subject: Re: How to create efficient index in this scenario? To: veem v Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000997095061a6d477e" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000997095061a6d477e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Jun 8, 2024 at 7:03=E2=80=AFPM veem v wrote: > Hi , > It's postgres version 15.4. A table is daily range partitioned on a colum= n > transaction_timestamp. It has a unique identifier which is the ideal for > primary key (say transaction_id) , however as there is a limitation in > which we have to include the partition key as part of the primary key, so > it has to be a composite index. Either it has to be > (transaction_id,transaction_timestamp) or ( transaction_timestamp, > transaction_id). But which one should we go for, if both of the columns g= et > used in all the queries? > > We will always be using transaction_timestamp as mostly a range predicate > filter/join in the query and the transaction_id will be mostly used as a > join condition/direct filter in the queries. So we were wondering, which > column should we be using as a leading column in this index? > > There is a blog below (which is for oracle), showing how the index should > be chosen and it states , "*Stick the columns you do range scans on last > in the index, filters that get equality predicates should come first.* ", > and in that case we should have the PK created as in the order > (transaction_id,transaction_timestamp). It's because making the range > predicate as a leading column won't help use that as an access predicate > but as an filter predicate thus will read more blocks and thus more IO. > Does this hold true in postgres too? > > https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexin= g/ > I believe the analogy holds true here in postgres too and the index in this case should be on (transaction_id, transaction_timestamp). > > > Additionally there is another scenario in which we have the requirement t= o > have another timestamp column (say create_timestamp) to be added as part = of > the primary key along with transaction_id and we are going to query this > table frequently by the column create_timestamp as a range predicate. And > ofcourse we will also have the range predicate filter on partition key > "transaction_timestamp". But we may or may not have join/filter on column > transaction_id, so in this scenario we should go for > (create_timestamp,transaction_id,transaction_timestamp). because > "transaction_timestamp" is set as partition key , so putting it last > doesn't harm us. Will this be the correct order or any other index order = is > appropriate? > > > In this case , the index should be on ( create_timestamp,transaction_id,transaction_timestamp), considering the fact that you will always have queries with "create_timestamp" as predicate and may not have transaction_id in the query predicate. --000000000000997095061a6d477e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Sat, Jun 8, 2024 at 7:03=E2=80=AFPM ve= em v <veema0000@gmail.com>= wrote:
Hi ,=C2=A0
It's postgres version 15.4. A table is daily ra= nge partitioned on a column transaction_timestamp. It has a unique identifi= er which is the ideal for primary key (say transaction_id) , however as the= re is a limitation in which we have to include the partition key as part of= the primary key, so it has to be a composite index. Either it has to be (t= ransaction_id,transaction_timestamp) or ( transaction_timestamp, transactio= n_id). But which one should we go for, if both of the columns get used in a= ll the queries?

We will always be using transaction_timestamp as mos= tly a range predicate filter/join in the query and the transaction_id will = be mostly used as a join condition/direct filter in the queries. So we were= wondering, which column should we be using =C2=A0as a leading column in th= is index?

There is a blog below (which is for oracle), showing how t= he index should be chosen and it states , =C2=A0"Stick the columns = you do range scans on last in the index, filters that get equality predicat= es should come first. ", and in that case we should have the PK cr= eated as in the order (transaction_id,transaction_timestamp). It's beca= use making the range predicate as a leading column won't help use that = as an access predicate but as an filter predicate thus will read more block= s and thus more IO. Does this hold true in postgres too?

https://ctandrewsayer.wordpress.com/2017/03/24/the-gold= en-rule-of-indexing/

I beli= eve=C2=A0the analogy holds true here in postgres too and the index in this = case should be on (transaction_id, transaction_timestamp).
=C2=A0


Additionally there= is another scenario in which we have the requirement to have another times= tamp column (say create_timestamp) to be added as part of the primary key a= long with transaction_id and we are going to query this table frequently by= the column create_timestamp as a range predicate. And ofcourse we will als= o have the range predicate filter on partition key "transaction_timest= amp". But we may or may not have join/filter on column transaction_id,= so in this scenario we should go for =C2=A0(create_timestamp,transaction_i= d,transaction_timestamp). because "transaction_timestamp" is set = as partition key , so putting it last doesn't harm us. Will this be the= correct order or any other index order is appropriate?

<= /div>


In this case , t= he index should be on ( create_timestamp,transaction_id,transaction_timesta= mp), considering the fact that you will always=C2=A0have queries with "= ;create_timestamp" as predicate and may not have transaction_id in the= query predicate.
--000000000000997095061a6d477e--