Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sFwCF-007XxQ-Ek for pgsql-general@arkaria.postgresql.org; Sat, 08 Jun 2024 13:33:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sFwCC-00H6hZ-Us for pgsql-general@arkaria.postgresql.org; Sat, 08 Jun 2024 13:33:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sFwCC-00H6hQ-Jw for pgsql-general@lists.postgresql.org; Sat, 08 Jun 2024 13:33:25 +0000 Received: from mail-lf1-x135.google.com ([2a00:1450:4864:20::135]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sFwC9-000cSo-K8 for pgsql-general@lists.postgresql.org; Sat, 08 Jun 2024 13:33:24 +0000 Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-52c804e092dso467999e87.2 for ; Sat, 08 Jun 2024 06:33:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717853599; x=1718458399; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=6iGeFXEhAEYo+qdvXE2qXFiRbOhzDnzcWca5GKMbOlg=; b=KsAZSwBhv37TINIk6LXoC+M0zJp2ig/52bYATzra2iKLKlYgwfKzJu/vP31HdW0a6Z 6Ns0sC5/COZeLZrwQ2pCXMaDmerABkKFzdFWMb/ZV9YdQs+raBbr96Di3Wab67wI6dJ2 2XQQRueNNTZU8S3dLIXYfXABi0DO7pAMYTtJorY+fsLWeo1figpY/4qhICf042/pTafG t6ZutEYwkhO41MQg00V1TLZS+x9f+lynbgwrHrrZRnXQsgoipBdqzzKI3S1BDGsscjo3 Jkie1/dFYjE9h57ij+96lkKXSgVnrAcvb9Z/tWIwwOclJJxsDoRLVShNloes3Wb/qJvs WoyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717853599; x=1718458399; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=6iGeFXEhAEYo+qdvXE2qXFiRbOhzDnzcWca5GKMbOlg=; b=X/+bTG+plnALS8d4oMIulVvSYd1H2KgFyjR6XdZPAh9PapzsnBB6xeM7yF2MZKJt/4 /6dliK1rhioiP2bU2dUEfHY/MLhiFn0KA26OGhr8qMY0y8gsMvJoXGz7br+dI4pRyInL xWS52WXg1/8EXp/dWI50c5uZfsHzsf7VGuCp82b8ScpYSfG8Uy2Va3+PEM8OHKNifs7G ocZGXsRiemeqhjxe0yHV27gzQaQpMXHDjtqHDbWa19gQe5YHpVsxuoWpiqNJPtZgZcuN 9dO/2NA1z+dq+uD/o+0LjhrgObiBMYM1Ada7CBa0eBwlhs13n4JRlDoy/u5mi4w2ldv1 9woA== X-Gm-Message-State: AOJu0YzF9ZlDX7LilHk162iibNt9kG8xSUNZZx81Xb8Q2Us1IAeiL1LM TwnrKlObeLjKKfM/NeW9mkTvlEQWChVnXr9Rlpe8UZqLRt2zD6hDl6Kc7OeX/hAsFp98nBXAB1J XuPqIRmebqfC2AIM3TnhrxuPjan6ByNvO X-Google-Smtp-Source: AGHT+IEqDGZXJocigmReVAqBTwq+xad2B1/JSjV0ScfuxUIXKWI8bG2f6sInEqnOt6qlcnGyr+eW3KBuo03Y6if0poo= X-Received: by 2002:ac2:54a9:0:b0:52c:816c:28cd with SMTP id 2adb3069b0e04-52c816c2966mr786143e87.37.1717853599004; Sat, 08 Jun 2024 06:33:19 -0700 (PDT) MIME-Version: 1.0 From: veem v Date: Sat, 8 Jun 2024 19:03:07 +0530 Message-ID: Subject: How to create efficient index in this scenario? To: pgsql-general Content-Type: multipart/alternative; boundary="0000000000008b83be061a60f550" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000008b83be061a60f550 Content-Type: text/plain; charset="UTF-8" Hi , It's postgres version 15.4. A table is daily range partitioned on a column transaction_timestamp. It has a unique identifier which is the ideal for primary key (say transaction_id) , however as there is a limitation in which we have to include the partition key as part of the primary key, so it has to be a composite index. Either it has to be (transaction_id,transaction_timestamp) or ( transaction_timestamp, transaction_id). But which one should we go for, if both of the columns get used in all the queries? We will always be using transaction_timestamp as mostly a range predicate filter/join in the query and the transaction_id will be mostly used as a join condition/direct filter in the queries. So we were wondering, which column should we be using as a leading column in this index? There is a blog below (which is for oracle), showing how the index should be chosen and it states , "*Stick the columns you do range scans on last in the index, filters that get equality predicates should come first.* ", and in that case we should have the PK created as in the order (transaction_id,transaction_timestamp). It's because making the range predicate as a leading column won't help use that as an access predicate but as an filter predicate thus will read more blocks and thus more IO. Does this hold true in postgres too? https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexing/ Additionally there is another scenario in which we have the requirement to have another timestamp column (say create_timestamp) to be added as part of the primary key along with transaction_id and we are going to query this table frequently by the column create_timestamp as a range predicate. And ofcourse we will also have the range predicate filter on partition key "transaction_timestamp". But we may or may not have join/filter on column transaction_id, so in this scenario we should go for (create_timestamp,transaction_id,transaction_timestamp). because "transaction_timestamp" is set as partition key , so putting it last doesn't harm us. Will this be the correct order or any other index order is appropriate? Thanks and Regards Veem --0000000000008b83be061a60f550 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi ,=C2=A0
It's postgres version 15.4. A table is = daily range partitioned on a column transaction_timestamp. It has a unique = identifier which is the ideal for primary key (say transaction_id) , howeve= r as there is a limitation in which we have to include the partition key as= part of the primary key, so it has to be a composite index. Either it has = to be (transaction_id,transaction_timestamp) or ( transaction_timestamp, tr= ansaction_id). But which one should we go for, if both of the columns get u= sed in all the queries?

We will always be using transaction_timestam= p as mostly a range predicate filter/join in the query and the transaction_= id will be mostly used as a join condition/direct filter in the queries. So= we were wondering, which column should we be using =C2=A0as a leading colu= mn in this index?

There is a blog below (which is for oracle), showi= ng how the index should be chosen and it states , =C2=A0"Stick the = columns you do range scans on last in the index, filters that get equality = predicates should come first. ", and in that case we should have t= he PK created as in the order (transaction_id,transaction_timestamp). It= 9;s because making the range predicate as a leading column won't help u= se that as an access predicate but as an filter predicate thus will read mo= re blocks and thus more IO. Does this hold true in postgres too?

https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of= -indexing/

Additionally there is another scenario in which we ha= ve the requirement to have another timestamp column (say create_timestamp) = to be added as part of the primary key along with transaction_id and we are= going to query this table frequently by the column create_timestamp as a r= ange predicate. And ofcourse we will also have the range predicate filter o= n partition key "transaction_timestamp". But we may or may not ha= ve join/filter on column transaction_id, so in this scenario we should go f= or =C2=A0(create_timestamp,transaction_id,transaction_timestamp). because &= quot;transaction_timestamp" is set as partition key , so putting it la= st doesn't harm us. Will this be the correct order or any other index o= rder is appropriate?

Thanks and Regards
<= div>Veem
--0000000000008b83be061a60f550--