Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1smrlT-008h8n-O5 for pgsql-general@arkaria.postgresql.org; Sat, 07 Sep 2024 09:29:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1smrlR-00DiIe-RR for pgsql-general@arkaria.postgresql.org; Sat, 07 Sep 2024 09:29:54 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1smrlR-00DiIV-CC for pgsql-general@lists.postgresql.org; Sat, 07 Sep 2024 09:29:53 +0000 Received: from mail-ua1-x929.google.com ([2607:f8b0:4864:20::929]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1smrlO-000Zds-9g for pgsql-general@lists.postgresql.org; Sat, 07 Sep 2024 09:29:52 +0000 Received: by mail-ua1-x929.google.com with SMTP id a1e0cc1a2514c-846d536254fso851893241.1 for ; Sat, 07 Sep 2024 02:29:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725701388; x=1726306188; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=cqAOOttLLSr93QTKC+nqXb2T6pXy0zaB7VZ17fZkKRI=; b=i/hlQZZey3w88V2Of925GJzgV85vOGTuDiaN4YZPsDlXZ4oPgGJOcRrqYdSPPSqpN+ r6L5dNG7a3McZwMw8EM2ki+V54LO3WkeyfclO2qmrbD4VekeJpymlSs+gMiuS8d+rGVJ VAPNA1Gxr1FSmGDxudE76oUqT8/4UU197n3YBjIUNYoMHNRHOi8xG7nxbQFrbHKpYwIb ow0Ky1aVnQMskaVuyV0MA3pzFDXYoLzc+gx5vtDnZjtx7qA6KiiX5uWau261flaXavV8 LPINUd/VKaEctbO8kBqGbxI7pZFQ96JT27QiCqNh5zmt+9cHCvB+0hFSXPCZCWy/7cEY a8fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725701388; x=1726306188; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=cqAOOttLLSr93QTKC+nqXb2T6pXy0zaB7VZ17fZkKRI=; b=e7mZNNLM2THVWsym7LszzydOd/XvY0jMnRvszQnQQdxCeNHwHXxXFtZu4JTXpapZCm RLpwqQh4P5oTqRr1+jPBnkVQyorXPO+wQO0BGyN8nVc8pBEvmrllicFvQ2HxB2BJ0cb9 mLQ0JTfxNkpQrlHvk4SJWOPgVfxabvse2pRhVtLYw/Bwg3A27Sk6zFX6AsA70A0zqVnl Ou5Yo0CPBOZLUVYWRVkK57+upMmLc9dQ+IIht40YUcNVnT3D2OBSUOhC5q0RPHMySP/e JUfTzwxqr/qdwDFB8CtVPp46Xlu5EO+/I5zp2EGkeFvu2w7MR046A67xscRhIf7pDkPS 9PQg== X-Gm-Message-State: AOJu0Yw9p9eLDue5urE94gLQD93scSibyyHh8Kv2vu3Z6qtkk1C4FBws vYst3m0V70BgYAtrYkiS3djcsh7KHGeYgmOp+M2dbN20g2HCl2sdXc0ireX+khC3mM7uJ4pz2Wl DRysFd869LfOuM68+h4BT+vK4MT5iwH1X X-Google-Smtp-Source: AGHT+IExty3Zsl9AE0Ym55lx8fYyGJrRqt53vZLeDququAGqq53cCRMB2LJRIXC4qJXuTOEElJIYpk09QkSWY59ERlw= X-Received: by 2002:a05:6102:32cd:b0:493:f097:e5b7 with SMTP id ada2fe7eead31-49bde130e3dmr6002086137.1.1725701388605; Sat, 07 Sep 2024 02:29:48 -0700 (PDT) MIME-Version: 1.0 From: sud Date: Sat, 7 Sep 2024 14:59:37 +0530 Message-ID: Subject: How effectively do the indexing in postgres in such cases To: pgsql-general Content-Type: multipart/alternative; boundary="00000000000041a2240621842a92" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000041a2240621842a92 Content-Type: text/plain; charset="UTF-8" Hello Experts, We have a requirement in which the query will be formed like below. We will have two partitioned tables joined and there may be filters used on both of the tables columns or it may be one of those. These types of queries are very frequently used queries and critical to customers as these are part of search screens , so we want to have the indexing happen effectively to satisfy these types of queries to return rows in not more than ~1 seconds. The both tables are daily range partitions on column "part_date" and the volume of data per day/partitions will be ~700mllion in both the tables. The customer can go searching for a duration starting from one days till max ~1 month of data i.e. part_date spanning for ~1 month duration. And the search should provide the latest transaction on the screen which is why "order by ..limit clause is used". "Offset" is used there because the customer can scroll through the next page on the UI where he has the capability to see the next 100 rows and so on. In the first screen it is also expected to see the count of the results , so that the customer can get an immediate idea about the total count of transactions he has matching his search criteria. So , 1)In the query below , if the optimizer chooses tab1 as the driving table, the index on just col1 should be enough or it should be (col1, tab1_id)? Similarly if it chooses the tab2 be the driving table then , index on (col2,tab2_id). Or just indexing the filtered column should be enough like individual indexes on COL1 and COL2 of table tab1 and tab2 respectively? 2)In scenarios where the customer has a lot of matching transactions (say in millions) post all the filters applied , and as the customer has to just see the latest 100 rows transaction data, the sorting will be a bottleneck. So what can be done to make such types of queries to return the latest transactions in quick time on the search screen? 3)As here also the count has to happen in the first step to make the customer know the total number of rows(which may be in millions), so what additional index will support this requirement? Or if any other optimization strategy we can follow for catering to such queries? select * from tab1, tab2 where tab1.part_date between '1-jan-2024' and '31-jan-2024' and tab1.part_date=tab2.part_date and tab1.tab1_id=tab2.tab2_id and tab1.col1=<:input_col1> and tab2.col2=<:input_col2> order by tab1.create_timestamp desc limit 100 offset 100; Regards Sud --00000000000041a2240621842a92 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello Experts,
We have a requirement in which the quer= y will be formed like below. We will have two partitioned tables joined and= there may be filters used on both of the tables columns or it may be one o= f those.

These types of queries are very frequently used queries an= d critical to customers as these are part of search screens , so we want to= have the indexing happen effectively to satisfy these types of queries to = return rows in not more than ~1 seconds. The both tables are daily range pa= rtitions on column "part_date" and the volume of data per day/par= titions will be ~700mllion in both the tables.

The customer can go s= earching for a duration starting from one days till max ~1 month of data i.= e. part_date spanning for ~1 month duration. And the search should provide = the latest transaction on the screen which is why "order by ..limit cl= ause is used". "Offset" is used there because the customer c= an scroll through the next page on the UI where he has the capability to se= e the next 100 rows and so on. In the first screen it is also expected to s= ee the count of the results , so that the customer can get an immediate ide= a about the total count of transactions he has matching his search criteria= .

So ,
1)In the query below , if the optimizer chooses tab1 as th= e driving table, the index on just col1 should be enough or it should be (c= ol1, tab1_id)? Similarly if it chooses the tab2 be the driving table then ,= index on (col2,tab2_id). Or just indexing the filtered column should be en= ough like individual indexes on COL1 and COL2 of table tab1 and tab2 respec= tively?

2)In scenarios where the customer has a lot of matching tran= sactions (say in millions) post all the filters applied , and as the custom= er has to just see the latest 100 rows transaction data, the sorting will b= e a bottleneck. So what can be done to make such types of queries to return= the latest transactions in quick time on the search screen?

3)As he= re also the count has to happen in the first step to make the customer know= the total number of rows(which may be in millions), so what additional ind= ex will support this requirement?

Or if any other optimization strat= egy we can follow for catering to such queries?

select * from tab1, = tab2
where tab1.part_date between '1-jan-2024' and '31-jan-2= 024'
and tab1.part_date=3Dtab2.part_date
and tab1.tab1_id=3Dtab2= .tab2_id
and tab1.col1=3D<:input_col1>
and tab2.col2=3D<:inp= ut_col2>
order by tab1.create_timestamp desc
limit 100 offset 100= ;

Regards
Sud

--00000000000041a2240621842a92--