Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sngOy-005Lhr-Jf for pgsql-general@arkaria.postgresql.org; Mon, 09 Sep 2024 15:34:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sngOy-00E9Zz-8Z for pgsql-general@arkaria.postgresql.org; Mon, 09 Sep 2024 15:34:04 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sngOx-00E9X7-SE for pgsql-general@lists.postgresql.org; Mon, 09 Sep 2024 15:34:03 +0000 Received: from mail-lj1-x233.google.com ([2a00:1450:4864:20::233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sngOu-000K6V-8v for pgsql-general@lists.postgresql.org; Mon, 09 Sep 2024 15:34:03 +0000 Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2f759b87f83so24027261fa.2 for ; Mon, 09 Sep 2024 08:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725896041; x=1726500841; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jhBBxq0LYxu2f2E88vDbG/0354TYLkV7ef3c4SMfIMs=; b=RcWdlWrtH6dkY7P2G8vfohliX1mWG25pfEIP629yC/6XmjNr+C22LtiVL+AB0Exoyj Fzoxp7LEOBEIsLjJoj8J7BeOPkr5lHRzFQnoG67aKUk+zvR+F+wvYkOiboosmbxOzy/d VYIY2K8RnDcLsm3lChzcUlB2csaR6qa12/Nl0UUoh8WRJQpNPxG0/bNYL8XS1ZxdZjDy WcVNJrdy9vP0fH6Zr6hgQfdo5roBsJnQiipWD8CX3PA1BxlIupYqEOzuX+XeKRqMkbcr iX12VA+YUiRDKCaWK0aP2KPLB6Xl1Ko7fFTZWxGbXIYQd92WKbW4b+mOMbnHP9pspt5k xoFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725896041; x=1726500841; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jhBBxq0LYxu2f2E88vDbG/0354TYLkV7ef3c4SMfIMs=; b=s4Uc5kEmQRFnaek6hsgkyGPfuXHm5dVDI/6H2e0EDeXpExE2VKbj/c+bq53FNGQhfP SrumQJMN1p2aaSPer6wdAVigi0XhFVglJnBCchdRVvyM1+hcX6g3J1kjIVw6OgYah3kO inkDR0n3/geO6/FLyG2lmo0S8gQ1f7gTZZdB91anVtSPj24Dm0wlZz/TvTqqSyw6w8M/ OISoYEsdWAxdzTWxCgmIchGewQ/qi9qGZFCLYF+CdjUwqxgWE0Tx+IKBzST7DjTcnHGq Tz/wzy/3tqKYfm6LdVtjaeg8LEdWm7ofYEUZmhKERwvo4kyGahzDypv2AFJAL4V7JTDM TxpA== X-Gm-Message-State: AOJu0Yx+eaeMLQXWZJpWLjui8aQBngZi6RFzYfyxqCBCviyaOyLsHzGO FKFMyUnWTunTTlbyxVmhSYgOoG0cq8zMkxeOFJtd228+93UQIHx1NXhmt4sJcYepB5DSDXCTZ1G Q/wRT83xprfDRC6FdH6kfdgOYrdfraQ== X-Google-Smtp-Source: AGHT+IHTOMWZ3loagaoPAh1Ho7RXfaZh15wbib2mdjZEYoee47KYEnO5LCtPoAzdZSxh3f/9QahP6+tFy4sBQGr0Pe8= X-Received: by 2002:a05:6512:4028:b0:52c:d905:9645 with SMTP id 2adb3069b0e04-536587aaf0cmr6458164e87.13.1725896039623; Mon, 09 Sep 2024 08:33:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Greg Sabino Mullane Date: Mon, 9 Sep 2024 11:33:21 -0400 Message-ID: Subject: Re: How effectively do the indexing in postgres in such cases To: sud Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000005c72950621b17c75" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005c72950621b17c75 Content-Type: text/plain; charset="UTF-8" Your questions are a little too vague to answer well, but let me try a bit. 1)In the query below , if the optimizer chooses tab1 as the driving table, > the index on just col1 should be enough or it should be (col1, tab1_id)? > No way to tell without trying it yourself. We need information on how the tables are joined, the cardinality, general distribution, etc. But as a rough general rule, yes, indexes on the column of interest should be able to handle the job well by themselves. 2)In scenarios where the customer has a lot of matching transactions (say > in millions) post all the filters applied , and as the customer has to just > see the latest 100 rows transaction data, the sorting will be a bottleneck. > So what can be done to make such types of queries to return the latest > transactions in quick time on the search screen? > Sorting can be quick, if you hit an index (b-trees are already sorted) Postgres can look at only the first X rows returned and does not need to read the whole thing. So a well-designed index is the key here. > 3)As here also the count has to happen in the first step to make the > customer know the total number of rows(which may be in millions), so what > additional index will support this requirement? > Again, a very vague question, but for things that are in millions, an estimate is usually sufficient, so you might be able to do something like SELECT count(*) FROM mytab WHERE mydate BETWEEN x AND y; and use that as your answer. Compare to the full query to see how close it is. You might even have cutoffs, where if the results of that first one is < 10,000, switch to a more accurate version which has more filtering (i.e. the joins and more where conditions). > Or if any other optimization strategy we can follow for catering to such > queries? > > select * from tab1, tab2 > where tab1.part_date between '1-jan-2024' and '31-jan-2024' > and tab1.part_date=tab2.part_date > and tab1.tab1_id=tab2.tab2_id > and tab1.col1=<:input_col1> > and tab2.col2=<:input_col2> > order by tab1.create_timestamp desc > limit 100 offset 100; > It probably would help to see exact tables and queries. Why are you joining on part_date? Is tab_id unique to either table? Running EXPLAIN on these while you try out indexes and change the joins, etc. is a great exercise to help you learn how Postgres works. As far as asking on lists for help with specific queries, there is a range between totally abstract and generic queries that nobody can help you with, and large, complex specific queries that nobody wants to unravel and help you with. You are definitely on the former side: try to create some actually runable sample queries that are small, self-contained, and generate the problem you are trying to solve. Cheers, Greg --0000000000005c72950621b17c75 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Your questions are a little too vague to = answer well, but let me try a bit.

1)In the query below , if the optimizer chooses tab1 as the driv= ing table, the index on just col1 should be enough or it should be (col1, t= ab1_id)?

No way to tell without tryin= g it yourself. We need information on how the tables are joined, the cardin= ality, general distribution, etc. But as a rough general rule, yes, indexes= on the column of interest should be able to handle the job well by themsel= ves.

=
2)In scenarios where the customer has a lot of matching tr= ansactions (say in millions) post all the filters applied , and as the cust= omer has to just see the latest 100 rows transaction data, the sorting will= be a bottleneck. So what can be done to make such types of queries to retu= rn the latest transactions in quick time on the search screen?

Sorting can be quick, if you hit an index (b-= trees are already sorted) Postgres can look at only the first X rows return= ed and does not need to read the whole thing. So a well-designed index is t= he key here.
=C2=A0
3)As here also the count has to happen in th= e first step to make the customer know the total number of rows(which may b= e in millions), so what additional index will support this requirement?
=

Again, a very vague question, but fo= r things that are in millions, an estimate is usually sufficient, so you mi= ght be able to do something like SELECT count(*) FROM mytab WHERE mydate BE= TWEEN x AND y; and use that as your answer. Compare to the full query to se= e how close it is. You might even have cutoffs, where if the results of tha= t first one is < 10,000, switch to a more accurate version which has mor= e filtering (i.e. the joins and more where conditions).
=C2=A0
O= r if any other optimization strategy we can follow for catering to such que= ries?

select * from tab1, tab2
where tab1.part_date between '= 1-jan-2024' and '31-jan-2024'
and tab1.part_date=3Dtab2.part= _date
and tab1.tab1_id=3Dtab2.tab2_id
and tab1.col1=3D<:input_col= 1>
and tab2.col2=3D<:input_col2>
order by tab1.create_timest= amp desc
limit 100 offset 100;

It= probably would help to see exact tables and queries. Why are you joining o= n part_date? Is tab_id unique to either table? Running EXPLAIN on these whi= le you try out indexes and change the joins, etc. is a great exercise to he= lp you learn how Postgres works. As far as asking on lists for help=C2=A0wi= th specific queries, there is a range between totally abstract and generic = queries that nobody can help you with, and large, complex specific queries = that nobody wants to unravel and help you with. You are definitely on the f= ormer side: try to create some actually runable=C2=A0sample queries that ar= e small, self-contained, and generate the problem you are trying to solve.<= /div>

Cheers,
Greg

= =C2=A0
--0000000000005c72950621b17c75--