public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ron Johnson <[email protected]>
To: Pgsql-admin <[email protected]>
Subject: Re: Performance issue - Seq Scan
Date: Mon, 20 Jan 2025 09:39:28 -0500
Message-ID: <CANzqJaBwhk4G1yR5ab1d=hM+nZEcariu-UQz=DMkbHeJkqjFjw@mail.gmail.com> (raw)
In-Reply-To: <CAC5iy60+GWydu97iBvaXtOcF_TsmDyQSPJs+vVpWC8S2HwzxFQ@mail.gmail.com>
References: <CAC5iy63JA0nMnJeCUTDBGNZkZomQ7hTp1eHCsWd3bJY2CGpAkQ@mail.gmail.com>
	<CAECtzeWhskBgkEZgnhiEWwGhcDD0bV1y3TH8zv8gbCY57GsAZA@mail.gmail.com>
	<CAC5iy60+GWydu97iBvaXtOcF_TsmDyQSPJs+vVpWC8S2HwzxFQ@mail.gmail.com>

500M rows doesn't have to be a lot of records.  Are the tuples large?  If I
were to partition those tables, I would partition them on an existing PK
field.

Until then, I would:

   - disable AUTOVACUUM on those tables immediately before the ETL job
   starts
   - run the ETL job
   - "manually" run VACUUM ANALYZE on those tables.
   - enable  AUTOVACUUM on those tables


On Mon, Jan 20, 2025 at 6:07 AM Siraj G <[email protected]> wrote:

> Hello Guillaume!
>
> As I highlighted the records count for these tables which are quite high,
> would it be a best practice if we change the vacuum and analyze scale
> factor at the table level?
> Also, I am trying to understand if partitioning is required for these
> tables, or at least for the one which has over 500million records?
>
> Regards
> Siraj
>
> On Mon, Jan 20, 2025 at 3:04 PM Guillaume Lelarge <[email protected]>
> wrote:
>
>> Hi,
>>
>> Le lun. 20 janv. 2025 à 09:42, Siraj G <[email protected]> a écrit :
>>
>>> Hello Experts!
>>>
>>> We had a performance issue with a SQL that used to complete in a few
>>> milliseconds, was taking over 14seconds. We had to run *analyze *on 3
>>> tables to get the idle performance back.
>>>
>>> When the performance was not optimal, we noticed sequential scans even
>>> with indexes created.
>>>
>>> The tables and their count:
>>> coverage_details = 529628595
>>> customer_details = 81721669
>>> policy_details = 116909729
>>>
>>> PgSQL version is:
>>> PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by Debian clang version
>>> 12.0.1, 64-bit
>>>
>>> One more information is that we noticed this started happening (in the
>>> destination) after an ETL job completed the load (regular load). *Just
>>> wanted to know if any follow up actions we should do after such data loads,
>>> eg., analyze or vacuum. *We do have autovacuum on, with default values.
>>>
>>>
>> Yes, you should run "VACUUM ANALYZE" after running a batch. autovacuum
>> could be not fast enough to do it itself before you start querying the new
>> data.
>>
>>
>> --
>> Guillaume.
>>
>

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Performance issue - Seq Scan
  In-Reply-To: <CANzqJaBwhk4G1yR5ab1d=hM+nZEcariu-UQz=DMkbHeJkqjFjw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox