MIME-Version: 1.0
References: 
 <CAC5iy63JA0nMnJeCUTDBGNZkZomQ7hTp1eHCsWd3bJY2CGpAkQ@mail.gmail.com>
 <CAECtzeWhskBgkEZgnhiEWwGhcDD0bV1y3TH8zv8gbCY57GsAZA@mail.gmail.com>
 <CAC5iy60+GWydu97iBvaXtOcF_TsmDyQSPJs+vVpWC8S2HwzxFQ@mail.gmail.com>
In-Reply-To: 
 <CAC5iy60+GWydu97iBvaXtOcF_TsmDyQSPJs+vVpWC8S2HwzxFQ@mail.gmail.com>
From: Ron Johnson <ronljohnsonjr@gmail.com>
Date: Mon, 20 Jan 2025 09:39:28 -0500
Message-ID: 
 <CANzqJaBwhk4G1yR5ab1d=hM+nZEcariu-UQz=DMkbHeJkqjFjw@mail.gmail.com>
Subject: Re: Performance issue - Seq Scan
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000ec9a7f062c243a21"
Archived-At: 
 <https://www.postgresql.org/message-id/CANzqJaBwhk4G1yR5ab1d%3DhM%2BnZEcariu-UQz%3DDMkbHeJkqjFjw%40mail.gmail.com>
Precedence: bulk

--000000000000ec9a7f062c243a21
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

500M rows doesn't have to be a lot of records.  Are the tuples large?  If I
were to partition those tables, I would partition them on an existing PK
field.

Until then, I would:

   - disable AUTOVACUUM on those tables immediately before the ETL job
   starts
   - run the ETL job
   - "manually" run VACUUM ANALYZE on those tables.
   - enable  AUTOVACUUM on those tables


On Mon, Jan 20, 2025 at 6:07=E2=80=AFAM Siraj G <tosiraj.g@gmail.com> wrote=
:

> Hello Guillaume!
>
> As I highlighted the records count for these tables which are quite high,
> would it be a best practice if we change the vacuum and analyze scale
> factor at the table level?
> Also, I am trying to understand if partitioning is required for these
> tables, or at least for the one which has over 500million records?
>
> Regards
> Siraj
>
> On Mon, Jan 20, 2025 at 3:04=E2=80=AFPM Guillaume Lelarge <guillaume@lela=
rge.info>
> wrote:
>
>> Hi,
>>
>> Le lun. 20 janv. 2025 =C3=A0 09:42, Siraj G <tosiraj.g@gmail.com> a =C3=
=A9crit :
>>
>>> Hello Experts!
>>>
>>> We had a performance issue with a SQL that used to complete in a few
>>> milliseconds, was taking over 14seconds. We had to run *analyze *on 3
>>> tables to get the idle performance back.
>>>
>>> When the performance was not optimal, we noticed sequential scans even
>>> with indexes created.
>>>
>>> The tables and their count:
>>> coverage_details =3D 529628595
>>> customer_details =3D 81721669
>>> policy_details =3D 116909729
>>>
>>> PgSQL version is:
>>> PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by Debian clang versio=
n
>>> 12.0.1, 64-bit
>>>
>>> One more information is that we noticed this started happening (in the
>>> destination) after an ETL job completed the load (regular load). *Just
>>> wanted to know if any follow up actions we should do after such data lo=
ads,
>>> eg., analyze or vacuum. *We do have autovacuum on, with default values.
>>>
>>>
>> Yes, you should run "VACUUM ANALYZE" after running a batch. autovacuum
>> could be not fast enough to do it itself before you start querying the n=
ew
>> data.
>>
>>
>> --
>> Guillaume.
>>
>

--=20
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--000000000000ec9a7f062c243a21
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>500M rows doesn&#39;t have to be a lot of records.=C2=
=A0 Are the tuples large?=C2=A0 If I were to partition those tables, I woul=
d partition them on an existing PK field.</div><div><br></div><div>Until th=
en, I would:</div><div><ul><li>disable AUTOVACUUM on those=C2=A0tables imme=
diately before the ETL job starts</li><li>run the ETL job</li><li>&quot;man=
ually&quot; run VACUUM ANALYZE on those tables.</li><li>enable=C2=A0

AUTOVACUUM on those=C2=A0tables</li></ul></div><br><div class=3D"gmail_quot=
e gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jan =
20, 2025 at 6:07=E2=80=AFAM Siraj G &lt;<a href=3D"mailto:tosiraj.g@gmail.c=
om">tosiraj.g@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_=
quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,=
204);padding-left:1ex"><div dir=3D"ltr">Hello Guillaume!<div><br></div><div=
>As I highlighted the records count for these tables which are quite high, =
would it be a best practice if we change the vacuum and analyze scale facto=
r at the table level?</div><div>Also, I am trying to understand if partitio=
ning is required for these tables, or at least for the one which has over 5=
00million records?</div><div><br></div><div>Regards</div><div>Siraj</div></=
div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On=
 Mon, Jan 20, 2025 at 3:04=E2=80=AFPM Guillaume Lelarge &lt;<a href=3D"mail=
to:guillaume@lelarge.info" target=3D"_blank">guillaume@lelarge.info</a>&gt;=
 wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"ltr"><div dir=3D"ltr">Hi,<br></div><br><div class=3D"gmail_quote"><div =
dir=3D"ltr" class=3D"gmail_attr">Le=C2=A0lun. 20 janv. 2025 =C3=A0=C2=A009:=
42, Siraj G &lt;<a href=3D"mailto:tosiraj.g@gmail.com" target=3D"_blank">to=
siraj.g@gmail.com</a>&gt; a =C3=A9crit=C2=A0:<br></div><blockquote class=3D=
"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2=
04,204,204);padding-left:1ex"><div dir=3D"ltr">Hello Experts!<br><div><br><=
/div><div>We had a performance issue with a SQL that used to complete in a =
few milliseconds, was taking over 14seconds. We had to run <b>analyze </b>o=
n 3 tables to get the idle performance back.=C2=A0</div><div><br></div><div=
>When the performance was not optimal, we noticed sequential scans even wit=
h indexes created.</div><div><br></div><div>The tables and their count:</di=
v><div>coverage_details =3D 529628595<br>customer_details =3D 81721669<br>p=
olicy_details =3D 116909729</div><div><br></div><div>PgSQL version is:</div=
><div>PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by Debian clang vers=
ion 12.0.1, 64-bit</div><div><br></div><div>One more information is that we=
 noticed this started happening (in the destination) after an ETL job compl=
eted the load (regular load). <b>Just wanted to know if any follow up actio=
ns we should do after such data loads, eg., analyze or vacuum. </b>We do ha=
ve autovacuum on, with default values.</div><div><br></div></div></blockquo=
te><div><br></div><div>Yes, you should run &quot;VACUUM ANALYZE&quot; after=
 running a batch. autovacuum could be not fast enough to do it itself befor=
e you start querying the new data.<br></div></div><div><br clear=3D"all"></=
div><br><span class=3D"gmail_signature_prefix">-- </span><br><div dir=3D"lt=
r" class=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><div>Gu=
illaume.<br></div></div></div></div></div></div>
</blockquote></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr">Death to &lt;Redacted&gt;, and butter sauce.<div=
>Don&#39;t boil me, I&#39;m still alive.<br><div><div>&lt;Redacted&gt; lobs=
ter!</div></div></div></div></div></div>

--000000000000ec9a7f062c243a21--