MIME-Version: 1.0
References: <87bjjv1v96.fsf@gmail.com>
In-Reply-To: <87bjjv1v96.fsf@gmail.com>
From: Ron Johnson <ronljohnsonjr@gmail.com>
Date: Thu, 18 Dec 2025 14:32:07 -0500
Message-ID: <CANzqJaAXhYwR6vHzm_DqnxmWk8QCYxEmhs_JeZ+w3F+rx09xSw@mail.gmail.com>
Subject: Re: Dealing with SeqScans when Time-based Partitions Cut Over
To: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000dacbc606463f0438"
Archived-At: <https://www.postgresql.org/message-id/CANzqJaAXhYwR6vHzm_DqnxmWk8QCYxEmhs_JeZ%2Bw3F%2Brx09xSw%40mail.gmail.com>
Precedence: bulk

--000000000000dacbc606463f0438
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 18, 2025 at 1:48=E2=80=AFPM Matthew Planchard <msplanchard@gmai=
l.com>
wrote:

>
> In a table with high insert frequency (~1.5k rows/s) and high query
> frequency (~1k queries/s), partitioned by record creation time, we have
> observed the following behavior:
>
> * When the current time crosses a partition boundary, all new records
>   are written to the new partition, which was previously empty, as
>   expected
>
> * Because the planner's latest knowledge of the partition was based on
>   its state prior to the cutover, it assumes the partition is empty and
>   creates plans that use sequential scans
>
> * The table accumulates tens to hundreds of thousands of rows, and the
>   sequentail scans start to use nearly 100% of available database CPU
>
> * Eventually the planner updates thee stats and all is well, but the
>   cycle repeats the next time the partitions cut over.
>
> We have tried setting up a cron job that runs ANALYZE on the most recent
> partition of the table every 15 seconds at the start of the hour, and
> while this does help in reducing the magnitude and duration of the
> problem, it is insufficient to fully resolve it (our engineers are still
> getting daily pages for high DB CPU utilization).
>

What's autovacuum_analyze_scale_factor set to?   The default 20% is pretty
high.
autovacuum_naptime might need to be dropped, too.

And maybe have the shell script that the cron job runs sleep only 5 seconds
in the ANALY loop.


> We have considered maintaining a separate connection pool with
> connections that have `enable_seqscan` set to `off`, and updating the
> application to use that pool for these queries, but I was hoping the
> community might have some better suggestions.
>

How about just force seqscan off when the table is created?
ALTER TABLE <table_partition> SET (enable_seqscan  =3D off);

--=20
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--000000000000dacbc606463f0438
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Thu, Dec 18, 2025 at 1:48=E2=80=AFPM M=
atthew Planchard &lt;<a href=3D"mailto:msplanchard@gmail.com">msplanchard@g=
mail.com</a>&gt; wrote:</div><div class=3D"gmail_quote gmail_quote_containe=
r"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
In a table with high insert frequency (~1.5k rows/s) and high query<br>
frequency (~1k queries/s), partitioned by record creation time, we have<br>
observed the following behavior:<br>
<br>
* When the current time crosses a partition boundary, all new records<br>
=C2=A0 are written to the new partition, which was previously empty, as<br>
=C2=A0 expected<br>
<br>
* Because the planner&#39;s latest knowledge of the partition was based on<=
br>
=C2=A0 its state prior to the cutover, it assumes the partition is empty an=
d<br>
=C2=A0 creates plans that use sequential scans<br>
<br>
* The table accumulates tens to hundreds of thousands of rows, and the<br>
=C2=A0 sequentail scans start to use nearly 100% of available database CPU<=
br>
<br>
* Eventually the planner updates thee stats and all is well, but the<br>
=C2=A0 cycle repeats the next time the partitions cut over.<br>
<br>
We have tried setting up a cron job that runs ANALYZE on the most recent<br=
>
partition of the table every 15 seconds at the start of the hour, and<br>
while this does help in reducing the magnitude and duration of the<br>
problem, it is insufficient to fully resolve it (our engineers are still<br=
>
getting daily pages for high DB CPU utilization).<br></blockquote><div><br =
class=3D"gmail-Apple-interchange-newline">What&#39;s=C2=A0autovacuum_analyz=
e_scale_factor set to?=C2=A0 =C2=A0The default 20% is pretty high.</div><di=
v>autovacuum_naptime might need to be dropped, too.</div><div><br></div><di=
v>And maybe have the shell script that the cron job runs sleep only 5 secon=
ds in the ANALY loop.</div><div>=C2=A0</div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204)=
;padding-left:1ex">We have considered maintaining a separate connection poo=
l with<br>
connections that have `enable_seqscan` set to `off`, and updating the<br>
application to use that pool for these queries, but I was hoping the<br>
community might have some better suggestions.<br></blockquote><div><br></di=
v><div>How about just force seqscan=C2=A0off when the table is created?</di=
v><div>ALTER TABLE &lt;table_partition&gt; SET (enable_seqscan=C2=A0 =3D of=
f);</div><div><br></div></div><span class=3D"gmail_signature_prefix">-- </s=
pan><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"ltr">Death t=
o &lt;Redacted&gt;, and butter sauce.<div>Don&#39;t boil me, I&#39;m still =
alive.<br><div><div>&lt;Redacted&gt; lobster!</div></div></div></div></div>=
</div>

--000000000000dacbc606463f0438--