public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Dealing with SeqScans when Time-based Partitions Cut Over
3+ messages / 3 participants
[nested] [flat]

* Re: Dealing with SeqScans when Time-based Partitions Cut Over
@ 2025-12-18 19:32 Ron Johnson <[email protected]>
  2025-12-18 19:55 ` Re: Dealing with SeqScans when Time-based Partitions Cut Over Matthew Planchard <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Ron Johnson @ 2025-12-18 19:32 UTC (permalink / raw)
  To: pgsql-general

On Thu, Dec 18, 2025 at 1:48 PM Matthew Planchard <[email protected]>
wrote:

>
> In a table with high insert frequency (~1.5k rows/s) and high query
> frequency (~1k queries/s), partitioned by record creation time, we have
> observed the following behavior:
>
> * When the current time crosses a partition boundary, all new records
>   are written to the new partition, which was previously empty, as
>   expected
>
> * Because the planner's latest knowledge of the partition was based on
>   its state prior to the cutover, it assumes the partition is empty and
>   creates plans that use sequential scans
>
> * The table accumulates tens to hundreds of thousands of rows, and the
>   sequentail scans start to use nearly 100% of available database CPU
>
> * Eventually the planner updates thee stats and all is well, but the
>   cycle repeats the next time the partitions cut over.
>
> We have tried setting up a cron job that runs ANALYZE on the most recent
> partition of the table every 15 seconds at the start of the hour, and
> while this does help in reducing the magnitude and duration of the
> problem, it is insufficient to fully resolve it (our engineers are still
> getting daily pages for high DB CPU utilization).
>

What's autovacuum_analyze_scale_factor set to?   The default 20% is pretty
high.
autovacuum_naptime might need to be dropped, too.

And maybe have the shell script that the cron job runs sleep only 5 seconds
in the ANALY loop.


> We have considered maintaining a separate connection pool with
> connections that have `enable_seqscan` set to `off`, and updating the
> application to use that pool for these queries, but I was hoping the
> community might have some better suggestions.
>

How about just force seqscan off when the table is created?
ALTER TABLE <table_partition> SET (enable_seqscan  = off);

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Dealing with SeqScans when Time-based Partitions Cut Over
  2025-12-18 19:32 Re: Dealing with SeqScans when Time-based Partitions Cut Over Ron Johnson <[email protected]>
@ 2025-12-18 19:55 ` Matthew Planchard <[email protected]>
  2025-12-18 20:21   ` Re: Dealing with SeqScans when Time-based Partitions Cut Over David Rowley <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Matthew Planchard @ 2025-12-18 19:55 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

Thanks very much for the reply and the suggestions, a lot of good
stuff to look at.

> What's autovacuum_analyze_scale_factor set to?   The default 20% is pretty high.
> autovacuum_naptime might need to be dropped, too.

This is just whatever the default is, so I will look into updating
these settings.

> And maybe have the shell script that the cron job runs sleep only 5 seconds in the ANALY loop.

Good suggestion. Our job runner will not run duplicate jobs if one is
still running, so I may even just have this run every second.

> How about just force seqscan off when the table is created?
> ALTER TABLE <table_partition> SET (enable_seqscan  = off);

I didn't know this could be set on the table level! Our partitions are
created by partman, but we have a nightly job that does maintenance
stuff that could go through and update this setting on future partitions
before they come into use. I'm always hesitant to mess
too much with the planner, but this might be the ticket.

On Thu, Dec 18, 2025 at 2:32 PM Ron Johnson <[email protected]> wrote:
>
> On Thu, Dec 18, 2025 at 1:48 PM Matthew Planchard <[email protected]> wrote:
>>
>>
>> In a table with high insert frequency (~1.5k rows/s) and high query
>> frequency (~1k queries/s), partitioned by record creation time, we have
>> observed the following behavior:
>>
>> * When the current time crosses a partition boundary, all new records
>>   are written to the new partition, which was previously empty, as
>>   expected
>>
>> * Because the planner's latest knowledge of the partition was based on
>>   its state prior to the cutover, it assumes the partition is empty and
>>   creates plans that use sequential scans
>>
>> * The table accumulates tens to hundreds of thousands of rows, and the
>>   sequentail scans start to use nearly 100% of available database CPU
>>
>> * Eventually the planner updates thee stats and all is well, but the
>>   cycle repeats the next time the partitions cut over.
>>
>> We have tried setting up a cron job that runs ANALYZE on the most recent
>> partition of the table every 15 seconds at the start of the hour, and
>> while this does help in reducing the magnitude and duration of the
>> problem, it is insufficient to fully resolve it (our engineers are still
>> getting daily pages for high DB CPU utilization).
>
>
> What's autovacuum_analyze_scale_factor set to?   The default 20% is pretty high.
> autovacuum_naptime might need to be dropped, too.
>
> And maybe have the shell script that the cron job runs sleep only 5 seconds in the ANALY loop.
>
>>
>> We have considered maintaining a separate connection pool with
>> connections that have `enable_seqscan` set to `off`, and updating the
>> application to use that pool for these queries, but I was hoping the
>> community might have some better suggestions.
>
>
> How about just force seqscan off when the table is created?
> ALTER TABLE <table_partition> SET (enable_seqscan  = off);
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Dealing with SeqScans when Time-based Partitions Cut Over
  2025-12-18 19:32 Re: Dealing with SeqScans when Time-based Partitions Cut Over Ron Johnson <[email protected]>
  2025-12-18 19:55 ` Re: Dealing with SeqScans when Time-based Partitions Cut Over Matthew Planchard <[email protected]>
@ 2025-12-18 20:21   ` David Rowley <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: David Rowley @ 2025-12-18 20:21 UTC (permalink / raw)
  To: Matthew Planchard <[email protected]>; +Cc: Ron Johnson <[email protected]>; pgsql-general

On Fri, 19 Dec 2025 at 08:55, Matthew Planchard <[email protected]> wrote:
> > How about just force seqscan off when the table is created?
> > ALTER TABLE <table_partition> SET (enable_seqscan  = off);
>
> I didn't know this could be set on the table level! Our partitions are
> created by partman, but we have a nightly job that does maintenance
> stuff that could go through and update this setting on future partitions
> before they come into use. I'm always hesitant to mess
> too much with the planner, but this might be the ticket.

You thought correctly. There's no such reloption.

David






^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2025-12-18 20:21 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-12-18 19:32 Re: Dealing with SeqScans when Time-based Partitions Cut Over Ron Johnson <[email protected]>
2025-12-18 19:55 ` Matthew Planchard <[email protected]>
2025-12-18 20:21   ` David Rowley <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox