public inbox for [email protected]  
help / color / mirror / Atom feed
Dealing with SeqScans when Time-based Partitions Cut Over
4+ messages / 4 participants
[nested] [flat]

* Dealing with SeqScans when Time-based Partitions Cut Over
@ 2025-12-18 18:48  Matthew Planchard <[email protected]>
  0 siblings, 3 replies; 4+ messages in thread

From: Matthew Planchard @ 2025-12-18 18:48 UTC (permalink / raw)
  To: pgsql-general


In a table with high insert frequency (~1.5k rows/s) and high query
frequency (~1k queries/s), partitioned by record creation time, we have
observed the following behavior:

* When the current time crosses a partition boundary, all new records
  are written to the new partition, which was previously empty, as
  expected

* Because the planner's latest knowledge of the partition was based on
  its state prior to the cutover, it assumes the partition is empty and
  creates plans that use sequential scans

* The table accumulates tens to hundreds of thousands of rows, and the
  sequentail scans start to use nearly 100% of available database CPU

* Eventually the planner updates thee stats and all is well, but the
  cycle repeats the next time the partitions cut over.

We have tried setting up a cron job that runs ANALYZE on the most recent
partition of the table every 15 seconds at the start of the hour, and
while this does help in reducing the magnitude and duration of the
problem, it is insufficient to fully resolve it (our engineers are still
getting daily pages for high DB CPU utilization).

We have considered maintaining a separate connection pool with
connections that have `enable_seqscan` set to `off`, and updating the
application to use that pool for these queries, but I was hoping the
community might have some better suggestions.

- Matthew Planchard






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Dealing with SeqScans when Time-based Partitions Cut Over
@ 2025-12-18 20:20  David Rowley <[email protected]>
  parent: Matthew Planchard <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: David Rowley @ 2025-12-18 20:20 UTC (permalink / raw)
  To: Matthew Planchard <[email protected]>; +Cc: pgsql-general

On Fri, 19 Dec 2025 at 07:48, Matthew Planchard <[email protected]> wrote:
> * Because the planner's latest knowledge of the partition was based on
>   its state prior to the cutover, it assumes the partition is empty and
>   creates plans that use sequential scans

If the table/partition was just created and not analyzed, the planner
shouldn't consider it empty. By default, an empty table is assumed to
have 10 pages. (Check what pg_class.relpages is set to. -1 means it's
not been analyzed, 0 would mean analyzed/vacuumed when empty)

create table a (a int primary key, b int not null);
create index on a(b);
explain select * from a where a = 1;

 Index Scan using a_pkey on a  (cost=0.15..8.17 rows=1 width=8)
   Index Cond: (a = 1)

explain select * from a where b = 1;

 Bitmap Heap Scan on a  (cost=4.24..14.91 rows=11 width=8)
   Recheck Cond: (b = 1)
   ->  Bitmap Index Scan on a_b_idx  (cost=0.00..4.24 rows=11 width=0)
         Index Cond: (b = 1)

Perhaps pg_partman thinks running an ANALYZE or VACUUM on the new
partitions is a good idea? Otherwise, if you're not doing this
yourself, I don't know why you're getting a Seq Scan. It might help if
you showed the problem query and schema.

David






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Dealing with SeqScans when Time-based Partitions Cut Over
@ 2025-12-19 07:47  Laurenz Albe <[email protected]>
  parent: Matthew Planchard <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Laurenz Albe @ 2025-12-19 07:47 UTC (permalink / raw)
  To: Matthew Planchard <[email protected]>; pgsql-general

On Thu, 2025-12-18 at 13:48 -0500, Matthew Planchard wrote:
> In a table with high insert frequency (~1.5k rows/s) and high query
> frequency (~1k queries/s), partitioned by record creation time, we have
> observed the following behavior:
> 
> * When the current time crosses a partition boundary, all new records
>   are written to the new partition, which was previously empty, as
>   expected
> 
> * Because the planner's latest knowledge of the partition was based on
>   its state prior to the cutover, it assumes the partition is empty and
>   creates plans that use sequential scans
> 
> * The table accumulates tens to hundreds of thousands of rows, and the
>   sequentail scans start to use nearly 100% of available database CPU
> 
> * Eventually the planner updates thee stats and all is well, but the
>   cycle repeats the next time the partitions cut over.
> 
> We have tried setting up a cron job that runs ANALYZE on the most recent
> partition of the table every 15 seconds at the start of the hour, and
> while this does help in reducing the magnitude and duration of the
> problem, it is insufficient to fully resolve it (our engineers are still
> getting daily pages for high DB CPU utilization).
> 
> We have considered maintaining a separate connection pool with
> connections that have `enable_seqscan` set to `off`, and updating the
> application to use that pool for these queries, but I was hoping the
> community might have some better suggestions.

I would try to tune autovacuum to check more often:

  autovacuum_naptime = 5s  # perhaps even less

Then hopefully the new partitions get analyzed early enough.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Dealing with SeqScans when Time-based Partitions Cut Over
@ 2025-12-19 14:49  Greg Sabino Mullane <[email protected]>
  parent: Matthew Planchard <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Greg Sabino Mullane @ 2025-12-19 14:49 UTC (permalink / raw)
  To: Matthew Planchard <[email protected]>; +Cc: pgsql-general

If this is still an issue after you check David's theory about premature
analyzing, another approach is to pre-populate and pre-analyze future
tables. Something like this:

* disable autovac on the future table
* detach the table from the main partition
* insert a few hundred thousand rows into it, then run analyze on it
* can pull rows from a current table, or just use random data on a key
column- whatever is enough to generate "good" stats
* delete the rows - the stats will remain
* reattach the table
* enable autovac if you like; I would not


Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2025-12-19 14:49 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-12-18 18:48 Dealing with SeqScans when Time-based Partitions Cut Over Matthew Planchard <[email protected]>
2025-12-18 20:20 ` David Rowley <[email protected]>
2025-12-19 07:47 ` Laurenz Albe <[email protected]>
2025-12-19 14:49 ` Greg Sabino Mullane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox