public inbox for [email protected]
help / color / mirror / Atom feedFrom: Andres Freund <[email protected]>
To: [email protected]
To: [email protected]
Subject: Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18
Date: Tue, 10 Feb 2026 14:58:33 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
Hi,
On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:
> I recently migrated my cluster with 3 dedicated servers to a new cluster. I
> was running on PG12 and I am now on PG18.1.
> I noticed an increasing memory usage on all of my 3 node, until at some
> point there is no memory left and patroni crashes on the leader, leaving the
> cluster with no available primary.
> The cluster is a Data Warehouse type using TimescaleDB, ingesting approx. 1M
> of time-serie a day.
> It appears that the memory leak is affecting both the checkpointer and
> startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
> I never had such issue on the old cluster with PG12 and the server's
> configuration and cluster usage are the same (except the upgrade of PG)
>
> SYMPTOMS:
> - Checkpointer process grows to 5.6GB RSS after 24 hours
> - Startup process on replicas grows to 3.9GB RSS
> - Memory growth rate: approximately 160-200MB per hour
> - Eventually causes out-of-memory conditions
>
> CONFIGURATION:
> - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue persists
> - Platform: Debian 13
> - TimescaleDB: 2.23.0
> - Deployment: 3-node Patroni cluster with streaming replication
> - WAL level: logical
> - Hot standby enabled
>
> SYSTEM RESOURCES:
> RAM: 32GB
> Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz
>
> KEY SETTINGS:
> - wal_level: logical
> - hot_standby: on
> - max_wal_senders: 20
> - max_replication_slots: 20
> - wal_keep_size: 1GB
> - shared_buffer: 8GB
>
> WAL STATISTICS (over 7 days):
> - Total WAL generated: 2.3TB (approximately 31GB/day)
> - Replication lag: 0 bytes (replicas are caught up)
> - No long-running transactions
>
> MEMORY STATE AFTER 24 HOURS:
> On primary:
> postgres checkpointer: 3.9GB RSS
>
> On replicas:
> postgres checkpointer: 5.6GB RSS
> postgres startup recovering: 3.9GB RSS <-- This is abnormal
The RSS slowly increasing towards shared_buffers is normal if you're not using
huge_pages. The OS only counts pages in shared memory as part of RSS once a
page has been used in the process. Over time the checkpointer process touches
more and more of shared_buffers, thus increasing the RSS.
You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
actually shared memory.
To show this, here's a PS for a new backend:
ps:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
andres 2544694 0.0 0.0 8719956 25744 ? Ss 14:55 0:00 postgres: dev assert: andres postgres [local] idle
and then after reading in a relation 1.3GB relation:
andres 2544694 1.7 2.2 8720972 1403576 ? Ss 14:55 0:00 postgres: dev assert: andres postgres [local] idle
So you can see that RSS increased proportionally with the amount of touched
data.
Whereas with pmap:
pmap -d -p 2544694|tail -n 1
mapped: 8721924K writeable/private: 5196K shared: 8646284K
I think you would need to monitor the real memory usage of various processes
to know why you're OOMing.
You can use pg_log_backend_memory_contexts() to get the memory usage
information of backend processes.
Greetings,
Andres Freund
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox