Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Raphaël Perissat <[email protected]>
To: Andres Freund <[email protected]>
Cc: [email protected]
Subject: Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18
Date: Wed, 11 Feb 2026 10:56:53 +0100
Message-ID: <CAOLPA2ds60bcR7ZnL1uwdottW+2DS+Hx=3Gy28ex2E9-+UXQAg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

Hi and thanks for the explanation.

Indeed the pmap seems to confirm that the private memory usage is correct
# pmap -d -p $(pgrep -f "postgres.*checkpointer") | tail -n 1
mapped: 8647256K    writeable/private: 2632K    shared: 8574584K

I used pg_log_backend_memory_contexts() with the pid of the checkpointer
process and I can't see much on this side as well :
2026-02-11 08:58:13.794 UTC [1988622] LOG:  logging memory contexts of PID
1988622
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 1; TopMemoryContext:
61568 total in 3 blocks; 2768 free (0 chunks); 58800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; smgr relation table:
32768 total in 3 blocks; 16904 free (9 chunks); 15864 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Checkpointer: 24576
total in 2 blocks; 24296 free (13 chunks); 280 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; LOCALLOCK hash: 8192
total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; WAL record
construction: 50200 total in 2 blocks; 6400 free (0 chunks); 43800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; PrivateRefCount: 8192
total in 1 blocks; 2672 free (0 chunks); 5520 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; MdSmgr: 8192 total in
1 blocks; 7952 free (62 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Pending ops context:
8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 3; Pending Ops Table:
16384 total in 2 blocks; 6712 free (3 chunks); 9672 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Rendezvous variable
hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; GUCMemoryContext:
32768 total in 3 blocks; 3264 free (19 chunks); 29504 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 3; GUC hash table: 32768
total in 3 blocks; 10664 free (6 chunks); 22104 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  level: 2; Timezones: 104112
total in 2 blocks; 2672 free (0 chunks); 101440 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  level: 2; ErrorContext: 8192
total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  Grand total: 404296 bytes in 26
blocks; 101440 free (122 chunks); 302856 used

I monitor my servers metric on ELK (grafana-like) and I can clearly see the
memory usage growing with approx 100MB/hour until it reaches 8GB for the
postgres process, causing patroni to crashout on the primary.
The fact that this memory "leak" is appearing on both the primary and the 2
replicas make me think that this is not caused by some ingest delay / index
creation but maybe I'm wrong.

I restarted the 3 nodes yesterday and here is the output of the watch
command on the primary :
total        used        free      shared  buff/cache   available
Mem:           31988        5861         670        4571       30530
26126
Swap:           4095          55        4040

free is only showing 670 already, and I can see it going down in real-time
like a countdown.

Based on this output on the replicas :
:~# ps aux --sort=-%mem | head -20
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres   62338  0.1 10.6 8652016 3492940 ?     Ss   Feb10   1:04
postgres: atmo_data: checkpointer
postgres   62340  0.2  9.8 8655736 3227356 ?     Ss   Feb10   2:36
postgres: atmo_data: startup recovering 0000001E00000246000000C9

Couldn't the startup recovery process be causing this ? I do this command
on the same replica time to time and I can clearly see that the %MEM used
by those 2 processes are growing over time.

Thanks for your help.


Le mar. 10 févr. 2026 à 20:58, Andres Freund <[email protected]> a écrit :

> Hi,
>
> On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:
> > I recently migrated my cluster with 3 dedicated servers to a new
> cluster. I
> > was running on PG12 and I am now on PG18.1.
> > I noticed an increasing memory usage on all of my 3 node, until at some
> > point there is no memory left and patroni crashes on the leader, leaving
> the
> > cluster with no available primary.
> > The cluster is a Data Warehouse type using TimescaleDB, ingesting
> approx. 1M
> > of time-serie a day.
> > It appears that the memory leak is affecting both the checkpointer and
> > startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
> > I never had such issue on the old cluster with PG12 and the server's
> > configuration and cluster usage are the same (except the upgrade of PG)
> >
> > SYMPTOMS:
> > - Checkpointer process grows to 5.6GB RSS after 24 hours
> > - Startup process on replicas grows to 3.9GB RSS
> > - Memory growth rate: approximately 160-200MB per hour
> > - Eventually causes out-of-memory conditions
> >
> > CONFIGURATION:
> > - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue
> persists
> > - Platform: Debian 13
> > - TimescaleDB: 2.23.0
> > - Deployment: 3-node Patroni cluster with streaming replication
> > - WAL level: logical
> > - Hot standby enabled
> >
> > SYSTEM RESOURCES:
> > RAM: 32GB
> > Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz
> >
> > KEY SETTINGS:
> > - wal_level: logical
> > - hot_standby: on
> > - max_wal_senders: 20
> > - max_replication_slots: 20
> > - wal_keep_size: 1GB
> > - shared_buffer: 8GB
> >
> > WAL STATISTICS (over 7 days):
> > - Total WAL generated: 2.3TB (approximately 31GB/day)
> > - Replication lag: 0 bytes (replicas are caught up)
> > - No long-running transactions
> >
> > MEMORY STATE AFTER 24 HOURS:
> > On primary:
> >   postgres checkpointer: 3.9GB RSS
> >
> > On replicas:
> >   postgres checkpointer: 5.6GB RSS
> >   postgres startup recovering: 3.9GB RSS  <-- This is abnormal
>
> The RSS slowly increasing towards shared_buffers is normal if you're not
> using
> huge_pages. The OS only counts pages in shared memory as part of RSS once a
> page has been used in the process. Over time the checkpointer process
> touches
> more and more of shared_buffers, thus increasing the RSS.
>
> You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
> actually shared memory.
>
> To show this, here's a PS for a new backend:
>
> ps:
> USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> andres   2544694  0.0  0.0 8719956 25744 ?       Ss   14:55   0:00
> postgres: dev assert: andres postgres [local] idle
>
> and then after reading in a relation 1.3GB relation:
>
> andres   2544694  1.7  2.2 8720972 1403576 ?     Ss   14:55   0:00
> postgres: dev assert: andres postgres [local] idle
>
> So you can see that RSS increased proportionally with the amount of touched
> data.
>
> Whereas with pmap:
>
> pmap -d -p 2544694|tail -n 1
> mapped: 8721924K    writeable/private: 5196K    shared: 8646284K
>
>
> I think you would need to monitor the real memory usage of various
> processes
> to know why you're OOMing.
>
> You can use pg_log_backend_memory_contexts() to get the memory usage
> information of backend processes.
>
> Greetings,
>
> Andres Freund
>

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18
  In-Reply-To: <CAOLPA2ds60bcR7ZnL1uwdottW+2DS+Hx=3Gy28ex2E9-+UXQAg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox