MIME-Version: 1.0
References: <19400-c889fc7fffc7c658@postgresql.org> <aYuLlzKTVLY9k1zB@alap3.anarazel.de>
In-Reply-To: <aYuLlzKTVLY9k1zB@alap3.anarazel.de>
From: =?UTF-8?Q?Rapha=C3=ABl_Perissat?= <raphael@atmotrack.fr>
Date: Wed, 11 Feb 2026 10:56:53 +0100
Message-ID: <CAOLPA2ds60bcR7ZnL1uwdottW+2DS+Hx=3Gy28ex2E9-+UXQAg@mail.gmail.com>
Subject: Re: BUG #19400: Memory leak in checkpointer and startup processes on
 PostgreSQL 18
To: Andres Freund <andres@anarazel.de>
Cc: pgsql-bugs@lists.postgresql.org
Content-Type: multipart/alternative; boundary="000000000000682abf064a896634"
Archived-At: <https://www.postgresql.org/message-id/CAOLPA2ds60bcR7ZnL1uwdottW%2B2DS%2BHx%3D3Gy28ex2E9-%2BUXQAg%40mail.gmail.com>
Precedence: bulk

--000000000000682abf064a896634
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi and thanks for the explanation.

Indeed the pmap seems to confirm that the private memory usage is correct
# pmap -d -p $(pgrep -f "postgres.*checkpointer") | tail -n 1
mapped: 8647256K    writeable/private: 2632K    shared: 8574584K

I used pg_log_backend_memory_contexts() with the pid of the checkpointer
process and I can't see much on this side as well :
2026-02-11 08:58:13.794 UTC [1988622] LOG:  logging memory contexts of PID
1988622
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 1; TopMemoryContext:
61568 total in 3 blocks; 2768 free (0 chunks); 58800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; smgr relation table:
32768 total in 3 blocks; 16904 free (9 chunks); 15864 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Checkpointer: 24576
total in 2 blocks; 24296 free (13 chunks); 280 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; LOCALLOCK hash: 8192
total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; WAL record
construction: 50200 total in 2 blocks; 6400 free (0 chunks); 43800 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; PrivateRefCount: 8192
total in 1 blocks; 2672 free (0 chunks); 5520 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; MdSmgr: 8192 total in
1 blocks; 7952 free (62 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Pending ops context:
8192 total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 3; Pending Ops Table:
16384 total in 2 blocks; 6712 free (3 chunks); 9672 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; Rendezvous variable
hash: 8192 total in 1 blocks; 616 free (0 chunks); 7576 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 2; GUCMemoryContext:
32768 total in 3 blocks; 3264 free (19 chunks); 29504 used
2026-02-11 08:58:13.794 UTC [1988622] LOG:  level: 3; GUC hash table: 32768
total in 3 blocks; 10664 free (6 chunks); 22104 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  level: 2; Timezones: 104112
total in 2 blocks; 2672 free (0 chunks); 101440 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  level: 2; ErrorContext: 8192
total in 1 blocks; 7952 free (5 chunks); 240 used
2026-02-11 08:58:13.795 UTC [1988622] LOG:  Grand total: 404296 bytes in 26
blocks; 101440 free (122 chunks); 302856 used

I monitor my servers metric on ELK (grafana-like) and I can clearly see the
memory usage growing with approx 100MB/hour until it reaches 8GB for the
postgres process, causing patroni to crashout on the primary.
The fact that this memory "leak" is appearing on both the primary and the 2
replicas make me think that this is not caused by some ingest delay / index
creation but maybe I'm wrong.

I restarted the 3 nodes yesterday and here is the output of the watch
command on the primary :
total        used        free      shared  buff/cache   available
Mem:           31988        5861         670        4571       30530
26126
Swap:           4095          55        4040

free is only showing 670 already, and I can see it going down in real-time
like a countdown.

Based on this output on the replicas :
:~# ps aux --sort=3D-%mem | head -20
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres   62338  0.1 10.6 8652016 3492940 ?     Ss   Feb10   1:04
postgres: atmo_data: checkpointer
postgres   62340  0.2  9.8 8655736 3227356 ?     Ss   Feb10   2:36
postgres: atmo_data: startup recovering 0000001E00000246000000C9

Couldn't the startup recovery process be causing this ? I do this command
on the same replica time to time and I can clearly see that the %MEM used
by those 2 processes are growing over time.

Thanks for your help.


Le mar. 10 f=C3=A9vr. 2026 =C3=A0 20:58, Andres Freund <andres@anarazel.de>=
 a =C3=A9crit :

> Hi,
>
> On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:
> > I recently migrated my cluster with 3 dedicated servers to a new
> cluster. I
> > was running on PG12 and I am now on PG18.1.
> > I noticed an increasing memory usage on all of my 3 node, until at some
> > point there is no memory left and patroni crashes on the leader, leavin=
g
> the
> > cluster with no available primary.
> > The cluster is a Data Warehouse type using TimescaleDB, ingesting
> approx. 1M
> > of time-serie a day.
> > It appears that the memory leak is affecting both the checkpointer and
> > startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.
> > I never had such issue on the old cluster with PG12 and the server's
> > configuration and cluster usage are the same (except the upgrade of PG)
> >
> > SYMPTOMS:
> > - Checkpointer process grows to 5.6GB RSS after 24 hours
> > - Startup process on replicas grows to 3.9GB RSS
> > - Memory growth rate: approximately 160-200MB per hour
> > - Eventually causes out-of-memory conditions
> >
> > CONFIGURATION:
> > - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue
> persists
> > - Platform: Debian 13
> > - TimescaleDB: 2.23.0
> > - Deployment: 3-node Patroni cluster with streaming replication
> > - WAL level: logical
> > - Hot standby enabled
> >
> > SYSTEM RESOURCES:
> > RAM: 32GB
> > Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz
> >
> > KEY SETTINGS:
> > - wal_level: logical
> > - hot_standby: on
> > - max_wal_senders: 20
> > - max_replication_slots: 20
> > - wal_keep_size: 1GB
> > - shared_buffer: 8GB
> >
> > WAL STATISTICS (over 7 days):
> > - Total WAL generated: 2.3TB (approximately 31GB/day)
> > - Replication lag: 0 bytes (replicas are caught up)
> > - No long-running transactions
> >
> > MEMORY STATE AFTER 24 HOURS:
> > On primary:
> >   postgres checkpointer: 3.9GB RSS
> >
> > On replicas:
> >   postgres checkpointer: 5.6GB RSS
> >   postgres startup recovering: 3.9GB RSS  <-- This is abnormal
>
> The RSS slowly increasing towards shared_buffers is normal if you're not
> using
> huge_pages. The OS only counts pages in shared memory as part of RSS once=
 a
> page has been used in the process. Over time the checkpointer process
> touches
> more and more of shared_buffers, thus increasing the RSS.
>
> You can use "pmap -d -p $pid_of_process" to see how much of the RSS is
> actually shared memory.
>
> To show this, here's a PS for a new backend:
>
> ps:
> USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAN=
D
> andres   2544694  0.0  0.0 8719956 25744 ?       Ss   14:55   0:00
> postgres: dev assert: andres postgres [local] idle
>
> and then after reading in a relation 1.3GB relation:
>
> andres   2544694  1.7  2.2 8720972 1403576 ?     Ss   14:55   0:00
> postgres: dev assert: andres postgres [local] idle
>
> So you can see that RSS increased proportionally with the amount of touch=
ed
> data.
>
> Whereas with pmap:
>
> pmap -d -p 2544694|tail -n 1
> mapped: 8721924K    writeable/private: 5196K    shared: 8646284K
>
>
> I think you would need to monitor the real memory usage of various
> processes
> to know why you're OOMing.
>
> You can use pg_log_backend_memory_contexts() to get the memory usage
> information of backend processes.
>
> Greetings,
>
> Andres Freund
>

--000000000000682abf064a896634
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi and thanks for the explanation.</div><div><br></di=
v><div>Indeed the pmap seems to confirm that the private memory usage is co=
rrect=C2=A0</div><div># pmap -d -p $(pgrep -f &quot;postgres.*checkpointer&=
quot;) | tail -n 1<br>mapped: 8647256K =C2=A0 =C2=A0writeable/private: 2632=
K =C2=A0 =C2=A0shared: 8574584K</div><div><br></div><div>I used=C2=A0pg_log=
_backend_memory_contexts() with the pid of the checkpointer process and I c=
an&#39;t see much on this side as well :</div><div>2026-02-11 08:58:13.794 =
UTC [1988622] LOG: =C2=A0logging memory contexts of PID 1988622<br>2026-02-=
11 08:58:13.794 UTC [1988622] LOG: =C2=A0level: 1; TopMemoryContext: 61568 =
total in 3 blocks; 2768 free (0 chunks); 58800 used<br>2026-02-11 08:58:13.=
794 UTC [1988622] LOG: =C2=A0level: 2; smgr relation table: 32768 total in =
3 blocks; 16904 free (9 chunks); 15864 used<br>2026-02-11 08:58:13.794 UTC =
[1988622] LOG: =C2=A0level: 2; Checkpointer: 24576 total in 2 blocks; 24296=
 free (13 chunks); 280 used<br>2026-02-11 08:58:13.794 UTC [1988622] LOG: =
=C2=A0level: 2; LOCALLOCK hash: 8192 total in 1 blocks; 616 free (0 chunks)=
; 7576 used<br>2026-02-11 08:58:13.794 UTC [1988622] LOG: =C2=A0level: 2; W=
AL record construction: 50200 total in 2 blocks; 6400 free (0 chunks); 4380=
0 used<br>2026-02-11 08:58:13.794 UTC [1988622] LOG: =C2=A0level: 2; Privat=
eRefCount: 8192 total in 1 blocks; 2672 free (0 chunks); 5520 used<br>2026-=
02-11 08:58:13.794 UTC [1988622] LOG: =C2=A0level: 2; MdSmgr: 8192 total in=
 1 blocks; 7952 free (62 chunks); 240 used<br>2026-02-11 08:58:13.794 UTC [=
1988622] LOG: =C2=A0level: 2; Pending ops context: 8192 total in 1 blocks; =
7952 free (5 chunks); 240 used<br>2026-02-11 08:58:13.794 UTC [1988622] LOG=
: =C2=A0level: 3; Pending Ops Table: 16384 total in 2 blocks; 6712 free (3 =
chunks); 9672 used<br>2026-02-11 08:58:13.794 UTC [1988622] LOG: =C2=A0leve=
l: 2; Rendezvous variable hash: 8192 total in 1 blocks; 616 free (0 chunks)=
; 7576 used<br>2026-02-11 08:58:13.794 UTC [1988622] LOG: =C2=A0level: 2; G=
UCMemoryContext: 32768 total in 3 blocks; 3264 free (19 chunks); 29504 used=
<br>2026-02-11 08:58:13.794 UTC [1988622] LOG: =C2=A0level: 3; GUC hash tab=
le: 32768 total in 3 blocks; 10664 free (6 chunks); 22104 used<br>2026-02-1=
1 08:58:13.795 UTC [1988622] LOG: =C2=A0level: 2; Timezones: 104112 total i=
n 2 blocks; 2672 free (0 chunks); 101440 used<br>2026-02-11 08:58:13.795 UT=
C [1988622] LOG: =C2=A0level: 2; ErrorContext: 8192 total in 1 blocks; 7952=
 free (5 chunks); 240 used<br>2026-02-11 08:58:13.795 UTC [1988622] LOG: =
=C2=A0Grand total: 404296 bytes in 26 blocks; 101440 free (122 chunks); 302=
856 used<br></div><div><br></div><div>I monitor my servers metric on ELK (g=
rafana-like) and I can clearly see the memory usage growing with approx 100=
MB/hour until it reaches 8GB for the postgres process, causing patroni to c=
rashout on the primary.=C2=A0</div><div>The fact that this memory &quot;lea=
k&quot; is appearing on both the primary and the 2 replicas make me think t=
hat this is not caused by some ingest delay / index creation but maybe I=
9;m wrong.</div><div><br></div><div>I restarted the 3 nodes yesterday=C2=A0=
and here is the output of the watch command on the primary :</div><div>tota=
l =C2=A0 =C2=A0 =C2=A0 =C2=A0used =C2=A0 =C2=A0 =C2=A0 =C2=A0free =C2=A0 =
=C2=A0 =C2=A0shared =C2=A0buff/cache =C2=A0 available<br>Mem: =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 31988 =C2=A0 =C2=A0 =C2=A0 =C2=A05861 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 670 =C2=A0 =C2=A0 =C2=A0 =C2=A04571 =C2=A0 =C2=A0 =C2=A0 3053=
0 =C2=A0 =C2=A0 =C2=A0 26126<br>Swap: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 40=
95 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A055 =C2=A0 =C2=A0 =C2=A0 =C2=A04040</di=
v><div><br></div><div>free is only showing 670 already, and I can see it go=
ing down in real-time like a countdown.</div><div><br></div><div>Based on t=
his output on the replicas :</div><div>:~# ps aux --sort=3D-%mem | head -20=
<br>USER =C2=A0 =C2=A0 =C2=A0 =C2=A0 PID %CPU %MEM =C2=A0 =C2=A0VSZ =C2=A0 =
RSS TTY =C2=A0 =C2=A0 =C2=A0STAT START =C2=A0 TIME COMMAND<br>postgres =C2=
=A0 62338 =C2=A00.1 10.6 8652016 3492940 ? =C2=A0 =C2=A0 Ss =C2=A0 Feb10 =
=C2=A0 1:04 postgres: atmo_data: checkpointer<br>postgres =C2=A0 62340 =C2=
=A00.2 =C2=A09.8 8655736 3227356 ? =C2=A0 =C2=A0 Ss =C2=A0 Feb10 =C2=A0 2:3=
6 postgres: atmo_data: startup recovering 0000001E00000246000000C9</div><di=
v><br></div><div>Couldn&#39;t the startup recovery process be causing this =
? I do this command on the same replica time to time and I can clearly see =
that the %MEM used by those 2 processes are growing over time.</div><div><b=
r></div><div>Thanks for your help.</div><div><div dir=3D"ltr" class=3D"gmai=
l_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div dir=
=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr=
"><div dir=3D"ltr"><div dir=3D"ltr"><div style=3D"color:rgb(51,51,51);font-=
family:Muli,sans-serif;font-size:13px;min-width:500px;width:920px"><div sty=
le=3D"min-width:500px;width:920px"><br></div></div></div></div></div></div>=
</div></div></div></div></div></div></div><br><div class=3D"gmail_quote gma=
il_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">Le=C2=A0mar. 10 f=
=C3=A9vr. 2026 =C3=A0=C2=A020:58, Andres Freund &lt;<a href=3D"mailto:andre=
s@anarazel.de">andres@anarazel.de</a>&gt; a =C3=A9crit=C2=A0:<br></div><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left=
:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
On 2026-02-10 15:28:38 +0000, PG Bug reporting form wrote:<br>
&gt; I recently migrated my cluster with 3 dedicated servers to a new clust=
er. I<br>
&gt; was running on PG12 and I am now on PG18.1.<br>
&gt; I noticed an increasing memory usage on all of my 3 node, until at som=
e<br>
&gt; point there is no memory left and patroni crashes on the leader, leavi=
ng the<br>
&gt; cluster with no available primary.<br>
&gt; The cluster is a Data Warehouse type using TimescaleDB, ingesting appr=
ox. 1M<br>
&gt; of time-serie a day.<br>
&gt; It appears that the memory leak is affecting both the checkpointer and=
<br>
&gt; startup (WAL replay) processes in PostgreSQL 18.0 and 18.1.<br>
&gt; I never had such issue on the old cluster with PG12 and the server&#39=
;s<br>
&gt; configuration and cluster usage are the same (except the upgrade of PG=
)<br>
&gt; <br>
&gt; SYMPTOMS:<br>
&gt; - Checkpointer process grows to 5.6GB RSS after 24 hours<br>
&gt; - Startup process on replicas grows to 3.9GB RSS<br>
&gt; - Memory growth rate: approximately 160-200MB per hour<br>
&gt; - Eventually causes out-of-memory conditions<br>
&gt; <br>
&gt; CONFIGURATION:<br>
&gt; - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue pe=
rsists<br>
&gt; - Platform: Debian 13<br>
&gt; - TimescaleDB: 2.23.0<br>
&gt; - Deployment: 3-node Patroni cluster with streaming replication<br>
&gt; - WAL level: logical<br>
&gt; - Hot standby enabled<br>
&gt; <br>
&gt; SYSTEM RESOURCES:<br>
&gt; RAM: 32GB<br>
&gt; Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz<br>
&gt; <br>
&gt; KEY SETTINGS:<br>
&gt; - wal_level: logical<br>
&gt; - hot_standby: on<br>
&gt; - max_wal_senders: 20<br>
&gt; - max_replication_slots: 20<br>
&gt; - wal_keep_size: 1GB<br>
&gt; - shared_buffer: 8GB<br>
&gt; <br>
&gt; WAL STATISTICS (over 7 days):<br>
&gt; - Total WAL generated: 2.3TB (approximately 31GB/day)<br>
&gt; - Replication lag: 0 bytes (replicas are caught up)<br>
&gt; - No long-running transactions<br>
&gt; <br>
&gt; MEMORY STATE AFTER 24 HOURS:<br>
&gt; On primary:<br>
&gt;=C2=A0 =C2=A0postgres checkpointer: 3.9GB RSS<br>
&gt; <br>
&gt; On replicas:<br>
&gt;=C2=A0 =C2=A0postgres checkpointer: 5.6GB RSS<br>
&gt;=C2=A0 =C2=A0postgres startup recovering: 3.9GB RSS=C2=A0 &lt;-- This i=
s abnormal<br>
<br>
The RSS slowly increasing towards shared_buffers is normal if you&#39;re no=
t using<br>
huge_pages. The OS only counts pages in shared memory as part of RSS once a=
<br>
page has been used in the process. Over time the checkpointer process touch=
es<br>
more and more of shared_buffers, thus increasing the RSS.<br>
<br>
You can use &quot;pmap -d -p $pid_of_process&quot; to see how much of the R=
SS is<br>
actually shared memory.<br>
<br>
To show this, here&#39;s a PS for a new backend:<br>
<br>
ps:<br>
USER=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PID %CPU %MEM=C2=A0 =C2=A0 VSZ=C2=A0 =
=C2=A0RSS TTY=C2=A0 =C2=A0 =C2=A0 STAT START=C2=A0 =C2=A0TIME COMMAND<br>
andres=C2=A0 =C2=A02544694=C2=A0 0.0=C2=A0 0.0 8719956 25744 ?=C2=A0 =C2=A0=
 =C2=A0 =C2=A0Ss=C2=A0 =C2=A014:55=C2=A0 =C2=A00:00 postgres: dev assert: a=
ndres postgres [local] idle<br>
<br>
and then after reading in a relation 1.3GB relation:<br>
<br>
andres=C2=A0 =C2=A02544694=C2=A0 1.7=C2=A0 2.2 8720972 1403576 ?=C2=A0 =C2=
=A0 =C2=A0Ss=C2=A0 =C2=A014:55=C2=A0 =C2=A00:00 postgres: dev assert: andre=
s postgres [local] idle<br>
<br>
So you can see that RSS increased proportionally with the amount of touched=
<br>
data.<br>
<br>
Whereas with pmap:<br>
<br>
pmap -d -p 2544694|tail -n 1<br>
mapped: 8721924K=C2=A0 =C2=A0 writeable/private: 5196K=C2=A0 =C2=A0 shared:=
 8646284K<br>
<br>
<br>
I think you would need to monitor the real memory usage of various processe=
s<br>
to know why you&#39;re OOMing.<br>
<br>
You can use pg_log_backend_memory_contexts() to get the memory usage<br>
information of backend processes.<br>
<br>
Greetings,<br>
<br>
Andres Freund<br>
</blockquote></div>

--000000000000682abf064a896634--