Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vptIA-000kLM-2g for pgsql-bugs@arkaria.postgresql.org; Tue, 10 Feb 2026 19:21:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vptIA-000krx-0T for pgsql-bugs@arkaria.postgresql.org; Tue, 10 Feb 2026 19:20:58 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vppfp-00FvPj-1q for pgsql-bugs@lists.postgresql.org; Tue, 10 Feb 2026 15:29:09 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vppfm-000000021e5-13PK for pgsql-bugs@lists.postgresql.org; Tue, 10 Feb 2026 15:29:08 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:Cc:From:To:Subject: Content-Transfer-Encoding:MIME-Version:Content-Type:Sender:Content-ID: Content-Description:In-Reply-To:References; bh=QebztBY4h7o/phOJoScTTj+9ncG24zUnu4LLmmpD1uY=; b=vk/PX7/n6cSFa2l1FRuzmAKtw7 EPB25vIF75UjtP5emmiL8lOS+1w43W5FztMRZ8wT+5zmFqr1JdZiSwb/0QjGoaMp0VEmkzLTQODcN 0sTT/sjeF9HC9CaihPgZ6VHSs63Hbv1ggU92ZVK31xnVxWkAT7dx0CVSceGsmiEpDKhy50CB0Jj4N ch33/DiWkd+C5JlDL/dSWNj5dOzTyaQhhfgN4ai8d2kqiOcBEyHmL5JAqnh+JlnmBcUXVZl/eYxOh phL/7ZmvaK3LhsPg53tMm+HDNSqpGw/at/+cMu2zH3rt3RiTEJS3LpnzoVH+u7+93an/Li7mkdJ+r OTOufh/A==; Received: from wrigleys.postgresql.org ([2a02:16a8:dc51::60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vppfk-005vMN-0X for pgsql-bugs@lists.postgresql.org; Tue, 10 Feb 2026 15:29:05 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vppfi-007czk-2m for pgsql-bugs@lists.postgresql.org; Tue, 10 Feb 2026 15:29:03 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: BUG #19400: Memory leak in checkpointer and startup processes on PostgreSQL 18 To: pgsql-bugs@lists.postgresql.org From: PG Bug reporting form Cc: raphael@atmotrack.fr Reply-To: raphael@atmotrack.fr, pgsql-bugs@lists.postgresql.org Date: Tue, 10 Feb 2026 15:28:38 +0000 Message-ID: <19400-c889fc7fffc7c658@postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk The following bug has been logged on the website: Bug reference: 19400 Logged by: Raphael Email address: raphael@atmotrack.fr PostgreSQL version: 18.1 Operating system: Debian 13 Description: =20 Hello, I recently migrated my cluster with 3 dedicated servers to a new cluster. I was running on PG12 and I am now on PG18.1. I noticed an increasing memory usage on all of my 3 node, until at some point there is no memory left and patroni crashes on the leader, leaving the cluster with no available primary. The cluster is a Data Warehouse type using TimescaleDB, ingesting approx. 1M of time-serie a day. It appears that the memory leak is affecting both the checkpointer and startup (WAL replay) processes in PostgreSQL 18.0 and 18.1. I never had such issue on the old cluster with PG12 and the server's configuration and cluster usage are the same (except the upgrade of PG) SYMPTOMS: - Checkpointer process grows to 5.6GB RSS after 24 hours - Startup process on replicas grows to 3.9GB RSS - Memory growth rate: approximately 160-200MB per hour - Eventually causes out-of-memory conditions CONFIGURATION: - PostgreSQL version: Initially 18.0, upgraded to 18.1 - same issue persists - Platform: Debian 13 - TimescaleDB: 2.23.0 - Deployment: 3-node Patroni cluster with streaming replication - WAL level: logical - Hot standby enabled SYSTEM RESOURCES: RAM: 32GB Proc: 12 core of Intel(R) Xeon(R) E-2386G 3.50GHz KEY SETTINGS: - wal_level: logical - hot_standby: on - max_wal_senders: 20 - max_replication_slots: 20 - wal_keep_size: 1GB - shared_buffer: 8GB WAL STATISTICS (over 7 days): - Total WAL generated: 2.3TB (approximately 31GB/day) - Replication lag: 0 bytes (replicas are caught up) - No long-running transactions MEMORY STATE AFTER 24 HOURS: On primary: postgres checkpointer: 3.9GB RSS On replicas: postgres checkpointer: 5.6GB RSS postgres startup recovering: 3.9GB RSS <-- This is abnormal TESTING PERFORMED: - Disabled wal_log_hints: No effect - Reduced timescaledb.max_background_workers from 16 to 4: No effect - Increased checkpoint_timeout to 30min: No effect - Increased max_wal_size to 4GB: No effect - Checkpoint frequency is normal (5.99% requested vs timed) - Upgraded from PostgreSQL 18.0 to 18.1: Same issue persists The startup process consuming 3.9GB on replicas is particularly concerning, as this process should not accumulate memory during normal WAL replay. Here some output 4 days after starting Postgres, on the primary : # ps aux --sort=3D-%mem | head -20 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 38762 0.0 24.9 8648868 8182260 ? Ss Feb06 2:42 postgres: atmo_data: checkpointer postgres 38763 0.0 8.3 8646992 2746188 ? Ss Feb06 1:08 postgres: atmo_data: background writer postgres 38759 0.0 4.5 8646728 1501344 ? Ss Feb06 0:24 postgres: atmo_data: io worker 0 postgres 38760 0.0 2.9 8646728 976132 ? Ss Feb06 0:13 postgres: atmo_data: io worker 1 postgres 38761 0.0 2.3 8646728 755948 ? Ss Feb06 0:07 postgres: atmo_data: io worker 2 The same output on a replica : # ps aux --sort=3D-%mem | head -20 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 2706473 0.0 25.2 8653612 8291200 ? Ss Feb06 5:09 postgres: atmo_data: checkpointer postgres 2706475 0.2 23.8 8656016 7802324 ? Ss Feb06 14:41 postgres: atmo_data: startup recovering 0000001E0000024500000075 Replica : =3D# SELECT name, pg_size_pretty(total_bytes) as total, pg_size_pretty(used_bytes) as used, pg_size_pretty(free_bytes) as free FROM pg_backend_memory_contexts ORDER BY total_bytes DESC LIMIT 30; name | total | used | free ----------------------------------+------------+------------+------------ CacheMemoryContext | 512 kB | 483 kB | 29 kB TopMemoryContext | 214 kB | 192 kB | 23 kB MessageContext | 128 kB | 39 kB | 89 kB Timezones | 102 kB | 99 kB | 2672 bytes GUCMemoryContext | 64 kB | 55 kB | 9248 bytes WAL record construction | 49 kB | 43 kB | 6400 bytes ExecutorState | 48 kB | 39 kB | 9104 bytes TupleSort main | 32 kB | 25 kB | 6848 bytes ExprContext | 32 kB | 26 kB | 5832 bytes TransactionAbortContext | 32 kB | 240 bytes | 32 kB tuplestore tuples | 32 kB | 19 kB | 13 kB GUC hash table | 32 kB | 24 kB | 8600 bytes smgr relation table | 32 kB | 15 kB | 17 kB Type information cache | 24 kB | 21 kB | 2672 bytes Custom Scan Methods | 16 kB | 9664 bytes | 6720 bytes PgStat Pending | 16 kB | 9112 bytes | 7272 bytes pg_get_backend_memory_contexts | 16 kB | 10 kB | 5664 bytes Relcache by OID | 16 kB | 12 kB | 3608 bytes PgStat Shared Ref Hash | 9264 bytes | 8552 bytes | 712 bytes Record information cache | 8192 bytes | 6552 bytes | 1640 bytes RowDescriptionContext | 8192 bytes | 1272 bytes | 6920 bytes Hypertable cache | 8192 bytes | 504 bytes | 7688 bytes ErrorContext | 8192 bytes | 240 bytes | 7952 bytes Rendezvous variable hash | 8192 bytes | 7576 bytes | 616 bytes search_path processing cache | 8192 bytes | 2552 bytes | 5640 bytes Cache pins | 8192 bytes | 240 bytes | 7952 bytes fn telemetry local function hash | 8192 bytes | 6168 bytes | 2024 bytes TopTransactionContext | 8192 bytes | 416 bytes | 7776 bytes MdSmgr | 8192 bytes | 352 bytes | 7840 bytes PrivateRefCount | 8192 bytes | 5520 bytes | 2672 bytes My not-so-good workaround is to perform a daily restart of the cluster members to avoid breaking patroni on the primary. Please let me know if you need any additional information to investigate this issue