public inbox for [email protected]  
help / color / mirror / Atom feed
From: Gaurav Singh <[email protected]>
To: [email protected]
Subject: Memory leak in pg_stat_statements when query text file contains invalid encoding
Date: Fri, 27 Mar 2026 13:33:41 +0530
Message-ID: <CAEcQ1bYfDQNeQgk8hL_rnyc7iMXOtErLoqmVZ_fa_y5MX-mKfQ@mail.gmail.com> (raw)

I have found a memory leak in *contrib**/pg_stat_statements* that occurs
when the query text file (*pg**ss_query_texts.stat*) contains an invalid
byte sequence. Each call to *pg**_stat_statements* leaks the entire malloc'd
file buffer and fails to release the held LWLock.*PostgreSQL version**:*
 Discovered against PostgreSQL 15.12, verified also present in
PG18(installed via homebrew). The affected code path in
*pg_stat_statements_internal**(**)* is unchanged between these versions.
*Platform**:* macOS 15.7.3 (aarch64).*Steps to reproduce:*

   1. Enable pg_stat_statements and populate it with a large number of
   structurally unique queries (I used 2000 unique CTE-based queries with
    identifiers padded to 63 characters each). This creates a query text
   file of approximately 600 KB.
   2. Corrupt the query text file by injecting a null byte at an arbitrary
    offset (I used byte offset 500). This can be done with: printf '\x00' |
   dd of=<data_directory>/pg_stat_tmp/pgss_query_texts.stat bs=1 seek=500
    count=1 conv=notrunc
   3. Verify that querying pg_stat_statements now returns: *ERROR**:
   invalid byte sequence for encoding "UTF8": 0x00*
   4. In a single psql session, repeatedly query pg_stat_statements (I ran
    SELECT count() FROM pg_stat_statements 2000 times) while monitoring the
    backend process RSS using *ps** -o rss= -p <backend_pid>*.

*Output I got:*The backend's RSS grows linearly with each failing query.
With a 600 KB query text file and 2000 iterations, the backend's RSS
grew by approximately
1.2 GB. The per-error leak is approximately equal to the query text file
size (600 KB), confirming the file buffer is leaked on every call. Sample
 RSS measurements over time:

   - 0 seconds: 67 MB
   - 8 seconds: 153 MB
   - 20 seconds: 370 MB
   - 38 seconds: 739 MB
   - 50 seconds: 1028 MB
   - 54 seconds: 1251 MB

*Output I expected:*RSS should remain approximately constant across the
 failing queries. Each call should either succeed or fail cleanly without
leaking memory. The LWLock should always be released regardless of whether the
function succeeds or errors out.*Root cause:*In
*pg_stat_statements_internal(**)* in pg_stat_statements.c, the
function acquires
*pg**ss->lock* via *LWLockAcquire**()* and may allocate a file buffer via
*q**text_load_file(**)* (which uses malloc). Inside the hash table iteration
loop, *pg**_any_to_server**()* is called to convert each stored query text to
the server encoding. If the query text file contains an invalid encoding
 (such as a null byte), *pg_any_to**_server()* calls ereport(ERROR), which
performs a longjmp out of the function. The cleanup code at the bottom of
the function that calls *L**WLockRelease()* and free(qbuffer) is never
reached. On every subsequent call, the entire file buffer is leaked again,
and the LWLock release is skipped.*Proposed fix:*Wrap the hash table iteration
loop in *P**G_TRY/PG**_FINALLY* so that *L**WLockRelease**(pgss->lock)* and
*free**(qbuffer)* execute even when *pg**_any_to_server()* throws an
encoding error. This is a minimal change: no new allocations, no behavioral
change on the success path. It only adds cleanup protection on the error
 path.Gaurav Singh


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Memory leak in pg_stat_statements when query text file contains invalid encoding
  In-Reply-To: <CAEcQ1bYfDQNeQgk8hL_rnyc7iMXOtErLoqmVZ_fa_y5MX-mKfQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox