pgsql: Fix buffer overflows in pg_trgm due to lower-casing

public inbox for [email protected]  
help / color / mirror / Atom feed

pgsql: Fix buffer overflows in pg_trgm due to lower-casing
2+ messages / 1 participants
[nested] [flat]

* pgsql: Fix buffer overflows in pg_trgm due to lower-casing
@ 2026-02-09 00:06  Thomas Munro <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Thomas Munro @ 2026-02-09 00:06 UTC (permalink / raw)
  To: [email protected]

Fix buffer overflows in pg_trgm due to lower-casing

The code made a subtle assumption that the lower-cased version of a
string never has more characters than the original. That is not always
true. For example, in a database with the latin9 encoding:

    latin9db=# select lower(U&'\00CC' COLLATE "lt-x-icu");
       lower
    -----------
     i\x1A\x1A
    (1 row)

In this example, lower-casing expands the single input character into
three characters.

The generate_trgm_only() function relied on that assumption in two
ways:

- It used "slen * pg_database_encoding_max_length() + 4" to allocate
  the buffer to hold the lowercased and blank-padded string. That
  formula accounts for expansion if the lower-case characters are
  longer (in bytes) than the originals, but it's still not enough if
  the lower-cased string contains more *characters* than the original.

- Its callers sized the output array to hold the trigrams extracted
  from the input string with the formula "(slen / 2 + 1) * 3", where
  'slen' is the input string length in bytes. (The formula was
  generous to account for the possibility that RPADDING was set to 2.)
  That's also not enough if one input byte can turn into multiple
  characters.

To fix, introduce a growable trigram array and give up on trying to
choose the correct max buffer sizes ahead of time.

Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.

Security: CVE-2026-2007
Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Jeff Davis <[email protected]>

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/00896ddaf41fa7b725991120678d544c18c6af70
Author: Heikki Linnakangas <[email protected]>

Modified Files
--------------
contrib/pg_trgm/trgm_op.c        | 275 ++++++++++++++++++++++++++-------------
src/tools/pgindent/typedefs.list |   1 +
2 files changed, 185 insertions(+), 91 deletions(-)



^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* pgsql: Fix buffer overflows in pg_trgm due to lower-casing
@ 2026-02-09 00:07  Thomas Munro <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Thomas Munro @ 2026-02-09 00:07 UTC (permalink / raw)
  To: [email protected]

Fix buffer overflows in pg_trgm due to lower-casing

The code made a subtle assumption that the lower-cased version of a
string never has more characters than the original. That is not always
true. For example, in a database with the latin9 encoding:

    latin9db=# select lower(U&'\00CC' COLLATE "lt-x-icu");
       lower
    -----------
     i\x1A\x1A
    (1 row)

In this example, lower-casing expands the single input character into
three characters.

The generate_trgm_only() function relied on that assumption in two
ways:

- It used "slen * pg_database_encoding_max_length() + 4" to allocate
  the buffer to hold the lowercased and blank-padded string. That
  formula accounts for expansion if the lower-case characters are
  longer (in bytes) than the originals, but it's still not enough if
  the lower-cased string contains more *characters* than the original.

- Its callers sized the output array to hold the trigrams extracted
  from the input string with the formula "(slen / 2 + 1) * 3", where
  'slen' is the input string length in bytes. (The formula was
  generous to account for the possibility that RPADDING was set to 2.)
  That's also not enough if one input byte can turn into multiple
  characters.

To fix, introduce a growable trigram array and give up on trying to
choose the correct max buffer sizes ahead of time.

Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.

Security: CVE-2026-2007
Reviewed-by: Noah Misch <[email protected]>
Reviewed-by: Jeff Davis <[email protected]>

Branch
------
REL_18_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/e0965fb1a8550716db08e2183560be3546851647
Author: Heikki Linnakangas <[email protected]>

Modified Files
--------------
contrib/pg_trgm/trgm_op.c        | 275 ++++++++++++++++++++++++++-------------
src/tools/pgindent/typedefs.list |   1 +
2 files changed, 185 insertions(+), 91 deletions(-)



^ permalink  raw  reply  [nested|flat] 2+ messages in thread

end of thread, other threads:[~2026-02-09 00:07 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-09 00:06 pgsql: Fix buffer overflows in pg_trgm due to lower-casing Thomas Munro <[email protected]>
2026-02-09 00:07 pgsql: Fix buffer overflows in pg_trgm due to lower-casing Thomas Munro <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox