public inbox for [email protected]  
help / color / mirror / Atom feed
From: Andreas Karlsson <[email protected]>
To: Alexander Lakhin <[email protected]>
To: Jeff Davis <[email protected]>
To: zengman <[email protected]>
To: pgsql-hackers <[email protected]>
Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*()
Date: Wed, 1 Apr 2026 02:46:23 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On 3/12/26 5:00 AM, Alexander Lakhin wrote:
> I've discovered that starting from c4ff35f10, the following query:
> CREATE COLLATION c (provider = icu, locale = 'icu_something');
> 
> makes asan detect (maybe dubious, but still..) stack-buffer-overflow:
> ==21963==ERROR: AddressSanitizer: stack-buffer-overflow on address 
> 0x7ffd386d4e63 at pc 0x650cd7972a76 bp 0x7ffd386d4e00 sp 0x7ffd386d45a8
> ...
> Address 0x7ffd386d4e63 is located in stack of thread T0 at offset 67 in 
> frame
>      #0 0x650cd86962ef in foldcase_options (.../usr/local/pgsql/bin/ 
> postgres+0x12322ef) (BuildId: e441a9634858193e7358e5901e7948606ff5b1b1)
> 
>    This frame has 2 object(s):
>      [48, 52) 'status' (line 993)
>      [64, 67) 'lang' (line 992) <== Memory access at offset 67 overflows 
> this variable
> 
> I use a build made with:
> CC=gcc-13 CPPFLAGS="-fsanitize=address" LDFLAGS="-fsanitize=address - 
> static-libasan" ./configure --with-icu ...
> 
> Could you please have a look?
Thanks for finding this!

Interestingly this bug seems like it would be there even before my 
patch, but maybe something I did made it when moving code around made it 
possible or easier to trigger. As far as I can tell the issue is that

     uloc_getLanguage(locale, lang, 3, &status);

will populate lang with a string which is not zero terminated if the 
language is 3 or more characters, e.g. "und". And for some reason which 
I am not entirely strcmp("tr", {'u','n','d'}) can cause an overflow. 
Maybe due to some optimization?

My proposed fix is that we allocate a ULOC_LANG_CAPACITY buffer for the 
language like we do in fix_icu_locale_str() instead of trying to be 
clever. An alternative would be to use strncmp("tr", lang, 3) but that 
seems too clever for my taste in something which is not performance 
critical. A third option would be to check for 
U_STRING_NOT_TERMINATED_WARNING but I think that would just be 
unnecessarily convoluted code.

I have attached my proposed fix.

Andreas


Attachments:

  [text/x-patch] v1-0001-Fix-overrun-when-comparing-with-unterminated-ICU-.patch (1.3K, 2-v1-0001-Fix-overrun-when-comparing-with-unterminated-ICU-.patch)
  download | inline diff:
From 9d9a13917f53de690d70dcfb62adb1f0c5acad2a Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <[email protected]>
Date: Wed, 1 Apr 2026 02:39:09 +0200
Subject: [PATCH v1] Fix overrun when comparing with unterminated ICU language
 string

When uloc_getLanguage() returns an unterminated string when the language
is too long to fit in our buffer, in this case 3 bytes. This could cause
a later strcmp() to read outside the buffer.

Since this is not a performance cirtical code path just increase the buffer
size to ULOC_LANG_CAPACITY to match the code on fix_icu_locale_str()
instead of trying to do something clever.
---
 src/backend/utils/adt/pg_locale_icu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 5ad05fcd016..96d66dd4f8a 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -989,10 +989,10 @@ static int32_t
 foldcase_options(const char *locale)
 {
 	uint32		options = U_FOLD_CASE_DEFAULT;
-	char		lang[3];
+	char		lang[ULOC_LANG_CAPACITY];
 	UErrorCode	status = U_ZERO_ERROR;
 
-	uloc_getLanguage(locale, lang, 3, &status);
+	uloc_getLanguage(locale, lang, ULOC_LANG_CAPACITY, &status);
 	if (U_SUCCESS(status))
 	{
 		/*
-- 
2.47.3



view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*()
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox