Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w7jj3-005e0H-0G for pgsql-hackers@arkaria.postgresql.org; Wed, 01 Apr 2026 00:46:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w7jj1-00E4NQ-1W for pgsql-hackers@arkaria.postgresql.org; Wed, 01 Apr 2026 00:46:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w7jj1-00E4NI-0F for pgsql-hackers@lists.postgresql.org; Wed, 01 Apr 2026 00:46:27 +0000 Received: from smtp.outgoing.loopia.se ([93.188.3.37]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w7jiy-00000002Gfz-40Yx for pgsql-hackers@postgresql.org; Wed, 01 Apr 2026 00:46:27 +0000 Received: from s807.loopia.se (localhost [127.0.0.1]) by s807.loopia.se (Postfix) with ESMTP id 8B0F848AB07 for ; Wed, 01 Apr 2026 02:46:24 +0200 (CEST) Received: from s899.loopia.se (unknown [172.22.191.5]) by s807.loopia.se (Postfix) with ESMTP id 793F848A911; Wed, 01 Apr 2026 02:46:24 +0200 (CEST) Received: from localhost (unknown [172.22.191.5]) by s899.loopia.se (Postfix) with ESMTP id 39B442C8B99C; Wed, 01 Apr 2026 02:46:24 +0200 (CEST) X-Virus-Scanned: amavis at amavis.loopia.se X-Spam-Flag: NO X-Spam-Score: -1.2 X-Spam-Level: X-Spam-Status: No, score=-1.2 tagged_above=-999 required=6.2 tests=[ALL_TRUSTED=-1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1] autolearn=disabled Authentication-Results: s472.loopia.se (amavis); dkim=pass (2048-bit key) header.d=proxel.se Received: from s981.loopia.se ([172.22.191.5]) by localhost (s472.loopia.se [172.22.190.12]) (amavis, port 10024) with LMTP id hPzW8DfSKTQs; Wed, 1 Apr 2026 02:46:23 +0200 (CEST) X-Loopia-Auth: user X-Loopia-User: andreas@proxel.se X-Loopia-Originating-IP: 147.28.75.140 Received: from [192.168.0.121] (customer-147-28-75-140.stosn.net [147.28.75.140]) (Authenticated sender: andreas@proxel.se) by s981.loopia.se (Postfix) with ESMTPSA id 72FAF22B16FB; Wed, 01 Apr 2026 02:46:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proxel.se; s=loopiadkim1707418970; t=1775004383; bh=0eRraf+9El6FZx4JU5PcsojFNwVbQrj7GBRzQSvDlBM=; h=Date:Subject:To:References:From:In-Reply-To; b=Gv8+OFlqnvEQ8Hex9aEA4Vdmvh4OB+YTzFiR6GKnDmhhwBgvhLcHD+HOMUvg8eLL/ qt1guTURXXthXc6w4kYIK8SatwAry41nO3eEo+0/Qzrf9xxyZIwkkQuKlxSx3mb6PN 5xO/cOsLnoZxoHMeRsIEn5bv9+MkGyX0SDEg9G/y7wRs0bEOGQJOrJ5iWz4Gj9W9S0 Lsioh+RfY2D6SyFcj31HSDMh9T1Q0Piaeiftukc/A1N1MGepYjhdkTV68s5xEl5+R/ NwXWFFFkMinlyRMTpE+E6h8WLGw+J+Ao6CDAcnJ9WTJzbBS1IsR5w08tQeOpdgIkr6 x6+dzRXDYr8dQ== Content-Type: multipart/mixed; boundary="------------dhVhRLrC0vkQbBbFdd0CvDEV" Message-ID: Date: Wed, 1 Apr 2026 02:46:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*() To: Alexander Lakhin , Jeff Davis , zengman , pgsql-hackers References: <167986ff-afcf-4542-94c6-61ee8474e138@proxel.se> <72c7c2b5848da44caddfe0f20f6c7ebc7c0c6e60.camel@j-davis.com> <4cfde442-25dd-495f-8d76-a23502ce17b8@proxel.se> <744b9998-4463-4be5-b60e-a960eeb43202@proxel.se> <5a010b27-8ed9-4739-86fe-1562b07ba564@proxel.se> <96d80a47-f17f-42fa-82b1-2908efbd6541@gmail.com> From: Andreas Karlsson Content-Language: en-US In-Reply-To: <96d80a47-f17f-42fa-82b1-2908efbd6541@gmail.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------dhVhRLrC0vkQbBbFdd0CvDEV Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 3/12/26 5:00 AM, Alexander Lakhin wrote: > I've discovered that starting from c4ff35f10, the following query: > CREATE COLLATION c (provider = icu, locale = 'icu_something'); > > makes asan detect (maybe dubious, but still..) stack-buffer-overflow: > ==21963==ERROR: AddressSanitizer: stack-buffer-overflow on address > 0x7ffd386d4e63 at pc 0x650cd7972a76 bp 0x7ffd386d4e00 sp 0x7ffd386d45a8 > ... > Address 0x7ffd386d4e63 is located in stack of thread T0 at offset 67 in > frame >     #0 0x650cd86962ef in foldcase_options (.../usr/local/pgsql/bin/ > postgres+0x12322ef) (BuildId: e441a9634858193e7358e5901e7948606ff5b1b1) > >   This frame has 2 object(s): >     [48, 52) 'status' (line 993) >     [64, 67) 'lang' (line 992) <== Memory access at offset 67 overflows > this variable > > I use a build made with: > CC=gcc-13 CPPFLAGS="-fsanitize=address" LDFLAGS="-fsanitize=address - > static-libasan" ./configure --with-icu ... > > Could you please have a look? Thanks for finding this! Interestingly this bug seems like it would be there even before my patch, but maybe something I did made it when moving code around made it possible or easier to trigger. As far as I can tell the issue is that uloc_getLanguage(locale, lang, 3, &status); will populate lang with a string which is not zero terminated if the language is 3 or more characters, e.g. "und". And for some reason which I am not entirely strcmp("tr", {'u','n','d'}) can cause an overflow. Maybe due to some optimization? My proposed fix is that we allocate a ULOC_LANG_CAPACITY buffer for the language like we do in fix_icu_locale_str() instead of trying to be clever. An alternative would be to use strncmp("tr", lang, 3) but that seems too clever for my taste in something which is not performance critical. A third option would be to check for U_STRING_NOT_TERMINATED_WARNING but I think that would just be unnecessarily convoluted code. I have attached my proposed fix. Andreas --------------dhVhRLrC0vkQbBbFdd0CvDEV Content-Type: text/x-patch; charset=UTF-8; name="v1-0001-Fix-overrun-when-comparing-with-unterminated-ICU-.patch" Content-Disposition: attachment; filename*0="v1-0001-Fix-overrun-when-comparing-with-unterminated-ICU-.pa"; filename*1="tch" Content-Transfer-Encoding: base64 RnJvbSA5ZDlhMTM5MTdmNTNkZTY5MGQ3MGRjZmI2MmFkYjFmMGM1YWNhZDJhIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBBbmRyZWFzIEthcmxzc29uIDxhbmRyZWFzQHByb3hl bC5zZT4KRGF0ZTogV2VkLCAxIEFwciAyMDI2IDAyOjM5OjA5ICswMjAwClN1YmplY3Q6IFtQ QVRDSCB2MV0gRml4IG92ZXJydW4gd2hlbiBjb21wYXJpbmcgd2l0aCB1bnRlcm1pbmF0ZWQg SUNVIGxhbmd1YWdlCiBzdHJpbmcKCldoZW4gdWxvY19nZXRMYW5ndWFnZSgpIHJldHVybnMg YW4gdW50ZXJtaW5hdGVkIHN0cmluZyB3aGVuIHRoZSBsYW5ndWFnZQppcyB0b28gbG9uZyB0 byBmaXQgaW4gb3VyIGJ1ZmZlciwgaW4gdGhpcyBjYXNlIDMgYnl0ZXMuIFRoaXMgY291bGQg Y2F1c2UKYSBsYXRlciBzdHJjbXAoKSB0byByZWFkIG91dHNpZGUgdGhlIGJ1ZmZlci4KClNp bmNlIHRoaXMgaXMgbm90IGEgcGVyZm9ybWFuY2UgY2lydGljYWwgY29kZSBwYXRoIGp1c3Qg aW5jcmVhc2UgdGhlIGJ1ZmZlcgpzaXplIHRvIFVMT0NfTEFOR19DQVBBQ0lUWSB0byBtYXRj aCB0aGUgY29kZSBvbiBmaXhfaWN1X2xvY2FsZV9zdHIoKQppbnN0ZWFkIG9mIHRyeWluZyB0 byBkbyBzb21ldGhpbmcgY2xldmVyLgotLS0KIHNyYy9iYWNrZW5kL3V0aWxzL2FkdC9wZ19s b2NhbGVfaWN1LmMgfCA0ICsrLS0KIDEgZmlsZSBjaGFuZ2VkLCAyIGluc2VydGlvbnMoKyks IDIgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2JhY2tlbmQvdXRpbHMvYWR0L3Bn X2xvY2FsZV9pY3UuYyBiL3NyYy9iYWNrZW5kL3V0aWxzL2FkdC9wZ19sb2NhbGVfaWN1LmMK aW5kZXggNWFkMDVmY2QwMTYuLjk2ZDY2ZGQ0ZjhhIDEwMDY0NAotLS0gYS9zcmMvYmFja2Vu ZC91dGlscy9hZHQvcGdfbG9jYWxlX2ljdS5jCisrKyBiL3NyYy9iYWNrZW5kL3V0aWxzL2Fk dC9wZ19sb2NhbGVfaWN1LmMKQEAgLTk4OSwxMCArOTg5LDEwIEBAIHN0YXRpYyBpbnQzMl90 CiBmb2xkY2FzZV9vcHRpb25zKGNvbnN0IGNoYXIgKmxvY2FsZSkKIHsKIAl1aW50MzIJCW9w dGlvbnMgPSBVX0ZPTERfQ0FTRV9ERUZBVUxUOwotCWNoYXIJCWxhbmdbM107CisJY2hhcgkJ bGFuZ1tVTE9DX0xBTkdfQ0FQQUNJVFldOwogCVVFcnJvckNvZGUJc3RhdHVzID0gVV9aRVJP X0VSUk9SOwogCi0JdWxvY19nZXRMYW5ndWFnZShsb2NhbGUsIGxhbmcsIDMsICZzdGF0dXMp OworCXVsb2NfZ2V0TGFuZ3VhZ2UobG9jYWxlLCBsYW5nLCBVTE9DX0xBTkdfQ0FQQUNJVFks ICZzdGF0dXMpOwogCWlmIChVX1NVQ0NFU1Moc3RhdHVzKSkKIAl7CiAJCS8qCi0tIAoyLjQ3 LjMKCg== --------------dhVhRLrC0vkQbBbFdd0CvDEV--