Message-ID: From: "jarvis24young (@jarvis24young)" To: "postgresql-interfaces/psqlodbc" Date: Tue, 12 May 2026 03:42:49 +0000 Subject: [postgresql-interfaces/psqlodbc] PR #187: Validate UTF-16 surrogate pairs before combining List-Id: X-GitHub-Author-Id: 48787405 X-GitHub-Author-Login: jarvis24young X-GitHub-Issue: 187 X-GitHub-Repo: postgresql-interfaces/psqlodbc X-GitHub-State: merged X-GitHub-Type: pull_request X-GitHub-Url: https://github.com/postgresql-interfaces/psqlodbc/pull/187 Content-Type: text/plain; charset=utf-8 SQLWCHAR-to-UTF-8 conversion currently treats any UTF-16 high surrogate as the start of a surrogate pair. It then advances to the next code unit and reads it unconditionally. That can read past the caller-supplied length when a wide-character ODBC API receives a dangling high surrogate at the end of its input. The new regression test exercises this through the public `SQLPrepareW()` path with a guarded one-code-unit SQLWCHAR buffer, so the old implementation faults deterministically if it reads `wstr[1]`. Fix this by only taking the surrogate-pair path when: - the current code unit is a high surrogate, - there is another code unit within `ilen`, and - the next code unit is a low surrogate. Otherwise the existing non-pair path is used, avoiding the out-of-bounds read. Reproduction on the old implementation, using the same black-box test with ASan and a guarded buffer: ```text ERROR: AddressSanitizer: SEGV on unknown address The signal is caused by a READ memory access. #0 ucs2_to_utf8 win_unicode.c:191 #1 SQLPrepareW odbcapiw.c:439 #2 SQLPrepareW libodbc.so.2 #3 main test/src/surrogate-pair-test.c:109 ``` Tested after the fix: ```text cd ~/psqlodbc-surrogate-oob-build/test ODBCSYSINI=. ODBCINSTINI=./odbcinst.ini ODBCINI=./odbc.ini ./runsuite surrogate-pair --inputdir=. TAP version 13 1..1 ok 1 - surrogate-pair ``` Also tested the target binary directly under ASan/UBSan with `detect_leaks=0`; it returns normally.