Message-ID: <gh-postgresql-interfaces-psqlodbc-187@github.com>
From: "jarvis24young (@jarvis24young)" <noreply+jarvis24young@github.com>
To: "postgresql-interfaces/psqlodbc" <noreply+postgresql-interfaces-psqlodbc@github.com>
Date: Tue, 12 May 2026 03:42:49 +0000
Subject: [postgresql-interfaces/psqlodbc] PR #187: Validate UTF-16 surrogate pairs before combining
List-Id: <gh-postgresql-interfaces-psqlodbc.github.com>
X-GitHub-Author-Id: 48787405
X-GitHub-Author-Login: jarvis24young
X-GitHub-Issue: 187
X-GitHub-Repo: postgresql-interfaces/psqlodbc
X-GitHub-State: merged
X-GitHub-Type: pull_request
X-GitHub-Url: https://github.com/postgresql-interfaces/psqlodbc/pull/187
Content-Type: text/plain; charset=utf-8

SQLWCHAR-to-UTF-8 conversion currently treats any UTF-16 high surrogate as the start of a surrogate pair. It then advances to the next code unit and reads it unconditionally.

That can read past the caller-supplied length when a wide-character ODBC API receives a dangling high surrogate at the end of its input. The new regression test exercises this through the public `SQLPrepareW()` path with a guarded one-code-unit SQLWCHAR buffer, so the old implementation faults deterministically if it reads `wstr[1]`.

Fix this by only taking the surrogate-pair path when:

- the current code unit is a high surrogate,
- there is another code unit within `ilen`, and
- the next code unit is a low surrogate.

Otherwise the existing non-pair path is used, avoiding the out-of-bounds read.

Reproduction on the old implementation, using the same black-box test with ASan and a guarded buffer:

```text
ERROR: AddressSanitizer: SEGV on unknown address
The signal is caused by a READ memory access.
#0 ucs2_to_utf8 win_unicode.c:191
#1 SQLPrepareW odbcapiw.c:439
#2 SQLPrepareW libodbc.so.2
#3 main test/src/surrogate-pair-test.c:109
```

Tested after the fix:

```text
cd ~/psqlodbc-surrogate-oob-build/test
ODBCSYSINI=. ODBCINSTINI=./odbcinst.ini ODBCINI=./odbc.ini ./runsuite surrogate-pair --inputdir=.
TAP version 13
1..1
ok 1 - surrogate-pair
```

Also tested the target binary directly under ASan/UBSan with `detect_leaks=0`; it returns normally.