postgresql-interfaces/psqlodbc GitHub issues and pull requests (mirror)
help / color / mirror / Atom feedFrom: davecramer (@davecramer) <[email protected]>
To: postgresql-interfaces/psqlodbc <[email protected]>
Subject: Re: [postgresql-interfaces/psqlodbc] PR #187: Validate UTF-16 surrogate pairs before combining
Date: Thu, 14 May 2026 11:48:08 +0000
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
In win_unicode.c, the ucs2_to_utf8() function converts SQLWCHAR (UTF-16) to UTF-8. When it encounters a high
surrogate (0xD800–0xDBFF), it unconditionally advances to wstr[1] and reads the next code unit — without checking:
1. Whether there is a next code unit within the caller-specified length (ilen)
2. Whether that next code unit is actually a low surrogate (0xDC00–0xDFFF)
This is an out-of-bounds read. If the input ends with a dangling high surrogate (which is invalid UTF-16 but must be
handled gracefully), the driver reads one SQLWCHAR past the buffer.
The Fix
// Before:
else if (surrog1_bits == (*wstr & surrog_check))
// After:
else if (surrog1_bits == (*wstr & surrog_check) &&
i + 1 < ilen &&
surrog2_bits == (wstr[1] & surrog_check))
Three conditions must all be true before taking the surrogate-pair path:
1. Current code unit is a high surrogate
2. There's at least one more code unit within bounds (i + 1 < ilen)
3. The next code unit is a low surrogate
If any condition fails, the code falls through to the non-pair path (which handles the code unit as a standalone
value — producing replacement or garbage UTF-8, but not crashing).
Assessment
Correctness: ✅ The fix is minimal and correct. The bounds check (i + 1 < ilen) prevents the OOB read. The
low-surrogate validation (surrog2_bits == (wstr[1] & surrog_check)) ensures we only combine when we actually have a
valid pair — a high surrogate followed by something that isn't a low surrogate is malformed input and shouldn't be
combined.
Test: ✅ Excellent test design. It uses mmap/VirtualAlloc to place the single SQLWCHAR at the end of a page with the
next page marked PROT_NONE/PAGE_NOACCESS. If the driver reads past the supplied length, it gets a SIGSEGV/access
violation — deterministic crash rather than silent corruption. Cross-platform (Unix + Win32).
Security relevance: This is a memory safety bug reachable from any wide-character ODBC API (SQLPrepareW,
SQLExecDirectW, etc.) when the application passes a length that doesn't include a trailing low surrogate. In
practice this is unlikely to be exploitable (it's a 2-byte over-read, not a write), but it's still a crash bug that
ASan catches.
One observation: The fallthrough path for a dangling high surrogate will encode it as a 3-byte UTF-8 sequence (the
raw surrogate code point). This is technically "WTF-8" rather than valid UTF-8. PostgreSQL will reject it anyway, so
it's fine — the driver doesn't crash, and the server returns an error. A future improvement could substitute U+FFFD,
but that's out of scope for this fix.
Verdict
Approve. Clean, minimal fix for a real OOB read with an excellent guarded-page regression test. Ship it.
view thread (4+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: github://postgresql-interfaces/psqlodbc
Cc: [email protected], [email protected]
Subject: Re: [postgresql-interfaces/psqlodbc] PR #187: Validate UTF-16 surrogate pairs before combining
In-Reply-To: <<[email protected]>>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox