Message-ID: <gh-pgjdbc-pgjdbc-4016@github.com>
From: "vlsi (@vlsi)" <noreply+vlsi@github.com>
To: "pgjdbc/pgjdbc" <noreply+pgjdbc-pgjdbc@github.com>
Date: Fri, 10 Apr 2026 13:28:33 +0000
Subject: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
List-Id: <gh-pgjdbc-pgjdbc.github.com>
X-GitHub-Author-Id: 213894
X-GitHub-Author-Login: vlsi
X-GitHub-Issue: 4016
X-GitHub-Repo: pgjdbc/pgjdbc
X-GitHub-State: open
X-GitHub-Type: pull_request
X-GitHub-Url: https://github.com/pgjdbc/pgjdbc/pull/4016
Content-Type: text/plain; charset=utf-8

### Why

If the response stream desyncs (corrupted bytes, MITM, a buggy server, a wire-incompatible fork), the pre-#4015 v3 reader in `PGStream.receiveTupleV3()` trusts the 4-byte field length and immediately does `new byte[size]`. On a corrupted length this becomes a roughly 1.7 GB allocation followed by an indefinite block in `SocketInputStream.read0` waiting for bytes that never arrive. The connection is never returned to the pool.

The blast radius is worse than the per-connection cost. A broken reader stays in the pool's "open and not yet known to be broken" set, gets handed to the next caller, and corrupts that caller's query as well. Every length-prefixed v3 message had the same shape before #4015: trust the wire-provided length to drive an allocation, a skip, or a buffer scan.

### What is checked unconditionally

The following checks fire on every connection in every mode. The `pgjdbc.protocolHardeningMode` system property has no effect on them:

* **`MAX_MESSAGE_SIZE = 0x3fffffff` absolute cap.** No backend message may exceed this length. The constant is the backend's `MaxAllocSize` and is shared by every PostgreSQL-compatible fork (CockroachDB, YugabyteDB, Redshift, Greenplum, Timescale).
* **User-set ceiling.** If `maxResultBuffer` is configured, no single backend message may exceed it. The user set the ceiling on purpose; the driver enforces it before allocating, not after.
* **Bounded C-string scans.** Every NUL-terminated string read is bounded by either the remaining envelope of the current message or by `MAX_MESSAGE_SIZE`. A C-string read that begins with zero remaining envelope is rejected.
* **DataRow per-field prefixes do not fit the envelope.** A row whose `nf * 4` per-field length prefixes alone exceed the message size is rejected.
* **DataRow field overruns the row envelope.** A field whose declared length exceeds the bytes still available in the row is rejected. This is the exact issue #4015 scenario.
* **DataRow or FunctionCallResponse per-field length below -1.** The wire protocol assigns meaning only to -1 (NULL) and to non-negative values; any other negative leaves the driver no way to decode the field.
* **FunctionCallResponse value overruns the message envelope.** A `valueLen` exceeding `msgLen - 8` is rejected (single-value variant of the issue #4015 row overrun).
* **RowDescription cannot fit the 19-byte protocol minimum per field.** A declared field count whose 19-byte-per-field minimum exceeds the message envelope is rejected.
* **NegotiateProtocolVersion option count overruns the envelope.** Each option is at minimum a NUL byte, so an option count above the remaining envelope is rejected.
* **GSS encrypted packet length outside `[1, 16380]`.** Mirrors the backend's `PQ_GSS_RECV_BUFFER_SIZE - sizeof(uint32)` hard limit. Enforced in `GSSInputStream` (for post-handshake traffic) and in `pgStream.readMessageLength("GSSEncryptionHandshakeToken", 0, 16380)` (for the handshake token).
* **Protocol-derived hard caps on message length.** Each call to `pgStream.readMessageLength(name, min, max)` enforces both bounds unconditionally. `max` is reserved for the value the protocol itself fixes, not for a pgjdbc-applied soft cap: `BackendKeyData ≤ 8 + 256` (the v3.2 cancel-key maximum), `CopyResponse ≤ 7 + 2 * Short.MAX_VALUE` (envelope is fully determined by the signed-int16 field count), and `ParameterDescription ≤ 6 + 4 * Short.MAX_VALUE` (same shape). Each cap is shadowed downstream by an exact-equality or negative-count check that would also reject the same bad message; the readMessageLength check is the fast-path rejection before any payload is read.
* **Fixed-length acknowledgements with the wrong size.** `ParseComplete`, `BindComplete`, `CloseComplete`, `NoData`, `PortalSuspended`, `EmptyQueryResponse`, and `ReadyForQuery` are verified against their protocol-defined exact length. Messages whose total length is fully determined by an inner int16 count (`ParameterDescription`, `CopyResponse`) are likewise verified for exact match. A length mismatch means the next reader is misaligned with the wire; continuing would parse the next message's header from inside this message's body.
* **Envelope mismatches.** Each `readMessageLength` call captures the declared envelope; subsequent reads bound themselves against the remaining envelope, and a paired `endMessage()` asserts the declared body was consumed exactly. A mismatch has the same desync consequence as a wrong fixed-length above.
* **Negative signed-int16 counts.** `DataRow.nf`, `RowDescription.size`, `CopyResponse.numFields`, `ParameterDescription.numParams`, `NegotiateProtocolVersion.numOptions`. A negative count here is not a useful compatibility escape hatch: even if a fork treated the slot as unsigned int16 and sent a value above 32767, pgjdbc currently reads it as signed and then uses it as an array size, a loop bound, or an envelope-arithmetic input. Continuing after `WARN` would not make that fork work; it would either fail later, allocate incorrectly, skip checks, or desync. A real "unsigned int16 fork" would need an intentional compatibility change in pgjdbc to read those slots differently, not a `WARN`-and-continue.
* **Authentication round-trip cap (64).** A hostile server must not be able to loop the client indefinitely on SCRAM, GSS, or SSPI continuations. The cap is a hard ceiling on pre-auth dialogue, not a tightening of the protocol that a fork could legitimately need to relax; `WARN`-and-continue would leave the loop unbounded, defeating the cap.

These checks share two properties: each detects a value that is mathematically impossible on a well-formed stream or would leave the reader misaligned with the wire, and relaxing any of them would either re-introduce the original #4015 bug or swap a clean `IOException` for a more confusing crash (`NegativeArraySizeException`, desync on the next message, indefinite socket-read hang) one or two lines later. There is no setting that disables them.

On a violation, `PGStream` is marked broken. An internal flag is set; `isClosed()` returns `true` thereafter, so a pool that tests on borrow discards the connection. The socket is closed with `SO_LINGER 0` so the TCP `RST` is immediate. The caller sees an `IOException`, or a `PSQLException` with `PSQLState.PROTOCOL_VIOLATION` for the pre-auth path.

### What is checked by default, relaxable by the flag

Four pgjdbc-applied soft caps remain under the flag. Each is a length ceiling pgjdbc applies on top of the documented protocol, where the protocol itself does not fix a maximum and a wire-compatible fork could legitimately advertise more. These run by default at `WARNING` level; the `pgjdbc.protocolHardeningMode` system property lets the user switch them to fail-fast (`=fail`) or silence them entirely (`=disable`).

* **`NotificationResponse ≤ 1 MiB`** — PostgreSQL's `NOTIFY` payload is `NOTIFY_PAYLOAD_MAX_LENGTH` = 8000 bytes and channel names are `NAMEDATALEN`-bounded (64 bytes), so a well-formed message tops out under 8 KiB. The 1 MiB cap leaves >100× headroom.
* **`ParameterStatus ≤ 1 MiB`** — names are `NAMEDATALEN`-bounded; values are short GUC strings in practice.
* **`AuthenticationRequest ≤ 8 + 2 MiB`** — payload is the SCRAM/MD5/GSS continuation. Kerberos tokens reach a few hundred KB in pathological nested-group cases; the 2 MiB cap leaves >30× headroom.
* **`AuthenticationGSSContinue ≤ 8 + 2 MiB`** — payload is the GSS continuation token. Same shape as the SCRAM/GSS branch above.

Each soft cap is implemented as an inline check after `readMessageLength`, routed through `pgStream.failOnDesync(...)`:

```java
int len = pgStream.readMessageLength("NotificationResponse", 10);  // unconditional [10, MAX_MESSAGE_SIZE]
if (len > 1024 * 1024) {
  pgStream.failOnDesync(IOException::new, GT.tr(
      "Protocol error. NotificationResponse length {0} exceeds the 1 MiB pgjdbc cap.", len));
}
```

A fork that advertises a larger value for any of those caps would trip the inline check on otherwise well-formed traffic. The flag lets the user accept the larger value for that specific deployment without rebuilding the driver.

### How to configure

The JVM system property `pgjdbc.protocolHardeningMode` selects what happens on a relaxable-check failure. It is read once at driver class-load time and applies to every connection the JVM opens. Default: `warn`.

| Value | Behaviour |
|---|---|
| `fail` | Mark the `PGStream` broken and throw a connection-level error. The thrown message includes a hint pointing at this property and at the issue tracker, so that an SRE or AI assistant triaging the exception has actionable next steps without reading the source. Opt in to fail fast on a desynced or hostile backend. |
| `warn` (default) | Log the violation at `WARNING` with the exception attached (the stack trace points at the reader site that detected it). The caller continues with the suspect value, restoring the pre-#4015 read path. The driver may still crash later (`NegativeArraySizeException`, hang on socket read) if the data is genuinely corrupted, but the log breadcrumb lets the user confirm whether the check is firing on real or pathological-but-benign traffic. |
| `disable` | Silently skip the check. Discouraged; prefer `warn` so that the symptom remains visible in logs. |

Configure on the JVM command line:

```
-Dpgjdbc.protocolHardeningMode=fail
```

Example error message in `fail` mode:

```
Protocol error. DataRow has negative field count -1 (message size 5).
If you are sure this is a false positive, you can silence the error by
setting -Dpgjdbc.protocolHardeningMode=warn (log only) or
-Dpgjdbc.protocolHardeningMode=disable (no logging). Consider
filing a bug report at https://github.com/pgjdbc/pgjdbc/issues.
```

### Why `warn` is the default

The four soft caps are conservative pgjdbc-side ceilings; a real desync is very unlikely to slip through one of them. The risk that an upgrade trips one of these on a previously working workload (e.g. an exotic wire-compatible fork that advertises a larger `NotificationResponse` payload than upstream PostgreSQL) is real but small. Defaulting to `warn` means an upgrade cannot silently turn such a workload into hard connection failures; the log breadcrumb makes a false positive easy to identify, report, and patch. Operators who want fail-fast safety against a hostile or buggy backend opt into `fail` explicitly.

### When to switch to `fail` or `disable`

* `fail`: explicit opt-in for fail-fast behaviour on the four soft caps. A genuinely-too-large message results in a connection-level error and the marked-broken socket is evicted from the pool, preventing the next caller from inheriting a corrupted stream.
* `disable`: for users who hit a false positive on a soft cap, cannot wait for an upstream fix, and want the symptom suppressed even in logs. Discouraged. The unconditional checks listed in [What is checked unconditionally](#what-is-checked-unconditionally) remain in effect; `disable` only relaxes the four soft caps.

Sites that were already throwing before #4015 (`Session setup failed`, SSL/GSS handshake errors) are not affected by the flag either. They are real errors, not false positives the user can opt out of.

### Backward compatibility

The default `warn` mode preserves the pre-#4015 read path for the four soft caps: well-formed traffic works exactly as before, and a backend message that exceeds a soft cap but stays under `MAX_MESSAGE_SIZE` and `maxResultBuffer` is processed with a `WARNING` log line that names the violated cap. An upgrade cannot turn a previously working workload into hard connection failures over this set.

The unconditional set is new behaviour: a workload that previously hit one of these scenarios would have crashed with `NegativeArraySizeException`, desynced on the next message, or hung indefinitely on a socket read. The new code surfaces a clean `IOException` instead and evicts the connection from the pool. No production workload tripped these scenarios on well-formed traffic, so the change is invisible to well-behaved deployments.

The system property is opt-in. Users who want fail-fast safety against a hostile or buggy backend set `-Dpgjdbc.protocolHardeningMode=fail`.

### Tests

* `VisibleBufferedInputStreamTest` covers the bounded NUL-scan, the partial-read resume path, the position tracker across compaction, and the budget-exceeded throw.
* `ProtocolHardeningModeTest` covers all three modes, the silence-hint suffix, both exception factories (`IOException` and `PSQLException`), the case-insensitive parser, the unknown-value fallback to `warn` (the default), four unconditional-check fixtures verified to fire in every mode (DataRow field overrun, DataRow negative field length, `maxResultBuffer` overrun, and `endMessage` envelope mismatch), and two C-string-scan failures verified to set the broken flag at the throw site (positive-budget overrun, EOF mid-scan).