[pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)

pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed

[pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
21+ messages / 3 participants
[nested] [flat]

* [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-04-10 13:28  "vlsi (@vlsi)" <[email protected]>
  0 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-04-10 13:28 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

### Why

If the response stream desyncs (corrupted bytes, MITM, a buggy server, a wire-incompatible fork), the pre-#4015 v3 reader in `PGStream.receiveTupleV3()` trusts the 4-byte field length and immediately does `new byte[size]`. On a corrupted length this becomes a roughly 1.7 GB allocation followed by an indefinite block in `SocketInputStream.read0` waiting for bytes that never arrive. The connection is never returned to the pool.

The blast radius is worse than the per-connection cost. A broken reader stays in the pool's "open and not yet known to be broken" set, gets handed to the next caller, and corrupts that caller's query as well. Every length-prefixed v3 message had the same shape before #4015: trust the wire-provided length to drive an allocation, a skip, or a buffer scan.

### What is checked unconditionally

The following checks fire on every connection in every mode. The `pgjdbc.protocolHardeningMode` system property has no effect on them:

* **`MAX_MESSAGE_SIZE = 0x3fffffff` absolute cap.** No backend message may exceed this length. The constant is the backend's `MaxAllocSize` and is shared by every PostgreSQL-compatible fork (CockroachDB, YugabyteDB, Redshift, Greenplum, Timescale).
* **User-set ceiling.** If `maxResultBuffer` is configured, no single backend message may exceed it. The user set the ceiling on purpose; the driver enforces it before allocating, not after.
* **Bounded C-string scans.** Every NUL-terminated string read is bounded by either the remaining envelope of the current message or by `MAX_MESSAGE_SIZE`. A C-string read that begins with zero remaining envelope is rejected.
* **DataRow per-field prefixes do not fit the envelope.** A row whose `nf * 4` per-field length prefixes alone exceed the message size is rejected.
* **DataRow field overruns the row envelope.** A field whose declared length exceeds the bytes still available in the row is rejected. This is the exact issue #4015 scenario.
* **DataRow or FunctionCallResponse per-field length below -1.** The wire protocol assigns meaning only to -1 (NULL) and to non-negative values; any other negative leaves the driver no way to decode the field.
* **FunctionCallResponse value overruns the message envelope.** A `valueLen` exceeding `msgLen - 8` is rejected (single-value variant of the issue #4015 row overrun).
* **RowDescription cannot fit the 19-byte protocol minimum per field.** A declared field count whose 19-byte-per-field minimum exceeds the message envelope is rejected.
* **NegotiateProtocolVersion option count overruns the envelope.** Each option is at minimum a NUL byte, so an option count above the remaining envelope is rejected.
* **GSS encrypted packet length outside `[1, 16380]`.** Mirrors the backend's `PQ_GSS_RECV_BUFFER_SIZE - sizeof(uint32)` hard limit. Enforced in `GSSInputStream` (for post-handshake traffic) and in `pgStream.readMessageLength("GSSEncryptionHandshakeToken", 0, 16380)` (for the handshake token).
* **Protocol-derived hard caps on message length.** Each call to `pgStream.readMessageLength(name, min, max)` enforces both bounds unconditionally. `max` is reserved for the value the protocol itself fixes, not for a pgjdbc-applied soft cap: `BackendKeyData ≤ 8 + 256` (the v3.2 cancel-key maximum), `CopyResponse ≤ 7 + 2 * Short.MAX_VALUE` (envelope is fully determined by the signed-int16 field count), and `ParameterDescription ≤ 6 + 4 * Short.MAX_VALUE` (same shape). Each cap is shadowed downstream by an exact-equality or negative-count check that would also reject the same bad message; the readMessageLength check is the fast-path rejection before any payload is read.
* **Fixed-length acknowledgements with the wrong size.** `ParseComplete`, `BindComplete`, `CloseComplete`, `NoData`, `PortalSuspended`, `EmptyQueryResponse`, and `ReadyForQuery` are verified against their protocol-defined exact length. Messages whose total length is fully determined by an inner int16 count (`ParameterDescription`, `CopyResponse`) are likewise verified for exact match. A length mismatch means the next reader is misaligned with the wire; continuing would parse the next message's header from inside this message's body.
* **Envelope mismatches.** Each `readMessageLength` call captures the declared envelope; subsequent reads bound themselves against the remaining envelope, and a paired `endMessage()` asserts the declared body was consumed exactly. A mismatch has the same desync consequence as a wrong fixed-length above.
* **Negative signed-int16 counts.** `DataRow.nf`, `RowDescription.size`, `CopyResponse.numFields`, `ParameterDescription.numParams`, `NegotiateProtocolVersion.numOptions`. A negative count here is not a useful compatibility escape hatch: even if a fork treated the slot as unsigned int16 and sent a value above 32767, pgjdbc currently reads it as signed and then uses it as an array size, a loop bound, or an envelope-arithmetic input. Continuing after `WARN` would not make that fork work; it would either fail later, allocate incorrectly, skip checks, or desync. A real "unsigned int16 fork" would need an intentional compatibility change in pgjdbc to read those slots differently, not a `WARN`-and-continue.
* **Authentication round-trip cap (64).** A hostile server must not be able to loop the client indefinitely on SCRAM, GSS, or SSPI continuations. The cap is a hard ceiling on pre-auth dialogue, not a tightening of the protocol that a fork could legitimately need to relax; `WARN`-and-continue would leave the loop unbounded, defeating the cap.

These checks share two properties: each detects a value that is mathematically impossible on a well-formed stream or would leave the reader misaligned with the wire, and relaxing any of them would either re-introduce the original #4015 bug or swap a clean `IOException` for a more confusing crash (`NegativeArraySizeException`, desync on the next message, indefinite socket-read hang) one or two lines later. There is no setting that disables them.

On a violation, `PGStream` is marked broken. An internal flag is set; `isClosed()` returns `true` thereafter, so a pool that tests on borrow discards the connection. The socket is closed with `SO_LINGER 0` so the TCP `RST` is immediate. The caller sees an `IOException`, or a `PSQLException` with `PSQLState.PROTOCOL_VIOLATION` for the pre-auth path.

### What is checked by default, relaxable by the flag

Four pgjdbc-applied soft caps remain under the flag. Each is a length ceiling pgjdbc applies on top of the documented protocol, where the protocol itself does not fix a maximum and a wire-compatible fork could legitimately advertise more. These run by default at `WARNING` level; the `pgjdbc.protocolHardeningMode` system property lets the user switch them to fail-fast (`=fail`) or silence them entirely (`=disable`).

* **`NotificationResponse ≤ 1 MiB`** — PostgreSQL's `NOTIFY` payload is `NOTIFY_PAYLOAD_MAX_LENGTH` = 8000 bytes and channel names are `NAMEDATALEN`-bounded (64 bytes), so a well-formed message tops out under 8 KiB. The 1 MiB cap leaves >100× headroom.
* **`ParameterStatus ≤ 1 MiB`** — names are `NAMEDATALEN`-bounded; values are short GUC strings in practice.
* **`AuthenticationRequest ≤ 8 + 2 MiB`** — payload is the SCRAM/MD5/GSS continuation. Kerberos tokens reach a few hundred KB in pathological nested-group cases; the 2 MiB cap leaves >30× headroom.
* **`AuthenticationGSSContinue ≤ 8 + 2 MiB`** — payload is the GSS continuation token. Same shape as the SCRAM/GSS branch above.

Each soft cap is implemented as an inline check after `readMessageLength`, routed through `pgStream.failOnDesync(...)`:

```java
int len = pgStream.readMessageLength("NotificationResponse", 10);  // unconditional [10, MAX_MESSAGE_SIZE]
if (len > 1024 * 1024) {
  pgStream.failOnDesync(IOException::new, GT.tr(
      "Protocol error. NotificationResponse length {0} exceeds the 1 MiB pgjdbc cap.", len));
}
```

A fork that advertises a larger value for any of those caps would trip the inline check on otherwise well-formed traffic. The flag lets the user accept the larger value for that specific deployment without rebuilding the driver.

### How to configure

The JVM system property `pgjdbc.protocolHardeningMode` selects what happens on a relaxable-check failure. It is read once at driver class-load time and applies to every connection the JVM opens. Default: `warn`.

| Value | Behaviour |
|---|---|
| `fail` | Mark the `PGStream` broken and throw a connection-level error. The thrown message includes a hint pointing at this property and at the issue tracker, so that an SRE or AI assistant triaging the exception has actionable next steps without reading the source. Opt in to fail fast on a desynced or hostile backend. |
| `warn` (default) | Log the violation at `WARNING` with the exception attached (the stack trace points at the reader site that detected it). The caller continues with the suspect value, restoring the pre-#4015 read path. The driver may still crash later (`NegativeArraySizeException`, hang on socket read) if the data is genuinely corrupted, but the log breadcrumb lets the user confirm whether the check is firing on real or pathological-but-benign traffic. |
| `disable` | Silently skip the check. Discouraged; prefer `warn` so that the symptom remains visible in logs. |

Configure on the JVM command line:

```
-Dpgjdbc.protocolHardeningMode=fail
```

Example error message in `fail` mode:

```
Protocol error. DataRow has negative field count -1 (message size 5).
If you are sure this is a false positive, you can silence the error by
setting -Dpgjdbc.protocolHardeningMode=warn (log only) or
-Dpgjdbc.protocolHardeningMode=disable (no logging). Consider
filing a bug report at https://github.com/pgjdbc/pgjdbc/issues.
```

### Why `warn` is the default

The four soft caps are conservative pgjdbc-side ceilings; a real desync is very unlikely to slip through one of them. The risk that an upgrade trips one of these on a previously working workload (e.g. an exotic wire-compatible fork that advertises a larger `NotificationResponse` payload than upstream PostgreSQL) is real but small. Defaulting to `warn` means an upgrade cannot silently turn such a workload into hard connection failures; the log breadcrumb makes a false positive easy to identify, report, and patch. Operators who want fail-fast safety against a hostile or buggy backend opt into `fail` explicitly.

### When to switch to `fail` or `disable`

* `fail`: explicit opt-in for fail-fast behaviour on the four soft caps. A genuinely-too-large message results in a connection-level error and the marked-broken socket is evicted from the pool, preventing the next caller from inheriting a corrupted stream.
* `disable`: for users who hit a false positive on a soft cap, cannot wait for an upstream fix, and want the symptom suppressed even in logs. Discouraged. The unconditional checks listed in [What is checked unconditionally](#what-is-checked-unconditionally) remain in effect; `disable` only relaxes the four soft caps.

Sites that were already throwing before #4015 (`Session setup failed`, SSL/GSS handshake errors) are not affected by the flag either. They are real errors, not false positives the user can opt out of.

### Backward compatibility

The default `warn` mode preserves the pre-#4015 read path for the four soft caps: well-formed traffic works exactly as before, and a backend message that exceeds a soft cap but stays under `MAX_MESSAGE_SIZE` and `maxResultBuffer` is processed with a `WARNING` log line that names the violated cap. An upgrade cannot turn a previously working workload into hard connection failures over this set.

The unconditional set is new behaviour: a workload that previously hit one of these scenarios would have crashed with `NegativeArraySizeException`, desynced on the next message, or hung indefinitely on a socket read. The new code surfaces a clean `IOException` instead and evicts the connection from the pool. No production workload tripped these scenarios on well-formed traffic, so the change is invisible to well-behaved deployments.

The system property is opt-in. Users who want fail-fast safety against a hostile or buggy backend set `-Dpgjdbc.protocolHardeningMode=fail`.

### Tests

* `VisibleBufferedInputStreamTest` covers the bounded NUL-scan, the partial-read resume path, the position tracker across compaction, and the budget-exceeded throw.
* `ProtocolHardeningModeTest` covers all three modes, the silence-hint suffix, both exception factories (`IOException` and `PSQLException`), the case-insensitive parser, the unknown-value fallback to `warn` (the default), four unconditional-check fixtures verified to fire in every mode (DataRow field overrun, DataRow negative field length, `maxResultBuffer` overrun, and `endMessage` envelope mismatch), and two C-string-scan failures verified to set the broken flag at the throw site (positive-budget overrun, EOF mid-scan).

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-04-11 09:48  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-04-11 09:48 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

This is still WIP, however, I would like to know if we want adding more debug information to "desync detected" exception messages.

For instance, here's one of the failures:

```
        Caused by: java.io.IOException: Protocol error. C-string exceeds remaining message budget of 30 bytes.
            at org.postgresql.core.VisibleBufferedInputStream.scanCStringLength(VisibleBufferedInputStream.java:398)
            at org.postgresql.core.PGStream.receiveBoundedCanonicalStringIfPresent(PGStream.java:711)
            at org.postgresql.core.v3.QueryExecutorImpl.receiveParameterStatus(QueryExecutorImpl.java:3103)
            at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:3080)
            at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:237)
```

Should we add something like "the latest bytes are ..."?
I have mixed feelings about it:
1) It might be a security issue (e.g. the bytes might easily contain sensitive info)
2) It might be useless for the analysis
3) It might be helpful for the analysis (e.g. like in https://github.com/pgjdbc/pgjdbc/issues/4015)

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-04-11 10:15  "davecramer (@davecramer)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: davecramer (@davecramer) @ 2026-04-11 10:15 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> This is still WIP, however, I would like to know if we want adding more debug information to "desync detected" exception messages.
> 
> For instance, here's one of the failures:
> 
> ```
>         Caused by: java.io.IOException: Protocol error. C-string exceeds remaining message budget of 30 bytes.
>             at org.postgresql.core.VisibleBufferedInputStream.scanCStringLength(VisibleBufferedInputStream.java:398)
>             at org.postgresql.core.PGStream.receiveBoundedCanonicalStringIfPresent(PGStream.java:711)
>             at org.postgresql.core.v3.QueryExecutorImpl.receiveParameterStatus(QueryExecutorImpl.java:3103)
>             at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:3080)
>             at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:237)
> ```
> 
> Should we add something like "the latest bytes are ..."? I have mixed feelings about it:
> 
> 1. It might be a security issue (e.g. the bytes might easily contain sensitive info)
> 2. It might be useless for the analysis
> 3. It might be helpful for the analysis (e.g. like in [receiveTupleV3 allocates huge byte[] due to misread 4-byte field length under high concurrency #4015](https://github.com/pgjdbc/pgjdbc/issues/4015))

Not sure how it would be helpful. I would imagine that replicating such a failure would be difficult. These failures should be quite random so I doubt that subsequent failures would be the same. Plus I agree with the security issue.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-04 15:32  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-04 15:32 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

~I think this requires a minor version bump.~
UPD: I think the change is safe to merge in patch version as it should not harm benign workflows.

The tests pass, and I am happy with the added hardenings.

Comments are welcome.

@duwenice, it would be great if you could run the hardened build in your environment to check it it detects violations or introduces false positives.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 08:34  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-23 08:34 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I have added system property to configure the failure behariour: fail, warn, disable. That enables us shipping the feature in a patch release if we go with "no-harden by default" mode

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 12:37  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-23 12:37 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

This PR has been open since April 10 and I think it's ready to merge.

The most material recent change: warn is now the default for pgjdbc.protocolViolationBehaviour, so an upgrade cannot turn a previously-working workload into hard connection failures. Issue [#4015](https://github.com/pgjdbc/pgjdbc/issues/4015) itself and a handful of impossible-value checks always fail fast in every mode (they would otherwise reach a NegativeArraySizeException or an indefinite socket read a few lines later). Operators who want broader fail-fast safety opt into fail explicitly. See the updated PR description for the full mode table and rationale.

Planning to merge on Tuesday 26 May unless there's further feedback.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 13:49  "sehrope (@sehrope)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: sehrope (@sehrope) @ 2026-05-23 13:49 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I'm planning on reviewing this in the next few days.

My initial take is that protocol validation and hardening is a great idea. It's arguably less of an issue for a language like Java vs something memory unsafe (e.g., reading past buffer length in C), but still a great addition and would catch all kinds of odd server bugs too.

However, I think if such a violation is discovered it should just abort the connection. No options to disable or turn it into a warning. If the protocol is broken we should not try to continue as we've likely violated our own assumptions and invariants.

I get the desire to have incremental additions and not risk breaking people, but this should not break anyone if things work like they're supposed to. And anything it catches would be undefined behavior.

Anyway, I'll have more thoughts when I actually step through the details.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 18:54  "davecramer (@davecramer)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: davecramer (@davecramer) @ 2026-05-23 18:54 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/PGStream.java:113)

"Set to {@code true} the first time" — consider "Set to {@code true} on the first time" or, more naturally, "Becomes {@code true} when a protocol-level hardening check first rejects a backend message."

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 18:54  "davecramer (@davecramer)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: davecramer (@davecramer) @ 2026-05-23 18:54 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java:828)

The double-space in "Protocol error.  Session setup failed." is preserved from the original code, but since this is a user-facing error message, it might be worth fixing to a single space while you're touching these lines. (Appears in 3 places.)

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 18:54  "davecramer (@davecramer)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: davecramer (@davecramer) @ 2026-05-23 18:54 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/PGStream.java:171)

Nit: "Throws {@link IOException} when the caller read fewer or more body bytes" — "read" should be "has read" or restructure to "when fewer or more body bytes were read than the message envelope declared".

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 18:54  "davecramer (@davecramer)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: davecramer (@davecramer) @ 2026-05-23 18:54 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/ProtocolViolationBehaviour.java)

The class Javadoc says "Default: {@link #WARN}" but the PR description says "Default: `fail`". The code confirms WARN is the default (`fromSystemProperty()` returns WARN on null/empty). The PR description's table has a discrepancy:

> | `fail` (default) | Poison `PGStream` and throw...

One of these is wrong. Looking at the code, WARN is clearly the default. The PR description's table should mark `warn` as the default, not `fail`.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 19:01  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-23 19:01 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/ProtocolViolationBehaviour.java)

@davecramer , are you sure you reviewed the latest version of the PR?
I have updated the PR description 6 hours ago, and PR description reads "warn (default)"

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-23 20:13  "davecramer (@davecramer)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: davecramer (@davecramer) @ 2026-05-23 20:13 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/PGStream.java)

Set to makes more sense 

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-24 08:16  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-24 08:16 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java:828)

`"Protocol error. Session setup failed."` is mentioned in `.po` translation files as well. Do you suggest I should include that modification into the current PR?

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-26 01:51  "sehrope (@sehrope)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: sehrope (@sehrope) @ 2026-05-26 01:51 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

This is a pretty invasive PR and I wouldn't want this merged without extensive review, understanding, and testing by someone else. For an idea of the size, the diffs for `PGStream` and `QueryExecutorImpl` are so large that GitHub does not even shown them by default.

On the naming, I don't like "poison". It's not appropriate as that's not what that word means. I don't even think that aspect of the PR needs to exist. If there's a protocol violation for reading past the end of a message or a non-terminated string, just close the connection. We don't need to invent some new state for it. The connection would be garbage at that point anyway if it's going to be out of sync with the FEBE message flow. Just treat the same as we'd treat a socket error and close the connection.

Overall I like the idea and I think that once we're comfortable with something like this, it should just be on. There's no point in having enablement modes or a "debug" mode. If it's actually catching a broken wire protocol it does not make sense to keep processing anything. We're either comfortable with the hardening changes or we're not. It's not a half measure situation.

This goal may be better reached by splitting up the message tracking and more extensive checks. As a first step adding some plumbing for tracking the current message type + size. And propagating that through the code base. That's kind of done with the `pgStream.readMessageLength(...)`. As that's a known set of messages, it should take an enum, not an arbitrary string.

Once that's in place, expanding things to include intra-message length checks when reading fields from the message (e.g, reading C-strings). That would make it a lot easier for someone to understand all the places that are being changed.

Some of the other changes could go in on their own too. Like the buffer size changes to pgjdbc/src/main/java/org/postgresql/gss/GSSInputStream.java.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-26 05:54  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-26 05:54 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java:828)

There are many double space after colon patterns in the **source** messages, and they should be addressed separately. It would be a mechanical change with a regexp.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-26 06:28  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-26 06:28 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> There's no point in having enablement modes

The `pgjdbc.protocolViolationBehaviour` property does not relax protocol invariants. Checks that prove the stream is desynced remain fatal in every mode. The property only affects defensive hardening bounds where the PostgreSQL protocol does not define a strict maximum, such as the `1 MiB` caps for `NotificationResponse` and `ParameterStatus`. Those caps are intended to catch desyncs early, but because they are pgjdbc-chosen compatibility limits rather than protocol limits, the default is `warn`.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-26 06:41  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-26 06:41 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I can split the self-contained `GSSInputStream` packet-size cap if that helps, but the other GSS changes use the same `readMessageLength(...)` plumbing as the rest of the PR, so splitting those would create dependent PRs rather than independent review units.

I agree a structured descriptor may be better eventually. I intentionally kept `readMessageLength(String, ...)` narrow in this PR because the name is only diagnostic text, not protocol dispatch. Adding an enum raises separate design questions around frontend/backend direction, startup/auth state, replication, and context-specific messages. I am trying to keep this PR focused on enforcing bounds, not redesigning message metadata.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-26 08:50  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-26 08:50 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I pushed an update that narrows the configurable part further.

The current split is:

* protocol/envelope invariants always fail in every mode and mark the stream broken
* `pgjdbc.protocolHardeningMode` only applies to four pgjdbc-chosen soft caps:
  * `NotificationResponse > 1 MiB`
  * `ParameterStatus > 1 MiB`
  * `AuthenticationRequest > 8 + 2 MiB`
  * `AuthenticationGSSContinue > 8 + 2 MiB`

`readMessageLength(name, min, max)` now treats `max` as an unconditional protocol-derived hard cap. Soft caps are implemented as explicit inline checks through `failOnDesync(...)`, so the code mirrors the PR description.

On the review/testing concern: I agree this needs careful review. Could you clarify what concrete testing or review evidence would make this mergeable? For example, would green CI plus the targeted malformed-stream tests in this PR be sufficient, or is there a specific external workload/environment you want exercised?

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-26 11:16  "davecramer (@davecramer)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: davecramer (@davecramer) @ 2026-05-26 11:16 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> I think this requires a minor version bump.
> 
why ?


^ permalink  raw  reply  [nested|flat] 21+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015)
@ 2026-05-26 11:54  "vlsi (@vlsi)" <[email protected]>
  19 siblings, 0 replies; 21+ messages in thread

From: vlsi (@vlsi) @ 2026-05-26 11:54 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

>  a minor version bump

That was a stale comment. Initially I thought the change could break benign workflows, however, now I think it is fine for patch change.
Every mandatory check implemented here comes from the protocol specification. The extra checks like `NotificationResponse ≤ 1 MiB` are "warning by default" as there might be PostgreSQL forks using 1MiB notifications for something useful.

^ permalink  raw  reply  [nested|flat] 21+ messages in thread

end of thread, other threads:[~2026-05-26 11:54 UTC | newest]

Thread overview: 21+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-10 13:28 [pgjdbc/pgjdbc] PR #4016: feat: harden protocol reader against desynced streams (#4015) "vlsi (@vlsi)" <[email protected]>
2026-04-11 09:48 ` "vlsi (@vlsi)" <[email protected]>
2026-04-11 10:15 ` "davecramer (@davecramer)" <[email protected]>
2026-05-04 15:32 ` "vlsi (@vlsi)" <[email protected]>
2026-05-23 08:34 ` "vlsi (@vlsi)" <[email protected]>
2026-05-23 12:37 ` "vlsi (@vlsi)" <[email protected]>
2026-05-23 13:49 ` "sehrope (@sehrope)" <[email protected]>
2026-05-23 18:54 ` "davecramer (@davecramer)" <[email protected]>
2026-05-23 18:54 ` "davecramer (@davecramer)" <[email protected]>
2026-05-23 18:54 ` "davecramer (@davecramer)" <[email protected]>
2026-05-23 18:54 ` "davecramer (@davecramer)" <[email protected]>
2026-05-23 19:01 ` "vlsi (@vlsi)" <[email protected]>
2026-05-23 20:13 ` "davecramer (@davecramer)" <[email protected]>
2026-05-24 08:16 ` "vlsi (@vlsi)" <[email protected]>
2026-05-26 01:51 ` "sehrope (@sehrope)" <[email protected]>
2026-05-26 05:54 ` "vlsi (@vlsi)" <[email protected]>
2026-05-26 06:28 ` "vlsi (@vlsi)" <[email protected]>
2026-05-26 06:41 ` "vlsi (@vlsi)" <[email protected]>
2026-05-26 08:50 ` "vlsi (@vlsi)" <[email protected]>
2026-05-26 11:16 ` "davecramer (@davecramer)" <[email protected]>
2026-05-26 11:54 ` "vlsi (@vlsi)" <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox