pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed
[pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
17+ messages / 3 participants
[nested] [flat]

* [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-10 23:11  "rlbdv (@rlbdv)" <[email protected]>
  0 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-10 23:11 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

With 42.7.7 this code fails:
```
// assuming: create table test (data bytea)
s = connection.prepareStatement("insert into test (data) values (?)");
obj = PGobject();
obj.setType("bytea");
obj.setValue("\\x01020304");
s.setObject(1, obj);
s.toString()
```
with:
```
org.postgresql.util.PSQLException: Unable to convert bytea parameter at position 0 to literal
 at org.postgresql.core.v3.SimpleParameterList.toString (SimpleParameterList.java:267)
    org.postgresql.core.NativeQuery.toString (NativeQuery.java:73)
    org.postgresql.core.v3.SimpleQuery.toString (SimpleQuery.java:63)
    org.postgresql.core.v3.SimpleQuery.toString (SimpleQuery.java:58)
    org.postgresql.jdbc.PgPreparedStatement.toString (PgPreparedStatement.java:1107)
...
```

Possibly related: #3365 

This problem was discovered after one of PuppetDB's concurrency tests failed. The test was deliberately provoking an expected database transaction serialization exception, and when that exception occurred, it provoked a crash in `PGbytea.toPGLiteral` because it was passed the `PGObject`'s "\x..." `getValue` string and `toPGLiteral` has no handler for `String` (only `byte[]`). As a result, it throws the final "Can't convert String to bytea" `IllegalArgumentException`.

The test ends up in `toPGLiteral` via `BatchResultHandler.handleError()`'s attempt to produce a string via `queries[resultIndex].toString(...)`.

I'm not certain, but I'm moderately sure that the `SimpleParameterList.toString` failure is caused by the same final throw in `toPGLiteral`. 

## Environment:

Debian/trixie
PostgreSQL Debian 17.5-1

openjdk version "21.0.8" 2025-07-15
OpenJDK Runtime Environment (build 21.0.8+9-Debian-1)
OpenJDK 64-Bit Server VM (build 21.0.8+9-Debian-1, mixed mode, sharing)


^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-10 23:47  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-10 23:47 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I don't know pgjdbc well enough to have a good sense whether this is a plausible fix, but it does appear to resolve the immediate concerns.

In previous pgjdbc releases, a PGobject of type "bytea" and with a "\xHEX" value worked fine. It looks like what's happening now is that the string value is being passed directly to `toPGLiteral`, so this patch just adds a clause to handle it.

```
--- a/pgjdbc/src/main/java/org/postgresql/util/PGbytea.java
+++ b/pgjdbc/src/main/java/org/postgresql/util/PGbytea.java
@@ -182,6 +182,16 @@ public class PGbytea {
    * @throws IOException in case there's underflow in the input value
    */
   public static String toPGLiteral(Object value, SqlSerializationContext context) throws IOException {
+
+    if (value instanceof String) {
+      String str = (String) value;
+      StringBuilder sb = new StringBuilder(str.length() + 9);
+      sb.append("'");
+      sb.append(str);
+      sb.append("'::bytea");
+      return sb.toString();
+    }
+
     if (value instanceof byte[]) {
       byte[] bytes = (byte[]) value;
       StringBuilder sb = new StringBuilder(bytes.length * 2 + 11);
```


^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 07:22  "vlsi (@vlsi)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: vlsi (@vlsi) @ 2025-08-11 07:22 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

@rlbdv , the analysis sounds quite right, however, there's an issue with the current way of handling literals.

The important point is that `org.postgresql.core.v3.SimpleParameterList#toString(int, org.postgresql.core.v3.SqlSerializationContext)` should not fall into SQL-injection issues.

Currently, every unknown literal goes through `org.postgresql.core.v3.SimpleParameterList#quoteAndCast` which ensures the literal is quoted.

However, it is not clear how to make it work with user-provided literals which might already be quoted.
For instance, in your example you provide `pgObject.setValue("\\x01020304")` which is a [hex format](https://www.postgresql.org/docs/current/datatype-binary.html#DATATYPE-BINARY-BYTEA-HEX-FORMAT) encoding for `bytea`. Apparently, the driver should not quote `\`, so wrapping the value with `quoteAndCast` would yield an invalid literal of `\\x01020304`. On the other hand, if we blindly trust user-provided literal, it might introduce SQL-injection.

It looks like `bytea` is an exception thanks to its two special encodings.

I would suggest the following:
a) If the literal starts with `\x`, then assume it is `hex format`. Then verify the rest of the string includes only hex digits or whitespace (see PG documentation). If the verification fails, throw an error.
b) If the literal does not start with `\x`, then assume it is `bytea escape format`. In this case, apply the regular `SimpleParameterList#quoteAndCast` logic.

In other words, there should be an extra `if (value instanceof String && !value.startsWith("\\x"))`-like check before `return PGbytea.toPGLiteral` in `SimpleParameterList`.

---

Of course, it would be a non-issue if one uses `org.postgresql.jdbc.PgPreparedStatement#setBytes`. At the same time, it might be a nice idea to have something like `PGobject` which would allow users to pass (in and out) binary-encoded values (I guess it was requested already some time ago).

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 16:27  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-11 16:27 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Oh, *of course*.

And just to be sure it's clear, previously, PGobjects with type "bytea" and values of "\xHEX" did work, (PuppetDB has relied on that in various places). Though did pgjdbc actually intend to support that?

In any case, if support is desirable now, then I think I understand what you're suggesting, and I'll likely pursue it. Although you may have to bear with me -- my Java is somewhat rusty.


^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 16:59  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-11 16:59 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Would it be plausible to start with just support for the simpler hex format?

If so, here's a very pedestrian attempt. Happy to make adjustments, and/or to add some tests, if this seems reasonable and I have time to figure that out.

```java
    if (value instanceof String) {
      String str = (String) value;
      StringBuilder sb = new StringBuilder(str.length() + 9);

      if (!str.startsWith("\\x"))
        throw new IllegalArgumentException(GT.tr("bytea string parameters must be hex format"));

      sb.append("'\\x");

      int i = 2;
      while (i < str.length()) {
        while (Character.isWhitespace(str.charAt(i)))
          i++;
        if (i == str.length())
          break;
        if (i + 2 > str.length())
          throw new IllegalArgumentException(GT.tr("Truncated bytea hex format"));
        char c1 = str.charAt(i);
        char c2 = str.charAt(i + 1);
        if ("0123456789abcdefABCDEF".indexOf(c1) == -1)
            throw new IllegalArgumentException(GT.tr("Invalid bytea hex format character {0}", c1));
        if ("0123456789abcdefABCDEF".indexOf(c2) == -1)
            throw new IllegalArgumentException(GT.tr("Invalid bytea hex format character {0}", c2));
        sb.append(c1);
        sb.append(c2);
        i += 2;
      }

      sb.append("'::bytea");
      return sb.toString();
    }
```


^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 18:28  "vlsi (@vlsi)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: vlsi (@vlsi) @ 2025-08-11 18:28 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Here's my attempt to optimize it via ChatGPT:

https://chatgpt.com/share/689a3548-60f4-800f-bab8-6550415687c4

That was "5 thinking" model, and I still had to ask it multiple times to remove allocations.

At the same time, it was able to apply a hack from Netty when I mentioned the relevant netty PR. I wish it was able to know that trick on its own though.

Claude Opus 4.1 generates pretty much the same thing (it does care to suggest an implementation to validate and an implementation to parse the hex): https://claude.ai/share/92589422-3a8f-4516-beee-c830591e02cc

I haven't benchmarked the implementations though.

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 18:51  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-11 18:51 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Are "bytea" PGobjects with "\xHEX" syntax something pgjdbc intended to support, or was that support just incidental before? If the latter, and it's not documented / commonly used (perhaps it's just PuppetDB?), then I assume another option might be to avoid the additional complexity by just rejecting them in a way that doesn't crash toString.

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 19:14  "davecramer (@davecramer)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: davecramer (@davecramer) @ 2025-08-11 19:14 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> Are "bytea" PGobjects with "\xHEX" syntax something pgjdbc intended to support, or was that support just incidental before? If the latter, and it's not documented / commonly used (perhaps it's just PuppetDB?), then I assume another option might be to avoid the additional complexity by just rejecting them in a way that doesn't crash toString.

No, \xHEX are not special see https://www.postgresql.org/docs/current/datatype-binary.html

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 20:22  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-11 20:22 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

To clarify, what I meant was, was this intended to be a supported way to store a bytea value:
```
obj = PGObject();
obj.setType("bytea");
obj.setValue("\\xabcdef");
prepared_statement.setObject(i, obj);
```
(Previously it would "work", but now it (eventually) provokes the exception.)

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 20:36  "davecramer (@davecramer)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: davecramer (@davecramer) @ 2025-08-11 20:36 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

It should work. 

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 21:53  "davecramer (@davecramer)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: davecramer (@davecramer) @ 2025-08-11 21:53 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I just tried 
```
public static void main(String[] args) throws Exception {
        try (Connection connection = DriverManager.getConnection(DB_URL, DB_USERNAME, DB_PASSWORD)) {
            try (Statement statement = connection.createStatement()) {
                statement.execute("create table if not exists btable(b bytea)");
            }
            try (PreparedStatement pstmt = connection.prepareStatement("insert into btable(b) values(?)")) {
                PGobject pg = new PGobject();
                pg.setType("bytea");
                pg.setValue("\\xabcdef");
                pstmt.setObject(1, pg);
                pstmt.executeUpdate();

            }
        }
    }
```

with 42.7.7 and it works fine.

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 22:50  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-11 22:50 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Oh, try `System.err.println(pstmt)` (see other discussion above).

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-11 23:03  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-11 23:03 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I was wondering if we could work around the regression without too much difficulty by just switching to `setBytes()`, but didn't see how we might handle a prepared statement parameter that's an array of bytea values without "bytea" PGobjects, offhand.

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-12 09:09  "davecramer (@davecramer)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: davecramer (@davecramer) @ 2025-08-12 09:09 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Amazon Q came up with the same code 
After analyzing the Netty PR #9896, we can apply similar optimizations using lookup tables and bitwise operations. Here's the reworked version:

```java
if (value instanceof String) {
    final String str = (String) value;
    final int length = str.length();
    
    // Fast fail if string is too short
    if (length < 2 || str.charAt(0) != '\\' || str.charAt(1) != 'x') {
        throw new IllegalArgumentException(GT.tr("bytea string parameters must be hex format"));
    }

    // Pre-calculate capacity: prefix('\x') + actual hex digits + suffix('::bytea')
    final StringBuilder sb = new StringBuilder(length + 7);  // Conservative estimate
    sb.append("'\\x");
    
    // Lookup table for hex validation - similar to Netty's approach
    // Using byte[] instead of int[] as we only need 1 bit per entry
    static final byte[] HEX_LOOKUP = new byte[256];
    static {
        // Initialize lookup table for hex characters
        for (char c = '0'; c <= '9'; c++) HEX_LOOKUP[c] = 1;
        for (char c = 'a'; c <= 'f'; c++) HEX_LOOKUP[c] = 1;
        for (char c = 'A'; c <= 'F'; c++) HEX_LOOKUP[c] = 1;
    }

    int i = 2;
    while (i < length) {
        char c = str.charAt(i);
        
        // Skip whitespace using bitwise operation
        // Based on Netty's implementation
        if ((c <= ' ' && ((1L << c) & ((1L << ' ') | (1L << '\t') | (1L << '\r') | (1L << '\n'))) != 0)) {
            i++;
            continue;
        }

        // Check if we have enough characters left
        if (i + 2 > length) {
            throw new IllegalArgumentException(GT.tr("Truncated bytea hex format"));
        }

        // Get hex digits
        final char c1 = c;
        final char c2 = str.charAt(i + 1);

        // Validate hex digits using lookup table
        if (c1 >= HEX_LOOKUP.length || HEX_LOOKUP[c1] == 0) {
            throw new IllegalArgumentException(GT.tr("Invalid bytea hex format character {0}", c1));
        }
        if (c2 >= HEX_LOOKUP.length || HEX_LOOKUP[c2] == 0) {
            throw new IllegalArgumentException(GT.tr("Invalid bytea hex format character {0}", c2));
        }

        sb.append(c1);
        sb.append(c2);
        i += 2;
    }

    sb.append("'::bytea");
    return sb.toString();
}
```

Key improvements based on Netty's PR:

1. More efficient prefix check
   - Direct char comparison instead of `startsWith()`
   - Removes string allocation from `startsWith()`

2. Optimized whitespace checking
   - Uses bitwise operations similar to Netty's implementation
   - Single operation to check for all whitespace characters
   - Avoids method call to `Character.isWhitespace()`

3. Better hex validation
   - Static lookup table using byte[] for minimal memory footprint
   - Single array bounds check and lookup instead of string search
   - Byte array uses 1 bit effectively vs boolean array's 8 bits
   - Table is shared across all instances

4. Removed separate counting pass
   - StringBuilder will grow if needed
   - Initial capacity is conservative estimate
   - Trade-off between extra counting pass vs potential resize

5. Simplified loop structure
   - Single while loop with continue for whitespace
   - Reduced number of method calls
   - More straightforward flow control

6. Memory efficiency
   - Static lookup table shared across all instances
   - No temporary object creation
   - Minimal memory footprint for validation table

This implementation combines the best practices from both the original optimization and Netty's approaches:
- Uses Netty's efficient whitespace checking
- Employs lookup table for hex validation
- Maintains good readability while being highly optimized
- Reduces method calls and object creation
- Uses minimal memory for validation structures

The code should perform better than the original version while maintaining clear intent and good maintainability. As always, actual performance improvements should be verified through benchmarking in your specific use case.



^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-12 13:05  "vlsi (@vlsi)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: vlsi (@vlsi) @ 2025-08-12 13:05 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

>but didn't see how we might handle a prepared statement parameter that's an array of bytea values without "bytea" PGobjects, offhand.

@rlbdv , have you tried `conn.createArrayOf("bytea", byte[][])`?
See
* https://github.com/pgjdbc/pgjdbc/blob/3f1e7c51ac1bb43c206afaf82a4a1df52268037a/pgjdbc/src/test/java/...
* https://github.com/pgjdbc/pgjdbc/blob/3f1e7c51ac1bb43c206afaf82a4a1df52268037a/pgjdbc/src/test/java/...


^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-12 19:17  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-12 19:17 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Yep, that appears to work just fine, and grepping around I found the related server-prepare.md docs that I suspect none of us had noticed before. It also looks like with a bit of hacking it's possible to extend the relevant libraries (clojure.java.jdbc and next-jdbc) to handle the values in question that way.

Not sure whether we'll do that for now, but glad to know about the option.  Much appreciated.

^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7
@ 2025-08-13 03:00  "rlbdv (@rlbdv)" <[email protected]>
  15 siblings, 0 replies; 17+ messages in thread

From: rlbdv (@rlbdv) @ 2025-08-13 03:00 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I also wondered whether it might make sense to link from https://jdbc.postgresql.org/documentation/binary-data/ to https://jdbc.postgresql.org/documentation/server-prepare/#arrays -- not sure.

^ permalink  raw  reply  [nested|flat] 17+ messages in thread


end of thread, other threads:[~2025-08-13 03:00 UTC | newest]

Thread overview: 17+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-08-10 23:11 [pgjdbc/pgjdbc] issue #3757: PreparedStatement.toString() fails for bytea parameters with at least 42.7.7 "rlbdv (@rlbdv)" <[email protected]>
2025-08-10 23:47 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-11 07:22 ` "vlsi (@vlsi)" <[email protected]>
2025-08-11 16:27 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-11 16:59 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-11 18:28 ` "vlsi (@vlsi)" <[email protected]>
2025-08-11 18:51 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-11 19:14 ` "davecramer (@davecramer)" <[email protected]>
2025-08-11 20:22 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-11 20:36 ` "davecramer (@davecramer)" <[email protected]>
2025-08-11 21:53 ` "davecramer (@davecramer)" <[email protected]>
2025-08-11 22:50 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-11 23:03 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-12 09:09 ` "davecramer (@davecramer)" <[email protected]>
2025-08-12 13:05 ` "vlsi (@vlsi)" <[email protected]>
2025-08-12 19:17 ` "rlbdv (@rlbdv)" <[email protected]>
2025-08-13 03:00 ` "rlbdv (@rlbdv)" <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox