pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed
[pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
16+ messages / 3 participants
[nested] [flat]

* [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-06-10 22:21 "ilovethebeatles (@ilovethebeatles)" <[email protected]>
  0 siblings, 0 replies; 16+ messages in thread

From: ilovethebeatles (@ilovethebeatles) @ 2025-06-10 22:21 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

The driver now supports composite queries, which theoretically should execute far more efficiently. However, the current implementation treats every simple statement within a composite query as distinct, even when they’re identical. For example, in SQL like:  "select * from table where col = ?;  select * from table where col = ?; " the driver considers these two subqueries completely separate and rebuilds an execution plan for each.  

Queries are cached in a map called statementCache, where the key is the full SQL string. When a composite query runs, only the entire statement is cached—not its individual components. As a result, if a later composite query contains any of the same simple subqueries, their plans cannot be reused because they aren’t stored in statementCache; instead, they are regenerated from scratch. Since plan generation can be very time-consuming, this behavior significantly degrades performance in workloads with frequently repeated subqueries.  

In my optimization, I cache both composite queries and their constituent simple queries. Identical subqueries are now recognized by their SQL string, and their existing plans are reused, improving execution speed.  When caching, driver checks executeCount on full queries but does not update сount for subqueries. Therefore, if you use threshold=1 to cache subquery plans on first execution and enable reuse in subsequent composite queries. If your project needs to execute large composite parameterized queries with identical subqueries, enable threshold=1 to cache each subquery plan on first execution and maximize reuse.

Future work could include recalculating executeCount for subqueries and incorporating parameter data types into cache keys to further improve caching accuracy.

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing](https://github.com/pgjdbc/pgjdbc/blob/master/CONTRIBUTING.md) document?
* [x] Have you checked to ensure there aren't other open [Pull Requests](../../pulls) for the same update/change?

<!-- You can erase any parts of this template not applicable to your Pull Request. -->

### New Feature Submissions:

1. [ ] Does your submission pass tests?
2. [ ] Does `./gradlew styleCheck` pass ?
3. [ ] Have you added your new test classes to an existing test suite in alphabetical order?

### Changes to Existing Features:

* [ ] Does this break existing behaviour? If so please explain.
* [x] Have you added an explanation of what your changes do and why you'd like us to include them?
* [x] Have you written new tests for your core changes, as applicable?
* [x] Have you successfully run tests with your changes locally?


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-06-11 18:45 ` "davecramer (@davecramer)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: davecramer (@davecramer) @ 2025-06-11 18:45 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

cool, LGTM

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-06-11 19:29 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-06-11 19:29 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Could you please measure the performance before and after the change?

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-06-12 09:53 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-06-12 09:53 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Please add a test that verifies a case when client uses different bind types for "the same" statement.

For example:

```java
String sql = "select * from inttable where a = ?;select * from inttable where a = ?;select * from inttable where a = ?";
...
ps.setInt(1, 42); ps.setString(2, "43"); ps.setDouble(3, 44.0);
```

The execution plan might depend on the result shape, so we can't easily reuse server-side handles by looking at the SQL alone.

We might be looking at https://github.com/pgjdbc/pgjdbc/issues/345

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-06-16 19:31 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-06-16 19:31 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on benchmarks/src/jmh/java/org/postgresql/benchmark/statement/SelectBatch.java)

Have you considered using different benchmark methods instead?

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-06-21 18:18 ` "ilovethebeatles (@ilovethebeatles)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: ilovethebeatles (@ilovethebeatles) @ 2025-06-21 18:18 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> Please add a test that verifies a case when client uses different bind types for "the same" statement.
> 
> For example:
> 
> ```java
> String sql = "select * from inttable where a = ?;select * from inttable where a = ?;select * from inttable where a = ?";
> ...
> ps.setInt(1, 42); ps.setString(2, "43"); ps.setDouble(3, 44.0);
> ```
> 
> The execution plan might depend on the result shape, so we can't easily reuse server-side handles by looking at the SQL alone.
> 
> We might be looking at #345

Added test for this case

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-06-22 19:59 ` "ilovethebeatles (@ilovethebeatles)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: ilovethebeatles (@ilovethebeatles) @ 2025-06-22 19:59 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> Could you please measure the performance before and after the change?

Implemented benchmarks to measure throughput in the following cases:
1. Composite queries of different size, consisting of repeated subqueries, `prepareThreshold=1`  
   For k = 1…1000 (step 1), we build and execute a single SQL statement that repeats the subquery  
  "select val from bench_select where id = ?;"
k times (so first a 1-subquery statement, then 2-subquery, … up to 1000-subquery), cycling through these composite queries on one connection. With `threshold=1`, each subquery plan is cached on its first execution and then reused in all later composite queries.
2. Composite queries of different size, consisting of repeated subqueries, `prepareThreshold=5` (default)
Same loop, but with `threshold=5` (the default value for threshold). Subquery plans are only cached once the full composite statement has reached five executions, but because the 256-entry `statementCache` evicts those composite queries before they hit five runs, the subqueries never actually get cached. As a result, throughput stays the same as before.
3. Single parametrized `SELECT`
Just repeat the same simple `SELECT` under both thresholds to make sure the common case isn’t slower after the change.
Here are the results for the throughput in simple selects per second before and after my optimization:

| Scenario                          | Threshold      | Before, selects/s          | After , selects/s       |
|-----------------------------------|:--------------:|:----------------:|:----------------:|
| Composite batch-select loop       | 5 (default)    | 33 374 ± 383     | 33 622 ± 495     |
| Composite batch-select loop       | 1              | 29 355 ± 744     | 118 576 ± 450    |
| Single parametrized `SELECT`      | 5  (default)             | 13 643 ± 574     | 14 101 ± 68      |
| Single parametrized `SELECT`      | 1              | 14 143 ± 151     | 13 629 ± 496     |

- **Threshold=5**: no real change because subqueries never reach the threshold, so the behaviour doesn't change
- **Threshold=1**: 
    - ~3.5× speedup in comparison to `threshold=5`, showing the benefit of immediate subquery caching in large composite queries with frequently repeated subqueries
     - ~4× speedup over the stock driver, highlighting the impact of subquery deduplication.
- **Single-query**: performance within measurement noise, confirming no regression for the common case.

Further development might include tracking `executeCount` per subquery and adding parameter types to cache keys for better perfomance and accuracy.  





^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 15:27 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 15:27 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on benchmarks/src/jmh/java/org/postgresql/benchmark/statement/SelectBatch.java:76)

Frankly speaking, I am afraid it is a suboptimal way of executing the queries.

The suggested SQL would have:
* 1-query statement
* 2-query statement
* 3-query statement
* ...
* 1000-query statement

That would consume n^2 memory.

The typical approach for these type of workloads is to use 2^N splits:

* 1-query statement
* 2-query statement
* 4-query statement
* 8-query statement
* 16-query statement
* 32-query statement
* 64-query statement
* 128-query statement
* 256-query statement
* 512-query statement
* 1024-query statement

Then you would have 11 queries that would be enough to cover any possible number of queries in a range of 1..2047.

For instance executing 356 would require 256+64+32+4=356.

WDYT?

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 15:56 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 15:56 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on benchmarks/src/jmh/java/org/postgresql/benchmark/statement/SelectBatch.java:55)

```suggestion
    Properties props = new Properties();
    PGProperty.PREPARE_THRESHOLD.set(props, prepareThreshold);
    connection = TestUtil.openDB(props);
```

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 15:57 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 15:57 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on benchmarks/src/jmh/java/org/postgresql/benchmark/statement/SelectBatch.java:115)

try-with-resources?

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 15:58 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 15:58 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/test/java/org/postgresql/test/jdbc2/PreparedStatementTest.java:1806)

Is there a reason to catch the exception? I guess test method could throw it.

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 15:58 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 15:58 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/test/java/org/postgresql/test/jdbc2/PreparedStatementTest.java:1810)

I see the rest might be using `closeQuietly`, however, it would be better to go for try-with-resources in the new code

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 16:00 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 16:00 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/test/java/org/postgresql/test/jdbc2/PreparedStatementTest.java:1784)

Just wondering: do your production usages include executing statements of various shapes? For instance something like insert-select-insert, "select from a; select from b; select from c"?

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 16:02 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 16:02 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/test/java/org/postgresql/test/jdbc2/PreparedStatementTest.java:1815)

Just wondering: is it covered with existing `BinaryMode=ON` test parameter?
If not, we should probably consider adding `prepareThreshold` variation on a global level rather than adding individual tests.

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 16:04 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 16:04 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/test/java/org/postgresql/test/jdbc3/CompositeQueryParseTest.java:234)

This looks like a leftover

```suggestion
```

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
@ 2025-07-02 16:05 ` "vlsi (@vlsi)" <[email protected]>
  14 siblings, 0 replies; 16+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 16:05 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/test/java/org/postgresql/test/jdbc3/CompositeQueryParseTest.java:249)

`+e.getMessage()` rarely helps. Typically you want adding the relevant method arguments and/or local variables to make it easier for the end-user to analyze the failure

^ permalink  raw  reply  [nested|flat] 16+ messages in thread


end of thread, other threads:[~2025-07-02 16:05 UTC | newest]

Thread overview: 16+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-06-10 22:21 [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries "ilovethebeatles (@ilovethebeatles)" <[email protected]>
2025-06-11 18:45 ` "davecramer (@davecramer)" <[email protected]>
2025-06-11 19:29 ` "vlsi (@vlsi)" <[email protected]>
2025-06-12 09:53 ` "vlsi (@vlsi)" <[email protected]>
2025-06-16 19:31 ` "vlsi (@vlsi)" <[email protected]>
2025-06-21 18:18 ` "ilovethebeatles (@ilovethebeatles)" <[email protected]>
2025-06-22 19:59 ` "ilovethebeatles (@ilovethebeatles)" <[email protected]>
2025-07-02 15:27 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 15:56 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 15:57 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 15:58 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 15:58 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 16:00 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 16:02 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 16:04 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 16:05 ` "vlsi (@vlsi)" <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox