Message-ID: <gh-pgjdbc-pgjdbc-3657@github.com>
From: "ilovethebeatles (@ilovethebeatles)" <noreply+ilovethebeatles@github.com>
To: "pgjdbc/pgjdbc" <noreply+pgjdbc-pgjdbc@github.com>
Date: Tue, 10 Jun 2025 22:21:46 +0000
Subject: [pgjdbc/pgjdbc] PR #3657: perf: cache composite and simple subqueries
List-Id: <gh-pgjdbc-pgjdbc.github.com>
X-GitHub-Author-Id: 106533857
X-GitHub-Author-Login: ilovethebeatles
X-GitHub-Issue: 3657
X-GitHub-Labels: performance
X-GitHub-Repo: pgjdbc/pgjdbc
X-GitHub-State: open
X-GitHub-Type: pull_request
X-GitHub-Url: https://github.com/pgjdbc/pgjdbc/pull/3657
Content-Type: text/plain; charset=utf-8

The driver now supports composite queries, which theoretically should execute far more efficiently. However, the current implementation treats every simple statement within a composite query as distinct, even when they’re identical. For example, in SQL like:  "select * from table where col = ?;  select * from table where col = ?; " the driver considers these two subqueries completely separate and rebuilds an execution plan for each.  

Queries are cached in a map called statementCache, where the key is the full SQL string. When a composite query runs, only the entire statement is cached—not its individual components. As a result, if a later composite query contains any of the same simple subqueries, their plans cannot be reused because they aren’t stored in statementCache; instead, they are regenerated from scratch. Since plan generation can be very time-consuming, this behavior significantly degrades performance in workloads with frequently repeated subqueries.  

In my optimization, I cache both composite queries and their constituent simple queries. Identical subqueries are now recognized by their SQL string, and their existing plans are reused, improving execution speed.  When caching, driver checks executeCount on full queries but does not update сount for subqueries. Therefore, if you use threshold=1 to cache subquery plans on first execution and enable reuse in subsequent composite queries. If your project needs to execute large composite parameterized queries with identical subqueries, enable threshold=1 to cache each subquery plan on first execution and maximize reuse.

Future work could include recalculating executeCount for subqueries and incorporating parameter data types into cache keys to further improve caching accuracy.

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing](https://github.com/pgjdbc/pgjdbc/blob/master/CONTRIBUTING.md) document?
* [x] Have you checked to ensure there aren't other open [Pull Requests](../../pulls) for the same update/change?

<!-- You can erase any parts of this template not applicable to your Pull Request. -->

### New Feature Submissions:

1. [ ] Does your submission pass tests?
2. [ ] Does `./gradlew styleCheck` pass ?
3. [ ] Have you added your new test classes to an existing test suite in alphabetical order?

### Changes to Existing Features:

* [ ] Does this break existing behaviour? If so please explain.
* [x] Have you added an explanation of what your changes do and why you'd like us to include them?
* [x] Have you written new tests for your core changes, as applicable?
* [x] Have you successfully run tests with your changes locally?