[pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock

pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed

[pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
14+ messages / 3 participants
[nested] [flat]

* [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-06-27 14:36 "Sanne (@Sanne)" <[email protected]>
  0 siblings, 0 replies; 14+ messages in thread

From: Sanne (@Sanne) @ 2025-06-27 14:36 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

**Describe the issue**
I'm running some benchmarks on a Quarkus application using Hibernate ORM and pgjdbc. While I admit there are also other bottlenecks which worry me more than this one, it's quite noticeable on flamegraphs that we spend significant CPU cycles on `ResourceLock.obtain` and `ResourceLock.close`.

While the application is heavily parallel and using many threads, the general model is that a single HTTP request is being handled within the confines of a single backend thread: there is zero contention on the pgjdbc core resources such as `PgConnection` and `QueryExecutorBase`.

I'm sure that - in our design - we actually don't need these locks at all. But I appreciate the JDBC spec requirement and the need to accommodate for other architectures.

I see multiple possibilities, not sure what you'd find acceptable:

1. Introduce a global flag (perhaps even a JVM property?) which fully disables locking: make it responsibility of the system architects to play with such a setting at their own risk.
2. Have a detailed look at some of them and try to improve the locking patterns as I have the impression that some of them are unneccessary.

For example, I *think* that in the case of  `PgConnection`.`clearWarnings` one could simply remove the lock: it's guarding a `checkClosed()` invocation (which several other methods perform w/o the lock so seems acceptable) and also `queryExecutor.getWarnings();` : but the query executor has its own lock so it's definitely redundant. The method is also setting `firstWarning` to `null` but we'd be better off in replacing this with a volatile.
This is just an example, I see plenty of methods which seem to lock excessively and could be simplified.

I'd be happy to provide patches - just checking if these approaches would be acceptable?

Attaching a flamegraph as well, in case someone is interested to see them:
[flamegraph_quarkus_db_256.zip](https://github.com/user-attachments/files/20950646/flamegraph_quarkus_db_256.zip)

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-06-28 22:21 ` "davecramer (@davecramer)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: davecramer (@davecramer) @ 2025-06-28 22:21 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

So I think both approaches have merit.

For the experienced they could run lock free and for others we really should do a better job.

PR's are welcome

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-06-30 07:02 ` "vlsi (@vlsi)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: vlsi (@vlsi) @ 2025-06-30 07:02 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

@Sanne , can you please double-check you shared the right profiling results?
I am afraid the file does not contain the traces of async-profiler data.

---

> For example, I think that in the case of PgConnection.clearWarnings one could simply remove the lock: it's guarding a checkClosed() ... The method is also setting firstWarning to null but we'd be better off in replacing this with a volatile.

Imagine the following:
1) A connection executes `copy` command
2) A different thread calls `.getWarnings()`

If we remove the locking there, the noticeable behaviour will change, and we want keeping the backward compatibility:
a) With locking, the second thread would have to wait for `copy` completion and then it will obtain the warnings
b) Without locking, it would return no warnings

Of course, the reentrant locking could probably be eliminated (e.g. we should probably aquire a lock on the top-level method, and we should refrain from reentrant acquire of the same lock). We could indeed replace certain lock instances with `synchronized` (e.g. `LruCache`), however [previously](https://github.com/pgjdbc/pgjdbc/issues/1951#issuecomment-1254442789), the understanding was that `ReentrantLock` was cheap enough.

---

Benchmarks are always welcome. If you could contribute a benchmark (Quarkus + Hibernate is fine) or a link to the benchmark, that would be great.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-06-30 09:13 ` "Sanne (@Sanne)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: Sanne (@Sanne) @ 2025-06-30 09:13 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Hi @davecramer ! I'll explore something then, thanks!

Hi @vlsi , thank you very much for the comments! I need to look at the code with some care to understand your scenario better. At first impression I'm not really sure I sympatise with a user expecting a very specific behaviour from a concurrent, uncoordinated call to `getWarnings` ? Surely they can't be relying of going to get the set of warnings as defined before or after another operation when there is no coordination on their side, such as to ensure that the `getWarnings` invocation itself is happening before rather than after the other operation? But I'm not sure of such use cases, I'll think about it - thank you so much for the point. I am totally aware that simplifying the locking schemes will be tricky and perhaps I should stay away from that. 

> however https://github.com/pgjdbc/pgjdbc/issues/1951#issuecomment-1254442789, the understanding was that ReentrantLock was cheap enough.

Right I've seen the previous discussion. I actually remember it .. we had to patch Hibernate for loom compatibility as a follow up :)  But in this particular benchmark I'm running these days it's noticeable. Again, I admit it's not the main concern we have - there's other things in the picture that I'd also want to improve on, but this one seems like a low hanging fruit as we're saturating CPU and this is consuming about 2% CPU for no useful reason (in our case). If I had the flag to fully disable it, we'd also save on some allocations which is also always welcome.

> We could indeed replace certain lock instances with synchronized (e.g. LruCache)

Slightly off-topic, but I'd also love it if the cache implementation could be pluggable: I'm sure we'd not want pgjdbc to bring in additional dependencies, yet it's common for various runtime platforms (including Quarkus) to have Caffeine or other state of the art cache implementations around. It would be nice if I could have pgjdbc make use of it, similarly to what I did in https://hibernate.atlassian.net/browse/HHH-19544 .
I might open an issue about that later, as I don't have evidence of that being useful yet at this point.

> Benchmarks are always welcome. If you could contribute a benchmark (Quarkus + Hibernate is fine) or a link to the benchmark, that would be great.

We have such benchmarks in Techempower, publicly available:
 - https://github.com/Sanne/FrameworkBenchmarks/tree/master/frameworks/Java/quarkus

It's been there for some years - the reason I'm bringing this particular issue up only now is that previously this particular cost was just a "drop in the bucket", but as we evolved Quarkus, Hibernate, Vert.x, Netty and the JDK and other bottlenecks are being removed, this one which was a negligible detail in the past is starting to become more obvious.
I'm also working on an update PR for Techempower which is pushing figures much higher still, hopefully this fix could be a part of that batch of improvements: essentially I want the Java leading platforms to be able to deliver great results when compared to other programmig languages and platforms, any help is welcome :)

Attaching a new flamegraph zip file, hopefully this works better.

[flamegraph_quarkus_db_256.zip](https://github.com/user-attachments/files/20976538/flamegraph_quarkus_db_256.zip)

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-06-30 10:02 ` "vlsi (@vlsi)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: vlsi (@vlsi) @ 2025-06-30 10:02 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> At first impression I'm not really sure I sympatise with a user expecting a very specific behaviour from a concurrent, uncoordinated call to getWarnings ?

The thing is pgjdbc is wildly used in many applications, so we want to avoid breakages even for the clients who are not doing the best possible thing.

> Attaching a new flamegraph zip file, hopefully this works better.

This works much better, thank you. Do you think you could share the source data for it? (I mean stacks.txt file or jfr or whatever was the raw data for the flamegraph)

> Slightly off-topic, but I'd also love it if the cache implementation could be pluggable

Sure Caffeine is an option, and we "discuss" it here: https://github.com/pgjdbc/pgjdbc/issues/345#issuecomment-320901092
The default artifact will likely shadow Caffeine to avoid clashes with pgjdbc users' Caffeine though.

---

An interesting experiment could be to replace `ResourceLock` with a dummy class that does nothing, and then compare the thoughput, cpu utilization and the latency numbers. Sure we can't shave all the synchronization, however, it could probably give rough estimates on the improvements possible.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-06-30 10:46 ` "Sanne (@Sanne)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: Sanne (@Sanne) @ 2025-06-30 10:46 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> The thing is pgjdbc is wildly used in many applications, so we want to avoid breakages even for the clients who are not doing the best possible thing.

Right I appreciate that. I'll think about it - just not sure yet how anyone could possibly have relied on this, without additional thread coordination from their part as well: it seems to me that the lock within the client wouldn't have the right granularity to allow such usage: if one _needed_ to run getWarnings _after_ another operation, they'd need to guarantee ordering of those two operations, and this would belong in their code.

>  Do you think you could share the source data for it? (I mean stacks.txt file or jfr or whatever was the raw data for the flamegraph)

This one was produced directly by asynch-profiler and I don't have the source data but I can easily produce jfrs instead if you prefer - would you want to suggest exact settings you'd want to see?

> An interesting experiment could be to replace ResourceLock with a dummy class that does nothing

I was planning to do that, yes. 

> Sure Caffeine is an option, and we "discuss" it here: https://github.com/pgjdbc/pgjdbc/issues/345#issuecomment-320901092
The default artifact will likely shadow Caffeine to avoid clashes with pgjdbc users' Caffeine though.

That's nice but as a technical detail I'd very much prefer to inject my own instance rather than have it shaded within the driver for various reasons, such as minimize the number of classes being loaded (we'd be using Caffeine for other reasons too) and our ability to optimise for GraalVM as well: Caffeine is tricky to handle efficiently in native images and if you were to shade it I'd have to maintain ad-hoc rules for that. But thanks to the pointer I will follow the linked discussion.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-06-30 11:22 ` "vlsi (@vlsi)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: vlsi (@vlsi) @ 2025-06-30 11:22 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> if you prefer - would you want to suggest exact settings you'd want to see?

Frankly, I would love to see "all the backtraces for `ResourceLock`" to analyze what are the most common ones (something like https://bell-sw.com/announcements/2020/07/22/Hunting-down-code-hotspots-with-JDK-Flight-Recorder/#me... ). I'm not sure how it is possible with flamegraph.

I would prefer `collapsed` or `jfr` formats: https://github.com/async-profiler/async-profiler/blob/master/docs/OutputFormats.md#output-formats

> That's nice but as a technical detail I'd very much prefer to inject my own instance rather than have it shaded within the driver for various reasons, such as minimize the number of classes being loaded (we'd be using Caffeine for other reasons too) 

"provide your own cache" could probably be a second artifact, however:
* I'm not sure how it could be a drop-in replacement as the app should configure the cache class or for it
* We could probably release both shaded and non-shaded variants. It would probably open a way for "use the same Caffeine as the project already using" and "skip use a user-provided cache"

> our ability to optimise for GraalVM as well: Caffeine is tricky to handle efficiently in native images and if you were to shade it 

Sure we could bundle the rules as well.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-07-01 13:24 ` "Sanne (@Sanne)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: Sanne (@Sanne) @ 2025-07-01 13:24 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Attaching the same data but in JFR format, although that seems harder to me IMO:
[flamegraph_quarkus_db_256_jfr.zip](https://github.com/user-attachments/files/20999270/flamegraph_quarkus_db_256_jfr.zip)

To see the overhead on the flamegraphs you can open it in a browser, hit "CTRL+F" and type a relevant string, for example  "ResourceLock" . This will highlight the areas to zoom in in purple, and the bottom right corner on the screen will give you the aggregate stats among all callers.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-07-01 15:28 ` "Sanne (@Sanne)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: Sanne (@Sanne) @ 2025-07-01 15:28 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I have now results from a benchmark run in which the locks are being bypassed, and it looks very promising:

![Image](https://github.com/user-attachments/assets/2c4b72bd-259d-45b9-812d-cf13e658f350)

The experimental patch I've been running is this one:
 - https://github.com/pgjdbc/pgjdbc/commit/da092f524abd228b4d2f77e45fab6cbe40f0fbb7

In the graph, the same code is being run on both the blue and green bars. The difference is the JVM property being activated on the left one.
The two bars on the right represent the "512" scenario, in which we have highest CPU demand; obviously that's were this is most effective.

And let me add: the patch draft is not comprehensive. It's bypassing some of the locks as a POC, but I still see some of them, so I could improve further.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-07-01 15:50 ` "vlsi (@vlsi)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: vlsi (@vlsi) @ 2025-07-01 15:50 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Apparently, the major part of `ResourceLock` overhead comes from 
* `org.postgresql.core.QueryExecutorBase#setTransactionState` (we could make a private variation that does not grab the lock)
* `LruCache.borrow/put` <-- we could replace this with `synchronized` (the cache does not execute IO, so it is loom-safe with all JDKs)
* `org.postgresql.jdbc.PgStatement#executeInternal` <-- this is probably executed under a lock, thus synchronization there might be removed

I wonder if it can make noticeable difference with benchmark like https://github.com/pgjdbc/pgjdbc/blob/5be3c937cc6bdc1c753634881195cf0a0f7bb8c1/benchmarks/src/jmh/ja...


<img width="882" alt="Image" src="https://github.com/user-attachments/assets/33be4915-bf6c-45c2-bb6b-8c956cc46384"; />

<img width="882" alt="Image" src="https://github.com/user-attachments/assets/166bac79-301f-41c0-9533-a96fed1966bf"; />

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-07-02 11:42 ` "vlsi (@vlsi)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: vlsi (@vlsi) @ 2025-07-02 11:42 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

By the way,

>https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview...
> All test implementations **should be production-grade**. The particulars of this will vary by framework and platform, but the general sentiment is that the code and configuration should be suitable for a production deployment. The word should is used here because production-grade is our goal, but we don't want this to be a roadblock. If you're submitting a new test and uncertain whether your code is production-grade, submit it anyway and then solicit input from other subject-matter experts.

I'm not sure running with "deactivated locking" qualifies for "production-grade". It is like `-Xnoverify` to me. Sure it might speedup the startup in development, however, in production you'd better verify the bytecode :) 

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-07-02 13:34 ` "Sanne (@Sanne)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: Sanne (@Sanne) @ 2025-07-02 13:34 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> I'm not sure running with "deactivated locking" qualifies for "production-grade"

It's a good point. But as I mentioned in our architecture, such locks are totally redundant? Every connection instance is strictly scoped to a single thread. So I agree with you that one shouldn't be needing to set a JVM property, ideally our connection pool should be able to configure this semantics by default for all our users, and that would be our "production grade". 

> Apparently, the major part of ResourceLock overhead comes from [...]

These are excellent suggestions, thank you so much for looking!

I would totally agree that if we could avoid flags/configuration properties and simply reduce the overhead to a negligible amount that would be preferrable: the less tuning knobs the better.

But also bear in mind that we're competing with many other frameworks and languages, even other JVM based platforms which use different postgresql driver implementations, so I wouldn't want to see our user's CPUs (including in "production mode") spend time on unnecessary work.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-07-02 15:34 ` "davecramer (@davecramer)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: davecramer (@davecramer) @ 2025-07-02 15:34 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> > I'm not sure running with "deactivated locking" qualifies for "production-grade"
> 
> It's a good point. But as I mentioned in our architecture, such locks are totally redundant? Every connection instance is strictly scoped to a single thread. So I agree with you that one shouldn't be needing to set a JVM property, ideally our connection pool should be able to configure this semantics by default for all our users, and that would be our "production grade".
> 
> > Apparently, the major part of ResourceLock overhead comes from [...]
> 
> These are excellent suggestions, thank you so much for looking!
> 
> I would totally agree that if we could avoid flags/configuration properties and simply reduce the overhead to a negligible amount that would be preferrable: the less tuning knobs the better.
> 
> But also bear in mind that we're competing with many other frameworks and languages, even other JVM based platforms which use different postgresql driver implementations, so I wouldn't want to see our user's CPUs (including in "production mode") spend time on unnecessary work.

I'm curious what other driver implementations they use ? Do you know ?



^ permalink  raw  reply  [nested|flat] 14+ messages in thread

* Re: [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock
@ 2025-07-02 15:50 ` "Sanne (@Sanne)" <[email protected]>
  12 siblings, 0 replies; 14+ messages in thread

From: Sanne (@Sanne) @ 2025-07-02 15:50 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

hi @davecramer , sure among other JVM based platforms there's Vert.x (fully reactive, so can't use JDBC) and the Quarkus + Hibernate Reactive stack (also by my team, based on the same Vert.x driver).
 -  https://vertx.io/docs/vertx-pg-client/java/

It's performing extremely well. Our figures with Hibernate Reactive aren't great at the moment because of a design issue in the connection pool: there are novel challenges to solve over a "traditional" pool, but we'll fix them.

Personally I'm involved in both the "traditional" stack (blocking) based on JDBC and the alternative models, exploring things. I primarily want the various comparisons to be fair and want both to improve. 

There are also some extreme contenders on Techempower who have re-implemented everything from scratch. I wouldn't call those "production ready" personally, but by doing so they are setting a very interesting high bar in terms of performance results. I'd love us to be able to match such figures while still offering a reasonable programming model, maintainable and robust.

^ permalink  raw  reply  [nested|flat] 14+ messages in thread

end of thread, other threads:[~2025-07-02 15:50 UTC | newest]

Thread overview: 14+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-06-27 14:36 [pgjdbc/pgjdbc] issue #3693: Overhead of uncontended use of ResourceLock "Sanne (@Sanne)" <[email protected]>
2025-06-28 22:21 ` "davecramer (@davecramer)" <[email protected]>
2025-06-30 07:02 ` "vlsi (@vlsi)" <[email protected]>
2025-06-30 09:13 ` "Sanne (@Sanne)" <[email protected]>
2025-06-30 10:02 ` "vlsi (@vlsi)" <[email protected]>
2025-06-30 10:46 ` "Sanne (@Sanne)" <[email protected]>
2025-06-30 11:22 ` "vlsi (@vlsi)" <[email protected]>
2025-07-01 13:24 ` "Sanne (@Sanne)" <[email protected]>
2025-07-01 15:28 ` "Sanne (@Sanne)" <[email protected]>
2025-07-01 15:50 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 11:42 ` "vlsi (@vlsi)" <[email protected]>
2025-07-02 13:34 ` "Sanne (@Sanne)" <[email protected]>
2025-07-02 15:34 ` "davecramer (@davecramer)" <[email protected]>
2025-07-02 15:50 ` "Sanne (@Sanne)" <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox