Message-ID: From: "wgfm (@wgfm)" To: "pgjdbc/pgjdbc" Date: Fri, 17 Oct 2025 11:39:44 +0000 Subject: [pgjdbc/pgjdbc] issue #3840: Periodic latency spikes to Postgres in large scale deployment List-Id: X-GitHub-Author-Id: 10496005 X-GitHub-Author-Login: wgfm X-GitHub-Issue: 3840 X-GitHub-Repo: pgjdbc/pgjdbc X-GitHub-State: closed X-GitHub-Type: issue X-GitHub-Url: https://github.com/pgjdbc/pgjdbc/issues/3840 Content-Type: text/plain; charset=utf-8 **Describe the issue** We observe periodic latency spikes during regular operations, and it significantly impacts tail latencies of our database queries. We’re running a monolithic application with a sharded database setup. We’re on AWS and use RDS and RDS Proxies extensively. Our application nodes are connected to 120+ database instances at a time. We see periodic latency spikes every minute for ~3 seconds. While these latency spikes are happening, we see that pgjdbc’s SharedTimer thread is `Object.wait()`ing on a random thread serving traffic: Image In the above image, threads calling the database show long SocketRead events while the SharedTimer thread shows long Java Monitor Read events. We observe long SocketRead events in all threads querying the database, and the vast majority of these SocketRead events are queries to Postgres. Other network calls, such as queries to Redis or to microservices don't show the same periodic latency. We can't say for sure this is a driver issue yet, and we could use your help in either confirming that it could be, or rule it out completely. The driver is one of our primary suspects, because of the single SharedTimer thread shared by 120+ connection pools to our database instances. This could make it a resource contention issue. One thing of note is that no matter which server node we profile, the latency spikes always happen at 1-2 seconds past the minute mark. We have ruled out the following suspects: - Nagle's algorithm. This is disabled by default in the driver version we use, and we haven't enabled it explicitly. - Garbage collection events. These do not exhibit the same periodicity as our latency spikes. We realise this is not a lot to go on. Any pointers at all would be greatly appreciated. **Driver Version?** 42.6.2 **Java Version?** Corretto 17.0.16 **OS Version?** Ubuntu 22.04.5 **PostgreSQL Version?** 15.12 **To Reproduce** We have not been able to reproduce this locally. **Expected behaviour** We don't expect periodic latency spikes. **Logs** We don't have any relevant logs.