Message-ID: <gh-pgjdbc-pgjdbc-3840-c3418299382@github.com>
From: "vlsi (@vlsi)" <noreply+vlsi@github.com>
To: "pgjdbc/pgjdbc" <noreply+pgjdbc-pgjdbc@github.com>
Date: Sat, 18 Oct 2025 11:35:13 +0000
Subject: Re: [pgjdbc/pgjdbc] issue #3840: Periodic latency spikes to Postgres in large scale deployment
In-Reply-To: <gh-pgjdbc-pgjdbc-3840@github.com>
References: <gh-pgjdbc-pgjdbc-3840@github.com>
List-Id: <gh-pgjdbc-pgjdbc.github.com>
X-GitHub-Author-Login: vlsi
X-GitHub-Comment-Id: 3418299382
X-GitHub-Comment-Type: issue_comment
X-GitHub-Issue: 3840
X-GitHub-Repo: pgjdbc/pgjdbc
X-GitHub-Type: comment
X-GitHub-Url: https://github.com/pgjdbc/pgjdbc/issues/3840#issuecomment-3418299382
Content-Type: text/plain; charset=utf-8

The driver uses a shared `SharedTimer` to implement statement cancellation via query timeout. OpenJDK uses `Object,wait()` in `java.util.TimerThread`.

In other words, it is expected that `SharedTimer` would wait in `Object.wait` in case the application uses something like `java.sql.Statement#setQueryTimeout` and executes a query.

I do not see how `TimerThread` code could result in "at 1-2 seconds past the minute mark"

Frankly, so far it looks like a scheduled activity in the application code that executes every minute.

The current `SharedTimer` does bother me a bit. For instance, it is suboptimal for executing a lot of queries with long timeouts and we currently use `purgeTimerTasks()` to remove expired tasks. However, I haven't observed workloads that would run into `purgeTimerTasks()` issue though.

It would be interesting to get some stacktraces/threaddumps/jfs/async-profiler results regarding the issue.

---

Technically speaking, `SharedTimer` indeed uses a single thread to fire its tasks. However, an unexpected slowness of a single task should not impact the latency for the rest:
* The tasks execute without holding the timer lock
* Individual app threads do not wait each task execution

---

I would suggest capturing the stack traces for "at 1-2 seconds past the minute mark".
For instance, async-profiler's heatmaps might help you: https://github.com/async-profiler/async-profiler/blob/master/docs/Heatmap.md