Message-ID: From: "lfgcampos (@lfgcampos)" To: "pgjdbc/pgjdbc" Date: Fri, 22 Aug 2025 17:03:33 +0000 Subject: Re: [pgjdbc/pgjdbc] PR #3778: fix: avoid IllegalStateException: Timer already cancelled when StatementCancelTimerTask.run throws a runtime error In-Reply-To: References: List-Id: X-GitHub-Author-Login: lfgcampos X-GitHub-Comment-Id: 3215029957 X-GitHub-Comment-Type: issue_comment X-GitHub-Issue: 3778 X-GitHub-Repo: pgjdbc/pgjdbc X-GitHub-Type: comment X-GitHub-Url: https://github.com/pgjdbc/pgjdbc/pull/3778#issuecomment-3215029957 Content-Type: text/plain; charset=utf-8 > I'm leaning towards adding a test with mock that triggers an exception from `.cancel()` Will try adding it shortly. > > > Is there any explanation though why it works on v42.7.4 but not on others? > > [bac3d0a](https://github.com/pgjdbc/pgjdbc/commit/bac3d0add5626006ce41db53618da2190bf00910) landed at 42.7.6, and it changed `int` to `byte[]`. Previously, an unset `cancelKey` was just `0`, so it did not cause any harm. After PR #3592 the `null` `byte[]` could cause NPE. Interestingly, `castNonNull` caused NPE, so writing code in a way that does not require `castNotNull` seems the right way to go. > > At the same time, backend should always supply `cancel key` in the startup messages, so I don't understand how is it possible that you have `null` `cancelKey`. I wonder if you could capture logging for `org.postgresql.core.QueryExecutorBase` in order to check if it does indeed end up with null cancel keys. Do you use a load balancer? > > It does not explain the reason 42.7.5 fails though. Just to double-check: does 42.7.5 fail for you as well? We bumped from `~.4` to `~.7` so I can't say if the `~.5` works or not. Some info that may or may not help: - yes, we have as LB and RDS Proxy - we are using a mix of quarkus/pure java, meaning our connections are also a mix of hikari and agroal Regarding capturing the log for `org.postgresql.core.QueryExecutorBase`, that is a tricky one since the bump caused a couple of hours long outage on our prod Also, quite interesting, it does not happen on dev - probably related to the timeout after all, it would at least explain it