pgjdbc/pgjdbc GitHub issues and pull requests (mirror)
help / color / mirror / Atom feedFrom: vlsi (@vlsi) <[email protected]>
To: pgjdbc/pgjdbc <[email protected]>
Subject: [pgjdbc/pgjdbc] issue #4084: Multi-Host Retry and Host-Status Model: Revisit Operator Semantics
Date: Thu, 21 May 2026 07:59:01 +0000
Message-ID: <[email protected]> (raw)
Review multi-host connection behavior from an operator/SRE perspective, focusing on:
- shared `connectTimeout` budget across hosts
- cached host status in `GlobalHostStatusTracker`
- `hostRecheckSeconds` semantics
- how easy it is to reason about retry behavior during outages and failover
## Current behavior
Current implementation is centered in:
- `org.postgresql.core.v3.ConnectionFactoryImpl`
- `org.postgresql.hostchooser.MultiHostChooser`
- `org.postgresql.hostchooser.GlobalHostStatusTracker`
Observed behavior:
- `connectTimeout` is shared across hosts within one `getConnection()` call
- host statuses are cached JVM-wide
- hosts cached as `ConnectFail` are skipped until `hostRecheckSeconds` expires
- after TTL expiry, the host becomes eligible again on the next connection attempt
- `loadBalanceHosts`, target server type, and cached host state all participate in host ordering
## Why this is worth revisiting
The current model is much better than "fresh full timeout per host forever", but still not especially intuitive operationally.
Questions an operator may ask:
- How quickly will the driver stop trying a dead host?
- How long will it avoid that host?
- When does it start probing again?
- How do shared timeout budget and host-state TTL interact?
- What is the recommended shape for failover-oriented settings?
Those are answerable today, but not especially obvious.
## Candidate directions
### 1. Clarify the mental model
Possible framing:
- `connectTimeout` = attempt budget within one connection attempt
- `hostRecheckSeconds` = cache TTL for failed host status across attempts
That distinction should be explicit in behavior and docs.
### 2. Explore more operator-intent semantics
Potential future concepts:
- explicit failed-host backoff
- separate retry/backoff tuning for dead hosts vs role-mismatch hosts
- adaptive or exponential re-probe policy instead of fixed TTL only
### 3. Review whether cached host-state policy should be more transparent
Possible improvements:
- better logging / tracing when a host is skipped due to cached status
- better diagnostics around why a host was or was not retried
### 4. Revisit interaction with pools and request-retry loops
In practice, operators often reason about retries at a higher layer. The driver behavior should be predictable enough that pool/application retry policy can be layered on top without guesswork.
## Questions to resolve
- Is fixed `hostRecheckSeconds` sufficient, or should future backoff options be considered?
- Are `ConnectFail`, `Primary`, and `Secondary` TTL semantics equally appropriate?
- Should the driver expose better diagnostics for host skipping and re-probing?
- Should there be a more explicit "production failover profile" recommendation or opt-in behavior?
## Acceptance criteria
- multi-host behavior is explainable in one short operator-oriented model
- tests clearly cover:
- shared `connectTimeout` across hosts
- cached `ConnectFail` skipping until TTL expiry
- re-probe after expiry
- interaction with target server type and load balancing
- future improvements can be evaluated independently from other timeout work
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: github://pgjdbc/pgjdbc
Cc: [email protected], [email protected]
Subject: Re: [pgjdbc/pgjdbc] issue #4084: Multi-Host Retry and Host-Status Model: Revisit Operator Semantics
In-Reply-To: <<[email protected]>>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox