Hello all,
My name is Nadav Shatz, I’m the CTO at Tailor Brands and have been working with PostgreSQL in high-traffic, distributed environments for many years. Most of my focus has been on backend architecture, scaling, and performance optimization, and I’m a long-time user and admirer of the Postgres ecosystem.
I’d like to propose adding a feature to pgpool-II for recent access based routing in primary-replica setups. The idea is similar to what we’ve described in this article, and is also reflected in this pgcat PR. The core concept is to route read queries to the primary if they occur shortly after a write, reducing replica lag inconsistencies while still benefiting from read scaling.
How it would work (high-level):
External “effective lag” via config (hot-reloaded):
Instead of relying on pgpool-II’s replication delay checks (which don’t map well to Aurora semantics), we’ll expose a config value representing the effective replica lag (or directly the TTL to use for “recency”). This value is pushed by an external controller and hot-reloaded (no restarts). The relevant knobs might look like:
enable_recent_access_routing (boolean, default off)
recent_access_ttl_ms (integer, default 0, can be hot-reloaded)
enable_query_parser (boolean, required for this feature, default off)
In-memory recent-access map:
Each worker maintains a lightweight per-DB in-memory map of recently written relations. On any write (INSERT/UPDATE/DELETE/UPSERT/TRUNCATE), we record the touched relations with a TTL derived from recent_access_ttl_ms. Entries expire automatically; writes refresh them.
Routing + query parsing:
For incoming statements we parse enough to answer two questions: (1) is it a read or a write? and (2) which relations are referenced? If a read touches any “recently written” relation, we force route to primary; otherwise we allow normal read load-balancing to replicas.
Notes on behavior & ops:
Config & hot reload: Operators (or an external controller) can update recent_access_ttl_ms dynamically and trigger hot reload to adapt to changing conditions—no reliance on Aurora internals.
Safety levers: a global max TTL, optional allow/deny lists, and metrics (e.g., “reads forced to primary due to recency”) for visibility.
Defaults & compatibility: all defaults are safe/off; enabling requires explicit opt-in.
I’ll prepare the code changes and send a patch/PR, but before diving in I wanted to check if anyone has objections, concerns, or preferred alternatives—particularly around parser hooks, shared memory use, or hot-reload mechanics in pgpool-II.
Thanks for considering,