Hi Tatsuo,
Your really_writing_transaction approach is the right fix -- it
addresses the root cause across all DLBOW modes, not just ours.
Thanks for digging into it.
I applied your v1 patch and rebased our feature on top. Attaching
both patches separately so they can land independently in the
order you prefer:
v5-0001-Fix-disable_load_balance_on_write-and-query-cache.patch
-- your patch unchanged (just rebased to apply cleanly on
current master without our feature underneath).
v5-0002-Feature-load-balancing-control-by-table-tracking.patch
-- our feature, on top of your fix.
Changes in v5-0002 vs v4:
- Dropped pool_has_dml_adaptive_write_in_transaction() helper and
the matching pool_has_dml_table_oids() exposure. The cache
fetch guards in pool_proto_modules.c now correctly use
pool_is_really_writing_transaction() from your patch, so the
helper became redundant.
- Kept the MAIN_REPLICA gate in CommandComplete.c for the
autocommit mark-stale branch. dml_adaptive_global is only
meaningful in streaming replication mode (matches the routing
logic in where_to_send_main_replica), and gating prevents the
hang we saw in native_replication where the autocommit branch
could run while an explicit transaction was actually in flight
on the backend.
I tried to run 006.memqcache with the mutation against the
combined branch but local master is currently broken (commit
2ae004a48 as you noted), so the standby setup fails before
reaching the jdbctest part. Both patches build cleanly and our
043.track_table_mutation passes on an earlier base. Will retest
once master is unbroken.
Thanks!