Hello,
hackers!
I
would like to thank the community and all participants of this
thread for their interest in this problem.
In
our production system with tens of thousands PostgreSQL clusters
we encounter exactly the same issue and are forced to
synchronize upstreams and downstreams via external means, which
is quite suboptimal.
I`ve
done some work on top of the proposed v4 version patch and would
like to present v5 version for a discussion.
There
are a number of changes, such as sending just TLI and Segno
instead of full WAL filename, shifting some work into archiver
and adding shared memory for walreceiver/archiver
synchronization.
There
are a number of issues currently unresolved, which are worth a
discussion.
1.
Should we update pg_stat_archiver on standby to support
cascading replication or should we just resend the report,
received from upstream? Personally I'm more inclined towards the
pg_stat_archiver path, because this way there will be less
`if-else` programming and archive_mode=shared behaviour will be
more monitoring-friendly.