Hello, hackers!
I would like to thank the community and all participants of this thread for their interest in this problem.
In our production system with tens of thousands PostgreSQL clusters we encounter exactly the same issue and are forced to synchronize upstreams and downstreams via external means, which is quite suboptimal.
I`ve done some work on top of the proposed v4 version patch and would like to present v5 version for a discussion.
There are a number of changes, such as sending just TLI and Segno instead of full WAL filename, shifting some work into archiver and adding shared memory for walreceiver/archiver synchronization.
There are a number of issues currently unresolved, which are worth a discussion.

1. Should we update pg_stat_archiver on standby to support cascading replication or should we just resend the report, received from upstream? Personally I'm more inclined towards the pg_stat_archiver path, because this way there will be less `if-else` programming and archive_mode=shared behaviour will be more monitoring-friendly.

2. What should we do with *.backup.ready and *.partial.ready on standby? Can we just XLogArchiveForceDone() them?

3. Should we keep the awkward part with switchpont calculation in timeline switch case? I think all segments that are not in our server history should just be stamped with XLogArchiveForceDone().

4. Currently XLogArchiveForceDone is forced either by walreceiver (on receiving report from upstream) and archiver. Should we move this into the archiver entirely?

Any feedback will be much appreciated.