MIME-Version: 1.0
Date: Wed, 19 Feb 2025 13:49:12 +0100
From: richard@kojedz.in
To: pgsql-admin@lists.postgresql.org
Subject: In-place upgrade with streaming replicas
In-Reply-To: <12ebc478cd4dac5357c61b93d53ef8a4@kojedz.in>
References: <12ebc478cd4dac5357c61b93d53ef8a4@kojedz.in>
Message-ID: <ec4424cb330887e89b74727093e250e2@kojedz.in>
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Archived-At: <https://www.postgresql.org/message-id/ec4424cb330887e89b74727093e250e2%40kojedz.in>
Precedence: bulk

Dear All,

I am trying to follow instructions regarding in-place upgrade with 
streaming replica servers. The documentation here: 
https://www.postgresql.org/docs/13/pgupgrade.html#:~:text=Prepare%20for%20standby%20server%20upgrades 
says that I should check 'Latest checkpoint location' in primary and 
replica servers. Now, I want to make this process automatic, so I would 
like to know a reliable way to make checkpoint locations match surely. 
During the automated upgrade procedure, I restart all servers on a 
different tcp ports, thus no legitim clients connect to primary, and 
thus they dont make any changes. Then, I issue CHECKPOINT on primary, 
retrieve pg_current_wal_lsn() on primary, and wait until all replicas 
report the same value in pg_last_wal_replay_lsn(), then I issue a 
CHECKPOINT on replicas. According to documentation this creates a 
RESTOREPOINT on replicas. Then, I repeat until pg_current_wal_lsn() does 
not change on primary. Then, if I shut down cluster in a way that first 
the primary is shut down, and just after the replicas, then, checkpoint 
locations will match. Howewer, if I accidentally shut down a replica 
before primary is shut down, the checkpoint locations wont match.

With this, I have the question, that after the shutdown of primary, what 
is the guarantee for replicas having the same checkpoint location? Why 
does the order of shutting down the servers matter? What would be the 
really exact and reliable way to ensure that replicas will have the same 
checkpoint location as the primary?

Thanks in advance,
Richard