MIME-Version: 1.0
Date: Thu, 20 Feb 2025 08:55:31 +0100
From: richard@kojedz.in
To: Jerry Sievers <gsievers19@comcast.net>
Cc: =?UTF-8?Q?=C3=81lvaro_Herrera?= <alvherre@alvh.no-ip.org>,
 pgsql-admin@lists.postgresql.org
Subject: Re: In-place upgrade with streaming replicas
In-Reply-To: <m2y0y1kqds.fsf@comcast.net>
References: <202502191554.6asefyczl7jn@alvherre.pgsql>
 <d438397ad2c67e0ce683bc3158746691@kojedz.in> <m2y0y1kqds.fsf@comcast.net>
Message-ID: <71ed34f8c1ed858262a7cf50b633682f@kojedz.in>
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 8bit
Archived-At: 
 <https://www.postgresql.org/message-id/71ed34f8c1ed858262a7cf50b633682f%40kojedz.in>
Precedence: bulk

Dear Jerry,

So, yes it turns out that some kind of loop must be involved here, as 
you described:

1. ensure cluster is running
2. stop primary
3. wait some time
4. stop replicas
5. check if checkpoint locations match. repeat from step 1 if 
out-of-sync.

My question here is, the unreliable step here is 3rd one. Can we query 
the replica runtime if he did catch up? I mean, that after stopping the 
primary, we can obtain the checkpoint location from pg_controldata, 
then, can we somehow query the running replica about that?

Thanks in advance,
Richard

2025-02-20 08:49 időpontban Jerry Sievers ezt írta:
> richard@kojedz.in writes:
> 
>> Dear Alvaro,
>> 
>> Thanks for your answers. Unfortunately, I was unaware of a shutdown
>> record, that makes a difference then. So, I definitely must stop the
>> primary first, then use pg_controldata to obtain checkpoint
>> info. Then, can I query the replicas while they are up and running if
>> they've received the shutdown record or not? So, after shutting down
>> the primary, how will I know if a replica has received the mentioned
>> record, and is safe to shutdown?
> 
> 
> 
> Hmmm, not sure about that but what we do, is stop primary, wait a
> $short time, then stop replicas...
> 
> Then run pg_controldata on all nodes | filter out only the line
> indicating latest checkpoint and sort -u the output.  Expect only a
> single line if all are matched.
> 
> You may also wish to first insure that you got the same number of
> lines as total node count before doing the sorting and uniqueing.
> 
> Very rarely on our huge systems, we'd have a mismatch after the
> verification in in those cases, our automated upgrade procedure
> restarts all nodes and then does the shutdown and verify check again.
> 
> HTH
> 
> 
>> 
>> Thanks for the clarifications.
>> 
>> Best regards,
>> Richard
>> 
>> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
>>> On 2025-Feb-19, richard@kojedz.in wrote:
>>> 
>>>> With this, I have the question, that after the shutdown of primary,
>>>> what is
>>>> the guarantee for replicas having the same checkpoint location? Why
>>>> does the
>>>> order of shutting down the servers matter? What would be the really
>>>> exact
>>>> and reliable way to ensure that replicas will have the same 
>>>> checkpoint
>>>> location as the primary?
>>> The replicas can't write WAL by themselves, but they will replay
>>> whatever the primary has sent; by shutting down the primary first and
>>> letting the replicas catch up, you ensure that the replicas will
>>> actually receive the shutdown record and replay it.  If you shut down
>>> the replicas first, they can obviously never catch up with the 
>>> shutdown
>>> checkpoint of the primary.
>>> As I recall, if you do shut down the primary first, one potential
>>> danger
>>> is that the primary fails to send the checkpoint record before 
>>> shutting
>>> down, so the replicas won't receive it and obviously will not replay
>>> it;
>>> or simply that they are behind enough that they receive it but don't
>>> replay it.
>>> You could use pg_controldata to read the last checkpoint info from
>>> all
>>> nodes.  You can run it on the primary after shutting it down, and 
>>> then
>>> on each replica while it's still running to ensure that the correct
>>> restartpoint has been created.