Re: In-place upgrade with streaming replicas

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: In-place upgrade with streaming replicas
5+ messages / 2 participants
[nested] [flat]

* Re: In-place upgrade with streaming replicas
@ 2025-02-19 18:55 [email protected]
  2025-02-20 07:49 ` Re: In-place upgrade with streaming replicas Jerry Sievers <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: [email protected] @ 2025-02-19 18:55 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: [email protected]

Dear Alvaro,

Thanks for your answers. Unfortunately, I was unaware of a shutdown 
record, that makes a difference then. So, I definitely must stop the 
primary first, then use pg_controldata to obtain checkpoint info. Then, 
can I query the replicas while they are up and running if they've 
received the shutdown record or not? So, after shutting down the 
primary, how will I know if a replica has received the mentioned record, 
and is safe to shutdown?

Thanks for the clarifications.

Best regards,
Richard

2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
> On 2025-Feb-19, [email protected] wrote:
> 
>> With this, I have the question, that after the shutdown of primary, 
>> what is
>> the guarantee for replicas having the same checkpoint location? Why 
>> does the
>> order of shutting down the servers matter? What would be the really 
>> exact
>> and reliable way to ensure that replicas will have the same checkpoint
>> location as the primary?
> 
> The replicas can't write WAL by themselves, but they will replay
> whatever the primary has sent; by shutting down the primary first and
> letting the replicas catch up, you ensure that the replicas will
> actually receive the shutdown record and replay it.  If you shut down
> the replicas first, they can obviously never catch up with the shutdown
> checkpoint of the primary.
> 
> As I recall, if you do shut down the primary first, one potential 
> danger
> is that the primary fails to send the checkpoint record before shutting
> down, so the replicas won't receive it and obviously will not replay 
> it;
> or simply that they are behind enough that they receive it but don't
> replay it.
> 
> You could use pg_controldata to read the last checkpoint info from all
> nodes.  You can run it on the primary after shutting it down, and then
> on each replica while it's still running to ensure that the correct
> restartpoint has been created.






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: In-place upgrade with streaming replicas
  2025-02-19 18:55 Re: In-place upgrade with streaming replicas [email protected]
@ 2025-02-20 07:49 ` Jerry Sievers <[email protected]>
  2025-02-20 07:55   ` Re: In-place upgrade with streaming replicas [email protected]
  0 siblings, 1 reply; 5+ messages in thread

From: Jerry Sievers @ 2025-02-20 07:49 UTC (permalink / raw)
  To: [email protected]; +Cc: Álvaro Herrera <[email protected]>; [email protected]

[email protected] writes:

> Dear Alvaro,
>
> Thanks for your answers. Unfortunately, I was unaware of a shutdown
> record, that makes a difference then. So, I definitely must stop the
> primary first, then use pg_controldata to obtain checkpoint
> info. Then, can I query the replicas while they are up and running if
> they've received the shutdown record or not? So, after shutting down
> the primary, how will I know if a replica has received the mentioned
> record, and is safe to shutdown?



Hmmm, not sure about that but what we do, is stop primary, wait a
$short time, then stop replicas...

Then run pg_controldata on all nodes | filter out only the line
indicating latest checkpoint and sort -u the output.  Expect only a
single line if all are matched.

You may also wish to first insure that you got the same number of
lines as total node count before doing the sorting and uniqueing.

Very rarely on our huge systems, we'd have a mismatch after the
verification in in those cases, our automated upgrade procedure
restarts all nodes and then does the shutdown and verify check again.

HTH


>
> Thanks for the clarifications.
>
> Best regards,
> Richard
>
> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
>> On 2025-Feb-19, [email protected] wrote:
>> 
>>> With this, I have the question, that after the shutdown of primary,
>>> what is
>>> the guarantee for replicas having the same checkpoint location? Why
>>> does the
>>> order of shutting down the servers matter? What would be the really
>>> exact
>>> and reliable way to ensure that replicas will have the same checkpoint
>>> location as the primary?
>> The replicas can't write WAL by themselves, but they will replay
>> whatever the primary has sent; by shutting down the primary first and
>> letting the replicas catch up, you ensure that the replicas will
>> actually receive the shutdown record and replay it.  If you shut down
>> the replicas first, they can obviously never catch up with the shutdown
>> checkpoint of the primary.
>> As I recall, if you do shut down the primary first, one potential
>> danger
>> is that the primary fails to send the checkpoint record before shutting
>> down, so the replicas won't receive it and obviously will not replay
>> it;
>> or simply that they are behind enough that they receive it but don't
>> replay it.
>> You could use pg_controldata to read the last checkpoint info from
>> all
>> nodes.  You can run it on the primary after shutting it down, and then
>> on each replica while it's still running to ensure that the correct
>> restartpoint has been created.






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: In-place upgrade with streaming replicas
  2025-02-19 18:55 Re: In-place upgrade with streaming replicas [email protected]
  2025-02-20 07:49 ` Re: In-place upgrade with streaming replicas Jerry Sievers <[email protected]>
@ 2025-02-20 07:55   ` [email protected]
  2025-02-21 03:57     ` Re: In-place upgrade with streaming replicas Jerry Sievers <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: [email protected] @ 2025-02-20 07:55 UTC (permalink / raw)
  To: Jerry Sievers <[email protected]>; +Cc: Álvaro Herrera <[email protected]>; [email protected]

Dear Jerry,

So, yes it turns out that some kind of loop must be involved here, as 
you described:

1. ensure cluster is running
2. stop primary
3. wait some time
4. stop replicas
5. check if checkpoint locations match. repeat from step 1 if 
out-of-sync.

My question here is, the unreliable step here is 3rd one. Can we query 
the replica runtime if he did catch up? I mean, that after stopping the 
primary, we can obtain the checkpoint location from pg_controldata, 
then, can we somehow query the running replica about that?

Thanks in advance,
Richard

2025-02-20 08:49 időpontban Jerry Sievers ezt írta:
> [email protected] writes:
> 
>> Dear Alvaro,
>> 
>> Thanks for your answers. Unfortunately, I was unaware of a shutdown
>> record, that makes a difference then. So, I definitely must stop the
>> primary first, then use pg_controldata to obtain checkpoint
>> info. Then, can I query the replicas while they are up and running if
>> they've received the shutdown record or not? So, after shutting down
>> the primary, how will I know if a replica has received the mentioned
>> record, and is safe to shutdown?
> 
> 
> 
> Hmmm, not sure about that but what we do, is stop primary, wait a
> $short time, then stop replicas...
> 
> Then run pg_controldata on all nodes | filter out only the line
> indicating latest checkpoint and sort -u the output.  Expect only a
> single line if all are matched.
> 
> You may also wish to first insure that you got the same number of
> lines as total node count before doing the sorting and uniqueing.
> 
> Very rarely on our huge systems, we'd have a mismatch after the
> verification in in those cases, our automated upgrade procedure
> restarts all nodes and then does the shutdown and verify check again.
> 
> HTH
> 
> 
>> 
>> Thanks for the clarifications.
>> 
>> Best regards,
>> Richard
>> 
>> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
>>> On 2025-Feb-19, [email protected] wrote:
>>> 
>>>> With this, I have the question, that after the shutdown of primary,
>>>> what is
>>>> the guarantee for replicas having the same checkpoint location? Why
>>>> does the
>>>> order of shutting down the servers matter? What would be the really
>>>> exact
>>>> and reliable way to ensure that replicas will have the same 
>>>> checkpoint
>>>> location as the primary?
>>> The replicas can't write WAL by themselves, but they will replay
>>> whatever the primary has sent; by shutting down the primary first and
>>> letting the replicas catch up, you ensure that the replicas will
>>> actually receive the shutdown record and replay it.  If you shut down
>>> the replicas first, they can obviously never catch up with the 
>>> shutdown
>>> checkpoint of the primary.
>>> As I recall, if you do shut down the primary first, one potential
>>> danger
>>> is that the primary fails to send the checkpoint record before 
>>> shutting
>>> down, so the replicas won't receive it and obviously will not replay
>>> it;
>>> or simply that they are behind enough that they receive it but don't
>>> replay it.
>>> You could use pg_controldata to read the last checkpoint info from
>>> all
>>> nodes.  You can run it on the primary after shutting it down, and 
>>> then
>>> on each replica while it's still running to ensure that the correct
>>> restartpoint has been created.






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: In-place upgrade with streaming replicas
  2025-02-19 18:55 Re: In-place upgrade with streaming replicas [email protected]
  2025-02-20 07:49 ` Re: In-place upgrade with streaming replicas Jerry Sievers <[email protected]>
  2025-02-20 07:55   ` Re: In-place upgrade with streaming replicas [email protected]
@ 2025-02-21 03:57     ` Jerry Sievers <[email protected]>
  2025-02-24 09:16       ` Re: In-place upgrade with streaming replicas [email protected]
  0 siblings, 1 reply; 5+ messages in thread

From: Jerry Sievers @ 2025-02-21 03:57 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

[email protected] writes:

> Dear Jerry,
>
> So, yes it turns out that some kind of loop must be involved here, as
> you described:
>
> 1. ensure cluster is running
> 2. stop primary
> 3. wait some time
> 4. stop replicas
> 5. check if checkpoint locations match. repeat from step 1 if
> out-of-sync.
>
> My question here is, the unreliable step here is 3rd one. Can we query
> the replica runtime if he did catch up? I mean, that after stopping
> the primary, we can obtain the checkpoint location from
> pg_controldata, then, can we somehow query the running replica about
> that?
Assuming your client traffic has been stopped ahead of time and perhaps
you did a lockout via HBA or other means, including forcible termination
of persistent clients (we usually do a restart of the primary to insure
this)...

We don't wait more than a few seconds before also stopping the replicas
and the vast majority of times all nodes are at the same checkpoint.

Cheers!

>
> Thanks in advance,
> Richard
>
> 2025-02-20 08:49 időpontban Jerry Sievers ezt írta:
>> [email protected] writes:
>> 
>>> Dear Alvaro,
>>> Thanks for your answers. Unfortunately, I was unaware of a shutdown
>>> record, that makes a difference then. So, I definitely must stop the
>>> primary first, then use pg_controldata to obtain checkpoint
>>> info. Then, can I query the replicas while they are up and running if
>>> they've received the shutdown record or not? So, after shutting down
>>> the primary, how will I know if a replica has received the mentioned
>>> record, and is safe to shutdown?
>> Hmmm, not sure about that but what we do, is stop primary, wait a
>> $short time, then stop replicas...
>> Then run pg_controldata on all nodes | filter out only the line
>> indicating latest checkpoint and sort -u the output.  Expect only a
>> single line if all are matched.
>> You may also wish to first insure that you got the same number of
>> lines as total node count before doing the sorting and uniqueing.
>> Very rarely on our huge systems, we'd have a mismatch after the
>> verification in in those cases, our automated upgrade procedure
>> restarts all nodes and then does the shutdown and verify check again.
>> HTH
>> 
>>> Thanks for the clarifications.
>>> Best regards,
>>> Richard
>>> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
>>>> On 2025-Feb-19, [email protected] wrote:
>>>> 
>>>>> With this, I have the question, that after the shutdown of primary,
>>>>> what is
>>>>> the guarantee for replicas having the same checkpoint location? Why
>>>>> does the
>>>>> order of shutting down the servers matter? What would be the really
>>>>> exact
>>>>> and reliable way to ensure that replicas will have the same
>>>>> checkpoint
>>>>> location as the primary?
>>>> The replicas can't write WAL by themselves, but they will replay
>>>> whatever the primary has sent; by shutting down the primary first and
>>>> letting the replicas catch up, you ensure that the replicas will
>>>> actually receive the shutdown record and replay it.  If you shut down
>>>> the replicas first, they can obviously never catch up with the
>>>> shutdown
>>>> checkpoint of the primary.
>>>> As I recall, if you do shut down the primary first, one potential
>>>> danger
>>>> is that the primary fails to send the checkpoint record before
>>>> shutting
>>>> down, so the replicas won't receive it and obviously will not replay
>>>> it;
>>>> or simply that they are behind enough that they receive it but don't
>>>> replay it.
>>>> You could use pg_controldata to read the last checkpoint info from
>>>> all
>>>> nodes.  You can run it on the primary after shutting it down, and
>>>> then
>>>> on each replica while it's still running to ensure that the correct
>>>> restartpoint has been created.






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: In-place upgrade with streaming replicas
  2025-02-19 18:55 Re: In-place upgrade with streaming replicas [email protected]
  2025-02-20 07:49 ` Re: In-place upgrade with streaming replicas Jerry Sievers <[email protected]>
  2025-02-20 07:55   ` Re: In-place upgrade with streaming replicas [email protected]
  2025-02-21 03:57     ` Re: In-place upgrade with streaming replicas Jerry Sievers <[email protected]>
@ 2025-02-24 09:16       ` [email protected]
  0 siblings, 0 replies; 5+ messages in thread

From: [email protected] @ 2025-02-24 09:16 UTC (permalink / raw)
  To: Jerry Sievers <[email protected]>; +Cc: [email protected]

Dear Jerry,

Thanks for sharing your experiments, I will implement our upgrades in a 
similar way. Terminate/restart on different port, wait for catchup, stop 
primary, check replicas somehow (using pg_wal_lsn_diff()), then stop 
replicas too, check for pg_controldata match, and repeat if not.

Regards,
Richard

2025-02-21 04:57 időpontban Jerry Sievers ezt írta:
> [email protected] writes:
> 
>> Dear Jerry,
>> 
>> So, yes it turns out that some kind of loop must be involved here, as
>> you described:
>> 
>> 1. ensure cluster is running
>> 2. stop primary
>> 3. wait some time
>> 4. stop replicas
>> 5. check if checkpoint locations match. repeat from step 1 if
>> out-of-sync.
>> 
>> My question here is, the unreliable step here is 3rd one. Can we query
>> the replica runtime if he did catch up? I mean, that after stopping
>> the primary, we can obtain the checkpoint location from
>> pg_controldata, then, can we somehow query the running replica about
>> that?
> Assuming your client traffic has been stopped ahead of time and perhaps
> you did a lockout via HBA or other means, including forcible 
> termination
> of persistent clients (we usually do a restart of the primary to insure
> this)...
> 
> We don't wait more than a few seconds before also stopping the replicas
> and the vast majority of times all nodes are at the same checkpoint.
> 
> Cheers!
> 
>> 
>> Thanks in advance,
>> Richard
>> 
>> 2025-02-20 08:49 időpontban Jerry Sievers ezt írta:
>>> [email protected] writes:
>>> 
>>>> Dear Alvaro,
>>>> Thanks for your answers. Unfortunately, I was unaware of a shutdown
>>>> record, that makes a difference then. So, I definitely must stop the
>>>> primary first, then use pg_controldata to obtain checkpoint
>>>> info. Then, can I query the replicas while they are up and running 
>>>> if
>>>> they've received the shutdown record or not? So, after shutting down
>>>> the primary, how will I know if a replica has received the mentioned
>>>> record, and is safe to shutdown?
>>> Hmmm, not sure about that but what we do, is stop primary, wait a
>>> $short time, then stop replicas...
>>> Then run pg_controldata on all nodes | filter out only the line
>>> indicating latest checkpoint and sort -u the output.  Expect only a
>>> single line if all are matched.
>>> You may also wish to first insure that you got the same number of
>>> lines as total node count before doing the sorting and uniqueing.
>>> Very rarely on our huge systems, we'd have a mismatch after the
>>> verification in in those cases, our automated upgrade procedure
>>> restarts all nodes and then does the shutdown and verify check again.
>>> HTH
>>> 
>>>> Thanks for the clarifications.
>>>> Best regards,
>>>> Richard
>>>> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta:
>>>>> On 2025-Feb-19, [email protected] wrote:
>>>>> 
>>>>>> With this, I have the question, that after the shutdown of 
>>>>>> primary,
>>>>>> what is
>>>>>> the guarantee for replicas having the same checkpoint location? 
>>>>>> Why
>>>>>> does the
>>>>>> order of shutting down the servers matter? What would be the 
>>>>>> really
>>>>>> exact
>>>>>> and reliable way to ensure that replicas will have the same
>>>>>> checkpoint
>>>>>> location as the primary?
>>>>> The replicas can't write WAL by themselves, but they will replay
>>>>> whatever the primary has sent; by shutting down the primary first 
>>>>> and
>>>>> letting the replicas catch up, you ensure that the replicas will
>>>>> actually receive the shutdown record and replay it.  If you shut 
>>>>> down
>>>>> the replicas first, they can obviously never catch up with the
>>>>> shutdown
>>>>> checkpoint of the primary.
>>>>> As I recall, if you do shut down the primary first, one potential
>>>>> danger
>>>>> is that the primary fails to send the checkpoint record before
>>>>> shutting
>>>>> down, so the replicas won't receive it and obviously will not 
>>>>> replay
>>>>> it;
>>>>> or simply that they are behind enough that they receive it but 
>>>>> don't
>>>>> replay it.
>>>>> You could use pg_controldata to read the last checkpoint info from
>>>>> all
>>>>> nodes.  You can run it on the primary after shutting it down, and
>>>>> then
>>>>> on each replica while it's still running to ensure that the correct
>>>>> restartpoint has been created.






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2025-02-24 09:16 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-02-19 18:55 Re: In-place upgrade with streaming replicas [email protected]
2025-02-20 07:49 ` Jerry Sievers <[email protected]>
2025-02-20 07:55   ` [email protected]
2025-02-21 03:57     ` Jerry Sievers <[email protected]>
2025-02-24 09:16       ` [email protected]

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox