Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tmUZh-0075QB-22 for pgsql-admin@arkaria.postgresql.org; Mon, 24 Feb 2025 09:16:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tmUZg-00HMaW-08 for pgsql-admin@arkaria.postgresql.org; Mon, 24 Feb 2025 09:16:28 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tmUZf-00HMaN-EY for pgsql-admin@lists.postgresql.org; Mon, 24 Feb 2025 09:16:27 +0000 Received: from fw.sz-a.kwebs.cloud ([109.61.102.40]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tmUZc-000VKM-0U for pgsql-admin@lists.postgresql.org; Mon, 24 Feb 2025 09:16:26 +0000 Received: from webmail.srv.kwebs.cloud (172-16-36-106.prometheus-node-exporter.monitoring.svc.cluster.local [172.16.36.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) (Authenticated sender: richard@kojedz.in) by mx.kwebs.cloud (Postfix) with ESMTPSA id 2CEC014BD4; Mon, 24 Feb 2025 09:16:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kojedz.in; s=s02; t=1740388581; bh=sUHAA/GZsJbHURO/is38JSCfvwDfxUHDxR2EO0DmtWQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=KcLWXn4Q16mco3kIA2TPY4gRiapXsBi/PTAncElfPdMVXuQv35hinmhMKtUVRu6T1 5ozuf4vG4x4Z1EwyavP0FLE3XtnpUZCk6H1rcKM813oWSLQURSVVckXhN+yYkbtvW3 lhd3mtt90NHP7MQEMows5BsB0Z/7p43UzRfgJhyBFlEkbr91NTKE6n/bSC/3vxp1gO 34iy6y+vri5+lqTt2o2Mv8JgY9J3feCVebI5RCgAElItBFVQ8Wll1z29vG6v7Zh6xw f256mRZhjfI2dTt245N0cr/A6ABS6AberHkOersaOGmnOBkzslgwwwh0LQH+szc2Iy VoC8DHBEEWsgg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kojedz.in; s=s01; t=1740388581; bh=sUHAA/GZsJbHURO/is38JSCfvwDfxUHDxR2EO0DmtWQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=O2Ma4bfU2OyVfZDcB6npSlbpXpx9Lt/8lZEiDqvqJb7JXXhhJ+u9D9J4jSe8aK4mC AzZju2hVdFKyGyWKD5srAs3galIrbOrwouixOvO80iDoNcNH1/qe0aojofhl2ZpFI/ Bnwm15UhHRLC6F6WubJanWeXlCWehSC7V88I71X6ocbchQpMlHUacHbKkzWLOorF8F 8FJwxX3GguPXYbtXlXxXiEzUutAdHeaRis4SN+fh7lAGM5aZAfXRkVxTYEk+Yoj+N/ Xxx5dRRsEN13dQ3kiJ9enx26oau4+PEaR9pYRMjfIihTFFVhPzhr+vmwCP85iEnGOq gx6VK47p+81pNnLSOdJrRGsImxxV/kdl7y8DI22BPcecwNhBXqawAzAl4Y19Mqw+Ag /U2gK8cOyytvwMVLze63DHfdTyT0/TaXWLvTphxuwIWBsPhqhpYwpkjzvGgVF2J4gB r4eq4fXsyS7tCjJ5+ddTV6LpYOWvPO5REQLRuC6WDQmxqZrPNwv MIME-Version: 1.0 Date: Mon, 24 Feb 2025 10:16:20 +0100 From: richard@kojedz.in To: Jerry Sievers Cc: pgsql-admin@lists.postgresql.org Subject: Re: In-place upgrade with streaming replicas In-Reply-To: References: <202502191554.6asefyczl7jn@alvherre.pgsql> <71ed34f8c1ed858262a7cf50b633682f@kojedz.in> Message-ID: X-Sender: richard@kojedz.in Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Dear Jerry, Thanks for sharing your experiments, I will implement our upgrades in a similar way. Terminate/restart on different port, wait for catchup, stop primary, check replicas somehow (using pg_wal_lsn_diff()), then stop replicas too, check for pg_controldata match, and repeat if not. Regards, Richard 2025-02-21 04:57 időpontban Jerry Sievers ezt írta: > richard@kojedz.in writes: > >> Dear Jerry, >> >> So, yes it turns out that some kind of loop must be involved here, as >> you described: >> >> 1. ensure cluster is running >> 2. stop primary >> 3. wait some time >> 4. stop replicas >> 5. check if checkpoint locations match. repeat from step 1 if >> out-of-sync. >> >> My question here is, the unreliable step here is 3rd one. Can we query >> the replica runtime if he did catch up? I mean, that after stopping >> the primary, we can obtain the checkpoint location from >> pg_controldata, then, can we somehow query the running replica about >> that? > Assuming your client traffic has been stopped ahead of time and perhaps > you did a lockout via HBA or other means, including forcible > termination > of persistent clients (we usually do a restart of the primary to insure > this)... > > We don't wait more than a few seconds before also stopping the replicas > and the vast majority of times all nodes are at the same checkpoint. > > Cheers! > >> >> Thanks in advance, >> Richard >> >> 2025-02-20 08:49 időpontban Jerry Sievers ezt írta: >>> richard@kojedz.in writes: >>> >>>> Dear Alvaro, >>>> Thanks for your answers. Unfortunately, I was unaware of a shutdown >>>> record, that makes a difference then. So, I definitely must stop the >>>> primary first, then use pg_controldata to obtain checkpoint >>>> info. Then, can I query the replicas while they are up and running >>>> if >>>> they've received the shutdown record or not? So, after shutting down >>>> the primary, how will I know if a replica has received the mentioned >>>> record, and is safe to shutdown? >>> Hmmm, not sure about that but what we do, is stop primary, wait a >>> $short time, then stop replicas... >>> Then run pg_controldata on all nodes | filter out only the line >>> indicating latest checkpoint and sort -u the output. Expect only a >>> single line if all are matched. >>> You may also wish to first insure that you got the same number of >>> lines as total node count before doing the sorting and uniqueing. >>> Very rarely on our huge systems, we'd have a mismatch after the >>> verification in in those cases, our automated upgrade procedure >>> restarts all nodes and then does the shutdown and verify check again. >>> HTH >>> >>>> Thanks for the clarifications. >>>> Best regards, >>>> Richard >>>> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta: >>>>> On 2025-Feb-19, richard@kojedz.in wrote: >>>>> >>>>>> With this, I have the question, that after the shutdown of >>>>>> primary, >>>>>> what is >>>>>> the guarantee for replicas having the same checkpoint location? >>>>>> Why >>>>>> does the >>>>>> order of shutting down the servers matter? What would be the >>>>>> really >>>>>> exact >>>>>> and reliable way to ensure that replicas will have the same >>>>>> checkpoint >>>>>> location as the primary? >>>>> The replicas can't write WAL by themselves, but they will replay >>>>> whatever the primary has sent; by shutting down the primary first >>>>> and >>>>> letting the replicas catch up, you ensure that the replicas will >>>>> actually receive the shutdown record and replay it. If you shut >>>>> down >>>>> the replicas first, they can obviously never catch up with the >>>>> shutdown >>>>> checkpoint of the primary. >>>>> As I recall, if you do shut down the primary first, one potential >>>>> danger >>>>> is that the primary fails to send the checkpoint record before >>>>> shutting >>>>> down, so the replicas won't receive it and obviously will not >>>>> replay >>>>> it; >>>>> or simply that they are behind enough that they receive it but >>>>> don't >>>>> replay it. >>>>> You could use pg_controldata to read the last checkpoint info from >>>>> all >>>>> nodes. You can run it on the primary after shutting it down, and >>>>> then >>>>> on each replica while it's still running to ensure that the correct >>>>> restartpoint has been created.