Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tl1PH-008g3c-AU for pgsql-admin@arkaria.postgresql.org; Thu, 20 Feb 2025 07:55:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tl1PF-002jiV-Ph for pgsql-admin@arkaria.postgresql.org; Thu, 20 Feb 2025 07:55:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tl1PF-002jhY-9b for pgsql-admin@lists.postgresql.org; Thu, 20 Feb 2025 07:55:37 +0000 Received: from fw.sz-a.kwebs.cloud ([109.61.102.40]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tl1PD-001rUn-0p for pgsql-admin@lists.postgresql.org; Thu, 20 Feb 2025 07:55:36 +0000 Received: from webmail.srv.kwebs.cloud (172-16-36-101.prometheus-node-exporter.monitoring.svc.cluster.local [172.16.36.101]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) (Authenticated sender: richard@kojedz.in) by mx.kwebs.cloud (Postfix) with ESMTPSA id CDA53C0C7; Thu, 20 Feb 2025 07:55:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kojedz.in; s=s02; t=1740038132; bh=9EHNqMRtjMflbAy22uAD63oorYhqO/WGHlD2ZNJzFCA=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=YG7yz8/LGiUxdSNGhDjpQqdKF7FMjjF+CnezR9g1S86U/2hdnhj8XeiooOfVrni4U B2wG++dSB5DVMcbc8Kg+YxB3JOxJDfKUc8JirrWJUYZHTTYiu8+h54q9CZzfUG5E7e LxD+9b4mlt/neWg6fdR3Z2SWaVXzXP4L2wxUjQs6cN79bd/OAYaTf05HQeFeScx2jL 08RVDt8QmjUmrxhaLemFsYd5zcFtUh6X+C4DZJt3DaRSDxsFb5a8CwVvRR79cRRzd3 xATlx60mjG6NVpmNFu+1pBrNnWF2/VPH9KqmLQiwYlpIKIB9enyS6umdQ/uOaRSg5v Vkn9828Dh9PNg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kojedz.in; s=s01; t=1740038132; bh=9EHNqMRtjMflbAy22uAD63oorYhqO/WGHlD2ZNJzFCA=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=Yh7uFPFcakJ12hbcYv0r2sChA7xTWl7JMYSU+yBPHIus3LBvybXGhygsY4gKnxBEM Tr9hf6c23x94BHsNyAwF8YXDV9mYP7tHChfO4Jya0kR/O6NZ2+DeyZMx6HZMP6d7ql IbJz6yiRBQH/dI8q+2lU1Np9GaPBMrDumyT7MIqdiBcnxqZCoj8GvzP2qkpnOEPtwk 5nY9inHNizaKhRbkv31B9umIVHQLCELY6RMe0XPBA74spjwHDKXgGPkz3Ydzo4R/j2 Cb7QHokg5uVr4dNRWM3CBBTG4G+m1/wV3LrmEGpfnyTYO5A9wOK9tNSpdk0rjbhe6z LvrbY3Y9ZK5Z4QQBRfX+IUACDlmTv96QQ5epG1jfgHuwFB1l/63v2N5qckeNsJKh+s aiqNIOz9hGEUyIZEc6Qc883HwLYPl9f7MW/vQUIVXpijzJy18KuysLVtgQD8ybyMFl RSW2lwElZBoVpNBexn8QW9ulzNPO9+DtzrILr7qp8LSiyZsCGzP MIME-Version: 1.0 Date: Thu, 20 Feb 2025 08:55:31 +0100 From: richard@kojedz.in To: Jerry Sievers Cc: =?UTF-8?Q?=C3=81lvaro_Herrera?= , pgsql-admin@lists.postgresql.org Subject: Re: In-place upgrade with streaming replicas In-Reply-To: References: <202502191554.6asefyczl7jn@alvherre.pgsql> Message-ID: <71ed34f8c1ed858262a7cf50b633682f@kojedz.in> X-Sender: richard@kojedz.in Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Dear Jerry, So, yes it turns out that some kind of loop must be involved here, as you described: 1. ensure cluster is running 2. stop primary 3. wait some time 4. stop replicas 5. check if checkpoint locations match. repeat from step 1 if out-of-sync. My question here is, the unreliable step here is 3rd one. Can we query the replica runtime if he did catch up? I mean, that after stopping the primary, we can obtain the checkpoint location from pg_controldata, then, can we somehow query the running replica about that? Thanks in advance, Richard 2025-02-20 08:49 időpontban Jerry Sievers ezt írta: > richard@kojedz.in writes: > >> Dear Alvaro, >> >> Thanks for your answers. Unfortunately, I was unaware of a shutdown >> record, that makes a difference then. So, I definitely must stop the >> primary first, then use pg_controldata to obtain checkpoint >> info. Then, can I query the replicas while they are up and running if >> they've received the shutdown record or not? So, after shutting down >> the primary, how will I know if a replica has received the mentioned >> record, and is safe to shutdown? > > > > Hmmm, not sure about that but what we do, is stop primary, wait a > $short time, then stop replicas... > > Then run pg_controldata on all nodes | filter out only the line > indicating latest checkpoint and sort -u the output. Expect only a > single line if all are matched. > > You may also wish to first insure that you got the same number of > lines as total node count before doing the sorting and uniqueing. > > Very rarely on our huge systems, we'd have a mismatch after the > verification in in those cases, our automated upgrade procedure > restarts all nodes and then does the shutdown and verify check again. > > HTH > > >> >> Thanks for the clarifications. >> >> Best regards, >> Richard >> >> 2025-02-19 16:54 időpontban Álvaro Herrera ezt írta: >>> On 2025-Feb-19, richard@kojedz.in wrote: >>> >>>> With this, I have the question, that after the shutdown of primary, >>>> what is >>>> the guarantee for replicas having the same checkpoint location? Why >>>> does the >>>> order of shutting down the servers matter? What would be the really >>>> exact >>>> and reliable way to ensure that replicas will have the same >>>> checkpoint >>>> location as the primary? >>> The replicas can't write WAL by themselves, but they will replay >>> whatever the primary has sent; by shutting down the primary first and >>> letting the replicas catch up, you ensure that the replicas will >>> actually receive the shutdown record and replay it. If you shut down >>> the replicas first, they can obviously never catch up with the >>> shutdown >>> checkpoint of the primary. >>> As I recall, if you do shut down the primary first, one potential >>> danger >>> is that the primary fails to send the checkpoint record before >>> shutting >>> down, so the replicas won't receive it and obviously will not replay >>> it; >>> or simply that they are behind enough that they receive it but don't >>> replay it. >>> You could use pg_controldata to read the last checkpoint info from >>> all >>> nodes. You can run it on the primary after shutting it down, and >>> then >>> on each replica while it's still running to ensure that the correct >>> restartpoint has been created.