Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tlKAt-00B1wh-Vj for pgsql-admin@arkaria.postgresql.org; Fri, 21 Feb 2025 03:58:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tlKAs-0012Bt-13 for pgsql-admin@arkaria.postgresql.org; Fri, 21 Feb 2025 03:58:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tlKAr-00127k-0u for pgsql-admin@lists.postgresql.org; Fri, 21 Feb 2025 03:58:01 +0000 Received: from resdmta-h2p-565048.sys.comcast.net ([2001:558:fd02:2446::c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tlKAi-0021DF-18 for pgsql-admin@lists.postgresql.org; Fri, 21 Feb 2025 03:57:53 +0000 Received: from resomta-h2p-555030.sys.comcast.net ([96.102.179.204]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 256/256 bits) (Client did not present a certificate) by resdmta-h2p-565048.sys.comcast.net with ESMTPS id lJyYt12LZiKwNlKAdtQBWS; Fri, 21 Feb 2025 03:57:47 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=20190202a; t=1740110267; bh=sRh5mZkj1uIlbGzQK+Cp6Mw9wtH6XTFIlYaAyS+dtno=; h=Received:Received:Received:From:To:Subject:Date:Message-ID: MIME-Version:Content-Type:Xfinity-Spam-Result; b=5M2iVb+EZC3xU4FgHEG68Z6BKamoWIJ75d8EyVdfIQg17D8xoIFo0vcF+G0qNNvA0 xj/j8UguNwn6c0Pf3pwytj2g+R1dkaAopyMbPYoxvOk1GxA7DSV5ApqUvCPs1a08S4 dHYnTZFvCuHgfEBXmCVZMqP5Lr4vU/RnRNwZ/dzVPoI6kbHu04awTyuKef9GzFfgpd Hb+iD2/RszFv+gQgtUNc8gxYfAOH1L/6QLOql+TS8Bo0a+HI5qDr7UUrBMAyMieCK3 k0/fwEZbxsNyDLaxEZNuQh+UZCpP4TRd++OmGdFsWuwkFiz6P7c86Wkiqfcswtmwho 0+v1XMpsvPjug== Received: from Geralds-Laptop.local ([98.220.100.35]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 256/256 bits) (Client did not present a certificate) by resomta-h2p-555030.sys.comcast.net with ESMTPSA id lKAZt1wutR3cBlKAatUnoQ; Fri, 21 Feb 2025 03:57:46 +0000 Received: by Geralds-Laptop.local (Postfix, from userid 501) id 5C7E27A14C4; Thu, 20 Feb 2025 21:57:43 -0600 (CST) From: Jerry Sievers To: richard@kojedz.in Cc: pgsql-admin@lists.postgresql.org Subject: Re: In-place upgrade with streaming replicas In-Reply-To: <71ed34f8c1ed858262a7cf50b633682f@kojedz.in> (richard@kojedz.in's message of "Thu, 20 Feb 2025 08:55:31 +0100") References: <202502191554.6asefyczl7jn@alvherre.pgsql> <71ed34f8c1ed858262a7cf50b633682f@kojedz.in> Date: Thu, 20 Feb 2025 21:57:43 -0600 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfBex6A2IRmHohDsFRIRLlsDjkpuxsn38B7LUBK8KI3xqrnAZXNtFiPqOAv96viaKreuSjiN4uFNOPvrAC3GVRmRQcXdlYcOjn6TXyDpSW2aXuM0BUOMz 1qkk6S8jTRv4cNWBlSjYoWDMaRNYBp2brnaQ/PEfiMBA40qll2D+s1pHkUf/icBZswi1O77eLdHME19sjKCaZh9Ep9/K7CS86bn08ohhpuPPsQhjCvI0P8Dd +uH1MoIBk53IqtY1XNxEKg== List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk richard@kojedz.in writes: > Dear Jerry, > > So, yes it turns out that some kind of loop must be involved here, as > you described: > > 1. ensure cluster is running > 2. stop primary > 3. wait some time > 4. stop replicas > 5. check if checkpoint locations match. repeat from step 1 if > out-of-sync. > > My question here is, the unreliable step here is 3rd one. Can we query > the replica runtime if he did catch up? I mean, that after stopping > the primary, we can obtain the checkpoint location from > pg_controldata, then, can we somehow query the running replica about > that? Assuming your client traffic has been stopped ahead of time and perhaps you did a lockout via HBA or other means, including forcible termination of persistent clients (we usually do a restart of the primary to insure this)... We don't wait more than a few seconds before also stopping the replicas and the vast majority of times all nodes are at the same checkpoint. Cheers! > > Thanks in advance, > Richard > > 2025-02-20 08:49 id=C5=91pontban Jerry Sievers ezt =C3=ADrta: >> richard@kojedz.in writes: >>=20 >>> Dear Alvaro, >>> Thanks for your answers. Unfortunately, I was unaware of a shutdown >>> record, that makes a difference then. So, I definitely must stop the >>> primary first, then use pg_controldata to obtain checkpoint >>> info. Then, can I query the replicas while they are up and running if >>> they've received the shutdown record or not? So, after shutting down >>> the primary, how will I know if a replica has received the mentioned >>> record, and is safe to shutdown? >> Hmmm, not sure about that but what we do, is stop primary, wait a >> $short time, then stop replicas... >> Then run pg_controldata on all nodes | filter out only the line >> indicating latest checkpoint and sort -u the output. Expect only a >> single line if all are matched. >> You may also wish to first insure that you got the same number of >> lines as total node count before doing the sorting and uniqueing. >> Very rarely on our huge systems, we'd have a mismatch after the >> verification in in those cases, our automated upgrade procedure >> restarts all nodes and then does the shutdown and verify check again. >> HTH >>=20 >>> Thanks for the clarifications. >>> Best regards, >>> Richard >>> 2025-02-19 16:54 id=C5=91pontban =C3=81lvaro Herrera ezt =C3=ADrta: >>>> On 2025-Feb-19, richard@kojedz.in wrote: >>>>=20 >>>>> With this, I have the question, that after the shutdown of primary, >>>>> what is >>>>> the guarantee for replicas having the same checkpoint location? Why >>>>> does the >>>>> order of shutting down the servers matter? What would be the really >>>>> exact >>>>> and reliable way to ensure that replicas will have the same >>>>> checkpoint >>>>> location as the primary? >>>> The replicas can't write WAL by themselves, but they will replay >>>> whatever the primary has sent; by shutting down the primary first and >>>> letting the replicas catch up, you ensure that the replicas will >>>> actually receive the shutdown record and replay it. If you shut down >>>> the replicas first, they can obviously never catch up with the >>>> shutdown >>>> checkpoint of the primary. >>>> As I recall, if you do shut down the primary first, one potential >>>> danger >>>> is that the primary fails to send the checkpoint record before >>>> shutting >>>> down, so the replicas won't receive it and obviously will not replay >>>> it; >>>> or simply that they are behind enough that they receive it but don't >>>> replay it. >>>> You could use pg_controldata to read the last checkpoint info from >>>> all >>>> nodes. You can run it on the primary after shutting it down, and >>>> then >>>> on each replica while it's still running to ensure that the correct >>>> restartpoint has been created.