Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tKcAO-009k07-Po for pgsql-general@arkaria.postgresql.org; Mon, 09 Dec 2024 11:43:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tKcAL-007bW8-QS for pgsql-general@arkaria.postgresql.org; Mon, 09 Dec 2024 11:43:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tKcAK-007bU3-Dv for pgsql-general@lists.postgresql.org; Mon, 09 Dec 2024 11:43:06 +0000 Received: from fhigh-b5-smtp.messagingengine.com ([202.12.124.156]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tKcAH-001tGv-Mu for pgsql-general@lists.postgresql.org; Mon, 09 Dec 2024 11:43:03 +0000 Received: from phl-compute-02.internal (phl-compute-02.phl.internal [10.202.2.42]) by mailfhigh.stl.internal (Postfix) with ESMTP id 5C24F2540075 for ; Mon, 9 Dec 2024 06:43:00 -0500 (EST) Received: from phl-imap-02 ([10.202.2.81]) by phl-compute-02.internal (MEProxy); Mon, 09 Dec 2024 06:43:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lateraljoin.com; h=cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:message-id:mime-version:reply-to :subject:subject:to:to; s=fm3; t=1733744580; x=1733830980; bh=Ze DiiouGkHst+XDylG/NRcdSClZ8nvU5HLqBDhPMgxc=; b=S7LVG/Ox4RYIxgxd7z OBrxVty0crXUy3OS2azsVm9rgTfppDUUQiw7Bp2SWILKPm4HbAYVJVxg0DJI7mnl TO9reOzvy/MhAX1Sc5z5DaoOD8FOyWyt98QA3W05pXBV6MBw0J6O/K/r0Siz/aja kmJyr2Jk6wH5FrU5TevkkTtZfiv4QC5GTTid558JWHZIj0M959wmL7VMBV6dmPCF 9Z7B/uil+FtlJbVCTj9OKVAtpHfTJElroNGPbSkS43j1A6cIx6HeWnlTvkQtZQZB fmcNbQsx3ph+2FiReciE6gMmt5APJPrvBY95J79epM76OUdpsjycKQUk+Fv4l6Rp s3gw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1733744580; x=1733830980; bh=ZeDiiouGkHst+XDylG/NRcdSClZ8nvU5HLq BDhPMgxc=; b=R/dpL+TferX5gT39pb+/p5YRITvR3tcFFymxfhAzLvoAj6y05MM +gfc41BdeqMx0apUXgXkDMJdUy+o+VZE4WSMGimrSuewDpQsz3fLlU6jOp949LoX 3rRGMsk9my1hetqsKkr2FbuhWVW0LKbcGVk9ZLBGVBRn8hecV9H5AUjV/C+TUGG6 FH/6pTk6hh+kLQTMvlmTmqRnpNg2YqPZhmUJ7uuvEPTcIuSY/O92NKWPvPjJ6o0Q hbXz117IIjrswNbP0iARBlqbVMFoFJ1fDmOO9K32jC66zLjW1sp3CJjxemU/AVlt adx9Z56GuwNZ2tcj+Ln0nQkjk8Wd2r9d/9A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefuddrjeeigddtiecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdpuffr tefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecunecujfgurhepofggfffhvf fkufgtgfesthejredtredttdenucfhrhhomhepfdflohgvucghihhlughishhhfdcuoehj ohgvsehlrghtvghrrghljhhoihhnrdgtohhmqeenucggtffrrghtthgvrhhnpeejuddtge evheffgeetteduvdduhfeigeeugfefgfekveehvdfghfefhfeifeehgfenucffohhmrghi nhepphhoshhtghhrvghsqhhlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrg hrrghmpehmrghilhhfrhhomhepjhhovgeslhgrthgvrhgrlhhjohhinhdrtghomhdpnhgs pghrtghpthhtohepuddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepphhgshhqlh dqghgvnhgvrhgrlheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: i7cb144c0:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id ED535B00069; Mon, 9 Dec 2024 06:42:59 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface MIME-Version: 1.0 Date: Mon, 09 Dec 2024 11:42:39 +0000 From: "Joe Wildish" To: pgsql-general@lists.postgresql.org Message-Id: <7af5c98d-e547-4caa-b275-441c80dbe6c9@app.fastmail.com> Subject: pg_upgrade vs. logical replication Content-Type: text/plain Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk We maintain c.50 logical replicas. Typically the producer version is 12 or 13, and the subscriber version is 14. We intend to upgrade the subscribers to 15 using pg_upgrade. However, we ran into an unexpected problem with that approach. I couldn't find much being mentioned about it on the web, so I'm sending this message for a sanity check that what we are seeing is expected, and, that the proposed solution make sense. The problem we see is that after running pg_upgrade on the subscriber, details about the subscription are lost; specifically, pg_subscription_rel is empty, and the associated replication origin has no LSN information in pg_replication_origin_status. I found a thread on the hackers list that seems to call out this problem: https://www.postgresql.org/message-id/flat/20230217075433.u5mjly4d5cr4hcfe%40jrouhaud It is a very long thread. TLDR; Julien describes a similar problem to what we have (although in our case, it is a "pure" logical replica with no consideration about switching between physical replication, or needing to cut over, etc), and a fix has gone in to PG17 that sorts this problem. However, we can't go to 17 yet, so need a solution for 15 and 16. We are doing the following sequence of events: (1) Shutdown subscriber; (2) Startup subscriber with max_logical_replication_workers set to 0; (3) Copy the data in pg_replication_origin_status to a table; (4) Shutdown subscriber; (5) Run pg_upgrade; (6) Startup subscriber with max_logical_replication_workers set to 0; (7) Apply the LSNs recorded in (3) to the appropriate origin using pg_replication_origin_advance; (8) ENABLE, REFRESH PUBLICATION WITH (COPY_DATA=FALSE), & DISABLE the subscription; (9) Shutdown subscriber; (10) Startup subscriber normally; (11) ENABLE subscription. Step 7 requires a mapping from old origin name to new origin name as the names are derived from subscription OIDs. Steps 2 and 6 are required to ensure no activity occurs that could advance LSNs. Unlike in the mentioned thread, we know we won't have to deal with tables that are partially replicated i.e we know all tables will be 'r'. We have done some testing with this and it seems to work. We run a repeated INSERT on the producer several times a second, and notice that after step 11, we have all expected rows arrive on the subscriber without gaps. However, I thought I'd ping this list for a sanity check that what we are doing makes sense. And, if so, perhaps it'll be useful to other people in a similar situation. Comments / suggestions welcome. -Joe