Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrsMb-00AemY-1L for pgsql-bugs@arkaria.postgresql.org; Mon, 16 Feb 2026 06:45:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vrsMa-003yyO-14 for pgsql-bugs@arkaria.postgresql.org; Mon, 16 Feb 2026 06:45:44 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrsMZ-003yyC-31 for pgsql-bugs@lists.postgresql.org; Mon, 16 Feb 2026 06:45:44 +0000 Received: from mail-qv1-xf2b.google.com ([2607:f8b0:4864:20::f2b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vrsMX-00000000qP7-1Uh9 for pgsql-bugs@lists.postgresql.org; Mon, 16 Feb 2026 06:45:43 +0000 Received: by mail-qv1-xf2b.google.com with SMTP id 6a1803df08f44-896f4627dffso37485286d6.0 for ; Sun, 15 Feb 2026 22:45:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771224341; cv=none; d=google.com; s=arc-20240605; b=kmyjgFkU1mtR7XGK6Y2rCRtgqKmxgNSFLfvEQXtZgS+bbylF45TchpcZXnXIe884an 4smXUPm4V1Twti+px8j+EKnuI6bR12/PN/goc7w+OLugYIQCfybJ2aRtCtyGcsM/ypKb dHJ29fEbKuYQf+NEWC5Ai4ULMajt16BCywE70VHpsa1diSP5COQwV7y0yJBhFgwqgL2C GmQRxZP1EocZW8mQEoXV+755ep3glGOlcTQYwuKvi7lHcI7V7kGax5wU8Ei7JHocTBbe /KyZfFTVzcUcc3t5jw/6K4TYQUoB8jBuUs8P+tb4It1xjdF9dU3iAgHsiuAEgtpJlOny g4qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=nQGRcfutepUVorq8CXsq1QW3s/kuOtcc6Ow8fbVPgx4=; fh=4uMQ6GV+qkZ4r+DsShhuKvvYZASXt8UgX976HTLUtP0=; b=k3x31fqXbHxSvWtVlrZrKfQ901bYZgjbm9iZZRGywZhexlvwsGmkVBjHWxgHlq6azz oHAN+SHL1jMwjz1aDXNVDTDaZPgCVJCJeNT7jCKilGt2ASALW7xkhJWX/Tt5/S9qrRAO 8iaQW8LY8tIGY71Hw/geBd++gsc556sWCtiTUrV5SbU2s+o+EX+ke3JgpBscZTmDSrqb XhlHaO4yR7RlqsFNBNM329wnJmWPRcCJtg8SlP/L5ZbqrYyPgOX0JSKcYu43wOMz1KtB SI9Gfv4upiTfh9mHyVEjB18xDI7nYiAL4HT5h/nxN/MbX5LugZrMI4F277lJ4TC1/gBX EFXw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771224341; x=1771829141; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=nQGRcfutepUVorq8CXsq1QW3s/kuOtcc6Ow8fbVPgx4=; b=MD6ldhLMgH7D441DF7Z2T434o4Mz4VruWtFyAuI5DfbuQIjOCdL6pdXwgDeECZBCbt 39bdDl1cqc7dYhbzTIKh+YMF7DVFr2vH8VsiBE1Er/IJW5tj535qibktQOhrUA9eCu/M o6u22tn/ikW4Rka33xsXNcQo4oMa8DlCNkPXL1+T06tqvubaIWOlfFVY5wsAQzGG8cLf cE8oSB2gfBWskyzTIMqA3r17cLZTtYSjfg7oJch227obVpxzxha9C7EL6OAQ2OrleuMC iD4dRY9yGRs9Wo1aGjuBysTc748WS2EZ3L4yZp05Km3oSCM4dwdYOXYKOD1Fc9ih5k9+ gicQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771224341; x=1771829141; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nQGRcfutepUVorq8CXsq1QW3s/kuOtcc6Ow8fbVPgx4=; b=cYxa5N3cZDPsXNXXdQ/lSuob/loaKd88CXsqOxV0aQZSQZq2r4IO9ySDEtEyMAHY3j CVycX7INQTa/kxyy07FspOQPPvODkZUyJzbDVv38qLVVjYNCFtxHWYvktKQw/f+ONY4k n+74eIQlucjlajfPiZVsOJ+65vmnlDmhcurnoTI3DlX5VGecs5kvHFRk+PIUcibe64K+ C3HXfoc29Osz5dXUUKZFoKorfBhrXGKg/hAHwXrI0d8SqchiK2X2O35r1cx+DcG0rZVl nFq+BAw+8uZRGUTh1PM2gLcwnXZkAXbR/aPXJ5bTmrCSl9Sx6gOaAEGMrm+riONJc0he X+kQ== X-Forwarded-Encrypted: i=1; AJvYcCX91c+1g5eNSyltatMFNFB8MUtuHIt087nn3D9416Evm6kf9513H7esJhSqVm5FrRxqulwC4idm5cQu@lists.postgresql.org X-Gm-Message-State: AOJu0Yzv3Sqx6ejYZC3JltTDWI8yE01tzeasDzoQLPLvokJbZvgO6aBT wZ0Wt9NM6fXnxV7o5rP9ir2vC3jnZbYQgpVupN4LsKuuUJi1Gzqn9HUkg+J2cSARrbt5tQ+mZBK JmWtAdYhnrL9ruexl1buj46VnAI6f3Io= X-Gm-Gg: AZuq6aL/m7cIfRId2yTQXHg5whOs9AFTAawuzBuXxE34etcnsvWZC5N1R53TXbSI/IK njhvaCrd+exKPY1JwWdOTZCbVfh2GQj69zElG7ZRkOCTABrWiD48w262qRP+SINcETrldCM56cS 12CppdQ4K+pEATgCLaWskEkI+7w4HNZC6kFl+8Q7MMfJwiekwCLXrkiqB2mXa6voAq/acD/zLg8 31sK5wVhUXtOBHULdIiOHjz5B9rgt44hnfBvocOs7gLZDQ6dYkxpRY0HV5EHbuOMppta/1bHo4/ AMFKd38EOuJu4OxUFJlduU8i7TYa5ovpEpjwkDaSfv6Bc2E38Q7cPdVopZka5qbgk6Mw2CT3otH 6DObo X-Received: by 2002:a05:6214:518e:b0:882:44cc:f572 with SMTP id 6a1803df08f44-897360a4c86mr141303436d6.20.1771224341304; Sun, 15 Feb 2026 22:45:41 -0800 (PST) MIME-Version: 1.0 References: <349f9c82-3a8b-48ad-8cc4-fe81553793dd@iki.fi> In-Reply-To: <349f9c82-3a8b-48ad-8cc4-fe81553793dd@iki.fi> From: Kirill Reshke Date: Mon, 16 Feb 2026 11:45:29 +0500 X-Gm-Features: AaiRm53i1zzJd1ReNu_ZhDsen6P9KUW5FGirqNEgCUdmS-J5EfA9skZUdpcGeYA Message-ID: Subject: Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" To: Heikki Linnakangas Cc: Sebastian Webber , pgsql-bugs@lists.postgresql.org, Andrey Borodin , =?UTF-8?Q?=C3=81lvaro_Herrera?= , Dmitry Yurichev , Chao Li , Ivan Bykov Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, 14 Feb 2026 at 16:42, Heikki Linnakangas wrote: > > On 13/02/2026 22:31, Sebastian Webber wrote: > > PostgreSQL version: 17.8 (standby), 17.5 (primary) > > > > Primary: PostgreSQL 17.5 (Debian 17.5-1.pgdg130+1) on aarch64-unknown- > > linux-gnu > > Standby: PostgreSQL 17.8 (Debian 17.8-1.pgdg13+1) on aarch64-unknown- > > linux-gnu > > > > Platform: Docker containers on macOS (Apple Silicon / aarch64), Docker > > Desktop > > > > > > Description > > ----------- > > > > A PostgreSQL 17.8 standby crashes during WAL replay when streaming > > from a 17.5 primary. The crash occurs after replaying a > > MultiXact/TRUNCATE_ID record followed by a MultiXact/CREATE_ID > > record. > > Thanks for the report, I can repro it with your script. It is indeed a > regression introduced in the latest minor release, in the logic to > replay multixact WAL generated on older minor versions. (Commit > 8ba61bc063). Adding the folks from the thread that led to that commit. > > The commit added this in RecordNewMultiXact(): > > > /* > > * Older minor versions didn't set the next multixid's offset in this > > * function, and therefore didn't initialize the next page until the next > > * multixid was assigned. If we're replaying WAL that was generated by > > * such a version, the next page might not be initialized yet. Initialize > > * it now. > > */ > > if (InRecovery && > > next_pageno != pageno && > > pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number) == pageno) > > { > > elog(DEBUG1, "next offsets page is not initialized, initializing it now"); > > The idea is that if the next offset falls on a different page > (next_pageno != pageno), and we have not yet initialized the next page > (pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number) == > pageno), we initialize it now. However, that last check goes wrong after > a truncation record is replayed. Replaying a truncation record does this: > > > > > /* > > * During XLOG replay, latest_page_number isn't necessarily set up > > * yet; insert a suitable value to bypass the sanity test in > > * SimpleLruTruncate. > > */ > > pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff); > > pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number, > > pageno); > Thanks to that, latest_page_number moves backwards to much older page > number. That breaks the "was the next offset page already initialized?" > test in RecordNewMultiXact(). > > I don't understand why that "bypass the sanity check" is needed. As far > as I can see, latest_page_number is tracked accurately during WAL > replay, and should already be set up. It's initialized in > StartupMultiXact(), and updated whenever the next page is initialized. > > That was introduced a long time ago, in commit 4f627f8973, which in turn > was a backpatched and had deal with WAL that was generated before that > commit. I suspect it was necessary back then, for backwards > compatiblity, but isn't necessary any more. Hence, I propose to remove > that "bypass the sanity check" code (attached). Does anyone see a > scenario where latest_page_number might not be set correctly? > > If we want to play it even more safe -- and I guess that's the right > thing to do for backpatching -- we could set latest_page_number > *temporarily* while we do the the truncation, and restore the old value > afterwards. > > This fixes the bug. With this fix, you can replay WAL that's already > been generated. > > - Heikki Hi! Patch LGTM. Lets wrap new minors with IT? -- Best regards, Kirill Reshke