Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uZZaK-005Lt6-Eu for pgsql-general@arkaria.postgresql.org; Wed, 09 Jul 2025 18:32:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uZZaH-002kLd-Lh for pgsql-general@arkaria.postgresql.org; Wed, 09 Jul 2025 18:31:58 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uZZaH-002kLV-5n for pgsql-general@lists.postgresql.org; Wed, 09 Jul 2025 18:31:57 +0000 Received: from mail-wr1-x434.google.com ([2a00:1450:4864:20::434]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uZZaF-006RdJ-0m for pgsql-general@lists.postgresql.org; Wed, 09 Jul 2025 18:31:56 +0000 Received: by mail-wr1-x434.google.com with SMTP id ffacd0b85a97d-3a6cd1a6fecso237536f8f.3 for ; Wed, 09 Jul 2025 11:31:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1752085912; x=1752690712; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kPvUMLLJzt0qS5ip5iVL0Q94P7+qVeMjp5xixnTVrFA=; b=rMShM8BdHqtK7SbHw2vpydPZld4W2Km5PHjvPPIJCYprfJSn88frrEIcQrNy/5soKl qnbHcuZSy8KB/Mb8U2JOc8CPwQ3LmACuqAjAz6GIR9CaeVmO5UoiHMPTpR11eSTKLvxF jKU8IqdjbwZEk0A1VLcp85IzNka90V0jGrAWQ6FZKOjq7elpf0m84l82GNNVFCfmKMVq UiZqH2rXEUeBE3cHgr4eoq3yVAI/nRsr0DmpV+7ifoIAGLmmZDe2aV/MFGEZot+/tUVh iB5sMqrvR7fW7N9QBjekTrnyRy4LOkM7yftP/CnLj2TyGU65n6eUuqElLF4log6mrCif 1nnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752085912; x=1752690712; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kPvUMLLJzt0qS5ip5iVL0Q94P7+qVeMjp5xixnTVrFA=; b=NyvWGKiffrtEEtaStjHLI32D8V/Btz5m1G3xNmxXBNS3G9Oltdw2s2MvYIhFhNhq23 Fwcz3woEtt9HiYWIeLgFlBP04bvSNpWMZ1wvHPpYMLp/OceZib9ElDQTff2sup3Yq2MF KnMihC079vRoRADILhBDxpoC+VxoLcRXfdshpPaofOBBLxMJBfG0zFrqVdzgecquFCs3 xCOvHovUnAcO0AB+3mfgxqOQRWHV2NcY+hOp6kIqSKx67YMfPwlU53z/0sawXu5h6Frc h4e43HQyUap7mT4UBG53+G8YfiTGzCITGK7caqaino4zPAwlxPI5W4U1dcQUr45rJ67U uuZg== X-Gm-Message-State: AOJu0YxFzhUXT/V3H2IINsY8mJ1zZIPvMLAceJW2QzyRC19QdEx9MdHd wb9BwgiPXWR+nKBEExgZbIxNaKGl7BeHwbcf2mVVdYeL/+25wnSTA5u13XTU02ldBtatlD7Rofh KJDysIH22O2Dn8xrVkEUFDllfiJMd4s0Iyynfnzlyp8efCS5AklJb X-Gm-Gg: ASbGncsz3BpT7lGFRuIK40hvVfmx1ZjKCz5a+nIfU1p44MNu29Cc8Z7fdG+PglDp7eg L/AkhCFm9REWcZstMOCXKWebcMdaZgI/YDiHlKBbowfuPa6iFLdA+IZsTwMHCq2+LlCcbB+2M5M iq8laXQ4G5p6UwA5n+nW+aou7Gh2VjQp3phjaRNNJGIIA= X-Google-Smtp-Source: AGHT+IE8NQIcLuxRUu/QOaJm3/UYnxSKtL+nG5v8hIYhps/rXp56YFXTs7EK0AllOo6TPAOd7Srfv+izdVZ/QclXN4Y= X-Received: by 2002:a05:6000:4615:b0:3a5:2a24:fbf5 with SMTP id ffacd0b85a97d-3b5e44e5fc2mr2546683f8f.18.1752085912182; Wed, 09 Jul 2025 11:31:52 -0700 (PDT) MIME-Version: 1.0 References: <01df0ca4-cf32-448f-a2cb-e44a8e9f7fd6@element.io> In-Reply-To: From: Peter Geoghegan Date: Wed, 9 Jul 2025 14:31:25 -0400 X-Gm-Features: Ac12FXytKaKrFDbfbqvtKcqrUduUzzG5pNoR_aYjUy23h4JteNGJkYWQ2Gl8GxU Message-ID: Subject: Re: Corrupt btree index includes rows that don't match To: Erik Johnston Cc: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Jul 9, 2025 at 1:02=E2=80=AFPM Erik Johnston wro= te: > To recap: the situation is that, looking at our backup from 2025-06-26 vi= a pageinspect, we have btree index rows which point to either non-existent = heap TIDs, or to heap TIDs with data which does not correspond to the index= row. In fact it looks like we have entire index pages which point only to = non-existent heap TIDs. This is a generic symptom of corruption. You can see this sort of thing whenever (say) the storage lies about fsync having flushed everything to disk. The index might still contain TIDs that point to heap pages that existed before the crash, that didn't survive crash recovery. It's quite likely that those same TIDs will be used for wholly unrelated logical rows when the application inserts a little more data. > Empirically, and surprisingly to us, when one does a SELECT from an index= entry that points to a non-existent TID, the index entry is quietly ignore= d. > > We therefore suspect that this index corruption has been present for some= time (possibly years); more recently those non-existent heap TIDs have bee= n recycled, and that is when we have noticed the effects of the problem. That sounds plausible. --=20 Peter Geoghegan