Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uxm8p-000PEx-EK for pgsql-general@arkaria.postgresql.org; Sun, 14 Sep 2025 12:47:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uxm8l-007Phw-Gp for pgsql-general@arkaria.postgresql.org; Sun, 14 Sep 2025 12:47:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uxm8l-007Pho-3v for pgsql-general@lists.postgresql.org; Sun, 14 Sep 2025 12:47:35 +0000 Received: from mail-ed1-x530.google.com ([2a00:1450:4864:20::530]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uxm8h-000l74-2m for pgsql-general@lists.postgresql.org; Sun, 14 Sep 2025 12:47:35 +0000 Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-62f1987d44fso1166210a12.0 for ; Sun, 14 Sep 2025 05:47:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757854050; x=1758458850; darn=lists.postgresql.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WlcBI7Sb5eAmtidoXR4vKM5AuiR58JDcvaQMh+jTews=; b=Ta3ZfQFdcQ178cVmymgi9BDimDZx5GNXR8lIchd7OoEecyioXmWdB5ie1Zr5JaYLjU R8ewhI2C+TwfKqQZwszkgElURphcDnj+7Bxqc95SYBgadRiS5Cw1zbKJNbfXPl0B7tM2 TTMjuIqaqjenIgF0osgSMERkCZ67AZkYAD6VvEV6pilal7GoFGAjPzdP4pJQJ2huOJHY NhCn/rsDIpw4pTaZiFD+QsfIvmJfJXUxzX9D9ylXvLFUZzJb15OShiHJJHsVD8Y385m6 RRQZSIfCJDxBVPZVA/mBuwPjv10EFJrCmpDYIOf9SzKGHN0q3DksdMHl8U449YjZcoWQ CzmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757854050; x=1758458850; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WlcBI7Sb5eAmtidoXR4vKM5AuiR58JDcvaQMh+jTews=; b=wDdD0QKX3uptE07dAszheyk7q4LTYCUyUK6YARBNORjYLbP6AyOVnWeSVHvJgYU5X8 nqamPF7M5huUCK8dd/hcjRl8U/R7Pt31vD3ae0u1ZXVzL9wLekXzGQlQsWdnVNtrcQ2Z L2o/tfp7iExsNmsEOYGSX+Z7DaeDSiPfrxbu54pPBaYYbPQ7btDtbu0hEIKCHata6yQI WYw2FrEgiR1yHatxeupUADN7rNQEJi6A3OUMQUloScKXUeFEhAkEZOPboswhleHE7qbN uCzsIdEigZiE+1V5HIqJHHBk8tHgZJg1eSvLmc69mTMRo6+SSY6AxpoP1Q2/QM5aEcI+ ARMQ== X-Gm-Message-State: AOJu0Yzb6yd5PSvnjzsU5/0zLoNUVZwqkkelPvmvAna8chEW5q6HolsR EkPE1oum1M7vv6qBvX7bGb3WG1s7lPYDV5uNqKh01Ct7Ri4l7FNOlVb77OuOrO0ns5VwMmJKv8o WlsEzNri9TFr4cP1jB9qcTZBioPhG34Dvkw== X-Gm-Gg: ASbGncviOQEiL1hXQfmD6BpNbUQS9xDrW0Yv9fQRWHKT+ZQa1NIgUsiUkSPscy4K3w4 rx4w+OW0Wc9VsHBaOxWBR0Bw7Tis5VJAwoZlIhQ9/hEr5rA5Qe0UMK+HoWoVkfn9HwHVuX4VKwE PSboyjw62IrBnClEFsikxh6MBNlfHiOKEe0BfBQOH2AaXSymqeyWFWsfumMIER+e4WlDW7NKJ/Y M+m49aCMVK2uiQWVTEZt7cXlhfa4pixRWCoQvYLo1XHj21RgShdJCws98/wuHFZqM/yNjqt5aft CogS/YmVioWxr0vcsuM7V57hEy5Y/MLedaD2ZVWQpCFcCZIf+i4h6wxBKQ== X-Google-Smtp-Source: AGHT+IG3SOc4+5RSSsyi5yIBMwbH1ekQjnMZHH4mQAevkGhvk84C5TTN8kwvDhDhplTcmRKgYXJp0XR2LD9Tl+R3ol0= X-Received: by 2002:a05:6402:42cc:b0:61e:9c96:9326 with SMTP id 4fb4d7f45d1cf-62ed823e0cemr9240227a12.6.1757854050396; Sun, 14 Sep 2025 05:47:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Pawel Kudzia Date: Sun, 14 Sep 2025 14:47:19 +0200 X-Gm-Features: Ac12FXzKPsG1eiSWcIWk7k_fWvjGptNMdHBWtq9m6jI44J562gSsdTvHHCCts6k Message-ID: Subject: Re: Silent data corruption in PostgreSQL 17 - how to detect it proactively? To: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, Sep 14, 2025 at 12:35=E2=80=AFPM Laurenz Albe wrote: > > On Sun, 2025-09-14 at 10:30 +0200, Pawel Kudzia wrote: > > I've hit a silent data corruption for TOAST data - leading to some > > infinite loop when accessing bytea column for very particular row. I > > did not suffer data loss - data from streaming replica was fine, I've > > used it to rebuild the main server. > > > > I'm wondering if there's any proactive way of detecting that type of > > an issue rather than discovering pile-up of SELECT queries leading to > > CPU starvation or finding hanged backup jobs. > > > > Thanks in advance for your suggestions! > > > > I was originally running PostgreSQL 17.2 installed from project's deb > > packages, under Debian 12 on amd64. Environment is - without any > > recent crashes, with ECC memory and server-grade hardware. I run > > pg_dumpall every 24h, on a certain day it hanged. On re-run it hanged > > again, each time leaving single PosgreSQL process using 100% of single > > CPU core. > > > > [...] > > > > PostgreSQL's log does not have any error messages indicating data corru= ption. > > > > This server does not have checksums enabled, but - as I understand - > > such checksums cannot be checked online anyway. So - how can I detect > > similar corruption? > > A proactive way of detecting TOAST corruption... how about > > pg_dump -f /dev/null yourdatabase > > If there is TOAST corruption, that should give you an error. > > If your standby does not have the same problem, that increases the > likelihood that the cause is a hardware problem. Of course, it could > still be a software bug. > > Yours, > Laurenz Albe Thanks for your answer! pg_dump mydatabase ends with ~the same as the SELECT statement. It hangs leaving 'postgres: 17/main: postgres application_cache [local] COPY' using 100% of single CPU core and no messages in PG's logs. gdb stack trace for that process: #0 0x000055cb571ef444 in hash_search_with_hash_value () #1 0x000055cb5706217a in BufTableLookup () #2 0x000055cb5706715f in StartReadBuffer () #3 0x000055cb57068671 in ReadBufferExtended () #4 0x000055cb56d91573 in _bt_relandgetbuf () #5 0x000055cb56d96b9b in _bt_moveright () #6 0x000055cb56d96d69 in _bt_search () #7 0x000055cb56d97b83 in _bt_first () #8 0x000055cb56d93bbf in btgettuple () #9 0x000055cb56d8b289 in index_getnext_tid () #10 0x000055cb56d8b3ed in index_getnext_slot () #11 0x000055cb56d8a957 in systable_getnext_ordered () #12 0x000055cb56d812bd in heap_fetch_toast_slice () #13 0x000055cb56d3abae in ?? () #14 0x000055cb56d3afc5 in detoast_attr () #15 0x000055cb571b7bd6 in byteaout () #16 0x000055cb571eb9a1 in OutputFunctionCall () #17 0x000055cb56e711c7 in ?? () #18 0x000055cb56e71b2f in DoCopyTo () #19 0x000055cb56e6c06f in DoCopy () #20 0x000055cb570a2726 in standard_ProcessUtility () #21 0x00007fd02f890a81 in ?? () from /usr/lib/postgresql/17/lib/pg_stat_statements.so I was curious if there's any other tool that can spot that type of corruption and state it explicitly rather than guess it by monitoring execution time. #22 0x000055cb570a0c01 in ?? () #23 0x000055cb570a0d18 in ?? () #24 0x000055cb570a1296 in PortalRun () #25 0x000055cb5709d4a5 in ?? () #26 0x000055cb5709e82d in PostgresMain () #27 0x000055cb57099a3f in BackendMain () #28 0x000055cb570072aa in postmaster_child_launch () #29 0x000055cb5700adb9 in ?? () #30 0x000055cb5700ccc8 in PostmasterMain () #31 0x000055cb56d2ea11 in main () -- regards, Pawel Kudzia