Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uxk5Q-00039o-J3 for pgsql-general@arkaria.postgresql.org; Sun, 14 Sep 2025 10:36:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uxk5O-007JIr-GP for pgsql-general@arkaria.postgresql.org; Sun, 14 Sep 2025 10:35:59 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uxk5O-007JIj-4p for pgsql-general@lists.postgresql.org; Sun, 14 Sep 2025 10:35:58 +0000 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uxk5M-000LO4-29 for pgsql-general@lists.postgresql.org; Sun, 14 Sep 2025 10:35:58 +0000 Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-45decc9e83eso33293645e9.3 for ; Sun, 14 Sep 2025 03:35:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1757846154; x=1758450954; darn=lists.postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:from:to:cc:subject:date :message-id:reply-to; bh=UxIfnvDDBTxCRtLEjKGqkik1sACVsoagGPZ5r6MgjUI=; b=LAl34nySsjR7oz7lFIEpC4FHMERfgunmM3/MXS994wXpipeiEAcRzi7DSlWsrY0ifd ZdvT/pdAeEXg4CGpnit4YMMkxf7IO23G2j/jGv6iiWq/U8ge7x3/ikxzssm3X6A0QeDf khqKYUrOiy82Hi/N2Vc0eFtlAjkcsklb//XyK3S7cgsSaVnYWIYMZ8XbRupBztD+a6Db S441IUWTof5rwAQGe1Loo4wVYblvXNpG2Zy+t3n/yHG1sTqjC5IW7UsczrLaCLgB6kVS aEvsrX1njJfHJFK6wBOLn/5tN+veSPo9Zc/EsC82IU5s1VAm/d0ZsbWAeCVy0CAR3Tsm lVPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757846154; x=1758450954; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=UxIfnvDDBTxCRtLEjKGqkik1sACVsoagGPZ5r6MgjUI=; b=SMPNPr6Udk1ZE6hfVUovfLbqlGh92fLywl/E8F4tHGU4XbVaRQsaf6mCoZMAcNs2jv X2yviweHpVEuXk8EY4ExrkK4unkv/VgHjHEL6aIU/iE2L20JktwSNHjcxeVlqgZlcfB2 OSMr5mqk9LMadtbzV8+CMgV/tZBzYToIAVgbnYYJWBjLQjVzJJqJ+8B3SP9/XaSwT6yB /Jx+8gCKbOhED0MyNqYt3KHhJbnafcVnNILg/FtQ+Tdu7eya06HJTsbRrCCg9i9QjVxL 4FRP7X9r+gBum1Ve9203LbGLewnk7FdVfBFKlSJuCkBgCnDO65ffWCeog4sV2bjqLMw0 lI3A== X-Forwarded-Encrypted: i=1; AJvYcCVNSW2fS4zkqagDcss/9YQBjY1LafjaDeztpa8eDoacKSzXKuZ3WaUkKxozaQsXNtsplrmHTTx8yHTL5xAT@lists.postgresql.org X-Gm-Message-State: AOJu0Yx4u01mrw5CxhxWCyPWCes8pMhSCYYKlbU4seK53C07WpD8Kf/0 54erO9bVj9B6F2hiejr3zLqGFZ5ME3wPG2RDsiCDaRsG2aynXLNdDOXWCqxnOQ/w2mE= X-Gm-Gg: ASbGncuCNghKXjfkTzCP0tZlXNMilebNArNo+y6d59PpNmVi4hy5ijuye5jGnjhXcWY Dr6upBaWNVukeTtDWqCjKjoeS8gr8xEw/vrlX/toQAGz7RD73qRv67nCnCpDV7Be6fh+9IcTKkJ ftKzv78+E6EqbiSr4I14DA7WTfGA8785JaExfIoSJH9KgXes4qc3idA5C/W0SCTYFHJZxslHN8b 9gmmRLpMMfVCUfKYxRjgDS+ijAcC7Nba9ViTNJQr4QYu8ayZ4nd4YWgbGjWdezbohXFCyNYtlt9 iyOSbPg+npXGSTfw0T07CMl/49q9N46YKM2krsUU9tXx6bEvXO7OpRRf+hrd3R+DVRgVqdlNL4S GtjQInqFQTi/9aM2j3ZGPSRAG2GfjXWw4ezwUJwlBAWFzQoMD7vM/cQGnnleXIt8C5Q13j4KNpQ == X-Google-Smtp-Source: AGHT+IElKNT5R4benaImUIz+KHgT1+qsQJnnDiA1yqpduUeutCEEgeZ+xsp8+ZF7weEspfdmUCH1iw== X-Received: by 2002:a05:600c:2211:b0:45d:d50e:20be with SMTP id 5b1f17b1804b1-45f21201ad8mr58010235e9.26.1757846153099; Sun, 14 Sep 2025 03:35:53 -0700 (PDT) Received: from laurenz.albe-K4N0CV00F97414D ([2001:871:255:30a:2403:9037:2d08:c2c]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-45e0152ffc1sm70486355e9.3.2025.09.14.03.35.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Sep 2025 03:35:52 -0700 (PDT) Message-ID: Subject: Re: Silent data corruption in PostgreSQL 17 - how to detect it proactively? From: Laurenz Albe To: Pawel Kudzia , pgsql-general@lists.postgresql.org Date: Sun, 14 Sep 2025 12:35:52 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.56.2 (3.56.2-2.fc42) MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, 2025-09-14 at 10:30 +0200, Pawel Kudzia wrote: > I've hit a silent data corruption for TOAST data - leading to some > infinite loop when accessing bytea column for very particular row. I > did not suffer data loss - data from streaming replica was fine, I've > used it to rebuild the main server. >=20 > I'm wondering if there's any proactive way of detecting that type of > an issue rather than discovering pile-up of SELECT queries leading to > CPU starvation or finding hanged backup jobs. >=20 > Thanks in advance for your suggestions! >=20 > I was originally running PostgreSQL 17.2 installed from project's deb > packages, under Debian 12 on amd64. Environment is - without any > recent crashes, with ECC memory and server-grade hardware. I run > pg_dumpall every 24h, on a certain day it hanged. On re-run it hanged > again, each time leaving single PosgreSQL process using 100% of single > CPU core. >=20 > [...] >=20 > PostgreSQL's log does not have any error messages indicating data corrupt= ion. >=20 > This server does not have checksums enabled, but - as I understand - > such checksums cannot be checked online anyway. So - how can I detect > similar corruption? A proactive way of detecting TOAST corruption... how about pg_dump -f /dev/null yourdatabase If there is TOAST corruption, that should give you an error. If your standby does not have the same problem, that increases the likelihood that the cause is a hardware problem. Of course, it could still be a software bug. Yours, Laurenz Albe