public inbox for [email protected]help / color / mirror / Atom feed
Intermittent missing chunk ERRORS 12+ messages / 4 participants [nested] [flat]
* Intermittent missing chunk ERRORS @ 2025-04-26 17:09 Sbob <[email protected]> 0 siblings, 2 replies; 12+ messages in thread From: Sbob @ 2025-04-26 17:09 UTC (permalink / raw) To: [email protected] All; I am working with a client, they have an app that is running into error's like this: _SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 _ I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, REINDEX on the table and even a reindexdb on the database. Today they passed me another error (the one above) and once I was able to login I ran the same query the app team referenced and it ran without any errors. I asked them to re-run the query and they informed me that it now works for them as well. I assume that if this were a corrupt index issue then the query would continue to fail every time. Has anyone seen this type of intermittent index error before? and thoughts on where to look /how to debug this? Thanks in advance ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-26 20:04 Vijaykumar Jain <[email protected]> parent: Sbob <[email protected]> 1 sibling, 1 reply; 12+ messages in thread From: Vijaykumar Jain @ 2025-04-26 20:04 UTC (permalink / raw) To: Sbob <[email protected]>; +Cc: Pgsql-admin <[email protected]> a lot in this gist is something you can debug to start with. https://gist.github.com/supix/80f9a6111dc954cf38ee99b9dedf187a if not the index than the toast table might be corrupt. the bigger problem might be to figure out how it got corrupt. a one off power incident, or bug or storage is having issues etc. once that is clear, a simple start would be to run a pg_dump of the database to /dev/null and check if the dump is clean without errors. Thanks, Vijay Open to work Resume - Vijaykumar Jain <https://github.com/cabecada; On Sat, Apr 26, 2025, 10:39 PM Sbob <[email protected]> wrote: > All; > > > I am working with a client, they have an app that is running into error's > like this: > > > *SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 > in pg_toast_5321896 * > > > I thought it was a corruption / REINDEX issue, however I have run > REINDEXon the index, REINDEX on the table and even a reindexdb on the > database. Today they passed me another error (the one above) and once I was > able to login I ran the same query the app team referenced and it ran > without any errors. I asked them to re-run the query and they informed me > that it now works for them as well. > > > I assume that if this were a corrupt index issue then the query would > continue to fail every time. > > Has anyone seen this type of intermittent index error before? and thoughts > on where to look /how to debug this? > > > Thanks in advance > > > ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-26 20:12 Vijaykumar Jain <[email protected]> parent: Vijaykumar Jain <[email protected]> 0 siblings, 0 replies; 12+ messages in thread From: Vijaykumar Jain @ 2025-04-26 20:12 UTC (permalink / raw) To: Sbob <[email protected]>; +Cc: Pgsql-admin <[email protected]> forgot about the core tool to check for the problem. https://www.postgresql.org/docs/current/amcheck.html this can help find corruption issues much faster. Thanks, Vijay Open to work Resume - Vijaykumar Jain <https://github.com/cabecada; On Sun, Apr 27, 2025, 1:34 AM Vijaykumar Jain < [email protected]> wrote: > a lot in this gist is something you can debug to start with. > > https://gist.github.com/supix/80f9a6111dc954cf38ee99b9dedf187a > > if not the index than the toast table might be corrupt. > the bigger problem might be to figure out how it got corrupt. a one off > power incident, or bug or storage is having issues etc. > once that is clear, > a simple start would be to run a pg_dump of the database to /dev/null and > check if the dump is clean without errors. > > Thanks, > Vijay > > Open to work > Resume - Vijaykumar Jain <https://github.com/cabecada; > > On Sat, Apr 26, 2025, 10:39 PM Sbob <[email protected]> wrote: > >> All; >> >> >> I am working with a client, they have an app that is running into error's >> like this: >> >> >> *SQL Error [XX001]: ERROR: missing chunk number 0 for toast value >> 44915477 in pg_toast_5321896 * >> >> >> I thought it was a corruption / REINDEX issue, however I have run >> REINDEXon the index, REINDEX on the table and even a reindexdb on the >> database. Today they passed me another error (the one above) and once I was >> able to login I ran the same query the app team referenced and it ran >> without any errors. I asked them to re-run the query and they informed me >> that it now works for them as well. >> >> >> I assume that if this were a corrupt index issue then the query would >> continue to fail every time. >> >> Has anyone seen this type of intermittent index error before? and >> thoughts on where to look /how to debug this? >> >> >> Thanks in advance >> >> >> ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-26 20:24 Laurenz Albe <[email protected]> parent: Sbob <[email protected]> 1 sibling, 2 replies; 12+ messages in thread From: Laurenz Albe @ 2025-04-26 20:24 UTC (permalink / raw) To: Sbob <[email protected]>; [email protected] On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: > I am working with a client, they have an app that is running into error's like this: > > SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 > > I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, > REINDEX on the table and even a reindexdb on the database. Today they passed me another > error (the one above) and once I was able to login I ran the same query the app team > referenced and it ran without any errors. I asked them to re-run the query and they > informed me that it now works for them as well. > > I assume that if this were a corrupt index issue then the query would continue to fail > every time. > > Has anyone seen this type of intermittent index error before? and thoughts on where to > look /how to debug this? The error is data corruption. If it is transient, that could mean a couple of things: - a TOAST index was corrupted, and REINDEX fixed it - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted, the problem was gone - somebody updated or deleted the row, and the corruption is hidden Yours, Laurenz Albe ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-26 20:28 Tom Lane <[email protected]> parent: Laurenz Albe <[email protected]> 1 sibling, 1 reply; 12+ messages in thread From: Tom Lane @ 2025-04-26 20:28 UTC (permalink / raw) To: Laurenz Albe <[email protected]>; +Cc: Sbob <[email protected]>; [email protected] Laurenz Albe <[email protected]> writes: > On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: >> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 >> Has anyone seen this type of intermittent index error before? and thoughts on where to >> look /how to debug this? > The error is data corruption. We've seen transient errors of this sort arise from what are basically timing problems, and fixed at least a few cases. I wonder exactly which PG version the problematic installation is running. regards, tom lane ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-26 22:49 Sbob <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 12+ messages in thread From: Sbob @ 2025-04-26 22:49 UTC (permalink / raw) To: Tom Lane <[email protected]>; Laurenz Albe <[email protected]>; +Cc: [email protected] On 4/26/25 2:28 PM, Tom Lane wrote: > Laurenz Albe <[email protected]> writes: >> On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: >>> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 >>> Has anyone seen this type of intermittent index error before? and thoughts on where to >>> look /how to debug this? >> The error is data corruption. > We've seen transient errors of this sort arise from what are basically > timing problems, and fixed at least a few cases. I wonder exactly > which PG version the problematic installation is running. > > regards, tom lane We are on V14 ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-26 22:54 Sbob <[email protected]> parent: Laurenz Albe <[email protected]> 1 sibling, 1 reply; 12+ messages in thread From: Sbob @ 2025-04-26 22:54 UTC (permalink / raw) To: Laurenz Albe <[email protected]>; [email protected] On 4/26/25 2:24 PM, Laurenz Albe wrote: > On Sat, 2025-04-26 at 11:09 -0600, Sbob wrote: >> I am working with a client, they have an app that is running into error's like this: >> >> SQL Error [XX001]: ERROR: missing chunk number 0 for toast value 44915477 in pg_toast_5321896 >> >> I thought it was a corruption / REINDEX issue, however I have run REINDEXon the index, >> REINDEX on the table and even a reindexdb on the database. Today they passed me another >> error (the one above) and once I was able to login I ran the same query the app team >> referenced and it ran without any errors. I asked them to re-run the query and they >> informed me that it now works for them as well. >> >> I assume that if this were a corrupt index issue then the query would continue to fail >> every time. >> >> Has anyone seen this type of intermittent index error before? and thoughts on where to >> look /how to debug this? > The error is data corruption. > > If it is transient, that could mean a couple of things: > > - a TOAST index was corrupted, and REINDEX fixed it > > - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted, > the problem was gone > > - somebody updated or deleted the row, and the corruption is hidden > > Yours, > Laurenz Albe I did REINDEX and the error came back, however this last time when they informed me about the error they sent the select statement throwing the error and once I logged in I ran their select statement and it worked, no REINDEX had been run, and then I asked them to run it again and it worked for them as well So, the first option above is not the answer and I dont think anyone updated the row, their process is mostly reads but I will verify, so maybe the RAM issue? the servers are cloud VM's in a private cloud ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-27 00:03 Tom Lane <[email protected]> parent: Sbob <[email protected]> 0 siblings, 1 reply; 12+ messages in thread From: Tom Lane @ 2025-04-27 00:03 UTC (permalink / raw) To: Sbob <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected] Sbob <[email protected]> writes: > On 4/26/25 2:28 PM, Tom Lane wrote: >> We've seen transient errors of this sort arise from what are basically >> timing problems, and fixed at least a few cases. I wonder exactly >> which PG version the problematic installation is running. > We are on V14 14-what? We are talking about bug fixes, so it matters. regards, tom lane ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-27 16:45 Sbob <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 12+ messages in thread From: Sbob @ 2025-04-27 16:45 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected] On 4/26/25 6:03 PM, Tom Lane wrote: > Sbob <[email protected]> writes: >> On 4/26/25 2:28 PM, Tom Lane wrote: >>> We've seen transient errors of this sort arise from what are basically >>> timing problems, and fixed at least a few cases. I wonder exactly >>> which PG version the problematic installation is running. >> We are on V14 > 14-what? We are talking about bug fixes, so it matters. > > regards, tom lane postgres=# select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-23), 64-bit (1 row) ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-27 18:25 Tom Lane <[email protected]> parent: Sbob <[email protected]> 0 siblings, 1 reply; 12+ messages in thread From: Tom Lane @ 2025-04-27 18:25 UTC (permalink / raw) To: Sbob <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected] Sbob <[email protected]> writes: > On 4/26/25 6:03 PM, Tom Lane wrote: >> 14-what? We are talking about bug fixes, so it matters. > PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 > 20210514 (Red Hat 8.5.0-23), 64-bit OK, well that's current at least... I don't suppose you can build a test case that reproduces this failure? If it's timing-dependent as I suspect, it might only fail once in awhile, but that would be good enough for investigation. regards, tom lane ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-27 19:52 Laurenz Albe <[email protected]> parent: Sbob <[email protected]> 0 siblings, 0 replies; 12+ messages in thread From: Laurenz Albe @ 2025-04-27 19:52 UTC (permalink / raw) To: Sbob <[email protected]>; [email protected] On Sat, 2025-04-26 at 16:54 -0600, Sbob wrote: > > If it is transient, that could mean a couple of things: > > > > - a TOAST index was corrupted, and REINDEX fixed it > > > > - faulty RAM corrupted data, but as soon as the corrupted buffer was evicted, > > the problem was gone > > > > - somebody updated or deleted the row, and the corruption is hidden > > I did REINDEX and the error came back, however this last time when they > informed me about the error they sent the select statement throwing the > error and once I logged in I ran their select statement and it worked, > no REINDEX had been run, and then I asked them to run it again and it > worked for them as well > > So, the first option above is not the answer and I dont think anyone > updated the row, their process is mostly reads but I will verify, so > maybe the RAM issue? the servers are cloud VM's in a private cloud Maybe. Check the hardware. Or maybe it is as Tom says. Yours, Laurenz Albe ^ permalink raw reply [nested|flat] 12+ messages in thread
* Re: Intermittent missing chunk ERRORS @ 2025-04-28 14:10 Sbob <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 0 replies; 12+ messages in thread From: Sbob @ 2025-04-28 14:10 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected] On 4/27/25 12:25 PM, Tom Lane wrote: > Sbob <[email protected]> writes: >> On 4/26/25 6:03 PM, Tom Lane wrote: >>> 14-what? We are talking about bug fixes, so it matters. >> PostgreSQL 14.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 >> 20210514 (Red Hat 8.5.0-23), 64-bit > OK, well that's current at least... > > I don't suppose you can build a test case that reproduces this > failure? If it's timing-dependent as I suspect, it might only fail > once in awhile, but that would be good enough for investigation. > > regards, tom lane Yeah, it only fails once in awhile, it seems like it fairly often but building a case to recreate it will likely be difficult. I will continue to work with the app team and see if I can find a pattern / process that recreates it. ^ permalink raw reply [nested|flat] 12+ messages in thread
end of thread, other threads:[~2025-04-28 14:10 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-04-26 17:09 Intermittent missing chunk ERRORS Sbob <[email protected]> 2025-04-26 20:04 ` Vijaykumar Jain <[email protected]> 2025-04-26 20:12 ` Vijaykumar Jain <[email protected]> 2025-04-26 20:24 ` Laurenz Albe <[email protected]> 2025-04-26 20:28 ` Tom Lane <[email protected]> 2025-04-26 22:49 ` Sbob <[email protected]> 2025-04-27 00:03 ` Tom Lane <[email protected]> 2025-04-27 16:45 ` Sbob <[email protected]> 2025-04-27 18:25 ` Tom Lane <[email protected]> 2025-04-28 14:10 ` Sbob <[email protected]> 2025-04-26 22:54 ` Sbob <[email protected]> 2025-04-27 19:52 ` Laurenz Albe <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox