Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nxPwa-0000JL-VH for pgsql-admin@arkaria.postgresql.org; Sat, 04 Jun 2022 09:19:41 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nxPwY-0007aD-A9 for pgsql-admin@arkaria.postgresql.org; Sat, 04 Jun 2022 09:19:38 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nxPwX-0007a4-Rm for pgsql-admin@lists.postgresql.org; Sat, 04 Jun 2022 09:19:37 +0000 Received: from mail-ej1-x636.google.com ([2a00:1450:4864:20::636]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nxPwU-0004rD-Jw for pgsql-admin@lists.postgresql.org; Sat, 04 Jun 2022 09:19:37 +0000 Received: by mail-ej1-x636.google.com with SMTP id q21so20096729ejm.1 for ; Sat, 04 Jun 2022 02:19:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=G33N/6K0B9411Gf7bn4vY3UQ/QkT8lLbNJjGQlc53uM=; b=GHBn45w5XvK9WzS1o4Y+LxmdNYds+yRQ4ZVyWj8XsLhYDSLVsSlwJB5o8ZK54f07E/ iUUS8hm/Zceb3G5ib7KvAG8c1aRh80g5GD1nqirLEUeqWn6M3Od2gD0F1xr5qvbUxR+h GJxJtTzHoeUsDMX5mlr7ExGQJPeLjY9S565WGM2aub7Kpn2+2QkwpZJFhlDEWKoqXoEf 1nqOrC18P6+Ugnmv64QD1XhagA0KOCwbgpds8LdouncieVLVaYDWaHgq0jsxvdrjRUus +fNG/0NBm/7hZUvvSBYUq/KMP8QYeSbjvqGomfu3VMROm5HPy9aZXuqde9D8IVNtoE1d F5wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=G33N/6K0B9411Gf7bn4vY3UQ/QkT8lLbNJjGQlc53uM=; b=OepAvcAwDDejAsh7x/JZucx+T3ASCqvNPoph2HAx9f5vh1s57SVC2nw2r8ZUPZ7we4 P726fKQP69EVZF7Be3w2lK65nlcNh0O2btR/cZBrHwmUBXl2jO+A0P3uWK5KGTutOQzf KOHnjgjdMrhYSlDrcs1vIVe5ZoedjFw/n8mDnKONm3BEyXRgk4CXEWgvVw76ZssM43D1 Vp2u4GxU4VHMLhzzTYaMCKfPQoETIufb3ChH50krbygNWEF+gBzJYkqpGPgeHcqvxwYv OMzTjVxbwkMCCnCTdS9AfROFd50UZJ4Len7KTHWJAwdtpm5MkuXbU68pLEbFPJjAgGzr RE9Q== X-Gm-Message-State: AOAM5334jB7IoCzYI4OnFo2dhGcwuTcS2GgrOoowPRE++3E0wdd+1ApG Tr6WZiyCn1GpYf1c2yVZgFZH/19/NyHP9HlmgrQ= X-Google-Smtp-Source: ABdhPJzoNi6PFe9n9ENGrc3PlYbuYwL92rThz4JyT2yvoxvlCuU3E7VLA4l3jQ5ocY/nUgwN5SQrcKOmp5twlwKJ6iY= X-Received: by 2002:a17:907:3fa4:b0:6fe:b83b:d667 with SMTP id hr36-20020a1709073fa400b006feb83bd667mr12068078ejc.481.1654334372713; Sat, 04 Jun 2022 02:19:32 -0700 (PDT) MIME-Version: 1.0 References: <397f5b28-0f5c-de1f-c825-06e2ea7b6a1f@gmail.com> In-Reply-To: From: A G Date: Sat, 4 Jun 2022 11:19:21 +0200 Message-ID: Subject: Re: Losing records in PostgreSQL 9.6 To: Robert Treat Cc: pgsql-admin@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000a01c3205e09bbca3" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a01c3205e09bbca3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you for your helpful input and sorry my late answer, I wanted to check first the database statistics tables you suggested. Unfortunately, we don't have access to them. But that was great input. We are planning to improve the backup system to include more information that will help us in the future to analyse what happened on a customer database. Best regards, Andreas On Sat, May 14, 2022 at 6:06 AM Robert Treat wrote: > If you are asking if it is within the realm of possibility that an old > version of Postgres with known bugs running on a presumably old > version of Linux which probably also has known bugs that happens to > also have some form of hardware failure that could include corrupted > memory and/or corrupted storage (did I mention filesystem bugs?) could > possibly lose data... well I reckon that probably is possible. > > However, because the pg_dumps produce what sound like working / proper > output, I don't think it'd be any of those things. You didn't mention > if you have any ability to look at any of the database statistics > tables, which would be a first place to look to see if there are any > DML statements or DML activity tracked. Beyond that, if I had access > to the server in question (or better an exact physical copy, which > might be difficult in your case), I'd want to see if I could find the > old rows which might still be on-disk even if not visible due to > deletion or transaction semantics, and would also want to rule out > things like index corruption that might cause some weird side effects. > > > Robert Treat > > On Fri, May 13, 2022 at 1:53 PM A G wrote: > > > > Thanks for your input! > > > > We checked the application that has access to the database, but it woul= d > never delete rows from that table. The missing rows in the database were > stored at some point through committed transactions and had a lower > sequential primary key. We don't think the transactions were rolled back > since they were part of an older backup. > > > > We believe that there was probably a manual access through the customer > or a service partner, but wanted to make sure that there is no other way > that Postgres would lose rows during a pg_dump because of something like = a > hardware failure, for instance. > > > > Best regards, > > Andreas > > > > On Sat, May 7, 2022 at 4:03 PM Ron wrote: > >> > >> On 5/4/22 09:55, A G wrote: > >> > >> Hi, > >> thanks for your help. > >> > >> My team is using Postgres 9.6.10 for an on-premise application (we are > planing on upgrading to a newer Postgres version). Our application comes > with Postgres running in a docker container with its data stored in a > docker volume. Our software uses pg_dump / pg_restore to backup and resto= re > the database. > >> > >> Now we got a ticket from a customer where their database is missing > rows from a table. There are 971 consecutive rows missing from the > beginning of the table. The missing rows were inserted first. We find it > also strange, that all the other tables don=E2=80=99t seem to be affected= at all. > It appears that there is only data loss in this single table. > >> Unfortunately, we don=E2=80=99t have access to the original database a= nymore > and need to find out what happened through the backups the customer > provides. We have one backup right after they installed and initially > configured the application, which seems complete. Then there is another > backup 10 months later where the first 971 rows are already missing in th= is > one table. > >> > >> If we exclude a manual deletion, which the customer denies, > >> > >> > >> There's more to PEBKAC than manual deletion. > >> > >> we are wondering if it=E2=80=99s possible that Postgres 9.6 could lose= some of > its data through a storage or memory error and would create a =E2=80=9Csu= ccessful=E2=80=9D > pg_dump with only partial data? Is such a behaviour even thinkable with > Postgres? > >> > >> Do you have an idea what else could cause this issue? > >> > >> > >> Uncommitted transactions? > >> * Purge job with a bug in it? > >> * Two different date columns (for example "transaction_date" and > "posted_date") which are expected to be the same apparently not always. > Since the errors apparently happen at the beginning of the month, the pur= ge > job might have seen them as the previous month's records. > >> > >> These are our dump and restore commands: > >> pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump > >> pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=3D4 acme.dump > >> > >> We use just a single db user to access the database and we don=E2=80= =99t use > RLS. > >> > >> Thank you. > >> > >> Best regards, > >> Andreas > >> > >> > >> -- > >> Angular momentum makes the world go 'round. > --000000000000a01c3205e09bbca3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for your helpful input and sorry my late answer,= I wanted to check first the database statistics tables you suggested.
= Unfortunately, we don't have access to them.=C2=A0
But that w= as great input. We are planning to improve the backup system to include mor= e information that will help us in the future to analyse what happened on a= customer database.

Best regards,
An= dreas

On Sat, May 14, 2022 at 6:06 AM Robert Treat <rob@xzilla.net> wrote:
= If you are asking if it is within the realm of possibility that an old
version of Postgres with known bugs running on a presumably old
version of Linux which probably also has known bugs that happens to
also have some form of hardware failure that could include corrupted
memory and/or corrupted storage (did I mention filesystem bugs?) could
possibly lose data... well I reckon that probably is possible.

However, because the pg_dumps produce what sound like working / proper
output, I don't think it'd be any of those things. You didn't m= ention
if you have any ability to look at any of the database statistics
tables, which would be a first place to look to see if there are any
DML statements or DML activity tracked. Beyond that, if I had access
to the server in question (or better an exact physical copy, which
might be difficult in your case), I'd want to see if I could find the old rows which might still be on-disk even if not visible due to
deletion or transaction semantics, and would also want to rule out
things like index corruption that might cause some weird side effects.


Robert Treat

On Fri, May 13, 2022 at 1:53 PM A G <andreas.grill@gmail.com> wrote:
>
> Thanks for your input!
>
> We checked the application that has access to the database, but it wou= ld never delete rows from that table. The missing rows in the database were= stored at some point through committed transactions and had a lower sequen= tial primary key. We don't think the transactions were rolled back sinc= e they were part of an older backup.
>
> We believe that there was probably a manual access through the custome= r or a service partner, but wanted to make sure that there is no other way = that Postgres would lose rows during a pg_dump because of something like a = hardware failure, for instance.
>
> Best regards,
> Andreas
>
> On Sat, May 7, 2022 at 4:03 PM Ron <ronljohnsonjr@gmail.com> wrote:
>>
>> On 5/4/22 09:55, A G wrote:
>>
>> Hi,
>> thanks for your help.
>>
>> My team is using Postgres 9.6.10 for an on-premise application (we= are planing on upgrading to a newer Postgres version). Our application com= es with Postgres running in a docker container with its data stored in a do= cker volume. Our software uses pg_dump / pg_restore to backup and restore t= he database.
>>
>> Now we got a ticket from a customer where their database is missin= g rows from a table. There are 971 consecutive rows missing from the beginn= ing of the table. The missing rows were inserted first. We find it also str= ange, that all the other tables don=E2=80=99t seem to be affected at all. I= t appears that there is only data loss in this single table.
>> Unfortunately, we don=E2=80=99t have access to the original databa= se anymore and need to find out what happened through the backups the custo= mer provides. We have one backup right after they installed and initially c= onfigured the application, which seems complete. Then there is another back= up 10 months later where the first 971 rows are already missing in this one= table.
>>
>> If we exclude a manual deletion, which the customer denies,
>>
>>
>> There's more to PEBKAC than manual deletion.
>>
>> we are wondering if it=E2=80=99s possible that Postgres 9.6 could = lose some of its data through a storage or memory error and would create a = =E2=80=9Csuccessful=E2=80=9D pg_dump with only partial data? Is such a beha= viour even thinkable with Postgres?
>>
>> Do you have an idea what else could cause this issue?
>>
>>
>> Uncommitted transactions?
>> * Purge job with a bug in it?
>> * Two different date columns (for example "transaction_date&q= uot; and "posted_date") which are expected to be the same apparen= tly not always.=C2=A0 Since the errors apparently happen at the beginning o= f the month, the purge job might have seen them as the previous month's= records.
>>
>> These are our dump and restore commands:
>> pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acm= e.dump
>> pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=3D4 acme.= dump
>>
>> We use just a single db user to access the database and we don=E2= =80=99t use RLS.
>>
>> Thank you.
>>
>> Best regards,
>> Andreas
>>
>>
>> --
>> Angular momentum makes the world go 'round.
--000000000000a01c3205e09bbca3--