Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1npZTU-00010P-14 for pgsql-admin@arkaria.postgresql.org; Fri, 13 May 2022 17:53:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1npZTR-0000Hk-M9 for pgsql-admin@arkaria.postgresql.org; Fri, 13 May 2022 17:53:09 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1npZTR-0000Ha-7l for pgsql-admin@lists.postgresql.org; Fri, 13 May 2022 17:53:09 +0000 Received: from mail-ej1-x632.google.com ([2a00:1450:4864:20::632]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1npZTK-00024c-Ps for pgsql-admin@lists.postgresql.org; Fri, 13 May 2022 17:53:08 +0000 Received: by mail-ej1-x632.google.com with SMTP id i27so17631661ejd.9 for ; Fri, 13 May 2022 10:53:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:cc; bh=FqePnaxNfLkuYhXHFEulV+SNx1zevKwt4Au+PdDEvtk=; b=bqoUfwgyeMWj766LL44CUpDYKb3nVc0KzzMuqGKtvCKr4aFFAjMGszKS7tf2lJW7Kp 9V7rqVdQsqvLSSIb2RTRhELy0Kb/P6SCYEJ/ekkht1QsVRABrSyj1sp55s6F/scqg1F8 Pr/7XU71KXpVp9v0oG/BAWelaHHB6ickuhi9Dia/KVDlwJf0IEGR1RBWXeKaqLI5FyAW LrqG8aeruhcdqGWR1D+pVUH+zJiM1DiMNRoUBzAkMNdRDwvzmWVNXqM6VDE32Ze7lbNF aF/mlk3sDpzJ9EM5AM6WxfkmOD9RXtwHKH6aEcOtcEtzIoP9x+EYGM/2bHoOyJHPLAOL Ei8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:cc; bh=FqePnaxNfLkuYhXHFEulV+SNx1zevKwt4Au+PdDEvtk=; b=BLFhOZWstGnoKhAsjRX63BnU11O55MH77CIEBLs4sSE4R/w5dEVagQUfRcQ2njLhgE ptfjX/ygF7RLO4FfsF1M0HSHa9DIXopZ3k2gF2DVh2A6R67gbNv3tXpj0buZT3lyqpG0 joCvQirfyte8FVsLdbTcUTkwtYO6062iLFsZY6ubccSD82+4COMr+uYJMaaXChrgfqgw dHw2e9aR99H74oaWVt84/4G6fHN24epc91KisFRK9ercZGto1KQSCEUM8DkfxMP6SxoT 78k3cQytYo2GkGJXaZpiD2f41GTI5/YayX7YvbxBBbZqgwFAQ7EBXFQmHYOzzVvMemeD sR2Q== X-Gm-Message-State: AOAM53211EUzImCk3fo8uyA5X3+I1KH0SJh4dML5gueNM5NyZEwqSi5H qhtzLI/CDGbRuApLCIlg9CUqv2YpQgRClollynGUj4MThJqAmg== X-Google-Smtp-Source: ABdhPJxjK1Ryn+PX08siHFNaKhQMqEH3Mof9oJbqOmSGbDP6pinCZmSbGRdZ1HxoOCZ/KzC3NhMCvILvN9v69GrkZuw= X-Received: by 2002:a17:906:d54e:b0:6f6:3ea:8e7b with SMTP id cr14-20020a170906d54e00b006f603ea8e7bmr5483315ejc.292.1652464381284; Fri, 13 May 2022 10:53:01 -0700 (PDT) MIME-Version: 1.0 References: <397f5b28-0f5c-de1f-c825-06e2ea7b6a1f@gmail.com> In-Reply-To: <397f5b28-0f5c-de1f-c825-06e2ea7b6a1f@gmail.com> From: A G Date: Fri, 13 May 2022 19:52:50 +0200 Message-ID: Subject: Re: Losing records in PostgreSQL 9.6 Cc: pgsql-admin@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000073553605dee8580a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000073553605dee8580a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for your input! We checked the application that has access to the database, but it would never delete rows from that table. The missing rows in the database were stored at some point through committed transactions and had a lower sequential primary key. We don't think the transactions were rolled back since they were part of an older backup. We believe that there was probably a manual access through the customer or a service partner, but wanted to make sure that there is no other way that Postgres would lose rows during a pg_dump because of something like a hardware failure, for instance. Best regards, Andreas On Sat, May 7, 2022 at 4:03 PM Ron wrote: > On 5/4/22 09:55, A G wrote: > > Hi, > thanks for your help. > > My team is using Postgres 9.6.10 for an on-premise application (we are > planing on upgrading to a newer Postgres version). Our application comes > with Postgres running in a docker container with its data stored in a > docker volume. Our software uses pg_dump / pg_restore to backup and resto= re > the database. > > Now we got a ticket from a customer where their database is missing rows > from a table. There are 971 consecutive rows missing from the beginning o= f > the table. The missing rows were inserted first. We find it also strange, > that all the other tables don=E2=80=99t seem to be affected at all. It ap= pears that > there is only data loss in this single table. > Unfortunately, we don=E2=80=99t have access to the original database anym= ore and > need to find out what happened through the backups the customer provides. > We have one backup right after they installed and initially configured th= e > application, which seems complete. Then there is another backup 10 months > later where the first 971 rows are already missing in this one table. > > If we exclude a manual deletion, which the customer denies, > > > There's more to PEBKAC than manual deletion. > > we are wondering if it=E2=80=99s possible that Postgres 9.6 could lose so= me of its > data through a storage or memory error and would create a =E2=80=9Csucces= sful=E2=80=9D > pg_dump with only partial data? Is such a behaviour even thinkable with > Postgres? > > Do you have an idea what else could cause this issue? > > > Uncommitted transactions? > * Purge job with a bug in it? > * Two different date columns (for example "transaction_date" and > "posted_date") which are *expected to be* the same apparently not > always. Since the errors apparently happen at the beginning of the month= , > the purge job might have seen them as the previous month's records. > > These are our dump and restore commands: > pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump > pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=3D4 acme.dump > > We use just a single db user to access the database and we don=E2=80=99t = use RLS. > > Thank you. > > Best regards, > Andreas > > > -- > Angular momentum makes the world go 'round. > --00000000000073553605dee8580a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for your input!

We checked = the application that has access to the database, but it would never delete = rows from that table. The missing rows in the database were stored at some = point through committed transactions and had a lower sequential primary key= . We don't think the transactions were rolled back since they were part= of an older backup.

We believe that there was probably a manual acc= ess through the customer or a service partner, but wanted to make sure that= there is no other way that Postgres would lose rows during a pg_dump becau= se of something like a hardware failure, for instance.

Best regards,=
Andreas

On Sat, May 7, 2022 at 4:03 PM Ron <ronljohnsonjr@gmail.com> wr= ote:
=20 =20 =20
On 5/4/22 09:55, A G wrote:
=20
Hi,=C2=A0
thanks for your help.

My team is using Postgres 9.6.10 for an on-premise application (we are planing on upgrading to a newer Postgres version). Our application comes with Postgres running in a docker container with its data stored in a docker volume. Our software uses pg_dump / pg_restore to backup and restore the database.
Now we got a ticket from a customer where their database is missing rows from a table. There are 971 consecutive rows missing from the beginning of the table. The missing rows were inserted first. We find it also strange, that all the other tables don=E2=80=99t seem to be affected at all. It appears that = there is only data loss in this single table.
Unfortunately, we don=E2=80=99t have access to the original database anymore and= need to find out what happened through the backups the customer provides. We have one backup right after they installed and initially configured the application, which seems complete. Then there is another backup 10 months later where the first 971 rows are already missing in this one table.

If we exclude a manual deletion, which the customer denies,

There's more to PEBKAC than manual deletion.

we are wondering if it=E2=80=99s possible that Postgres 9.6 could= lose some of its data through a storage or memory error and would create a =E2=80=9Csuccessful=E2=80=9D pg_dump with only partial d= ata? Is such a behaviour even thinkable with Postgres?

Do you have an idea what else could cause this issue?

Uncommitted transactions?=C2=A0
* Purge job with a bug in it?
* Two different date columns (for example "transaction_date" = and "posted_date") which are expected to be the same appar= ently not always.=C2=A0 Since the errors apparently happen at the beginning o= f the month, the purge job might have seen them as the previous month's records.

These are our dump and restore commands:
pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump
pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=3D4 acme.dump

We use just a single db user to access the database and we don=E2=80= =99t use RLS.

Thank you.

Best regards,
Andreas

--
Angular momentum makes the world go 'round.
--00000000000073553605dee8580a--