public inbox for [email protected]  
help / color / mirror / Atom feed
Losing records in PostgreSQL 9.6
5+ messages / 3 participants
[nested] [flat]

* Losing records in PostgreSQL 9.6
@ 2022-05-04 14:55 A G <[email protected]>
  2022-05-04 20:18 ` Re: Losing records in PostgreSQL 9.6 Ron <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: A G @ 2022-05-04 14:55 UTC (permalink / raw)
  To: [email protected]

Hi,
thanks for your help.

My team is using Postgres 9.6.10 for an on-premise application (we are
planing on upgrading to a newer Postgres version). Our application comes
with Postgres running in a docker container with its data stored in a
docker volume. Our software uses pg_dump / pg_restore to backup and restore
the database.

Now we got a ticket from a customer where their database is missing rows
from a table. There are 971 consecutive rows missing from the beginning of
the table. The missing rows were inserted first. We find it also strange,
that all the other tables don’t seem to be affected at all. It appears that
there is only data loss in this single table.
Unfortunately, we don’t have access to the original database anymore and
need to find out what happened through the backups the customer provides.
We have one backup right after they installed and initially configured the
application, which seems complete. Then there is another backup 10 months
later where the first 971 rows are already missing in this one table.

If we exclude a manual deletion, which the customer denies, we are
wondering if it’s possible that Postgres 9.6 could lose some of its data
through a storage or memory error and would create a “successful” pg_dump
with only partial data? Is such a behaviour even thinkable with Postgres?

Do you have an idea what else could cause this issue?


These are our dump and restore commands:
pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump
pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=4 acme.dump

We use just a single db user to access the database and we don’t use RLS.

Thank you.

Best regards,
Andreas


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Losing records in PostgreSQL 9.6
  2022-05-04 14:55 Losing records in PostgreSQL 9.6 A G <[email protected]>
@ 2022-05-04 20:18 ` Ron <[email protected]>
  2022-05-13 17:52   ` Re: Losing records in PostgreSQL 9.6 A G <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Ron @ 2022-05-04 20:18 UTC (permalink / raw)
  To: [email protected]

On 5/4/22 09:55, A G wrote:
> Hi,
> thanks for your help.
>
> My team is using Postgres 9.6.10 for an on-premise application (we are 
> planing on upgrading to a newer Postgres version). Our application comes 
> with Postgres running in a docker container with its data stored in a 
> docker volume. Our software uses pg_dump / pg_restore to backup and 
> restore the database.
>
> Now we got a ticket from a customer where their database is missing rows 
> from a table. There are 971 consecutive rows missing from the beginning of 
> the table. The missing rows were inserted first. We find it also strange, 
> that all the other tables don’t seem to be affected at all. It appears 
> that there is only data loss in this single table.
> Unfortunately, we don’t have access to the original database anymore and 
> need to find out what happened through the backups the customer provides. 
> We have one backup right after they installed and initially configured the 
> application, which seems complete. Then there is another backup 10 months 
> later where the first 971 rows are already missing in this one table.
>
> If we exclude a manual deletion, which the customer denies,

There's more to PEBKAC than manual deletion.

> we are wondering if it’s possible that Postgres 9.6 could lose some of its 
> data through a storage or memory error and would create a “successful” 
> pg_dump with only partial data? Is such a behaviour even thinkable with 
> Postgres?
>
> Do you have an idea what else could cause this issue?

Uncommitted transactions?
* Purge job with a bug in it?
* Two different date columns (for example "transaction_date" and 
"posted_date") which are /expected to be/ the same apparently not always.  
Since the errors apparently happen at the beginning of the month, the purge 
job might have seen them as the previous month's records.

> These are our dump and restore commands:
> pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump
> pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=4 acme.dump
>
> We use just a single db user to access the database and we don’t use RLS.
>
> Thank you.
>
> Best regards,
> Andreas

-- 
Angular momentum makes the world go 'round.

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Losing records in PostgreSQL 9.6
  2022-05-04 14:55 Losing records in PostgreSQL 9.6 A G <[email protected]>
  2022-05-04 20:18 ` Re: Losing records in PostgreSQL 9.6 Ron <[email protected]>
@ 2022-05-13 17:52   ` A G <[email protected]>
  2022-05-14 04:06     ` Re: Losing records in PostgreSQL 9.6 Robert Treat <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: A G @ 2022-05-13 17:52 UTC (permalink / raw)
  To: ; +Cc: [email protected]

Thanks for your input!

We checked the application that has access to the database, but it would
never delete rows from that table. The missing rows in the database were
stored at some point through committed transactions and had a lower
sequential primary key. We don't think the transactions were rolled back
since they were part of an older backup.

We believe that there was probably a manual access through the customer or
a service partner, but wanted to make sure that there is no other way that
Postgres would lose rows during a pg_dump because of something like a
hardware failure, for instance.

Best regards,
Andreas

On Sat, May 7, 2022 at 4:03 PM Ron <[email protected]> wrote:

> On 5/4/22 09:55, A G wrote:
>
> Hi,
> thanks for your help.
>
> My team is using Postgres 9.6.10 for an on-premise application (we are
> planing on upgrading to a newer Postgres version). Our application comes
> with Postgres running in a docker container with its data stored in a
> docker volume. Our software uses pg_dump / pg_restore to backup and restore
> the database.
>
> Now we got a ticket from a customer where their database is missing rows
> from a table. There are 971 consecutive rows missing from the beginning of
> the table. The missing rows were inserted first. We find it also strange,
> that all the other tables don’t seem to be affected at all. It appears that
> there is only data loss in this single table.
> Unfortunately, we don’t have access to the original database anymore and
> need to find out what happened through the backups the customer provides.
> We have one backup right after they installed and initially configured the
> application, which seems complete. Then there is another backup 10 months
> later where the first 971 rows are already missing in this one table.
>
> If we exclude a manual deletion, which the customer denies,
>
>
> There's more to PEBKAC than manual deletion.
>
> we are wondering if it’s possible that Postgres 9.6 could lose some of its
> data through a storage or memory error and would create a “successful”
> pg_dump with only partial data? Is such a behaviour even thinkable with
> Postgres?
>
> Do you have an idea what else could cause this issue?
>
>
> Uncommitted transactions?
> * Purge job with a bug in it?
> * Two different date columns (for example "transaction_date" and
> "posted_date") which are *expected to be* the same apparently not
> always.  Since the errors apparently happen at the beginning of the month,
> the purge job might have seen them as the previous month's records.
>
> These are our dump and restore commands:
> pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump
> pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=4 acme.dump
>
> We use just a single db user to access the database and we don’t use RLS.
>
> Thank you.
>
> Best regards,
> Andreas
>
>
> --
> Angular momentum makes the world go 'round.
>


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Losing records in PostgreSQL 9.6
  2022-05-04 14:55 Losing records in PostgreSQL 9.6 A G <[email protected]>
  2022-05-04 20:18 ` Re: Losing records in PostgreSQL 9.6 Ron <[email protected]>
  2022-05-13 17:52   ` Re: Losing records in PostgreSQL 9.6 A G <[email protected]>
@ 2022-05-14 04:06     ` Robert Treat <[email protected]>
  2022-06-04 09:19       ` Re: Losing records in PostgreSQL 9.6 A G <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Robert Treat @ 2022-05-14 04:06 UTC (permalink / raw)
  To: A G <[email protected]>; +Cc: [email protected]

If you are asking if it is within the realm of possibility that an old
version of Postgres with known bugs running on a presumably old
version of Linux which probably also has known bugs that happens to
also have some form of hardware failure that could include corrupted
memory and/or corrupted storage (did I mention filesystem bugs?) could
possibly lose data... well I reckon that probably is possible.

However, because the pg_dumps produce what sound like working / proper
output, I don't think it'd be any of those things. You didn't mention
if you have any ability to look at any of the database statistics
tables, which would be a first place to look to see if there are any
DML statements or DML activity tracked. Beyond that, if I had access
to the server in question (or better an exact physical copy, which
might be difficult in your case), I'd want to see if I could find the
old rows which might still be on-disk even if not visible due to
deletion or transaction semantics, and would also want to rule out
things like index corruption that might cause some weird side effects.


Robert Treat

On Fri, May 13, 2022 at 1:53 PM A G <[email protected]> wrote:
>
> Thanks for your input!
>
> We checked the application that has access to the database, but it would never delete rows from that table. The missing rows in the database were stored at some point through committed transactions and had a lower sequential primary key. We don't think the transactions were rolled back since they were part of an older backup.
>
> We believe that there was probably a manual access through the customer or a service partner, but wanted to make sure that there is no other way that Postgres would lose rows during a pg_dump because of something like a hardware failure, for instance.
>
> Best regards,
> Andreas
>
> On Sat, May 7, 2022 at 4:03 PM Ron <[email protected]> wrote:
>>
>> On 5/4/22 09:55, A G wrote:
>>
>> Hi,
>> thanks for your help.
>>
>> My team is using Postgres 9.6.10 for an on-premise application (we are planing on upgrading to a newer Postgres version). Our application comes with Postgres running in a docker container with its data stored in a docker volume. Our software uses pg_dump / pg_restore to backup and restore the database.
>>
>> Now we got a ticket from a customer where their database is missing rows from a table. There are 971 consecutive rows missing from the beginning of the table. The missing rows were inserted first. We find it also strange, that all the other tables don’t seem to be affected at all. It appears that there is only data loss in this single table.
>> Unfortunately, we don’t have access to the original database anymore and need to find out what happened through the backups the customer provides. We have one backup right after they installed and initially configured the application, which seems complete. Then there is another backup 10 months later where the first 971 rows are already missing in this one table.
>>
>> If we exclude a manual deletion, which the customer denies,
>>
>>
>> There's more to PEBKAC than manual deletion.
>>
>> we are wondering if it’s possible that Postgres 9.6 could lose some of its data through a storage or memory error and would create a “successful” pg_dump with only partial data? Is such a behaviour even thinkable with Postgres?
>>
>> Do you have an idea what else could cause this issue?
>>
>>
>> Uncommitted transactions?
>> * Purge job with a bug in it?
>> * Two different date columns (for example "transaction_date" and "posted_date") which are expected to be the same apparently not always.  Since the errors apparently happen at the beginning of the month, the purge job might have seen them as the previous month's records.
>>
>> These are our dump and restore commands:
>> pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump
>> pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=4 acme.dump
>>
>> We use just a single db user to access the database and we don’t use RLS.
>>
>> Thank you.
>>
>> Best regards,
>> Andreas
>>
>>
>> --
>> Angular momentum makes the world go 'round.





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Losing records in PostgreSQL 9.6
  2022-05-04 14:55 Losing records in PostgreSQL 9.6 A G <[email protected]>
  2022-05-04 20:18 ` Re: Losing records in PostgreSQL 9.6 Ron <[email protected]>
  2022-05-13 17:52   ` Re: Losing records in PostgreSQL 9.6 A G <[email protected]>
  2022-05-14 04:06     ` Re: Losing records in PostgreSQL 9.6 Robert Treat <[email protected]>
@ 2022-06-04 09:19       ` A G <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: A G @ 2022-06-04 09:19 UTC (permalink / raw)
  To: Robert Treat <[email protected]>; +Cc: [email protected]

Thank you for your helpful input and sorry my late answer, I wanted to
check first the database statistics tables you suggested.
Unfortunately, we don't have access to them.
But that was great input. We are planning to improve the backup system to
include more information that will help us in the future to analyse what
happened on a customer database.

Best regards,
Andreas

On Sat, May 14, 2022 at 6:06 AM Robert Treat <[email protected]> wrote:

> If you are asking if it is within the realm of possibility that an old
> version of Postgres with known bugs running on a presumably old
> version of Linux which probably also has known bugs that happens to
> also have some form of hardware failure that could include corrupted
> memory and/or corrupted storage (did I mention filesystem bugs?) could
> possibly lose data... well I reckon that probably is possible.
>
> However, because the pg_dumps produce what sound like working / proper
> output, I don't think it'd be any of those things. You didn't mention
> if you have any ability to look at any of the database statistics
> tables, which would be a first place to look to see if there are any
> DML statements or DML activity tracked. Beyond that, if I had access
> to the server in question (or better an exact physical copy, which
> might be difficult in your case), I'd want to see if I could find the
> old rows which might still be on-disk even if not visible due to
> deletion or transaction semantics, and would also want to rule out
> things like index corruption that might cause some weird side effects.
>
>
> Robert Treat
>
> On Fri, May 13, 2022 at 1:53 PM A G <[email protected]> wrote:
> >
> > Thanks for your input!
> >
> > We checked the application that has access to the database, but it would
> never delete rows from that table. The missing rows in the database were
> stored at some point through committed transactions and had a lower
> sequential primary key. We don't think the transactions were rolled back
> since they were part of an older backup.
> >
> > We believe that there was probably a manual access through the customer
> or a service partner, but wanted to make sure that there is no other way
> that Postgres would lose rows during a pg_dump because of something like a
> hardware failure, for instance.
> >
> > Best regards,
> > Andreas
> >
> > On Sat, May 7, 2022 at 4:03 PM Ron <[email protected]> wrote:
> >>
> >> On 5/4/22 09:55, A G wrote:
> >>
> >> Hi,
> >> thanks for your help.
> >>
> >> My team is using Postgres 9.6.10 for an on-premise application (we are
> planing on upgrading to a newer Postgres version). Our application comes
> with Postgres running in a docker container with its data stored in a
> docker volume. Our software uses pg_dump / pg_restore to backup and restore
> the database.
> >>
> >> Now we got a ticket from a customer where their database is missing
> rows from a table. There are 971 consecutive rows missing from the
> beginning of the table. The missing rows were inserted first. We find it
> also strange, that all the other tables don’t seem to be affected at all.
> It appears that there is only data loss in this single table.
> >> Unfortunately, we don’t have access to the original database anymore
> and need to find out what happened through the backups the customer
> provides. We have one backup right after they installed and initially
> configured the application, which seems complete. Then there is another
> backup 10 months later where the first 971 rows are already missing in this
> one table.
> >>
> >> If we exclude a manual deletion, which the customer denies,
> >>
> >>
> >> There's more to PEBKAC than manual deletion.
> >>
> >> we are wondering if it’s possible that Postgres 9.6 could lose some of
> its data through a storage or memory error and would create a “successful”
> pg_dump with only partial data? Is such a behaviour even thinkable with
> Postgres?
> >>
> >> Do you have an idea what else could cause this issue?
> >>
> >>
> >> Uncommitted transactions?
> >> * Purge job with a bug in it?
> >> * Two different date columns (for example "transaction_date" and
> "posted_date") which are expected to be the same apparently not always.
> Since the errors apparently happen at the beginning of the month, the purge
> job might have seen them as the previous month's records.
> >>
> >> These are our dump and restore commands:
> >> pg_dump -Fc --no-acl --no-owner -U acme -h 127.0.0.1 acme > acme.dump
> >> pg_restore -d acme -n public -U acme -h 127.0.0.1 --jobs=4 acme.dump
> >>
> >> We use just a single db user to access the database and we don’t use
> RLS.
> >>
> >> Thank you.
> >>
> >> Best regards,
> >> Andreas
> >>
> >>
> >> --
> >> Angular momentum makes the world go 'round.
>


^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2022-06-04 09:19 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04 14:55 Losing records in PostgreSQL 9.6 A G <[email protected]>
2022-05-04 20:18 ` Ron <[email protected]>
2022-05-13 17:52   ` A G <[email protected]>
2022-05-14 04:06     ` Robert Treat <[email protected]>
2022-06-04 09:19       ` A G <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox