public inbox for [email protected]  
help / color / mirror / Atom feed
From: Greg Sabino Mullane <[email protected]>
To: mark bradley <[email protected]>
Cc: Adrian Klaver <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: Duplicate Key Values
Date: Fri, 7 Mar 2025 10:52:40 -0500
Message-ID: <CAKAnmm+BBBaXGN2xPHhXywkwb72UWzinWu2wQ5WadcMw3_57rQ@mail.gmail.com> (raw)
In-Reply-To: <SJ2PR22MB43286857EAC74D1EE332B08CBAD52@SJ2PR22MB4328.namprd22.prod.outlook.com>
References: <CH3PR22MB4312C23476C0E67F9B4C2E10BACB2@CH3PR22MB4312.namprd22.prod.outlook.com>
	<CANzqJaDWzZ4O10QZi6w__N5q7gzwcZANfu0DxA7UOpFwqAqTyQ@mail.gmail.com>
	<SJ2PR22MB43284AE4E32208EB035F68F5BACB2@SJ2PR22MB4328.namprd22.prod.outlook.com>
	<CANzqJaCZefaoZqJzPSFqEaRA1W-Ps5E6yNuL_PT-WxUw9x38Nw@mail.gmail.com>
	<[email protected]>
	<SJ2PR22MB4328C53DB4397EB9759419DDBACA2@SJ2PR22MB4328.namprd22.prod.outlook.com>
	<[email protected]>
	<SJ2PR22MB4328C88FCBA2B447CC72DC90BACA2@SJ2PR22MB4328.namprd22.prod.outlook.com>
	<[email protected]>
	<SJ2PR22MB4328931B7CB951A48F1434ABBACA2@SJ2PR22MB4328.namprd22.prod.outlook.com>
	<[email protected]>
	<SJ2PR22MB43286857EAC74D1EE332B08CBAD52@SJ2PR22MB4328.namprd22.prod.outlook.com>

On Fri, Mar 7, 2025 at 9:35 AM mark bradley <[email protected]> wrote:

> This is what MS Copilot has to say about this apparent bug where Postgres
> inserts extra rows violating a primary keys uniqueness constraint:
>
> Yes, this issue has been encountered by others. There are a few potential
> reasons why this might happen:
>
>    1. *Sequence Out of Sync*: Sometimes, the sequence that generates
>    unique values for the primary key can become out of sync, especially after
>    a bulk import or a database restore. You can check if the sequence is out
>    of sync and reset it if necessary.
>    2. *Index Corruption*: Index corruption can occur due to various
>    reasons, such as hardware failures or bugs in earlier versions of
>    PostgreSQL. This can lead to duplicate primary keys being inserted.
>    3. *Table Inheritance*: If you are using table inheritance, primary
>    keys are not enforced among inherited tables. This can lead to duplicates
>    if not handled correctly.
>    4. *Application Logic*: Sometimes, the application logic might
>    inadvertently insert duplicate records. Reviewing the application code and
>    insert statements can help identify and resolve such issues.
>
> This is AI gobbledygook, and can be ignored. The only real option
is number 2 (index corruption).


>    1. There is no index on the primary key *node_id*, and I understand
>    there should be one.
>
>
There is an index, as your table definition showed.

What to do?  I hesitate to just delete my tables and start over because
> this error will reoccur.
>

The error should not reoccur. At least, a normal Postgres system will
prevent this from happening in the first place. To clean it up, carefully
run the below. If an error appears, or something does not look right,
rollback and stop.

-- Encourage not using indexes:
set enable_indexscan = 0;
set enable_bitmapscan = 0;
set enable_indexonlyscan = 0;

-- Sanity check. This should return a number greater than 1. If not, stop.
set search_path = public;
select count(*) from dataset where node_id = 26;

-- Make a backup:
create table dataset_backup as select * from dataset;

-- Test out the process on a subset of the data:
create table test_dataset as select * from dataset where node_id < 30;
create table test_dataset_duperows_20250307 (like dataset);
begin;
set local session_replication_role = 'replica';
with goodctids as (select min(ctid) from TEST_dataset group by node_id)
 , mydelete as (delete from TEST_dataset where not exists (select 1 from
goodctids where min=ctid)
   returning *)
insert into test_dataset_duperows_20250307 select * from mydelete;
reset session_replication_role;
commit;

-- STOP HERE and examine the test_dataset and
test_dataset_duperows_20250307 tables
--
-- If ZERO rows were deleted, then you should no go further,
-- as some of the underlying assumptions must be wrong.

-- Do the real table:
create table dataset_duperows_20250307 (like dataset);
begin;
set local session_replication_role = 'replica';
with goodctids as (select min(ctid) from dataset group by node_id)
  , mydelete as (delete from dataset where not exists (select 1 from
goodctids where min=ctid)
    returning *)
insert into dataset_duperows_20250307 select * from mydelete;
reset session_replication_role;
commit;

-- Rebuild the index
reindex index concurrently dataset_pkey;

-- Put things back from good measure:
reset enable_indexscan;
reset enable_bitmapscan;
reset enable_indexonlyscan;

drop table test_dataset;
drop table test_dataset_duperows;

Given the issues, I would keep the dataset_backup table around for a while.
And make sure your backups are running and up to date.

You might also want to reindex all the tables in your database, to see if
any other issues are lurking.

Check your Postgres logs closely to see if anything else unusual has
appeared.

Look over your OS logs to see if there are clues as to how the corruption
happened. Maybe you recently upgraded your OS?

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Duplicate Key Values
  In-Reply-To: <CAKAnmm+BBBaXGN2xPHhXywkwb72UWzinWu2wQ5WadcMw3_57rQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox