Removing duplicate rows in table

public inbox for [email protected]  
help / color / mirror / Atom feed

Removing duplicate rows in table
16+ messages / 8 participants
[nested] [flat]

* Removing duplicate rows in table
@ 2024-09-10 15:07  Rich Shepard <[email protected]>
  0 siblings, 7 replies; 16+ messages in thread

From: Rich Shepard @ 2024-09-10 15:07 UTC (permalink / raw)
  To: pgsql-general

I've no idea how I entered multiple, identical rows in a table but I want to
delete all but one of these rows.

Here's an example:

bustrac=# select * from projects where proj_nbr = '4242.01';
  proj_nbr |   proj_name    | start_date |  end_date  |  description  | notes 
----------+----------------+------------+------------+---------------+-------
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 
(4 rows)

How do I clean this up so there's only a single row for this project number?

TIA,

Rich






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:10  Christophe Pettus <[email protected]>
  parent: Rich Shepard <[email protected]>
  6 siblings, 1 reply; 16+ messages in thread

From: Christophe Pettus @ 2024-09-10 15:10 UTC (permalink / raw)
  To: Rich Shepard <[email protected]>; +Cc: pgsql-general

If you don't mind taking the time to swap tables, you can always do an INSERT ... SELECT DISTINCT <fields> into a new table, and then swap it with the existing table.


> On Sep 10, 2024, at 08:07, Rich Shepard <[email protected]> wrote:
> 
> I've no idea how I entered multiple, identical rows in a table but I want to
> delete all but one of these rows.
> 
> Here's an example:
> 
> bustrac=# select * from projects where proj_nbr = '4242.01';
> proj_nbr |   proj_name    | start_date |  end_date  |  description  | notes ----------+----------------+------------+------------+---------------+-------
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> 4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | (4 rows)
> 
> How do I clean this up so there's only a single row for this project number?
> 
> TIA,
> 
> Rich
> 
> 







^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:16  Ron Johnson <[email protected]>
  parent: Rich Shepard <[email protected]>
  6 siblings, 0 replies; 16+ messages in thread

From: Ron Johnson @ 2024-09-10 15:16 UTC (permalink / raw)
  To: pgsql-general

On Tue, Sep 10, 2024 at 11:07 AM Rich Shepard <[email protected]>
wrote:

> I've no idea how I entered multiple, identical rows in a table but I want
> to
> delete all but one of these rows.
>
> Here's an example:
>
> bustrac=# select * from projects where proj_nbr = '4242.01';
>   proj_nbr |   proj_name    | start_date |  end_date  |  description  |
> notes
>
> ----------+----------------+------------+------------+---------------+-------
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> (4 rows)
>
> How do I clean this up so there's only a single row for this project
> number?
>

https://www.postgresqltutorial.com/postgresql-window-function/postgresql-row_number/

I'd use row_number to delete records where row_number > 1.

-- 
Death to America, and butter sauce.
Iraq lobster!


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:22  Adrian Klaver <[email protected]>
  parent: Rich Shepard <[email protected]>
  6 siblings, 1 reply; 16+ messages in thread

From: Adrian Klaver @ 2024-09-10 15:22 UTC (permalink / raw)
  To: Rich Shepard <[email protected]>; pgsql-general

On 9/10/24 08:07, Rich Shepard wrote:
> I've no idea how I entered multiple, identical rows in a table but I 
> want to
> delete all but one of these rows.

Is there a Primary Key or Unique index on this table?

> 
> Here's an example:
> 
> bustrac=# select * from projects where proj_nbr = '4242.01';
>   proj_nbr |   proj_name    | start_date |  end_date  |  description  | 
> notes 
> ----------+----------------+------------+------------+---------------+-------
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 
> (4 rows)
> 
> How do I clean this up so there's only a single row for this project 
> number?
> 
> TIA,
> 
> Rich
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:28  Andreas Kretschmer <[email protected]>
  parent: Rich Shepard <[email protected]>
  6 siblings, 0 replies; 16+ messages in thread

From: Andreas Kretschmer @ 2024-09-10 15:28 UTC (permalink / raw)
  To: [email protected]



Am 10.09.24 um 17:07 schrieb Rich Shepard:
> I've no idea how I entered multiple, identical rows in a table but I 
> want to
> delete all but one of these rows.
>
> Here's an example:
>
> bustrac=# select * from projects where proj_nbr = '4242.01';
>  proj_nbr |   proj_name    | start_date |  end_date  | description  | 
> notes 
> ----------+----------------+------------+------------+---------------+-------
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>  4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy | 
> (4 rows)
>
> How do I clean this up so there's only a single row for this project 
> number?
>
> TIA,
>
> Rich
>
>

you can use the hidden ctid-column:

postgres=# create table demo (id int, val text);
CREATE TABLE
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# insert into demo values (1, 'test1');
INSERT 0 1
postgres=# select ctid, * from demo;
  ctid  | id |  val
-------+----+-------
  (0,1) |  1 | test1
  (0,2) |  1 | test1
  (0,3) |  1 | test1
  (0,4) |  1 | test1
(4 rows)

postgres=# with my_ctid as (select min(ctid) from demo where id = 1 and 
val = 'test1') delete from demo using my_ctid where id=1 and val='test1' 
and ctid != my_ctid.min;
DELETE 3
postgres=# select ctid, * from demo;
  ctid  | id |  val
-------+----+-------
  (0,1) |  1 | test1
(1 row)

postgres=#


-- 
Andreas Kretschmer
CYBERTEC PostgreSQL Services and Support







^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:29  Rich Shepard <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Rich Shepard @ 2024-09-10 15:29 UTC (permalink / raw)
  To: pgsql-general

On Tue, 10 Sep 2024, Adrian Klaver wrote:

> Is there a Primary Key or Unique index on this table?

Adrian,

No. It didn't occur to me to make the project number a PK as this table is
not related to others in the database.

But, yesterday it occurred to me to make the proj_nbr a PK to eliminate
future issues.

Thanks,

Rich

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:32  Rich Shepard <[email protected]>
  parent: Rich Shepard <[email protected]>
  6 siblings, 1 reply; 16+ messages in thread

From: Rich Shepard @ 2024-09-10 15:32 UTC (permalink / raw)
  To: pgsql-general

On Tue, 10 Sep 2024, Francisco Olarte wrote:

> Do you have any kid of corruption (i.e, unique index violation) or is
> it just a duplicate problem?

Francisco,

Only a duplicate problem because when I created this table I didn't make the
proj_nbr column a PK.

> Also, if you do not have any uniqueness criteria consider adding an
> "id identity" column, it is useful when shit hits the fan.

Yep. that's what I will do.

Thanks,

Rich






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:33  Rich Shepard <[email protected]>
  parent: Christophe Pettus <[email protected]>
  0 siblings, 0 replies; 16+ messages in thread

From: Rich Shepard @ 2024-09-10 15:33 UTC (permalink / raw)
  To: pgsql-general

On Tue, 10 Sep 2024, Christophe Pettus wrote:

> If you don't mind taking the time to swap tables, you can always do an
> INSERT ... SELECT DISTINCT <fields> into a new table, and then swap it
> with the existing table.

Christophe,

I'll make the proj_nbr table the PK then do as you recommend.

Thank you,

Rich






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:35  Rich Shepard <[email protected]>
  parent: Rich Shepard <[email protected]>
  6 siblings, 0 replies; 16+ messages in thread

From: Rich Shepard @ 2024-09-10 15:35 UTC (permalink / raw)
  To: pgsql-general

On Tue, 10 Sep 2024, Andreas Kretschmer wrote:

> you can use the hidden ctid-column:
>
> postgres=# create table demo (id int, val text);
> CREATE TABLE
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# insert into demo values (1, 'test1');
> INSERT 0 1
> postgres=# select ctid, * from demo;
>  ctid  | id |  val
> -------+----+-------
>  (0,1) |  1 | test1
>  (0,2) |  1 | test1
>  (0,3) |  1 | test1
>  (0,4) |  1 | test1
> (4 rows)
>
> postgres=# with my_ctid as (select min(ctid) from demo where id = 1 and val = 
> 'test1') delete from demo using my_ctid where id=1 and val='test1' and ctid 
> != my_ctid.min;
> DELETE 3
> postgres=# select ctid, * from demo;
>  ctid  | id |  val
> -------+----+-------
>  (0,1) |  1 | test1
> (1 row)

Thanks, Andreas.

Rich






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 15:45  Adrian Klaver <[email protected]>
  parent: Rich Shepard <[email protected]>
  0 siblings, 1 reply; 16+ messages in thread

From: Adrian Klaver @ 2024-09-10 15:45 UTC (permalink / raw)
  To: Rich Shepard <[email protected]>; pgsql-general

On 9/10/24 08:29, Rich Shepard wrote:
> On Tue, 10 Sep 2024, Adrian Klaver wrote:
> 
>> Is there a Primary Key or Unique index on this table?
> 
> Adrian,
> 
> No. It didn't occur to me to make the project number a PK as this table is
> not related to others in the database.
> 
> But, yesterday it occurred to me to make the proj_nbr a PK to eliminate
> future issues.
> 

You might want to do something like:

select proj_nbr, count(*) as ct from projects group by proj_nbr;

to see how big a problem it is. If it is only a few projects it could 
just a matter of manually deleting the extras.

Whatever you do:

1) Make sure you have a backup of at least that table.

2) Do the data changes as BEGIN; <changes> COMMIT; or ROLLBACK;

> Thanks,
> 
> Rich
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 16:03  Francisco Olarte <[email protected]>
  parent: Rich Shepard <[email protected]>
  0 siblings, 0 replies; 16+ messages in thread

From: Francisco Olarte @ 2024-09-10 16:03 UTC (permalink / raw)
  To: Rich Shepard <[email protected]>; +Cc: pgsql-general

Rich:

On Tue, 10 Sept 2024 at 17:32, Rich Shepard <[email protected]> wrote:
> Only a duplicate problem because when I created this table I didn't make the
> proj_nbr column a PK.

Always report this is if you have future problems, so people know if
it is a case of pilot error or corruption, solutions differ and the
good for ones may harm others.

> > Also, if you do not have any uniqueness criteria consider adding an
> > "id identity" column, it is useful when shit hits the fan.
> Yep. that's what I will do.

Bear in mind the relational model on which relational databases are
modeled needs unique rows ( i.e., no two full rows should be equal on
all fields ). It is not enforced in SQL, but now having it normally
leads to problems. When it cannot be done in any other way, normally
adding an identity column is a cheap way to make them unique. I had
that problem with CDR tables ( it means call detail record, and until
I managed to add circuit identification I had the problem that you can
have two calls from and two the same two numbers with equal start and
end times ) and solved it using an identity column ( added just for
this purpose, after doing it a couple times using ctid in testing, it
is slightly more expensive, but a lot more civilized ).

Francisco Olarte.

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 16:38  Rich Shepard <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 2 replies; 16+ messages in thread

From: Rich Shepard @ 2024-09-10 16:38 UTC (permalink / raw)
  To: pgsql-general

On Tue, 10 Sep 2024, Adrian Klaver wrote:

> You might want to do something like:
>
> select proj_nbr, count(*) as ct from projects group by proj_nbr;
>
> to see how big a problem it is. If it is only a few projects it could just a 
> matter of manually deleting the extras.

Adrian,

It's a small table, not updated in a while. Looking at the example I sent
how do I delete the extras while keeping one when each row has the same
content? Not knowing how to do that is why I wrote.

> Whatever you do:
> 1) Make sure you have a backup of at least that table.
> 2) Do the data changes as BEGIN; <changes> COMMIT; or ROLLBACK;

Yep. Learned that lesson.

Thanks,

Rich

P.S. Please reply to the mail list so I receive only one copy of your
message, not two.

^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 16:46  Rich Shepard <[email protected]>
  parent: Rich Shepard <[email protected]>
  1 sibling, 0 replies; 16+ messages in thread

From: Rich Shepard @ 2024-09-10 16:46 UTC (permalink / raw)
  To: pgsql-general

On Tue, 10 Sep 2024, Rich Shepard wrote:

>> to see how big a problem it is. If it is only a few projects it could just 
>> a matter of manually deleting the extras.

> Not knowing how to do that is why I wrote.

A web search (which I should have done before posting this thread) shows me
how to do this:
<https://www.postgresqltutorial.com/postgresql-tutorial/how-to-delete-duplicate-rows-in-postgresql/;

Rich






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-10 23:34  Erik Wienhold <[email protected]>
  parent: Rich Shepard <[email protected]>
  1 sibling, 0 replies; 16+ messages in thread

From: Erik Wienhold @ 2024-09-10 23:34 UTC (permalink / raw)
  To: Rich Shepard <[email protected]>; +Cc: pgsql-general

On 2024-09-10 18:38 +0200, Rich Shepard wrote:
> P.S. Please reply to the mail list so I receive only one copy of your
> message, not two.

You can configure your list subscription to not receive an extra copy.
That setting is under "Global configuration" on
https://lists.postgresql.org/manage/.

-- 
Erik






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-12 04:06  Muhammad Usman Khan <[email protected]>
  parent: Rich Shepard <[email protected]>
  6 siblings, 1 reply; 16+ messages in thread

From: Muhammad Usman Khan @ 2024-09-12 04:06 UTC (permalink / raw)
  To: Rich Shepard <[email protected]>; +Cc: pgsql-general

Hi,
You can try the following CTE which removes all the identical rows and only
leave single row

WITH CTE AS (
  SELECT ctid, ROW_NUMBER() OVER (PARTITION BY proj_nbr, proj_name,
start_date, end_date, description, notes ORDER BY proj_nbr) AS rn
  FROM projects
  WHERE proj_nbr = '4242.02'
)
DELETE FROM projects
WHERE ctid IN (
  SELECT ctid FROM CTE WHERE rn > 1
);


On Tue, 10 Sept 2024 at 20:07, Rich Shepard <[email protected]>
wrote:

> I've no idea how I entered multiple, identical rows in a table but I want
> to
> delete all but one of these rows.
>
> Here's an example:
>
> bustrac=# select * from projects where proj_nbr = '4242.01';
>   proj_nbr |   proj_name    | start_date |  end_date  |  description  |
> notes
>
> ----------+----------------+------------+------------+---------------+-------
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
>   4242.01  | Expert witness | 2008-10-15 | 2008-10-28 | Consol Energy |
> (4 rows)
>
> How do I clean this up so there's only a single row for this project
> number?
>
> TIA,
>
> Rich
>
>
>


^ permalink  raw  reply  [nested|flat] 16+ messages in thread

* Re: Removing duplicate rows in table
@ 2024-09-12 12:41  Rich Shepard <[email protected]>
  parent: Muhammad Usman Khan <[email protected]>
  0 siblings, 0 replies; 16+ messages in thread

From: Rich Shepard @ 2024-09-12 12:41 UTC (permalink / raw)
  To: pgsql-general

On Thu, 12 Sep 2024, Muhammad Usman Khan wrote:

> You can try the following CTE which removes all the identical rows and only
> leave single row

Thank you, Muhammed.

Rich






^ permalink  raw  reply  [nested|flat] 16+ messages in thread

end of thread, other threads:[~2024-09-12 12:41 UTC | newest]

Thread overview: 16+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-10 15:07 Removing duplicate rows in table Rich Shepard <[email protected]>
2024-09-10 15:10 ` Christophe Pettus <[email protected]>
2024-09-10 15:33   ` Rich Shepard <[email protected]>
2024-09-10 15:16 ` Ron Johnson <[email protected]>
2024-09-10 15:22 ` Adrian Klaver <[email protected]>
2024-09-10 15:29   ` Rich Shepard <[email protected]>
2024-09-10 15:45     ` Adrian Klaver <[email protected]>
2024-09-10 16:38       ` Rich Shepard <[email protected]>
2024-09-10 16:46         ` Rich Shepard <[email protected]>
2024-09-10 23:34         ` Erik Wienhold <[email protected]>
2024-09-10 15:28 ` Andreas Kretschmer <[email protected]>
2024-09-10 15:32 ` Rich Shepard <[email protected]>
2024-09-10 16:03   ` Francisco Olarte <[email protected]>
2024-09-10 15:35 ` Rich Shepard <[email protected]>
2024-09-12 04:06 ` Muhammad Usman Khan <[email protected]>
2024-09-12 12:41   ` Rich Shepard <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox