public inbox for [email protected]  
help / color / mirror / Atom feed
Need an idea to operate massive delete operation on big size table.
11+ messages / 7 participants
[nested] [flat]

* Need an idea to operate massive delete operation on big size table.
@ 2025-01-15 14:53 Gambhir Singh <[email protected]>
  2025-01-15 15:08 ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
  2025-01-15 21:02 ` Re: Need an idea to operate massive delete operation on big size table. Laurenz Albe <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: Gambhir Singh @ 2025-01-15 14:53 UTC (permalink / raw)
  To: [email protected]

Hi,

I received a request from a client to delete duplicate records from a table
which is very large in size.

Delete queries (~2 Billion) are provided via file, and we have to execute
that file in DB. Last time it lasted for two days. I feel there must be
another way to delete records in an efficient manner

This kind of activity they do every month.

-- 
Thanks & Regards
Gambhir Singh


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
@ 2025-01-15 15:08 ` Ron Johnson <[email protected]>
  2025-01-15 15:12   ` Re: Need an idea to operate massive delete operation on big size table. youness bellasri <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: Ron Johnson @ 2025-01-15 15:08 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

On Wed, Jan 15, 2025 at 9:54 AM Gambhir Singh <[email protected]>
wrote:

> Hi,
>
> I received a request from a client to delete duplicate records from a
> table which is very large in size.
>
> Delete queries (~2 Billion) are provided via file, and we have to execute
> that file in DB. Last time it lasted for two days. I feel there must be
> another way to delete records in an efficient manner
>

Maybe the delete "queries" are poorly written.  Maybe there's no supporting
index.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 15:08 ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
@ 2025-01-15 15:12   ` youness bellasri <[email protected]>
  2025-01-15 15:22     ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: youness bellasri @ 2025-01-15 15:12 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: Pgsql-admin <[email protected]>

1. *Batch Deletion*

Instead of deleting all records at once, break the operation into smaller
batches. This reduces locking, transaction log growth, and the risk of
timeouts.
2. *Use Indexes*

Ensure that the columns used in the WHERE clause of the delete queries are
indexed. This speeds up the identification of rows to delete.
3. *Disable Indexes and Constraints Temporarily*

If the table has many indexes or constraints, disabling them during the
delete operation can speed up the process. Re-enable them afterward.

Le mer. 15 janv. 2025 à 16:08, Ron Johnson <[email protected]> a
écrit :

> On Wed, Jan 15, 2025 at 9:54 AM Gambhir Singh <[email protected]>
> wrote:
>
>> Hi,
>>
>> I received a request from a client to delete duplicate records from a
>> table which is very large in size.
>>
>> Delete queries (~2 Billion) are provided via file, and we have to execute
>> that file in DB. Last time it lasted for two days. I feel there must be
>> another way to delete records in an efficient manner
>>
>
> Maybe the delete "queries" are poorly written.  Maybe there's no
> supporting index.
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 15:08 ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
  2025-01-15 15:12   ` Re: Need an idea to operate massive delete operation on big size table. youness bellasri <[email protected]>
@ 2025-01-15 15:22     ` Ron Johnson <[email protected]>
  2025-01-15 15:27       ` Re: Need an idea to operate massive delete operation on big size table. youness bellasri <[email protected]>
  2025-01-15 15:47       ` Re: Need an idea to operate massive delete operation on big size table. Day, Joseph <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: Ron Johnson @ 2025-01-15 15:22 UTC (permalink / raw)
  To: Pgsql-admin <[email protected]>

Sadly, Postgresql does not have (super-handy) "DISABLE" clauses.

On Wed, Jan 15, 2025 at 10:12 AM youness bellasri <[email protected]>
wrote:

> 1. *Batch Deletion*
>
> Instead of deleting all records at once, break the operation into smaller
> batches. This reduces locking, transaction log growth, and the risk of
> timeouts.
> 2. *Use Indexes*
>
> Ensure that the columns used in the WHERE clause of the delete queries
> are indexed. This speeds up the identification of rows to delete.
> 3. *Disable Indexes and Constraints Temporarily*
>
> If the table has many indexes or constraints, disabling them during the
> delete operation can speed up the process. Re-enable them afterward.
>
> Le mer. 15 janv. 2025 à 16:08, Ron Johnson <[email protected]> a
> écrit :
>
>> On Wed, Jan 15, 2025 at 9:54 AM Gambhir Singh <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I received a request from a client to delete duplicate records from a
>>> table which is very large in size.
>>>
>>> Delete queries (~2 Billion) are provided via file, and we have to
>>> execute that file in DB. Last time it lasted for two days. I feel there
>>> must be another way to delete records in an efficient manner
>>>
>>
>> Maybe the delete "queries" are poorly written.  Maybe there's no
>> supporting index.
>>
>>
-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 15:08 ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
  2025-01-15 15:12   ` Re: Need an idea to operate massive delete operation on big size table. youness bellasri <[email protected]>
  2025-01-15 15:22     ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
@ 2025-01-15 15:27       ` youness bellasri <[email protected]>
  2025-01-15 15:29         ` Re: Need an idea to operate massive delete operation on big size table. Alex Balashov <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: youness bellasri @ 2025-01-15 15:27 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: Pgsql-admin <[email protected]>

You're absolutely right—PostgreSQL doesn't have a DISABLE clause for
indexes or constraints like some other databases (e.g., SQL Server).
However, there are still ways to work around this limitation and optimize
large delete operations in PostgreSQL. Here are some PostgreSQL-specific
strategies:

1. **Batch Deletion**: Delete in small chunks using `LIMIT` or `CTE`.
   ```sql
   DELETE FROM your_table WHERE <condition> LIMIT 10000;
   ```

2. **Drop and Recreate Indexes**: Drop indexes before deletion, then
recreate them.
   ```sql
   DROP INDEX idx_name;
   DELETE FROM your_table WHERE <condition>;
   CREATE INDEX idx_name ON your_table(column_name);
   ```

3. **Partitioning**: Use table partitioning to drop entire partitions
instead of deleting rows.
   ```sql
   DROP TABLE your_table_partition_name;
   ```

4. **Archive and Truncate**: Copy rows to keep into a temp table, truncate
the original, then reinsert.
   ```sql
   CREATE TABLE temp_table AS SELECT * FROM your_table WHERE
<condition_to_keep>;
   TRUNCATE TABLE your_table;
   INSERT INTO your_table SELECT * FROM temp_table;
   DROP TABLE temp_table;
   ```

5. **Disable Triggers**: Temporarily disable triggers during deletion.
   ```sql
   ALTER TABLE your_table DISABLE TRIGGER ALL;
   DELETE FROM your_table WHERE <condition>;
   ALTER TABLE your_table ENABLE TRIGGER ALL;
   ```

6. **Optimize Settings**: Increase `work_mem` and `maintenance_work_mem`
for the session.
   ```sql
   SET work_mem = '1GB';
   SET maintenance_work_mem = '2GB';
   ```

7. **Vacuum and Analyze**: Clean up after deletion.
   ```sql
   VACUUM ANALYZE your_table;
   ```

8. **Soft Deletes**: Use an `is_deleted` flag to mark rows for later
cleanup.

By batching, partitioning, and optimizing settings, you can make large
deletes faster and less disruptive.

Le mer. 15 janv. 2025 à 16:22, Ron Johnson <[email protected]> a
écrit :

> Sadly, Postgresql does not have (super-handy) "DISABLE" clauses.
>
> On Wed, Jan 15, 2025 at 10:12 AM youness bellasri <
> [email protected]> wrote:
>
>> 1. *Batch Deletion*
>>
>> Instead of deleting all records at once, break the operation into smaller
>> batches. This reduces locking, transaction log growth, and the risk of
>> timeouts.
>> 2. *Use Indexes*
>>
>> Ensure that the columns used in the WHERE clause of the delete queries
>> are indexed. This speeds up the identification of rows to delete.
>> 3. *Disable Indexes and Constraints Temporarily*
>>
>> If the table has many indexes or constraints, disabling them during the
>> delete operation can speed up the process. Re-enable them afterward.
>>
>> Le mer. 15 janv. 2025 à 16:08, Ron Johnson <[email protected]> a
>> écrit :
>>
>>> On Wed, Jan 15, 2025 at 9:54 AM Gambhir Singh <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I received a request from a client to delete duplicate records from a
>>>> table which is very large in size.
>>>>
>>>> Delete queries (~2 Billion) are provided via file, and we have to
>>>> execute that file in DB. Last time it lasted for two days. I feel there
>>>> must be another way to delete records in an efficient manner
>>>>
>>>
>>> Maybe the delete "queries" are poorly written.  Maybe there's no
>>> supporting index.
>>>
>>>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 15:08 ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
  2025-01-15 15:12   ` Re: Need an idea to operate massive delete operation on big size table. youness bellasri <[email protected]>
  2025-01-15 15:22     ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
  2025-01-15 15:27       ` Re: Need an idea to operate massive delete operation on big size table. youness bellasri <[email protected]>
@ 2025-01-15 15:29         ` Alex Balashov <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Alex Balashov @ 2025-01-15 15:29 UTC (permalink / raw)
  To: youness bellasri <[email protected]>; +Cc: Ron Johnson <[email protected]>; Pgsql-admin <[email protected]>

Youness,

Your cut-and-paste ChatGPT contributions to this list reveal a degree of cretinism.

-- Alex

> On Jan 15, 2025, at 10:27 am, youness bellasri <[email protected]> wrote:
> 
> You're absolutely right—PostgreSQL doesn't have a DISABLE clause for indexes or constraints like some other databases (e.g., SQL Server). However, there are still ways to work around this limitation and optimize large delete operations in PostgreSQL. Here are some PostgreSQL-specific strategies:
> 
> 1. **Batch Deletion**: Delete in small chunks using `LIMIT` or `CTE`.
>    ```sql
>    DELETE FROM your_table WHERE <condition> LIMIT 10000;
>    ```
> 
> 2. **Drop and Recreate Indexes**: Drop indexes before deletion, then recreate them.
>    ```sql
>    DROP INDEX idx_name;
>    DELETE FROM your_table WHERE <condition>;
>    CREATE INDEX idx_name ON your_table(column_name);
>    ```
> 
> 3. **Partitioning**: Use table partitioning to drop entire partitions instead of deleting rows.
>    ```sql
>    DROP TABLE your_table_partition_name;
>    ```
> 
> 4. **Archive and Truncate**: Copy rows to keep into a temp table, truncate the original, then reinsert.
>    ```sql
>    CREATE TABLE temp_table AS SELECT * FROM your_table WHERE <condition_to_keep>;
>    TRUNCATE TABLE your_table;
>    INSERT INTO your_table SELECT * FROM temp_table;
>    DROP TABLE temp_table;
>    ```
> 
> 5. **Disable Triggers**: Temporarily disable triggers during deletion.
>    ```sql
>    ALTER TABLE your_table DISABLE TRIGGER ALL;
>    DELETE FROM your_table WHERE <condition>;
>    ALTER TABLE your_table ENABLE TRIGGER ALL;
>    ```
> 
> 6. **Optimize Settings**: Increase `work_mem` and `maintenance_work_mem` for the session.
>    ```sql
>    SET work_mem = '1GB';
>    SET maintenance_work_mem = '2GB';
>    ```
> 
> 7. **Vacuum and Analyze**: Clean up after deletion.
>    ```sql
>    VACUUM ANALYZE your_table;
>    ```
> 
> 8. **Soft Deletes**: Use an `is_deleted` flag to mark rows for later cleanup.
> 
> By batching, partitioning, and optimizing settings, you can make large deletes faster and less disruptive.
> 
> Le mer. 15 janv. 2025 à 16:22, Ron Johnson <[email protected]> a écrit :
> Sadly, Postgresql does not have (super-handy) "DISABLE" clauses.
> 
> On Wed, Jan 15, 2025 at 10:12 AM youness bellasri <[email protected]> wrote:
> 1. Batch Deletion
> Instead of deleting all records at once, break the operation into smaller batches. This reduces locking, transaction log growth, and the risk of timeouts.
> 2. Use Indexes
> Ensure that the columns used in the WHERE clause of the delete queries are indexed. This speeds up the identification of rows to delete.
> 3. Disable Indexes and Constraints Temporarily
> If the table has many indexes or constraints, disabling them during the delete operation can speed up the process. Re-enable them afterward.
> 
> Le mer. 15 janv. 2025 à 16:08, Ron Johnson <[email protected]> a écrit :
> On Wed, Jan 15, 2025 at 9:54 AM Gambhir Singh <[email protected]> wrote:
> Hi,
> 
> I received a request from a client to delete duplicate records from a table which is very large in size. 
> 
> Delete queries (~2 Billion) are provided via file, and we have to execute that file in DB. Last time it lasted for two days. I feel there must be another way to delete records in an efficient manner
> 
> Maybe the delete "queries" are poorly written.  Maybe there's no supporting index.
> 
> 
> -- 
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!

-- 
Alex Balashov
Principal Consultant
Evariste Systems LLC
Web: https://evaristesys.com
Tel: +1-706-510-6800






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 15:08 ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
  2025-01-15 15:12   ` Re: Need an idea to operate massive delete operation on big size table. youness bellasri <[email protected]>
  2025-01-15 15:22     ` Re: Need an idea to operate massive delete operation on big size table. Ron Johnson <[email protected]>
@ 2025-01-15 15:47       ` Day, Joseph <[email protected]>
  1 sibling, 0 replies; 11+ messages in thread

From: Day, Joseph @ 2025-01-15 15:47 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: Pgsql-admin <[email protected]>

Unless this is not an option, it would be very quick to do the following:


   - Create a new table without indexes, insert deduplicated rows into the
   new table.
   - Index the new table
   - Rename the old table
   - Rename the new table to the old table name

The works really fast but not an option if you have foreign keys on
the table.



On Wed, Jan 15, 2025 at 9:22 AM Ron Johnson <[email protected]> wrote:

> Sadly, Postgresql does not have (super-handy) "DISABLE" clauses.
>
> On Wed, Jan 15, 2025 at 10:12 AM youness bellasri <
> [email protected]> wrote:
>
>> 1. *Batch Deletion*
>>
>> Instead of deleting all records at once, break the operation into smaller
>> batches. This reduces locking, transaction log growth, and the risk of
>> timeouts.
>> 2. *Use Indexes*
>>
>> Ensure that the columns used in the WHERE clause of the delete queries
>> are indexed. This speeds up the identification of rows to delete.
>> 3. *Disable Indexes and Constraints Temporarily*
>>
>> If the table has many indexes or constraints, disabling them during the
>> delete operation can speed up the process. Re-enable them afterward.
>>
>> Le mer. 15 janv. 2025 à 16:08, Ron Johnson <[email protected]> a
>> écrit :
>>
>>> On Wed, Jan 15, 2025 at 9:54 AM Gambhir Singh <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I received a request from a client to delete duplicate records from a
>>>> table which is very large in size.
>>>>
>>>> Delete queries (~2 Billion) are provided via file, and we have to
>>>> execute that file in DB. Last time it lasted for two days. I feel there
>>>> must be another way to delete records in an efficient manner
>>>>
>>>
>>> Maybe the delete "queries" are poorly written.  Maybe there's no
>>> supporting index.
>>>
>>>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>


-- 





*Joseph M. Day*(866) 404-6119 :P
(866) 397-3931 :F
(312) 371-3054 :C


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
@ 2025-01-15 21:02 ` Laurenz Albe <[email protected]>
  2025-01-15 21:24   ` Re: Need an idea to operate massive delete operation on big size table. Alex Balashov <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: Laurenz Albe @ 2025-01-15 21:02 UTC (permalink / raw)
  To: Gambhir Singh <[email protected]>; [email protected]

On Wed, 2025-01-15 at 20:23 +0530, Gambhir Singh wrote:
> I received a request from a client to delete duplicate records from a table which is very large in size. 
> 
> Delete queries (~2 Billion) are provided via file, and we have to execute that file in DB.
> Last time it lasted for two days. I feel there must be another way to delete records in an efficient manner
> 
> This kind of activity they do every month.

I don't think there is a better way - except perhaps to create a new copy of
the table and copy the surviving rows to the new table.  Than may win if you
delete a majority of the rows.

For the future, you could consider not adding the duplicate rows rather than
deleting them.  Perhaps a constraint that prevents the duplicates can help.

Yours,
Laurenz Albe





^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 21:02 ` Re: Need an idea to operate massive delete operation on big size table. Laurenz Albe <[email protected]>
@ 2025-01-15 21:24   ` Alex Balashov <[email protected]>
  2025-01-16 00:06     ` Re: Need an idea to operate massive delete operation on big size table. Rui DeSousa <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Alex Balashov @ 2025-01-15 21:24 UTC (permalink / raw)
  To: [email protected]

In my experience, mass deletions are tough. There may be a supporting index to assist the broadest criteria, but the filtered rows that result must still be sequentially scanned for non-indexed sub-criteria[1]. That can still be an awful lot of rows and a huge, time-consuming workload. 

While it won't help with deduplication, partitioning is a very good, if somewhat labour-intensive solution to the problem of aging old data off the back of a rolling archive. Once upon a time, I had an installation with a periodic hygienic `DELETE` once or twice a year, which took many hours to plan and execute, and placed considerable demand on the system. We switched to monthly partitioning and the result was, to some, indistinguishable from magic. 

-- Alex

[1] Which normally doesn't make sense to index, in the overall tradeoff of index size and maintenance overhead vs. performance payoff.

-- 
Alex Balashov
Principal Consultant
Evariste Systems LLC
Web: https://evaristesys.com
Tel: +1-706-510-6800






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 21:02 ` Re: Need an idea to operate massive delete operation on big size table. Laurenz Albe <[email protected]>
  2025-01-15 21:24   ` Re: Need an idea to operate massive delete operation on big size table. Alex Balashov <[email protected]>
@ 2025-01-16 00:06     ` Rui DeSousa <[email protected]>
  2025-01-16 18:09       ` Re: Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Rui DeSousa @ 2025-01-16 00:06 UTC (permalink / raw)
  To: Alex Balashov <[email protected]>; +Cc: [email protected]



> On Jan 15, 2025, at 4:24 PM, Alex Balashov <[email protected]> wrote:
> 
> n my experience, mass deletions are tough. There may be a supporting index to assist the broadest criteria, but the filtered rows that result must still be sequentially scanned for non-indexed sub-criteria[1]. That can still be an awful lot of rows and a huge, time-consuming workload. 



This is how easliy do it. I’ve used this method many times over.  Create a purge queue table and a function to purge the records in batches.  

1. I normally try to keep the purge transaction to 500ms or less.  i.e. purge 1000 or 5000 records, etc.  if depends on the schema and cascading deletes, etc.
2. The function is setup to run in parallel; so kick off as many purge sessions as your system can handle.

Write a perl or shell script to just keep executing “select purge_date();” repeatedly until zero is returned.  Make sure to only call the function once in a given transaction.

You shouldn’t have to create any additional indexes or disable them while purging.  The application can continue to operate without issues and when the process is complete; vacuum and rebuild indexes on the effected tables.



create table purge_data_queue (
  data_id bigint not null primary key
);

/* Identify all records to be purged */
insert into purge_data_queue (data_id)
select data_id 
from data
where <records to delete>
;

create or replace function purge_data (_limit int default 1000)
  returns int 
as 
$$
  declare 
    _rowcnt int;
  begin
    create temp table if not exists purge_set (
      data_id bigint not null
      , primary key (data_id)
    ) on commit drop
    ;

    /* Identify batch to be purged */
    insert into purge_set (
      data_id
    )
    select data_id
    from purge_data_queue
    limit _limit 
    for update skip locked
    ;

    /* Delete batch from base table */
    delete from data using purge_set
    where data.data_id = purge_set.data_id
    ; 

    get diagnostics _rowcnt = ROW_COUNT;

    /* Delete batch from queue table */
    delete from purge_data_queue using purge_set
    where purge_data_queue.data_id = purge_set.data_id
    ;
    
    return _rowcnt;
  end;
$$ language plpgsql
  set search_path = schema_name
;

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Need an idea to operate massive delete operation on big size table.
  2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
  2025-01-15 21:02 ` Re: Need an idea to operate massive delete operation on big size table. Laurenz Albe <[email protected]>
  2025-01-15 21:24   ` Re: Need an idea to operate massive delete operation on big size table. Alex Balashov <[email protected]>
  2025-01-16 00:06     ` Re: Need an idea to operate massive delete operation on big size table. Rui DeSousa <[email protected]>
@ 2025-01-16 18:09       ` Gambhir Singh <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Gambhir Singh @ 2025-01-16 18:09 UTC (permalink / raw)
  To: Rui DeSousa <[email protected]>; +Cc: Alex Balashov <[email protected]>; [email protected]

Thanks everyone for providing the ideas and special thanks to Rui DeSousa.

On Thu, 16 Jan 2025 at 05:39, Rui DeSousa <[email protected]> wrote:

>
>
> On Jan 15, 2025, at 4:24 PM, Alex Balashov <[email protected]>
> wrote:
>
> n my experience, mass deletions are tough. There may be a supporting index
> to assist the broadest criteria, but the filtered rows that result must
> still be sequentially scanned for non-indexed sub-criteria[1]. That can
> still be an awful lot of rows and a huge, time-consuming workload.
>
>
>
>
> This is how easliy do it. I’ve used this method many times over.  Create a
> purge queue table and a function to purge the records in batches.
>
> 1. I normally try to keep the purge transaction to 500ms or less.  i.e.
> purge 1000 or 5000 records, etc.  if depends on the schema and cascading
> deletes, etc.
> 2. The function is setup to run in parallel; so kick off as many purge
> sessions as your system can handle.
>
> Write a perl or shell script to just keep executing “select purge_date();”
> repeatedly until zero is returned.  Make sure to only call the function
> once in a given transaction.
>
> You shouldn’t have to create any additional indexes or disable them while
> purging.  The application can continue to operate without issues and when
> the process is complete; vacuum and rebuild indexes on the effected tables.
>
>
>
> create table purge_data_queue (
> data_id bigint not null primary key
> );
>
> /* Identify all records to be purged */
> insert into purge_data_queue (data_id)
> select data_id
> from data
> where <records to delete>
> ;
>
> create or replace function purge_data (_limit int default 1000)
> returns int
> as
> $$
> declare
> _rowcnt int;
> begin
> create temp table if not exists purge_set (
> data_id bigint not null
> , primary key (data_id)
> ) on commit drop
> ;
>
> /* Identify batch to be purged */
> insert into purge_set (
> data_id
> )
> select data_id
> from purge_data_queue
> limit _limit
> for update skip locked
> ;
>
> /* Delete batch from base table */
> delete from data using purge_set
> where data.data_id = purge_set.data_id
> ;
>
> get diagnostics _rowcnt = ROW_COUNT;
>
> /* Delete batch from queue table */
> delete from purge_data_queue using purge_set
> where purge_data_queue.data_id = purge_set.data_id
> ;
> return _rowcnt;
> end;
> $$ language plpgsql
> set search_path = schema_name
> ;
>


-- 
Thanks & Regards
Gambhir Singh


^ permalink  raw  reply  [nested|flat] 11+ messages in thread


end of thread, other threads:[~2025-01-16 18:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-01-15 14:53 Need an idea to operate massive delete operation on big size table. Gambhir Singh <[email protected]>
2025-01-15 15:08 ` Ron Johnson <[email protected]>
2025-01-15 15:12   ` youness bellasri <[email protected]>
2025-01-15 15:22     ` Ron Johnson <[email protected]>
2025-01-15 15:27       ` youness bellasri <[email protected]>
2025-01-15 15:29         ` Alex Balashov <[email protected]>
2025-01-15 15:47       ` Day, Joseph <[email protected]>
2025-01-15 21:02 ` Laurenz Albe <[email protected]>
2025-01-15 21:24   ` Alex Balashov <[email protected]>
2025-01-16 00:06     ` Rui DeSousa <[email protected]>
2025-01-16 18:09       ` Gambhir Singh <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox