public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Partitioning and unique key
2+ messages / 2 participants
[nested] [flat]

* Re: Partitioning and unique key
@ 2024-08-31 22:28 Adrian Klaver <[email protected]>
  2024-09-01 03:06 ` Re: Partitioning and unique key veem v <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Adrian Klaver @ 2024-08-31 22:28 UTC (permalink / raw)
  To: veem v <[email protected]>; pgsql-general <[email protected]>

On 8/31/24 13:02, veem v wrote:
> Hello,
> 
> We have our transaction tables daily range partitioned based on 
> transaction_timestamp column which is timestamptz data type and these 
> are having composite primary key on (transaction_id, 
> transaction_timestamp). And we were using an "insert on conflict" for 
> loading data to our system , which means if another record comes to the 
> system with the same transaction_id and transaction_timestamp, it will 
> get updated. This way we already have 60 days worth of data stored in 
> our system with approx. 70 million transactions per day.
> 
> But we just got to know from business thatthe data should be unique by 
> only transaction_id but not transaction_timestamp. Any incoming data 
> with the same transaction_id(even different transaction_timestamp) 
> should get updated but not inserted.
> 
> Also these daily partitions are going to hold 400million rows in future 
> and will be queried on the transaction_timestamp filter so we can't 
> really avoid the partitioning option here considering future growth.
> 
> But due to postgres limitations we are unable to have this unique 
> constraint or primary key only on the transaction_id column, we have to 
> include transaction_timestamp with it as a composite key. So I want to 
> understand from experts if there is any possible way to satisfy both 
> partitioning on transaction_timestamp column and unique key or pk just 
> on _trans_action_id only?

The model is at odds with itself and untenable. If the tables hold 
multiple rows for a given transaction_id  then you cannot have a 
PK/Unique constraint on that column. Seems there is a decided lack of 
any planning. The only way I can see this happening is consolidating all 
the duplicate transaction_id rows into a single row for each 
transaction_id. That then leads to the question of how to do that and 
retain the 'correct' information from the selection of rows for each 
transaction_id.

> 
> Note-its 15.4 postgres database.
> 
> 
> Regards
> 
> Veem
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Partitioning and unique key
  2024-08-31 22:28 Re: Partitioning and unique key Adrian Klaver <[email protected]>
@ 2024-09-01 03:06 ` veem v <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: veem v @ 2024-09-01 03:06 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general <[email protected]>

On Sun, 1 Sept 2024 at 03:58, Adrian Klaver <[email protected]>
wrote:

>
> The model is at odds with itself and untenable. If the tables hold
> multiple rows for a given transaction_id  then you cannot have a
> PK/Unique constraint on that column. Seems there is a decided lack of
> any planning. The only way I can see this happening is consolidating all
> the duplicate transaction_id rows into a single row for each
> transaction_id. That then leads to the question of how to do that and
> retain the 'correct' information from the selection of rows for each
> transaction_id.
>
>
Yes we had messed up the data for now and have multiple records for each
transaction_id persisted and thus we need to fix the data. But more than
that , as I stated , I wanted to understand first
1) if it's technically possible to have a unique key on only the
transaction_id column having the partition key on the
transaction_timestamp, because the table is going to be queried/purged
based on the transaction_timestamp?

2) Additionally we were thinking if above is technically not possible, then
the maximum granularity which we can have for each transaction_id will be a
day, so the partition key transaction_timestmp can be truncated to have
only date component but no time component. So the primary key will
be (transaction_id, transaction_date). But we also don't want to lose the
time component and persist the existing data of transaction_timestmp (which
will have a time component in it, in a separate column).

And in above case , for fixing the existing data in least disruptive way,
as we have currently duplicate transaction_id inserted into the table
already because of the composite primary key(transaction_id,
transaction_timestmp).Can we simply
 i)rename the existing column transaction_timestmp to transaction_date and
then add new column transaction_timestmp using the values of existing
column partition by partition.
ii)And then delete the duplicate data using query something as below , each
partition by partition.
iii)And then alter the datatype of the partition key transaction_date to
DATE in one shot at the table level(which should be fast as its having more
granularity as compare to existing timestamptype, so should be catalog or
dictionary change only), and that will remain the part of composite PK
(transaction_id,transaction_date).
iv) Repeat this step for all child partition tables and then for the parent
partition tables.

Will this technique be the most efficient way of fixing this mess?

WITH ranked_records AS (
            SELECT column1_id, column2_timestamptz,
                   ROW_NUMBER() OVER (PARTITION BY column1_id,
date_trunc('day', column2_timestamptz)
                                      ORDER BY column2_timestamptz DESC) AS
rn
            FROM partition_name
        )
        DELETE FROM partition_name T1
        WHERE EXISTS (
            SELECT 1
            FROM ranked_records T2
            WHERE T1.column1_id = T2.column1_id
              AND T1.column2_timestamptz = T2.column2_timestamptz
              AND T2.rn > 1
        )


^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2024-09-01 03:06 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-08-31 22:28 Re: Partitioning and unique key Adrian Klaver <[email protected]>
2024-09-01 03:06 ` veem v <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox