Re: Moving delta data faster

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Moving delta data faster
5+ messages / 3 participants
[nested] [flat]

* Re: Moving delta data faster
@ 2024-04-04 04:46 Adrian Klaver <[email protected]>
  2024-04-04 05:24 ` Re: Moving delta data faster yudhi s <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Adrian Klaver @ 2024-04-04 04:46 UTC (permalink / raw)
  To: yudhi s <[email protected]>; pgsql-general <[email protected]>

On 4/3/24 20:54, yudhi s wrote:
> On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 4/3/24 13:38, yudhi s wrote:
>      >   Hi All,
>      >   It's postgresql database version 15.4. We have a requirement in
>     which
>      > we will be initially moving full table data for 3-4 tables, from
>     source
>      > database to target(i.e. postgres) . Maximum number of rows will be
>      > ~10million rows in those tables. Then subsequently these rows
>     will be
>      > inserted/updated based on the delta number of rows that got
>      > inserted/updated in the source database. In some cases these changed
>      > data can flow multiple times per day to the downstream i.e. postgres
>      > database and in other cases once daily.
> 
>     What is the source database?
> 
>     Can it be reached with a FDW?:
> 
>     https://wiki.postgresql.org/wiki/Foreign_data_wrappers
>     <https://wiki.postgresql.org/wiki/Foreign_data_wrappers;
> 
>     Can the delta on the source be output as CSV?
> 
> 
> 
> Thank you Adrian.
> 
> And one thing i forgot to mention this target postgresql database would 
> be on AWS RDS whereas the source Oracle databases is on premise. I think 
> we don't have the FDW extension currently in place but we can get that. 
> I am just not able to understand clearly  though, but do you mean export 
> the data from source using CSV and do truncate and import on target. And 
> as these data will be moved through the network won't that cause slowness?
> 
> The source database here is Oracle database. Correct me if wrong, it 
> looks like foreign data wrapper is like a DB link. Or do you mean 
> writing a query on the target database (which can be UPSERT or MERGE) 
> but will be joining the table from the source database through the 
> DBlink/DDW? But my question was whether we should use UPSERT or MERGE 
> for comparing and loading the delta records to the target postgresql 
> database. Want to understand which is more performant , as I see in the 
> past Merge having performance issues in the past, but not very sure 
> about that.

My motivation was to get some basic information about your setup and 
what you are trying to achieve.

If I understand correctly you have:

1) An Oracle database with tables that you want to copy the complete 
data from to a Postgres database. For this sort of thing 
COPY(https://www.postgresql.org/docs/current/sql-copy.html) on the 
Postgres end using CSV data generated from the source is probably the 
quickest bulk load method.

2) After the initial load you want to do follow up INSERT/UPDATEs based 
on a delta of the source tables relative to the initial load. This is 
still a bit of mystery to me. How are determining the delta: a) On the 
source end entirely or b) Target relative to source? Also what is the 
anticipated size of the delta per transfer?

Additional information needed:

1) Network distance between source and target?

2) Network capacity?

3) Expected load on both source and target servers from other operations?

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Moving delta data faster
  2024-04-04 04:46 Re: Moving delta data faster Adrian Klaver <[email protected]>
@ 2024-04-04 05:24 ` yudhi s <[email protected]>
  2024-04-04 15:34   ` Re: Moving delta data faster Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: yudhi s @ 2024-04-04 05:24 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general <[email protected]>

On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver <[email protected]>
wrote:

> On 4/3/24 20:54, yudhi s wrote:
> > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >    > Thank you Adrian.
> >
> > And one thing i forgot to mention this target postgresql database would
> > be on AWS RDS whereas the source Oracle databases is on premise. I think
> > we don't have the FDW extension currently in place but we can get that.
> > I am just not able to understand clearly  though, but do you mean export
> > the data from source using CSV and do truncate and import on target. And
> > as these data will be moved through the network won't that cause
> slowness?
> >
> > The source database here is Oracle database. Correct me if wrong, it
> > looks like foreign data wrapper is like a DB link. Or do you mean
> > writing a query on the target database (which can be UPSERT or MERGE)
> > but will be joining the table from the source database through the
> > DBlink/DDW? But my question was whether we should use UPSERT or MERGE
> > for comparing and loading the delta records to the target postgresql
> > database. Want to understand which is more performant , as I see in the
> > past Merge having performance issues in the past, but not very sure
> > about that.
>
> My motivation was to get some basic information about your setup and
> what you are trying to achieve.
>
> If I understand correctly you have:
>
> 1) An Oracle database with tables that you want to copy the complete
> data from to a Postgres database. For this sort of thing
> COPY(https://www.postgresql.org/docs/current/sql-copy.html) on the
> Postgres end using CSV data generated from the source is probably the
> quickest bulk load method.
>
> 2) After the initial load you want to do follow up INSERT/UPDATEs based
> on a delta of the source tables relative to the initial load. This is
> still a bit of mystery to me. How are determining the delta: a) On the
> source end entirely or b) Target relative to source? Also what is the
> anticipated size of the delta per transfer?
>
> Additional information needed:
>
> 1) Network distance between source and target?
>
> 2) Network capacity?
>
> 3) Expected load on both source and target servers from other operations?
>
>
Thank you. Actually I was trying to understand how to cater the delta load
after the one time load is done . The delta change in records is planned to
be found based on the primary keys on the tables. If it found the key it
will update the records if it does not find the keys it will insert the
rows.

Basically the select query from the source database will fetch the data
with a certain time interval(based on the latest update timestamp or create
timestamp if they are available or else full dump) and put it on S3 and
then from the S3 it will be picked and gets merged to the target postgres
database. As upsert and merge both were looking similar , so was wondering
what we should use here for loading the delta records?


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Moving delta data faster
  2024-04-04 04:46 Re: Moving delta data faster Adrian Klaver <[email protected]>
  2024-04-04 05:24 ` Re: Moving delta data faster yudhi s <[email protected]>
@ 2024-04-04 15:34   ` Adrian Klaver <[email protected]>
  2024-04-04 20:42     ` Re: Moving delta data faster yudhi s <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Adrian Klaver @ 2024-04-04 15:34 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: pgsql-general <[email protected]>

On 4/3/24 22:24, yudhi s wrote:
> 
> On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 4/3/24 20:54, yudhi s wrote:
>      > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver
>     <[email protected] <mailto:[email protected]>
>      > <mailto:[email protected]
>     <mailto:[email protected]>>> wrote:
>      >
>      >    > Thank you Adrian.
>      >
>      > And one thing i forgot to mention this target postgresql database
>     would
>      > be on AWS RDS whereas the source Oracle databases is on premise.
>     I think
>      > we don't have the FDW extension currently in place but we can get
>     that.
>      > I am just not able to understand clearly  though, but do you mean
>     export
>      > the data from source using CSV and do truncate and import on
>     target. And
>      > as these data will be moved through the network won't that cause
>     slowness?
>      >
>      > The source database here is Oracle database. Correct me if wrong, it
>      > looks like foreign data wrapper is like a DB link. Or do you mean
>      > writing a query on the target database (which can be UPSERT or
>     MERGE)
>      > but will be joining the table from the source database through the
>      > DBlink/DDW? But my question was whether we should use UPSERT or
>     MERGE
>      > for comparing and loading the delta records to the target postgresql
>      > database. Want to understand which is more performant , as I see
>     in the
>      > past Merge having performance issues in the past, but not very sure
>      > about that.
> 
>     My motivation was to get some basic information about your setup and
>     what you are trying to achieve.
> 
>     If I understand correctly you have:
> 
>     1) An Oracle database with tables that you want to copy the complete
>     data from to a Postgres database. For this sort of thing
>     COPY(https://www.postgresql.org/docs/current/sql-copy.html
>     <https://www.postgresql.org/docs/current/sql-copy.html;) on the
>     Postgres end using CSV data generated from the source is probably the
>     quickest bulk load method.
> 
>     2) After the initial load you want to do follow up INSERT/UPDATEs based
>     on a delta of the source tables relative to the initial load. This is
>     still a bit of mystery to me. How are determining the delta: a) On the
>     source end entirely or b) Target relative to source? Also what is the
>     anticipated size of the delta per transfer?
> 
>     Additional information needed:
> 
>     1) Network distance between source and target?
> 
>     2) Network capacity?
> 
>     3) Expected load on both source and target servers from other
>     operations?
> 
> 
> Thank you. Actually I was trying to understand how to cater the delta 
> load after the one time load is done . The delta change in records is 
> planned to be found based on the primary keys on the tables. If it found 
> the key it will update the records if it does not find the keys it will 
> insert the rows.
> 
> Basically the select query from the source database will fetch the data 
> with a certain time interval(based on the latest update timestamp or 
> create timestamp if they are available or else full dump) and put it on 
> S3 and then from the S3 it will be picked and gets merged to the target 
> postgres database. As upsert and merge both were looking similar , so 
> was wondering what we should use here for loading the delta records?

S3 is not a database. You will need to be more specific about '... then 
from the S3 it will be picked and gets merged to the target postgres 
database.'

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Moving delta data faster
  2024-04-04 04:46 Re: Moving delta data faster Adrian Klaver <[email protected]>
  2024-04-04 05:24 ` Re: Moving delta data faster yudhi s <[email protected]>
  2024-04-04 15:34   ` Re: Moving delta data faster Adrian Klaver <[email protected]>
@ 2024-04-04 20:42     ` yudhi s <[email protected]>
  2024-04-04 21:08       ` Re: Moving delta data faster Greg Sabino Mullane <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: yudhi s @ 2024-04-04 20:42 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general <[email protected]>

On Thu, Apr 4, 2024 at 9:04 PM Adrian Klaver <[email protected]>
wrote:

> On 4/3/24 22:24, yudhi s wrote:
> >
> > On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver <[email protected]
> > <mailto:[email protected]>> wrote:
>
> S3 is not a database. You will need to be more specific about '... then
> from the S3 it will be picked and gets merged to the target postgres
> database.'
>
>
The data from S3 will be dumped into the stage table and then the
upsert/merge from that table to the actual table.


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Moving delta data faster
  2024-04-04 04:46 Re: Moving delta data faster Adrian Klaver <[email protected]>
  2024-04-04 05:24 ` Re: Moving delta data faster yudhi s <[email protected]>
  2024-04-04 15:34   ` Re: Moving delta data faster Adrian Klaver <[email protected]>
  2024-04-04 20:42     ` Re: Moving delta data faster yudhi s <[email protected]>
@ 2024-04-04 21:08       ` Greg Sabino Mullane <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Greg Sabino Mullane @ 2024-04-04 21:08 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: Adrian Klaver <[email protected]>; pgsql-general <[email protected]>

So you want advice from this open source community about moving things from
a proprietary database to another proprietary database, going through a
proprietary storage system along the way? :)

To answer the original question, use MERGE. That's it's job, and your table
size is very tiny, so I doubt the performance impact will be worth worrying
about.

Cheers,
Greg

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2024-04-04 21:08 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 04:46 Re: Moving delta data faster Adrian Klaver <[email protected]>
2024-04-04 05:24 ` yudhi s <[email protected]>
2024-04-04 15:34   ` Adrian Klaver <[email protected]>
2024-04-04 20:42     ` yudhi s <[email protected]>
2024-04-04 21:08       ` Greg Sabino Mullane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox