MIME-Version: 1.0
References: <CAEzWdqcYGi0U5_cK1FVykx2-OZHmEUD8EZ_VE=kpoVaZKYWJeg@mail.gmail.com>
 <a9dd7961-dd3e-406e-a1d8-0feb233fa5e9@aklaver.com> <CAEzWdqfGN5cHN4cwSJm-rruab4E0y_9tqzihR2jQGpMXHR7cqw@mail.gmail.com>
 <b43139d3-8f8e-4635-a7cb-cab90e5205eb@aklaver.com> <CAEzWdqfts=HUoh7at7yD0M7DW97BqSe1o+4xqtXOyM_+ZX_XMA@mail.gmail.com>
 <a1abe41c-94d8-4292-899f-ea2f256d76ae@aklaver.com> <CAEzWdqdXvqYJJ0Pbb+uLKHMAEbvLb86kKW_GJ9DDrh=5MU+_GA@mail.gmail.com>
 <f4853f55-5b16-4e9c-bb59-682f2b0bdefc@aklaver.com> <CAB+=1TW0weW5XPkSdSjeY3nvmta-fxVEdwcMD1ySEhYz_fKs9Q@mail.gmail.com>
 <CAEzWdqfww7aUkE+xpXXBM9eTkif1NxE_nGxeHsYPv+8-FY4pmQ@mail.gmail.com> <c92d3f9b-0468-4a87-b974-24cc232036c2@aklaver.com>
In-Reply-To: <c92d3f9b-0468-4a87-b974-24cc232036c2@aklaver.com>
From: yudhi s <learnerdatabase99@gmail.com>
Date: Sun, 7 Apr 2024 01:34:49 +0530
Message-ID: <CAEzWdqd2uX5qoQ7V589uipWYcJhXS2MhvycJE4w4tGV3vvZ=rQ@mail.gmail.com>
Subject: Re: Moving delta data faster
To: Adrian Klaver <adrian.klaver@aklaver.com>
Cc: veem v <veema0000@gmail.com>, Greg Sabino Mullane <htamfids@gmail.com>, 
	pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000005fc43f06157316f1"
Archived-At: <https://www.postgresql.org/message-id/CAEzWdqd2uX5qoQ7V589uipWYcJhXS2MhvycJE4w4tGV3vvZ%3DrQ%40mail.gmail.com>
Precedence: bulk

--0000000000005fc43f06157316f1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, Apr 6, 2024 at 10:25=E2=80=AFPM Adrian Klaver <adrian.klaver@aklave=
r.com>
wrote:

>
> Your original problem description was:
>
> "Then subsequently these rows will be inserted/updated based on the
> delta number of rows that got inserted/updated in the source database.
> In some cases these changed data can flow multiple times per day to the
> downstream i.e. postgres database and in other cases once daily."
>
> If the above is not a hard rule, then yes up to some point just
> replacing the data in mass would be the simplest/fastest method. You
> could cut a step out by doing something like TRUNCATE target_tab and
> then COPY target_tab FROM 'source.csv' bypassing the INSERT INTO
> source_tab.
>
> Yes, actually i didn't realize that truncate table transactional/online
here in postgres. In other databases like Oracle its downtime for the read
queries on the target table, as data will be vanished from the target table
post truncate(until the data load happens) and those are auto commit.
Thanks Veem for sharing that  option.

 I also think that truncate will be faster if the changes/delta is large ,
but if its handful of rows like <5%of the rows in the table then
Upsert/Merge will be better performant. And also the down side of the
truncate option is,  it does ask to bring/export all the data from source
to the S3 file which may take longer as compared to bringing just the delta
records. Correct me if I'm wrong.

However I am still not able to understand why the upsert is less performant
than merge, could you throw some light on this please?

--0000000000005fc43f06157316f1
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr">On Sat, Apr 6, 2024 at 10:25=E2=80=AFPM A=
drian Klaver &lt;<a href=3D"mailto:adrian.klaver@aklaver.com">adrian.klaver=
@aklaver.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddin=
g-left:1ex"><br>
Your original problem description was:<br>
<br>
&quot;Then subsequently these rows will be inserted/updated based on the <b=
r>
delta number of rows that got inserted/updated in the source database. <br>
In some cases these changed data can flow multiple times per day to the <br=
>
downstream i.e. postgres database and in other cases once daily.&quot;<br>
<br>
If the above is not a hard rule, then yes up to some point just <br>
replacing the data in mass would be the simplest/fastest method. You <br>
could cut a step out by doing something like TRUNCATE target_tab and <br>
then COPY target_tab FROM &#39;source.csv&#39; bypassing the INSERT INTO so=
urce_tab.<br><br></blockquote><div>Yes, actually i didn&#39;t realize that =
truncate table transactional/online here in postgres. In other databases li=
ke Oracle its downtime for the read queries on the target table, as data wi=
ll be=C2=A0vanished from the target table post truncate(until the data load=
 happens) and those are auto commit. Thanks Veem for sharing that=C2=A0 opt=
ion.=C2=A0</div><div><br></div><div>=C2=A0I also think that truncate will b=
e faster if the changes/delta is large , but if its handful of rows like &l=
t;5%of the rows in the table then Upsert/Merge will be better performant. A=
nd also the down side of the truncate=C2=A0option is,=C2=A0 it does ask to =
bring/export all the data from source to the S3 file which may take longer =
as compared to bringing just the delta records. Correct me if I&#39;m wrong=
.</div><div><br></div><div>However I am still not able to understand why th=
e upsert is less performant than merge, could you throw some light on this =
please?</div><div><br></div></div></div>

--0000000000005fc43f06157316f1--