Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rtCHn-00D4ZU-W2 for pgsql-general@arkaria.postgresql.org; Sat, 06 Apr 2024 20:05:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rtCHm-00ForA-8V for pgsql-general@arkaria.postgresql.org; Sat, 06 Apr 2024 20:05:10 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rtCHl-00For1-Tr for pgsql-general@lists.postgresql.org; Sat, 06 Apr 2024 20:05:09 +0000 Received: from mail-qt1-x829.google.com ([2607:f8b0:4864:20::829]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rtCHe-0017yk-TL for pgsql-general@lists.postgresql.org; Sat, 06 Apr 2024 20:05:09 +0000 Received: by mail-qt1-x829.google.com with SMTP id d75a77b69052e-432c947e92eso39614081cf.0 for ; Sat, 06 Apr 2024 13:05:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712433901; x=1713038701; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PQIbB/RWURgZBty8bAgjXp6aTMFU/Tm2XpLMBibqaFk=; b=H33hu9uxoumX3i9sLhKADhWk7mlSTrHW9lkvL10KOux1aikImqvSLEuOlsFF8ojPUG 20QQ4PAU74ONN0npVPbU4Ai5E+TrnHtSmSyUftT1whf/6raC3oWdpD3IfXuA0nuGcJ5v PSlq0TFy+A3hZ92n4wFSrsZc3m0sMLtBohuhKH5bdOdXTiITDBOzLr0fqM4aTzGjWEg4 PxPSFfjUpHsCENRNoNLDD+72btmkpcydOFpq0qseOhp4mKgouq+JJkTnx0MpVC3qDCIu QrL0KTqh+N/WMwZiFcyVyonuimmMgaq0TfmCECnn3sut17RK53fdvh09lMKCo3tatrsW UogA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712433901; x=1713038701; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PQIbB/RWURgZBty8bAgjXp6aTMFU/Tm2XpLMBibqaFk=; b=AuYWhpGjTqi0eg2I6cLlP0fyIhBaE1Y+4ep+Q2vdXjcYYIB8nJkX2XrQuLg+5LxsNP Qil4ce5bC6DkpNh/RIeNFefuGfhZGYndTaZPhZO5fqGcdhEAtXLTkbplT9S7Rk6FgKOo 0MMThvnqFX+OCurK46X/bJ9GGl306ylBQI7tgyZEgSPpnMHGC+coCLLWcOHFhjIbD11E Ln9qgEv5i3BodTZmO5Xu//5t5GMrcZexW1VeG0qX41g+8vxVhyu8+zDZsYzKpsk2LoUA T2iyRTz893l21CxNaJeNWpOanshUAQ43z5tDKTEHQPJiWn0fwr9gW7rfTqemk2gSURMX mDDQ== X-Forwarded-Encrypted: i=1; AJvYcCWwipY1+fJY7ecf5EsQD+qAw5HFNjWoVumpBKjXJ4DOJ/Z1JnmOEiPVa9N8BMSfA6dYjK6ecb3o2kp31WUeVr3khWUJEZRjVe+cjw3gmw3Q8OY3 X-Gm-Message-State: AOJu0Yy8hBz1sc5fbAatZi6EQxMXvixi3a1mSq8TRCqP4YTVPEEI+1QC zwXu+h+RGsDo1A2BOeW8vd45gESYC8i0jksvUuLXJAs1q9/TdjroLWLoA68x9xVyLCG2Bo98udj Xowk0wBpeCtUBmkWFptin7/bmaxg= X-Google-Smtp-Source: AGHT+IEFHvAfTWMjCE+jurZhIuP0trHzBNileO25A1NScZAthzLGXqEOONJASIaACsz7SGMjpN1zDxp/f2g0b+hW/TU= X-Received: by 2002:a05:6214:212b:b0:699:d07:cf22 with SMTP id r11-20020a056214212b00b006990d07cf22mr8897040qvc.31.1712433901058; Sat, 06 Apr 2024 13:05:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: yudhi s Date: Sun, 7 Apr 2024 01:34:49 +0530 Message-ID: Subject: Re: Moving delta data faster To: Adrian Klaver Cc: veem v , Greg Sabino Mullane , pgsql-general Content-Type: multipart/alternative; boundary="0000000000005fc43f06157316f1" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005fc43f06157316f1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Apr 6, 2024 at 10:25=E2=80=AFPM Adrian Klaver wrote: > > Your original problem description was: > > "Then subsequently these rows will be inserted/updated based on the > delta number of rows that got inserted/updated in the source database. > In some cases these changed data can flow multiple times per day to the > downstream i.e. postgres database and in other cases once daily." > > If the above is not a hard rule, then yes up to some point just > replacing the data in mass would be the simplest/fastest method. You > could cut a step out by doing something like TRUNCATE target_tab and > then COPY target_tab FROM 'source.csv' bypassing the INSERT INTO > source_tab. > > Yes, actually i didn't realize that truncate table transactional/online here in postgres. In other databases like Oracle its downtime for the read queries on the target table, as data will be vanished from the target table post truncate(until the data load happens) and those are auto commit. Thanks Veem for sharing that option. I also think that truncate will be faster if the changes/delta is large , but if its handful of rows like <5%of the rows in the table then Upsert/Merge will be better performant. And also the down side of the truncate option is, it does ask to bring/export all the data from source to the S3 file which may take longer as compared to bringing just the delta records. Correct me if I'm wrong. However I am still not able to understand why the upsert is less performant than merge, could you throw some light on this please? --0000000000005fc43f06157316f1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Sat, Apr 6, 2024 at 10:25=E2=80=AFPM A= drian Klaver <adrian.klaver= @aklaver.com> wrote:

Your original problem description was:

"Then subsequently these rows will be inserted/updated based on the delta number of rows that got inserted/updated in the source database.
In some cases these changed data can flow multiple times per day to the downstream i.e. postgres database and in other cases once daily."

If the above is not a hard rule, then yes up to some point just
replacing the data in mass would be the simplest/fastest method. You
could cut a step out by doing something like TRUNCATE target_tab and
then COPY target_tab FROM 'source.csv' bypassing the INSERT INTO so= urce_tab.

Yes, actually i didn't realize that = truncate table transactional/online here in postgres. In other databases li= ke Oracle its downtime for the read queries on the target table, as data wi= ll be=C2=A0vanished from the target table post truncate(until the data load= happens) and those are auto commit. Thanks Veem for sharing that=C2=A0 opt= ion.=C2=A0

=C2=A0I also think that truncate will b= e faster if the changes/delta is large , but if its handful of rows like &l= t;5%of the rows in the table then Upsert/Merge will be better performant. A= nd also the down side of the truncate=C2=A0option is,=C2=A0 it does ask to = bring/export all the data from source to the S3 file which may take longer = as compared to bringing just the delta records. Correct me if I'm wrong= .

However I am still not able to understand why th= e upsert is less performant than merge, could you throw some light on this = please?

--0000000000005fc43f06157316f1--