Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s0Edj-002EBP-Pz for pgsql-general@arkaria.postgresql.org; Fri, 26 Apr 2024 06:00:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s0Edg-006xad-4A for pgsql-general@arkaria.postgresql.org; Fri, 26 Apr 2024 06:00:53 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s0Edf-006xaN-P3 for pgsql-general@lists.postgresql.org; Fri, 26 Apr 2024 06:00:52 +0000 Received: from mail-qk1-x730.google.com ([2607:f8b0:4864:20::730]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1s0Edd-0007uZ-V2 for pgsql-general@lists.postgresql.org; Fri, 26 Apr 2024 06:00:52 +0000 Received: by mail-qk1-x730.google.com with SMTP id af79cd13be357-78f05341128so109109985a.0 for ; Thu, 25 Apr 2024 23:00:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714111248; x=1714716048; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=umtD+GuHyhwgAqfHNU8wSzV/pdEIN+lLPRvoU6+ztFg=; b=C5Ggx+n9QuNNo3oniV4vB63Li3do/9K6X6vrfoMdspBKbZCwiHOWX+bWGgiZpLn9aL XW3xtN/nkiZSnu/d8XddL4Wu9qLOAc/Me36jRdG8dP6X/XS9Ejtog+mHnI0Wal3Zpfy2 NeQVHBqUg3UxUzUGiUT5ixHF8mU3Y+zOmBYmfBaLS7+PYcOZvM7GIXxczo+B5Vrf2avK gLmKrXsduFOt/NtoKmvS/7h/37dn8suleY8CekwED5lDctfVTjyl0bpowFMI55jRc/C2 zcPGBjXncTnh7uCgKeuaH2qPF/r7RwjiiFSeDjC8v/gnGpbKLccPX0f2BOuyxFqtc4GG Jofw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714111248; x=1714716048; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=umtD+GuHyhwgAqfHNU8wSzV/pdEIN+lLPRvoU6+ztFg=; b=VxPDkxMGjXlHZuDWL8jHsk7cYbd+bzrWLff6JTFHfHEIvx/3fJ/j83HuwUAMixDuDz wTD7ggK9fzqp+9QFT/9CUiP48VZiFlYWpK2bkYYr3L9sR6jadnkJz5JGQJmGEEUXWYvT OlHFO3xi1i+XV3ouySWPsLkD1zDFI25fxDt9bxHy4pVjJMz3c0GPWFWqAz2kMQoZy2KT Rw6k709t+XZiO37RmvrB0RFJXKzah6RDeVyvVwtwd2PqT/NrqlNz3v5JXphnt71/GTk/ yqXrila9XsvP3KHEujLQTcCTJctnTj4gZpcCx5k0cJE0yuXXaNTVMqraMiQw/gz+euqC vG5w== X-Gm-Message-State: AOJu0YwJJwPltPsgzSVaDzlBcMDt2TWx5W6/oP0Q1mYA4hdNsuZ1RFsG y07oz5d/UqxZB+wvtCHiEaQMBC4DT4s6v8nnq5y6Mev1I100goJ0/T5SghasUeI5meCgDr9uT0A 6aSQKwrCGry1qyxDrErMqswCIWlE= X-Google-Smtp-Source: AGHT+IGMFs+su4PoSxQM7EGjQEf/Uab9ROnauLXfydDNjwyDvcTnjLfMfhrFHR10Pvcnb/Xn5mByEDAc0ATsLgiUfJg= X-Received: by 2002:a05:6214:c28:b0:6a0:49d5:c4fb with SMTP id a8-20020a0562140c2800b006a049d5c4fbmr2449717qvd.31.1714111248100; Thu, 25 Apr 2024 23:00:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: yudhi s Date: Fri, 26 Apr 2024 11:30:34 +0530 Message-ID: Subject: Re: How you make efficient design for CDC and book marking To: Lok P Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000000c7f5c0616f9a0fa" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000c7f5c0616f9a0fa Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable If you are worried about the audit trigger and also an additional audit table to hold all the rows , then you may get an additional flag added to your base table deleted_flag and when delete happens, you just need to update that flag through your code manually, but not physically delete the record from the base table (if you want to avoid the performance impact of the row trigger on the base tables). So it will make the main table bulky but will have lesser impact to your incoming data load to base or main tables. On Fri, Apr 26, 2024 at 1:56=E2=80=AFAM Lok P wrote: > Hello, > My understanding is that the replication tools normally rely on the > database transaction logs to find the CDC/delta > changes(Insert/Update/Delete) for tables and then move those delta change= s > to the target system/databases. Whatever may be the source database (It > might be open source postgres or aws RDS). And never done though, but i a= m > assuming , manually scanning the DB logs must not be easy ones and also > might not be given access to DB logs because of security reasons too. Hop= e > my understanding is correct here. > > Thus, in absence of such replication tool(may be because of the additiona= l > cost associated etc) if someone wants to find the delta changes > (insert/update/delete) in a database as efficiently as possible and move > those to the target database in a continuous data streaming setup, I can > only think of below option.... > > i.e maintaining audit columns like create_timestamp/update_timestamp > columns in every source table so that they can be utilized to get the CDC > for Insert and Update statements and also for bookmarking. But to find th= e > delta for the deletes , there is not much option but to have row level > triggers created on the base table which will populate another audit tabl= e > with the deleted rows, and this is going to crawl if we get a lot of > deletes(in millions) on the source tables. > > Want to know from experts, if there exists any other way to have these > manual CDC and book marking more efficient for such continuous delta data > movement scenarios? > > Regards > Lok > --0000000000000c7f5c0616f9a0fa Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
If you are worried about the audit trigger and also an add= itional audit table to hold all the rows , then you may get an additional f= lag added to your base table deleted_flag and when delete happens, you just= need to update that flag through your=C2=A0code manually, but not physical= ly delete the record from the base table (if you want to avoid the performa= nce impact of the row trigger on the base tables). So it will make the main= table bulky but will have lesser impact to your incoming=C2=A0data load to= base or main tables.

On Fri, Apr 26, 2024 at 1:56=E2=80=AFAM Lok P <loknath.73@gmail.com> wrote:
=
He= llo,
My understanding is that the replication tools normally rely on the= database transaction logs to find the CDC/delta changes(Insert/Update/Dele= te) for tables and then move those delta changes to the target system/datab= ases. Whatever may be the source database (It might be open source postgres= or aws RDS). And never done though, but i am assuming , manually scanning = the DB logs must not be easy ones and also might not be given access to DB = logs because of security reasons too. Hope my understanding is correct here= .

Thus, in absence of such replication tool(may be because of the ad= ditional cost associated etc) if someone wants to find the delta changes (i= nsert/update/delete) in a database as efficiently as possible and move thos= e to the target database in a continuous data streaming setup, I can only t= hink of below option....

i.e maintaining audit columns like create_t= imestamp/update_timestamp columns in every source table so that they can be= utilized to get the CDC for Insert and Update statements and also for book= marking. But to find the delta for the deletes , there is not much option b= ut to have row level triggers created on the base table which will populate= another audit table with the deleted rows, and this is going to crawl if w= e get a lot of deletes(in millions) on the source tables.

Want to kn= ow from experts, if there exists any other way to have these manual CDC and= book marking more efficient for such continuous delta data movement scenar= ios?

Regards
Lok
--0000000000000c7f5c0616f9a0fa--