Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s05gR-000m7L-5l for pgsql-general@arkaria.postgresql.org; Thu, 25 Apr 2024 20:27:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s05fQ-002ULu-N5 for pgsql-general@arkaria.postgresql.org; Thu, 25 Apr 2024 20:26:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s05fQ-002ULm-AX for pgsql-general@lists.postgresql.org; Thu, 25 Apr 2024 20:26:05 +0000 Received: from mail-yb1-xb34.google.com ([2607:f8b0:4864:20::b34]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1s05fO-0003vg-Tq for pgsql-general@lists.postgresql.org; Thu, 25 Apr 2024 20:26:04 +0000 Received: by mail-yb1-xb34.google.com with SMTP id 3f1490d57ef6-de4654c3423so286012276.0 for ; Thu, 25 Apr 2024 13:26:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714076761; x=1714681561; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=PgXa7iUH7/0wGnDCVQZzKlibXqqzobhXSSGjG6GLxWI=; b=fCJvnP4Ta8EFz4BT89+J+fJ4b+yu0t3HXiUBC/59GnvRSkVYhnadQtcGYvAK5guhBS A3HxUEIO5bMsTHaY4JB0WVgwiFk/MvUv39IZQDnAUqPWjCEssf9ho05gymUS/T3VFgpJ iuP8Fljr+oHkpIqQCLtjpR9kdHxw4PXL2D9nrE0u0zXv88qxkVGo/5ttMgA00ReM5sqF u9D65T1fvoakxB/ZQnCOq3M7+V6d0SwvZpNu6OgE6VJeD06iG2PCw0wq2H8FzjW2NFpP FOMQA3baOYbHSonX4gmWWvsuJgJ4TNL7BP7BY0nU+57EJpdO1q7tlNAc2ZmGgjnYLMtD R5aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714076761; x=1714681561; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PgXa7iUH7/0wGnDCVQZzKlibXqqzobhXSSGjG6GLxWI=; b=D72LaXqr14lk8gaGWtaONK3oFgeoHZGV8TgHZ1kBzugYpyW8ujSOGmEuvnzSXZWHVs aAinQF7j7sdZUcqcpi5I9grDi4iBY4rDhUtBW5YhhNjY9XFcJTcLiwjaPbJamtTMrPJt uCWObN3Q6dEhbcRE87Rg4VWQ6qYXLRWZp2ALN0hyyv8GG1V/PW7+N2tyc6vU9AOkIyjA BHlmbXyvKkaaQWLvBgic2YLZnqcaPH4yXzBlIMVYA4Y2eYacUKwEXDU5Bi2U7E0f6PHd BcvFubRBRCITvFHRDciGubNdGiK8zaw4fnoGTeg6Vpv98quzMfQh4gaHvO2TEstJbB2o 7r2g== X-Gm-Message-State: AOJu0YzJAH50mQ//FxJrzJkvNgQgYbWnzsa7Ly9uZQDmtpO41SzxJLzn qC6ulQvMHGgNFBWH2dQcCF+xKfOfAPHz2M62dvQzTqPiRM1omM1LXLN/0D+LhsrdAHoUj9EQySQ yY9tnyBITSMdUiKZKha//xIJ23TjZLqqD X-Google-Smtp-Source: AGHT+IElBRCpAZNmwlAbpUiwf28trLpPuhX0uCXkJv1er9jcFZjE9z0QFnCiBuUdD6EQ6sJsfJ+4RsRbXcSCQCpvRUY= X-Received: by 2002:a0d:e202:0:b0:61a:fac1:b045 with SMTP id l2-20020a0de202000000b0061afac1b045mr556491ywe.3.1714076760745; Thu, 25 Apr 2024 13:26:00 -0700 (PDT) MIME-Version: 1.0 From: Lok P Date: Fri, 26 Apr 2024 01:55:49 +0530 Message-ID: Subject: How you make efficient design for CDC and book marking To: pgsql-general Content-Type: multipart/alternative; boundary="0000000000007131980616f19839" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007131980616f19839 Content-Type: text/plain; charset="UTF-8" Hello, My understanding is that the replication tools normally rely on the database transaction logs to find the CDC/delta changes(Insert/Update/Delete) for tables and then move those delta changes to the target system/databases. Whatever may be the source database (It might be open source postgres or aws RDS). And never done though, but i am assuming , manually scanning the DB logs must not be easy ones and also might not be given access to DB logs because of security reasons too. Hope my understanding is correct here. Thus, in absence of such replication tool(may be because of the additional cost associated etc) if someone wants to find the delta changes (insert/update/delete) in a database as efficiently as possible and move those to the target database in a continuous data streaming setup, I can only think of below option.... i.e maintaining audit columns like create_timestamp/update_timestamp columns in every source table so that they can be utilized to get the CDC for Insert and Update statements and also for bookmarking. But to find the delta for the deletes , there is not much option but to have row level triggers created on the base table which will populate another audit table with the deleted rows, and this is going to crawl if we get a lot of deletes(in millions) on the source tables. Want to know from experts, if there exists any other way to have these manual CDC and book marking more efficient for such continuous delta data movement scenarios? Regards Lok --0000000000007131980616f19839 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,
My understanding is that the replication tools n= ormally rely on the database transaction logs to find the CDC/delta changes= (Insert/Update/Delete) for tables and then move those delta changes to the = target system/databases. Whatever may be the source database (It might be o= pen source postgres or aws RDS). And never done though, but i am assuming ,= manually scanning the DB logs must not be easy ones and also might not be = given access to DB logs because of security reasons too. Hope my understand= ing is correct here.

Thus, in absence of such replication tool(may b= e because of the additional cost associated etc) if someone wants to find t= he delta changes (insert/update/delete) in a database as efficiently as pos= sible and move those to the target database in a continuous data streaming = setup, I can only think of below option....

i.e maintaining audit co= lumns like create_timestamp/update_timestamp columns in every source table = so that they can be utilized to get the CDC for Insert and Update statement= s and also for bookmarking. But to find the delta for the deletes , there i= s not much option but to have row level triggers created on the base table = which will populate another audit table with the deleted rows, and this is = going to crawl if we get a lot of deletes(in millions) on the source tables= .

Want to know from experts, if there exists any other way to have t= hese manual CDC and book marking more efficient for such continuous delta d= ata movement scenarios?

Regards
Lok
<= /div> --0000000000007131980616f19839--