Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sHmW9-00AlRY-Ff for pgsql-general@arkaria.postgresql.org; Thu, 13 Jun 2024 15:37:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sHmW7-0037TS-9J for pgsql-general@arkaria.postgresql.org; Thu, 13 Jun 2024 15:37:36 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sHmW6-0037TI-QT for pgsql-general@lists.postgresql.org; Thu, 13 Jun 2024 15:37:35 +0000 Received: from mail-io1-xd33.google.com ([2607:f8b0:4864:20::d33]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sHmW4-0018hW-EH for pgsql-general@lists.postgresql.org; Thu, 13 Jun 2024 15:37:34 +0000 Received: by mail-io1-xd33.google.com with SMTP id ca18e2360f4ac-7eb835e844cso45652039f.2 for ; Thu, 13 Jun 2024 08:37:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718293052; x=1718897852; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=N3CKtml/iRWEFnf0ETXkkkQ1wfuQRoYkMa3TOBrLmBI=; b=QED8hpA6k2rBAlhck5REqhhjSLRvUSbpPY6/5KEZm70TWM+Y46ZgH1/vlJahkrcfgG 5LMH227hMJUgsEtl/OlVvb1CQB8l1KOf1I7tWVU+66jqUNcffEKGaZ6K32HJp//ESVir 53+NakkFzyCaFZeyvg5++R5MD8emEvoIBA9zyR7yueTXN4p0d0yPhD5qRksfv/6CljBD +wJl0Lw3MqPdgvYBaSyvsfjyXHJGPOkzKE9p70SSJ+aCnaNFPu71XfpSyaEjZtbZgprR itkHLYg5V9tknCs66ryTOd5U/jge23tfqIfwaEZnJHLO25buZCkdBDYHxTtOpu7rDVWH wjIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718293052; x=1718897852; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N3CKtml/iRWEFnf0ETXkkkQ1wfuQRoYkMa3TOBrLmBI=; b=F/LqInK5HSo58RLeM5sfKhRG+GFJC60k0wllIDc7xx38C+dl/rjV9WqCOzrEFU14Ah sQoWco1pqzVstJO44SPPg4kXwXouS6RMJHJlnMXCz+8zDNfqhQsz9KSX2uX9cPs/9/wE wHD3KkmLzqb4kRste3eKNz58zG5/AgVm+AwL3mRpyXppQRY9klcVM7Kf8cV7DE+n5voC DPGo6b1VS8geXepd8DsXO/kY/c/jrSHg2mLimuLBFtNldStDnjg0hFS1jIxNN3ePSuO2 5D71gHgBU/MP68FGXG7aZM8ewF85DeDnGGCVfcMLo8lPm92E+OaVYVLUDrU1c5z12bcE 3yxg== X-Gm-Message-State: AOJu0YxxJ7Yv1xSZJdL0DH4yCucw05gfVvu0cCP64rUFm4ytfa5sJ4Kl NclPiTt7DdDe87+3Zziqcp6+XcgWgrU83NXcIvE4FpA4W33hd8y0Q7PDpydTK/aqzWfsialc5OM ygsaDueaUhFAKT+CxRgfmbdRLZUU= X-Google-Smtp-Source: AGHT+IE96pfC1ZMxMOiGKCEzdGfees99+L6a3wZy25uGySV3GBb6fgaMwIkVMUiu5WASYLmwwnMR3+loRwLDYBbu5Sg= X-Received: by 2002:a05:6602:1595:b0:7e2:1180:5f4a with SMTP id ca18e2360f4ac-7ebcd18d09amr603140839f.18.1718293051346; Thu, 13 Jun 2024 08:37:31 -0700 (PDT) MIME-Version: 1.0 References: <1628637f-419f-4f6a-9cb6-07af90cd0bc4@aklaver.com> <1c0273f5-a90a-48f6-b51f-fe15c16fa1c6@aklaver.com> <490e8a5c-f1f5-40fe-8243-f1c8c18d03f2@aklaver.com> In-Reply-To: From: Justin Date: Thu, 13 Jun 2024 11:37:20 -0400 Message-ID: Subject: Re: Questions on logical replication To: Koen De Groote Cc: "pgsql-generallists.postgresql.org" , Adrian Klaver Content-Type: multipart/alternative; boundary="000000000000f2178f061ac74602" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f2178f061ac74602 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jun 13, 2024 at 6:01=E2=80=AFAM Koen De Groote = wrote: > > Why? what benefit does this provide you?? Add all the tables when > creating the publication and be done with it... I get this when trying t= o > understand how this all works on test boxes, but for production NO idea > what you're trying to accomplish > > Adding all tables at once means adding the gigantic tables as well. Disk > IO and Network traffic are a serious concern, increased CPU usage affecti= ng > queries of the live system, as well as transaction wraparound. > > Initial sync can be a serious concern, depending on the size of the table= . > The number of initial sync workers can be controlled via max_sync_workers_per_subscription see https://www.postgresql.org/docs/current/logical-replication-config.html if you want to do one table at a time just set sync workers to 1. If bandwidth is a problem either from the disk or network, direct the network traffic from the subscriber through a proxy or firewall to throttle the network speed. Slowing the copy will cause the WAL to build up on the publisher CPU load on the publisher is very low its actually hard to see it doing anything as its just reading the disk, streaming it to the subscriber.. For large tables with lots of indexes for the copy to complete as fast as possible to prevent WAL build up, drop indexes. For me the WAL build up has only been an issue when dealing with multi-TB sized tables when it takes several days to copy the data for one table. One trick is to remove all the indexes during the initial sync except for the primary key so the subscriber has less work to do. > Here's a nice guide where people did a logical replication upgrade, > explaining why they did it this way: > https://knock.app/blog/zero-downtime-postgres-upgrades > The blog suggests overly complicated things. only doing 100GB chunks of data at one time. Maybe if the publisher was scarce on resources or the table is multi-TB in size it requires days to weeks to copy... If the publisher is so low on resources that Logical Replication is problematic one can create a binary replica, promote it and convert it to logical replication skipping the initial sync. Then upgrade that server. There is a minor outage required to convert a binary replica to a logical replica. I've done it in under 30 seconds. > > On Wed, Jun 12, 2024 at 7:01=E2=80=AFPM Justin wro= te: > >> >> >> On Tue, Jun 11, 2024 at 5:43=E2=80=AFPM Koen De Groote wrote: >> >>> > If there are any errors during the replay of WAL such as missing >>> indexes for Replica Identities during an Update or Delete this will ca= use >>> the main subscriber worker slot on the publisher to start backing up WA= L >>> files >>> >>> And also if the connection breaks, from what I understand, is that >>> correct? Anything that stops the subscription, including disabling the >>> subscription, is that right? >>> >> >> Yes to all.... >> >> >>> > I suggest confirming all tables have replica identities or primary >>> keys before going any further. >>> >>> Yes, I am aware of this. I made me a small script that prints which >>> tables I have added to the publication and are done syncing, and which = are >>> currently not being replicated. >>> >> >> >>> >>> > With PG 11 avoid REPLICA IDENTITY FULL as this causes full table scan >>> on the subscriber for PG 15 and earlier. >>> >>> I'm also aware of this. My plan is to create a publication with no >>> tables, and add them 1 by 1, refreshing the subscriber each time. >>> >> >> Why? what benefit does this provide you?? Add all the tables when >> creating the publication and be done with it... I get this when trying = to >> understand how this all works on test boxes, but for production NO idea >> what you're trying to accomplish >> >> >>> I'm not planning on using "REPLICA IDENTITY FULL" anywhere. >>> >> Good >> > --000000000000f2178f061ac74602 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Jun 13, 2024 at 6:01=E2=80=AF= AM Koen De Groote <kdg.dev@gmail.co= m> wrote:
> Why?=C2=A0 what benefit=C2=A0does this provide you??= =C2=A0 =C2=A0Add all the tables=20 when creating the publication and be done with it...=C2=A0 I get this when= =20 trying to understand how this all works on test boxes, but for=20 production NO idea what you're trying to accomplish

=
Adding all tables at once means adding the gigantic tables as well. Di= sk IO and Network traffic are a serious concern, increased CPU usage affect= ing queries of the live system, as well as transaction wraparound.

Initial sync can be a serious concern, depending on the si= ze of the table.

The number= of initial=C2=A0 sync workers can be controlled via=C2=A0max_sync_workers_= per_subscription=C2=A0=C2=A0
see https://www.postgresql.org/doc= s/current/logical-replication-config.html=C2=A0

if you want to = do one table at a time just set sync workers to 1.

If bandwidth is a= problem=C2=A0 either from the disk or network, direct the network traffic= =C2=A0from the subscriber through a proxy or firewall to throttle the netwo= rk speed.=C2=A0 Slowing the copy will cause the WAL to build up=C2=A0 on th= e publisher

CPU load on the publisher is very low its actually hard = to see it doing anything as its just reading the disk, streaming it to the = subscriber..=C2=A0 =C2=A0

For large tables with lots of indexes for = the copy to complete as fast as possible to prevent WAL build up, drop inde= xes.=C2=A0 For me the WAL build up has only been an issue when dealing with= multi-TB sized tables when it takes several days to copy the data=C2=A0 fo= r one table.

One trick is to remove all the indexes during th= e initial sync except for the primary key so the subscriber has less work t= o do.


Here's a nice guide where peo= ple did a logical replication upgrade, explaining why they did it this way:= https://knock.app/blog/zero-downtime-postgres-upgrades

The blog suggests overly complicate= d things.=C2=A0 only doing 100GB chunks of data at one time.=C2=A0 Maybe=C2= =A0 if the publisher was scarce on resources or the table is multi-TB in si= ze it requires days to weeks to copy...

If the publisher is so low o= n resources that Logical Replication is problematic=C2=A0 one can create a = binary replica, promote=C2=A0it and convert it to logical replication skipp= ing the initial sync.=C2=A0 Then upgrade that server. There is a minor outa= ge required to convert a binary replica to a logical replica.=C2=A0 I'v= e done it in under 30 seconds.=C2=A0


=C2=A0

On Wed, Jun 12, 2= 024 at 7:01=E2=80=AFPM Justin <zzzzz.graf@gmail.com> wrote:

=

= On Tue, Jun 11, 2024 at 5:43=E2=80=AFPM Koen De Groote <kdg.dev@gmail.com> wrote:
=
>=20 If there are any errors during the replay of WAL such as missing=20 indexes for Replica Identities during an Update or Delete=C2=A0 this will= =20 cause the main subscriber worker slot on the publisher to start backing=20 up WAL files

And also if the connection break= s, from what I understand, is that correct? Anything that stops the subscri= ption, including disabling the subscription, is that right?
=C2=A0
Yes to all....=C2=A0


>=20 I suggest confirming all tables have replica identities or primary keys bef= ore going any further.

Yes, I am aware of thi= s. I made me a small script that prints which tables I have added to the pu= blication and are done syncing, and which are currently not being replicate= d.
=C2=A0

> With PG 11 avoid REPLICA IDENTITY FULL as this causes full table scan on th= e subscriber for PG 15 and earlier.

I'm a= lso aware of this. My plan is to create a publication with no tables, and a= dd them 1 by 1, refreshing the subscriber each time.
=C2=A0
Why?=C2=A0 what benefit=C2=A0does this provide you?= ?=C2=A0 =C2=A0Add all the tables when creating the publication and be done = with it...=C2=A0 I get this when trying to understand how this all works on= test boxes, but for production NO idea what you're trying to accomplis= h=C2=A0


I'm not planning on using "= ;REPLICA IDENTITY FULL" anywhere.
Goo= d=C2=A0
--000000000000f2178f061ac74602--