Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t99iM-0085jd-6P for pgsql-general@arkaria.postgresql.org; Thu, 07 Nov 2024 21:06:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t99iJ-002VhM-3k for pgsql-general@arkaria.postgresql.org; Thu, 07 Nov 2024 21:06:47 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t99iI-002VhE-Hg for pgsql-general@lists.postgresql.org; Thu, 07 Nov 2024 21:06:47 +0000 Received: from mail-wr1-x42d.google.com ([2a00:1450:4864:20::42d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t99iF-000gwe-9y for pgsql-general@lists.postgresql.org; Thu, 07 Nov 2024 21:06:45 +0000 Received: by mail-wr1-x42d.google.com with SMTP id ffacd0b85a97d-37ed3bd6114so823267f8f.2 for ; Thu, 07 Nov 2024 13:06:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leisi.net; s=google; t=1731013601; x=1731618401; darn=lists.postgresql.org; h=to:date:message-id:subject:mime-version:from:from:to:cc:subject :date:message-id:reply-to; bh=6fWrf8l3XrhKY0bkg1uo7RXVqhJISfURihhfFqLbxss=; b=D0btfKqjK+LHZw5P1HbdEDEEBb9XgStUpj4IY9bp4ztBewHkpgRg28OAtXxMPaNlAp /Kxkjy0X0duGGxy+z7LydZ00cUGpf9XrV4mxz/MP7bPYQGcgVJO917aRzRZhLuBAlY3O WMrULSYD5glwrETvYfGcy1d+SNwhLilb0NlrI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731013601; x=1731618401; h=to:date:message-id:subject:mime-version:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=6fWrf8l3XrhKY0bkg1uo7RXVqhJISfURihhfFqLbxss=; b=AXcc3UXEIaTgJivBXYn31OAd92S4kMr8gTA4nwk63uOYDi6a7laE201xd6hjBQ5HBU leCy4KGsQIw73VgItLzagk19giiMs8MGHguizohRFFMt3OBwuk/XmF8awNXXITg82EPn 0EBToBjlydoZoZx8Dq906fDCMOc7pcUbhrONdE0ew781+gNz0N+cCoXDEh0UKziNs4m4 1xzH/M+Hm+FZrgy8idw6NJ2moJjwTHo/xHpB1QtNVjHUG19DeUO7zj5eGcxLGWULsyKY 4ajBaY8WJ2Fji8FP8hgs5zH1YoxDGhc29NNN/A01sd+l/aV4PvG/zZdU3ei1Wnd0lFFc 52nw== X-Gm-Message-State: AOJu0Yw6A/hN1nfSLgXrgOQVQyTIHoyqNIlDdrue3ZYJGT7YghbqsxwS peeChjV0PprBT03QWdRMPYkMDVozAe8KjTx2qgT/W4Babw6SdKTzqT9fUwMNVVoFvwnHOt314zA = X-Google-Smtp-Source: AGHT+IG9V3YxJ/wqp8JiCQhoeJFmxYLAFGwPrqyVx67hkcjpcQnmhbWC5yxx96+qnEsnTbMtmvq7xA== X-Received: by 2002:a05:6000:18a8:b0:37d:3985:8871 with SMTP id ffacd0b85a97d-381f188130bmr364900f8f.39.1731013600478; Thu, 07 Nov 2024 13:06:40 -0800 (PST) Received: from smtpclient.apple ([2a02:169:ce0e:0:6475:9219:63b5:475b]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-381eda0604bsm2635965f8f.105.2024.11.07.13.06.39 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Nov 2024 13:06:40 -0800 (PST) From: Matthias Leisi Content-Type: multipart/alternative; boundary="Apple-Mail=_EEDDBEDE-6C7E-457F-9C33-3ACF9BA50715" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Advice on cluster architecture for two related, but distinct, use cases Message-Id: <18DEA0D4-1D33-48CA-A037-FF01AFE37471@leisi.net> Date: Thu, 7 Nov 2024 22:06:29 +0100 To: pgsql-general@lists.postgresql.org X-Mailer: Apple Mail (2.3774.600.62) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_EEDDBEDE-6C7E-457F-9C33-3ACF9BA50715 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Dear all, (This is a follow-up to a question I asked almost exactly a year ago, = https://postgrespro.com/list/thread-id/2670756#726F3765-858C-4AC0-A7B0-5CB= 6720E4B37@leisi.net - the requirements have changed since then, and the = platform has changed from OpenBSD to Linux, which may make some things = easier.) I=E2=80=99m looking for advice on Postgres cluster architecture(s) for = two related but distinct use cases. Ideally, the approaches for the two = use cases would not differ too widely. The goal of clustering is low RPO (I guess we need sync clustering) and = RTO (ideally almost-instant failover, but a failover process of up to a = minute in the worst case could be acceptable); throughput is not a = concern (it=E2=80=99s relatively low transaction volume except for some = often-written statistics data, which is still moderate). Latency (due to = the distance between datacenters for georedundancy) is a fact we are = willing to accept. The first use case is in an environment under our own control (and where = eg a DBA could intervene). We can theoretically run any number of = cluster instances, but assume we would use an even number (split over = the two datacenters), or potentially an odd number of nodes (eg with an = arbiter). We could use a load balancer, but I guess this would strongly = deviate from the second use case: In the second use case, the environment is not under our control, so we = can only assume basic network connectivity from the application to the = DB, and between the DBs (the latter potentially through an SSH tunnel if = needed). In this use case, we can not assume a person to intervene if a = node goes down, and would prefer some automated failover to the other = node (this automation would also be welcome for the first use case, eg = if something happens while nobody is watching). We can not assume eg a = load balancer. There could be various ways how the environment in the second use case = is set up, ranging from =E2=80=9Eapplication and database running on the = same box=E2=80=9C (well, no clustering for you then=E2=80=A6), to = dedicated two- or three node database cluster serving a number of = application machines. In both use cases, we have full control over the application and the = database code and environment. =46rom reading various docs, it seems we would need something like = Patroni (/Percona), at least for the first use case. However it seems = relatively complex to set up and operate. I would appreciate your experience and input into which approach would = best fit the two use cases. We are also willing to engage in paid = consulting. Thanks, =E2=80=94 Matthias --=20 Matthias Leisi Katzenr=C3=BCtistrasse 68, 8153 R=C3=BCmlang Mobile +41 79 377 04 43 matthias@leisi.net = --Apple-Mail=_EEDDBEDE-6C7E-457F-9C33-3ACF9BA50715 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Dear = all,

(This is a follow-up to a question I asked = almost exactly a year ago, https://postgrespro.com/list/thread-id/267075= 6#726F3765-858C-4AC0-A7B0-5CB6720E4B37@leisi.net - the = requirements have changed since then, and the platform has changed from = OpenBSD to Linux, which may make some things = easier.)


I=E2=80=99m looking for = advice on Postgres cluster architecture(s) for two related but distinct = use cases. Ideally, the approaches for the two use cases would not = differ too widely.

The goal of clustering is = low RPO (I guess we need sync clustering) and RTO (ideally = almost-instant failover, but a failover process of up to a minute in the = worst case could be acceptable); throughput is not a concern (it=E2=80=99s= relatively low transaction volume except for some often-written = statistics data, which is still moderate). Latency (due to the distance = between datacenters for georedundancy) is a fact we are willing to = accept.


The first use case is in = an environment under our own control (and where eg a DBA could = intervene). We can theoretically run any number of cluster instances, = but assume we would use an even number (split over the two datacenters), = or potentially an odd number of nodes (eg with an arbiter). We could use = a load balancer, but I guess this would strongly deviate from the second = use case:


In the second use = case, the environment is not under our control, so we can only assume = basic network connectivity from the application to the DB, and between = the DBs (the latter potentially through an SSH tunnel if needed). In = this use case, we can not assume a person to intervene if a node goes = down, and would prefer some automated failover to the other node (this = automation would also be welcome for the first use case, eg if something = happens while nobody is watching). We can not assume eg a load = balancer.

There could be various ways how the = environment in the second use case is set up, ranging from = =E2=80=9Eapplication and database running on the same box=E2=80=9C = (well, no clustering for you then=E2=80=A6), to dedicated two- or three = node database cluster serving a number of application = machines.


In both use cases, we = have full control over the application and the database code and = environment.

=46rom reading various docs, it = seems we would need something like Patroni (/Percona), at least for the = first use case. However it seems relatively complex to set up and = operate.

I would appreciate your experience and = input into which approach would best fit the two use cases. We are also = willing to engage in paid = consulting.

Thanks,
=E2=80=94 = Matthias

-- 
Matthias Leisi
Katzenr=C3=BCtistrasse 68, 8153 = R=C3=BCmlang
Mobile +41 79 377 04 43
matthias@leisi.net
= --Apple-Mail=_EEDDBEDE-6C7E-457F-9C33-3ACF9BA50715--