Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tq1WC-00FmOL-A6 for pgsql-general@arkaria.postgresql.org; Thu, 06 Mar 2025 03:03:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tq1WB-00DFvR-1F for pgsql-general@arkaria.postgresql.org; Thu, 06 Mar 2025 03:03:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tq1WA-00DFvI-Hz for pgsql-general@lists.postgresql.org; Thu, 06 Mar 2025 03:03:26 +0000 Received: from mail-yw1-x1132.google.com ([2607:f8b0:4864:20::1132]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tq1W7-001Dfo-0A for pgsql-general@lists.postgresql.org; Thu, 06 Mar 2025 03:03:26 +0000 Received: by mail-yw1-x1132.google.com with SMTP id 00721157ae682-6f679788fd1so1454857b3.2 for ; Wed, 05 Mar 2025 19:03:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741230202; x=1741835002; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=gBvAaMDoD8rnRSV8Z601WNapfSayzXLqTY2Io8HzoaU=; b=CIKtzB9+jKSDWTg+ZxvCpF5wySgWXUXeRuAFrGTmSeETmF5L6DHg+/HjO99v6DGj07 aTvQ4ZtTTncRRn86hWUJ0zHyKOKvSgsCiZyUTiLYNm/0/s76Bj89OZd9dFdETwdgmYLz dHpKHwbhRZ49z1ZhxN7zjiYMqt0IF3j55qhD+6AUisPiQhb3z/iEzB+yFQSe03eu9HeT iU3Tqu8wA93zzUpBCM0u2765yIUCa0IteuN8urupGvcPgvPbzbVk3gpjeAGggoAzeV9D ZzWAChbGXBG3NKjSlW5U5uwHjFFov6GdGt7/DHLeObM1yb6pNew6s0JZeNFqU5M48JXo VpmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741230202; x=1741835002; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gBvAaMDoD8rnRSV8Z601WNapfSayzXLqTY2Io8HzoaU=; b=IlJ0yUuW7iELFCJjm5vUDkk684iLLcT2yYVoWS3Bitu0BZqWvfZwczD7IrdSU9JXAG U2xVs7fZT24dplk1idW9T3WL9lxypYomniIrwJfJUuOaitm6NpwIal14oh5HdR6FKnOe oOVR/VgCoi4FApifDxxQQyaR8vqWBy+GOvzNtsdKl5lv/e2buGJPY61hYpxN8Y8Iso2x h71cgZuqNgY9XDqxRI0JM1V1l5M6orbINXQ3bG+rITID05zSfZeRs/auZ0rp0XCPzWkW ryH7OvLXDW4xaUrDYlAgUqjR4bCftB24aDFtMfWSio3jFkj9ejycQ/fMecexCx4XE+0n i6Xw== X-Forwarded-Encrypted: i=1; AJvYcCX8Bl5sN4oIU6zpJO3kA9pwtDunYQarE6bUihUI7zsHVrTR2tH7XGRjRCeXroSLNL1jksHN+fmmiavpdHJO@lists.postgresql.org X-Gm-Message-State: AOJu0YwHCJ0p1MesVOu/R8AaQFcxxIhNVJDmWx/5UJs0qGNOLkZwDNx5 5XrZ/OSNeukvqzswU+uo8kbWisyhqLws8DhZbC2s4oljl3s2+qD1uQ4M1H9fDfqWfXiqfIWjuz7 tmZbfGERhI4WDpqFNm5gh9Uohsoc= X-Gm-Gg: ASbGncvhSG6f77n0t/lFB5KkUn8dKSFba7Rmtux15gdttPyFTV5vX12RO0tHPSF57IC cZHcq55SuM/xmCYK3SAl2Sowi3GxgyqrmiA29mygHtSi8Qirr+Nn4Yc47yL9yWlxhLqeWJrhl1I 37K0S8PGXpS9YmHv+ZE/flCstIPRSO9YVqtns0aZdjkPIaB+ygUGjEMwWzprWd X-Google-Smtp-Source: AGHT+IFR/fz++TfKJUjSwyoCjpwpCBjqDJAhKRVBIH/WzcNwbKESPtOm/zpWuige38FfLJvOM8yaq9eHpuLepTwkFu4= X-Received: by 2002:a05:690c:7082:b0:6fb:b907:d965 with SMTP id 00721157ae682-6fda2fe34edmr73804617b3.3.1741230202046; Wed, 05 Mar 2025 19:03:22 -0800 (PST) MIME-Version: 1.0 References: <099b49ebae94e23f19afdad3f8c9c6e702a3a2d5.camel@cybertec.at> <6d7e1022-6404-4dab-8467-8d1f6e8b63cb@aklaver.com> In-Reply-To: From: Igor Korot Date: Wed, 5 Mar 2025 21:03:10 -0600 X-Gm-Features: AQ5f1JqPsmmANosu7RMoRttlUKIrJlWui8wVXKdnCeImsvH6wTf8C6MOPCM-GPo Message-ID: Subject: Re: Quesion about querying distributed databases To: me nefcanto Cc: Adrian Klaver , Laurenz Albe , "pgsql-generallists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000aa50ba062fa3bfda" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000aa50ba062fa3bfda Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, On Wed, Mar 5, 2025, 8:44=E2=80=AFPM me nefcanto wrote: > I once worked with a monolithic SQL Server database with more than 10 > billion records and about 8 Terabytes of data. A single backup took us mo= re > than 21 days. It was a nightmare. Almost everybody knows that scaling up > has a ceiling, but scaling out has no boundaries. > But then you did the backup incrementally correct? That should not take the same amount of time... > Therefore I will never choose a monolithic database design unless it's a > small project. But my examples are just examples. We predict 100 million > records per year. So we have to design accordingly. And it's not just sal= es > records. Many applications have requirements that are cheap data but vast > in multitude. Consider a language-learning app that wants to store the > known words of any learner. 10 thousand learners each knowing 2 thousand > words means 20 million records. Convert that to 100 thousand learners eac= h > knowing 7 thousand words and now you almost have a billion records. Cheap= , > but necessary. Let's not dive into telemetry or time-series data. > Can you try and see if 1 server with 3 different databases will do? Having 1 table per database per server is too ugly. Also please understand - every databae is different. And so it works and operates differently. What work good in one may not work good in another... Thank you. > We initially chose to break the database into smaller databases, because > it seemed natural for our modularized monolith architecture. And it worke= d > great for SQL Server. If you're small, we host them all on one server. If > you get bigger, we can put heavy databases on separate machines. > > However, I don't have experience working with other types of database > scaling. I have used table partitioning, but I have never used sharding. > > Anyway, that's why I asked you guys. However, encouraging me to go back t= o > monolith without giving solutions on how to scale, is not helping. To be > honest, I'm somehow disappointed by how the most advanced open source > database does not support cross-database querying just like how SQL Serve= r > does. But if it doesn't, it doesn't. Our team should either drop it as a > choice or find a way (by asking the experts who built it or use it) how t= o > design based on its features. That's why I'm asking. > > One thing that comes to my mind, is to use custom types. Instead of > storing data in ItemCategories and ItemAttributes, store them as arrays i= n > the relevant tables in the same database. But then it seems to me that in > this case, Mongo would become a better choice because I lose the relation= al > nature and normalization somehow. What drawbacks have you experienced in > that sense? > > Regards > Saeed > > On Wed, Mar 5, 2025 at 7:38=E2=80=AFPM Adrian Klaver > wrote: > >> On 3/5/25 04:15, me nefcanto wrote: >> > Dear Laurenz, the point is that I think if we put all databases into >> one >> > database, then we have blocked our growth in the future. >> >> How? >> >> > A monolith database can be scaled only vertically. We have had huge >> > headaches in the past with SQL Server on Windows and a single database= . >> > But when you divide bounded contexts into different databases, then yo= u >> > have the chance to deploy each database on a separate physical machine= . >> > That means a lot in terms of performance. Please correct me if I am >> wrong. >> >> And you add the complexity of talking across machines, as well as >> maintaining separate machines. >> >> > >> > Let's put this physical restriction on ourselves that we have differen= t >> > databases. What options do we have? One option that comes to my mind, >> is >> > to store the ID of the categories in the Products table. This means >> that >> > I don't need FDW anymore. And databases can be on separate machines. I >> > first query the categories database first, get the category IDs, and >> > then add a where clause to limit the product search. That could be an >> > option. Array data type in Postgres is something that I think other >> > RDBMSs do not have. Will that work? And how about attributes? Because >> > attributes are more than a single ID. I should store the attribute key= , >> > alongside its value. It's a key-value pair. What can I do for that? >> >> You seem to be going out of the way to make your life more complicated. >> >> The only way you are going to find an answer is set up test cases and >> experiment. My bet is a single server with a single database and >> multiple schemas is where you end up, after all that is where you are >> starting from. >> >> >> > >> > Thank you for sharing your time. I really appreciate it. >> > Saeed >> > >> > >> > >> > >> > >> > On Wed, Mar 5, 2025 at 3:18=E2=80=AFPM Laurenz Albe > > > wrote: >> > >> > On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrote: >> > > That means a solid monolith database. We lose many goodies with >> that. >> > > As a real-world example, right now we can import a single >> database >> > > from the production to the development to test and troubleshoot >> data. >> > >> > Well, can't you import a single schema then? >> > >> > > What if we host all databases on the same server and use FDW. >> What >> > > happens in that case? Does it return 100 thousand records and >> join >> > > in the memory? >> > >> > It will do just the same thing. The performance could be better >> > because of the reduced latency. >> > >> > > Because in SQL Server, when you perform a cross-database query >> > > (not cross-server) the performance is extremely good, proving >> that >> > > it does not return 100 thousand ItemId from >> Taxonomy.ItemCategories >> > > to join with ProductId. >> > > >> > > Is that the same case with Postgres too, If databases are locat= ed >> > > on one server? >> > >> > No, you cannot perform cross-database queries without a foreign >> > data wrapper. I don't see a reason why the statement shouldn't >> > perform as well as in SQL Server if you use schemas instead of >> > databases. >> > >> > Yours, >> > Laurenz Albe >> > >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com >> >> --000000000000aa50ba062fa3bfda Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

On Wed, Mar 5, 2025, 8:44=E2= =80=AFPM me nefcanto <sn.1361@gmail= .com> wrote:
I on= ce worked with a monolithic SQL Server database with more than 10 billion r= ecords and about 8 Terabytes=C2=A0of data. A single backup took us more tha= n 21 days. It was a nightmare. Almost everybody knows that scaling up has a= ceiling, but scaling out has no boundaries.
=

But then you did the ba= ckup incrementally correct?

That should not take the same amount of time...
=

Therefore I will never choose a monolithic database design unless it's= a small project. But my examples are just examples. We predict 100 million= records per year. So we have to design accordingly. And it's not just = sales records. Many applications have requirements that are cheap data but = vast in multitude. Consider a language-learning app that wants to store the= known words of any learner. 10 thousand learners each knowing 2 thousand w= ords means 20 million records. Convert that to 100 thousand learners each k= nowing 7 thousand words and now you almost have a billion records. Cheap, b= ut necessary. Let's not dive into telemetry or time-series data.
<= /div>

Can you try and see if 1 server with 3 different databases will do?
<= div dir=3D"auto">
Having 1 table per database pe= r server is too ugly.

Al= so please understand - every databae is different. And so it works and oper= ates differently. What work good in one may not work good in another...

Thank you.


We initially chose t= o break the database into smaller databases, because it seemed natural for = our modularized monolith architecture. And it worked great for SQL Server. = If you're small, we host them all on one server. If you get bigger, we = can put heavy databases on separate machines.

However, I don't have = experience working with other types of database scaling. I have used table = partitioning, but I have never used sharding.

Anyway, that's why I a= sked you guys. However, encouraging me to go back to monolith without givin= g solutions on how to scale, is not helping. To be honest, I'm somehow = disappointed by how the most advanced open source database does not support= cross-database querying just like how SQL Server does. But if it doesn'= ;t, it doesn't. Our team should either drop it as a choice or find a wa= y (by asking the experts who built it or use it) how to design based on its= features. That's why I'm asking.

One thing that comes to my min= d, is to use custom types. Instead of storing data in ItemCategories and It= emAttributes, store them as arrays in the relevant tables in the same datab= ase. But then it seems to me that in this case, Mongo would become a better= choice because I lose the relational nature and normalization somehow. Wha= t drawbacks have you experienced in that sense?

Regards
Saeed

On We= d, Mar 5, 2025 at 7:38=E2=80=AFPM Adrian Klaver <adrian.klaver@ak= laver.com> wrote:
On 3/5/25 04:15, me nefcanto wrote:
> Dear Laurenz, the point is that I think if we put all databases into o= ne
> database, then we have blocked our growth in the future.

How?

> A monolith database can be scaled only vertically. We have had huge > headaches in the past with SQL Server on Windows and a single database= .
> But when you divide bounded contexts into different databases, then yo= u
> have the chance to deploy each database on a separate physical machine= .
> That means a lot in terms of performance. Please correct me if I am wr= ong.

And you add the complexity of talking across machines, as well as
maintaining separate machines.

>
> Let's put this physical restriction on ourselves that we have diff= erent
> databases. What options do we have? One option that comes to my mind, = is
> to store the ID of the categories in the Products table. This means th= at
> I don't need FDW anymore. And databases can be on separate machine= s. I
> first query the categories database first, get the category IDs, and <= br> > then add a where clause to limit the product search. That could be an =
> option. Array data type in Postgres is something that I think other > RDBMSs do not have. Will that work? And how about attributes? Because =
> attributes are more than a single ID. I should store the attribute key= ,
> alongside its value. It's a key-value pair. What can I do for that= ?

You seem to be going out of the way to make your life more complicated.

The only way you are going to find an answer is set up test cases and
experiment. My bet is a single server with a single database and
multiple schemas is where you end up, after all that is where you are
starting from.


>
> Thank you for sharing your time. I really appreciate it.
> Saeed
>
>
>
>
>
> On Wed, Mar 5, 2025 at 3:18=E2=80=AFPM Laurenz Albe <laurenz.= albe@cybertec.at
> <mailto:laurenz.albe@cybertec.at>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0On Wed, 2025-03-05 at 14:18 +0330, me nefcanto wrot= e:
>=C2=A0 =C2=A0 =C2=A0 > That means a solid monolith database. We lose= many goodies with that.
>=C2=A0 =C2=A0 =C2=A0 > As a real-world example, right now we can imp= ort a single database
>=C2=A0 =C2=A0 =C2=A0 > from the production to the development to tes= t and troubleshoot data.
>
>=C2=A0 =C2=A0 =C2=A0Well, can't you import a single schema then? >
>=C2=A0 =C2=A0 =C2=A0 > What if we host all databases on the same ser= ver and use FDW. What
>=C2=A0 =C2=A0 =C2=A0 > happens in that case? Does it return 100 thou= sand records and join
>=C2=A0 =C2=A0 =C2=A0 > in the memory?
>
>=C2=A0 =C2=A0 =C2=A0It will do just the same thing.=C2=A0 The performan= ce could be better
>=C2=A0 =C2=A0 =C2=A0because of the reduced latency.
>
>=C2=A0 =C2=A0 =C2=A0 > Because in SQL Server, when you perform a cro= ss-database query
>=C2=A0 =C2=A0 =C2=A0 > (not cross-server) the performance is extreme= ly good, proving that
>=C2=A0 =C2=A0 =C2=A0 > it does not return 100 thousand ItemId from T= axonomy.ItemCategories
>=C2=A0 =C2=A0 =C2=A0 > to join with ProductId.
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 > Is that the same case with Postgres too, If d= atabases are located
>=C2=A0 =C2=A0 =C2=A0 > on one server?
>
>=C2=A0 =C2=A0 =C2=A0No, you cannot perform cross-database queries witho= ut a foreign
>=C2=A0 =C2=A0 =C2=A0data wrapper.=C2=A0 I don't see a reason why th= e statement shouldn't
>=C2=A0 =C2=A0 =C2=A0perform as well as in SQL Server if you use schemas= instead of
>=C2=A0 =C2=A0 =C2=A0databases.
>
>=C2=A0 =C2=A0 =C2=A0Yours,
>=C2=A0 =C2=A0 =C2=A0Laurenz Albe
>

--
Adrian Klaver
adrian.klaver@aklaver.com

--000000000000aa50ba062fa3bfda--