MIME-Version: 1.0
References: <CAEHBEOBuoMFWuhHM3L_Zr6o1enELju-Vns6Pknt4TT+6MFQOwQ@mail.gmail.com>
 <fd47b28c-8f6b-4bfb-a393-160c5c3de8c0@aklaver.com> <CAEHBEOD969YrbPH_z9OEmThWx3-w4sMMaHLhZLOQwqCwE8Y58Q@mail.gmail.com>
 <caf4e99941b3c83bb9eab91e33b144b826b68f79.camel@cybertec.at>
 <CAEHBEOBXzkGTqxQSYqmEFN5hbc=zsGWFpU9h8zf7AAPv4VdOWQ@mail.gmail.com>
 <c9699a6fd331d33864fc269060d8d961f784d827.camel@cybertec.at>
 <CAEHBEOBNoG8RkKuCcQQWkbYppMLMzA0MXq+s0kZ6wKWgD7+45Q@mail.gmail.com>
 <099b49ebae94e23f19afdad3f8c9c6e702a3a2d5.camel@cybertec.at>
 <CAEHBEODw8svX557pjB_EL-Os7KWtwi-9Uq=RuCkRKgHVZWw8Bw@mail.gmail.com>
 <6d7e1022-6404-4dab-8467-8d1f6e8b63cb@aklaver.com> <CAEHBEOCpxASoNn=u21kaqOn1A-4YPy_mVfgkEjT3wRT5G4ycbg@mail.gmail.com>
 <e9769f673c78bbeeabd82eb8b3054cee4fbd662f.camel@cybertec.at>
In-Reply-To: <e9769f673c78bbeeabd82eb8b3054cee4fbd662f.camel@cybertec.at>
From: me nefcanto <sn.1361@gmail.com>
Date: Thu, 6 Mar 2025 12:15:55 +0330
Message-ID: <CAEHBEODn+-SF5nTArF9_mf968UAKbs0ZPbDpzJK=ZU2vs2DKNA@mail.gmail.com>
Subject: Re: Quesion about querying distributed databases
To: Laurenz Albe <laurenz.albe@cybertec.at>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>, pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="00000000000071c20d062fa88920"
Archived-At: <https://www.postgresql.org/message-id/CAEHBEODn%2B-SF5nTArF9_mf968UAKbs0ZPbDpzJK%3DZU2vs2DKNA%40mail.gmail.com>
Precedence: bulk

--00000000000071c20d062fa88920
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Dear Laurenz

> I hear you, and I agree with that.

Thank you. Such a relief.

> If that worked well, then it should also work well with PostgreSQL and
foreign data wrappers.

You're right. We had problems with cross-server queries on SQL Server and
MariaDB too. It seems that cross-server queries are not solved by any
engine. But we had no problem with cross-database queries. That's where it
worked well both on SQL Server and MariaDB. It seems that for
cross-database queries, Postgres returns the entire result set from the
other database to this database and then performs joins locally. It seems
that for Postgres it's not different if the foreign database is on the same
machine, or it's on another machine. I just say so by seeing the queries
and asking questions about them. I have not performed a test yet.

> Well, if you split the data into several databases, that *was* sharding.

The way I understood it, sharding is when you split the database by rows,
not by tables. Examples choose a column like Tenant or User or Date as the
base of sharding. Never have I seen an example that stores Orders on one
database and Customers on another database and call it sharding. I don't
know, but we might call it distributed databases.

 > Consider using other, better databases than PostgreSQL (if you can find
them).

That's the point here. If we can't design a good thing on Postgres, then we
stick back to MariaDB. That's why we're researching and testing. As I
mentioned above, Postgres is amazing at some points but lacks some simple
things that other engines expose out of the box.

> Perhaps you should get a consultant; the mailing list does not seem to be
the right format for that request.

We have done that over the last decade. For SQL Server and then for
MariaDB. We have come up with some very practical and useful designs.
Separating CLOBs from main tables, storing UUID only as the name of files
to match the cloud storage, storing date-times as UTC, using bigint
everywhere even for small tables for consistency, denormalizing enum
storage (storing text instead of numeric value) even in large tables, etc.
etc.

But to choose a technology, we do have enough literacy and experience. It's
just some simple questions and answers. If I know that FDW works
differently for same-server databases, then I know that we will migrate.

> Don't ever store arrays in the database.  It will be a nightmare.

This is a very interesting claim. May I ask you to share its problems and
your experience?


On Thu, Mar 6, 2025 at 11:34=E2=80=AFAM Laurenz Albe <laurenz.albe@cybertec=
.at>
wrote:

> On Thu, 2025-03-06 at 06:13 +0330, me nefcanto wrote:
> > I once worked with a monolithic SQL Server database with more than 10
> billion
> > records and about 8 Terabytes of data. A single backup took us more tha=
n
> 21 days.
> > It was a nightmare. Almost everybody knows that scaling up has a
> ceiling, but
> > scaling out has no boundaries.
>
> I hear you, and I agree with that.
>
>
> > We initially chose to break the database into smaller databases, becaus=
e
> it
> > seemed natural for our modularized monolith architecture. And it worked
> great
> > for SQL Server. If you're small, we host them all on one server. If you
> get
> > bigger, we can put heavy databases on separate machines.
>
> So you mean that you had those databases on different servers?
> How would a cross-database query work in that case?  It must be something
> akin to foreign data in PostgreSQL.
>
> If that worked well, then it should also work well with PostgreSQL and
> foreign data wrappers.  Look at the execution plan you got on SQL Server
> and see where PostgreSQL chooses a different plan.  Then try to improve
> that.
> We can try to help if we see actual plans.
>
> > However, I don't have experience working with other types of database
> > scaling. I have used table partitioning, but I have never used sharding=
.
>
> Well, if you split the data into several databases, that *was* sharding.
>
> > Anyway, that's why I asked you guys. However, encouraging me to go back
> to
> > monolith without giving solutions on how to scale, is not helping. To b=
e
> > honest, I'm somehow disappointed by how the most advanced open source
> > database does not support cross-database querying just like how SQL
> Server
> > does. But if it doesn't, it doesn't. Our team should either drop it as =
a
> > choice or find a way (by asking the experts who built it or use it) how
> > to design based on its features. That's why I'm asking.
>
> Excluding options from the start is limiting yourself.  Consider using
> other, better databases than PostgreSQL (if you can find them).
>
> It is difficult to come up with a concrete design based on the informatio=
n
> you provided.  Perhaps you should get a consultant; the mailing list does
> not seem to be the right format for that request.
>
> Typically, you split the data in a ways that they have few
> interconnections,
> for example per customer, so that you don't regularly end up joining data
> from different databases (shards).
>
> > One thing that comes to my mind, is to use custom types. Instead of
> storing
> > data in ItemCategories and ItemAttributes, store them as arrays in the
> > relevant tables in the same database.
>
> Don't ever store arrays in the database.  It will be a nightmare.
> You seem to be drawn to questionable data design...
>
> Yours,
> Laurenz Albe
>

--00000000000071c20d062fa88920
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:tahoma,s=
ans-serif">Dear Laurenz</div><div class=3D"gmail_default" style=3D"font-fam=
ily:tahoma,sans-serif"><br></div><div class=3D"gmail_default" style=3D"font=
-family:tahoma,sans-serif">&gt;=C2=A0<span style=3D"font-family:Arial,Helve=
tica,sans-serif">I hear you, and I agree with that.</span></div><div class=
=3D"gmail_default" style=3D"font-family:tahoma,sans-serif"><span style=3D"f=
ont-family:Arial,Helvetica,sans-serif"><br></span></div><div class=3D"gmail=
_default" style=3D"">Thank you. Such a relief.</div><div class=3D"gmail_def=
ault" style=3D"font-family:tahoma,sans-serif"><br></div><div class=3D"gmail=
_default" style=3D"font-family:tahoma,sans-serif">&gt;=C2=A0<span style=3D"=
font-family:Arial,Helvetica,sans-serif">If that worked well, then it should=
 also work well with PostgreSQL and</span></div>foreign data wrappers.<div>=
<br></div><div><div class=3D"gmail_default" style=3D"font-family:tahoma,san=
s-serif">You&#39;re right. We had problems with cross-server queries on SQL=
 Server and MariaDB too. It seems that cross-server queries are not solved =
by any engine. But we had no problem with cross-database queries. That&#39;=
s where it worked well both on SQL Server and MariaDB. It seems that for cr=
oss-database queries, Postgres returns the entire result set from the other=
 database to this database and then performs joins locally. It seems that f=
or Postgres it&#39;s not different if the foreign database is on the same m=
achine, or it&#39;s on another machine. I just say so by seeing the queries=
 and asking questions about them. I have not performed a test yet.</div><di=
v class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif"><br></div=
><div class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif">&gt;=
=C2=A0<span style=3D"font-family:Arial,Helvetica,sans-serif">Well, if you s=
plit the data into several databases, that *was* sharding.</span></div><spa=
n class=3D"gmail-im" style=3D"color:rgb(80,0,80)"><br><span class=3D"gmail_=
default" style=3D"font-family:tahoma,sans-serif">The way I understood it, s=
harding is when you split the database by rows, not by tables. Examples cho=
ose a column like Tenant or User or Date as the base of sharding. Never hav=
e I seen an example that stores Orders on one database and Customers on ano=
ther database and call it sharding. I don&#39;t know, but we might call it =
distributed databases.</span></span></div><div><span class=3D"gmail-im" sty=
le=3D"color:rgb(80,0,80)"><span class=3D"gmail_default" style=3D"font-famil=
y:tahoma,sans-serif"><br></span></span></div><div><span class=3D"gmail-im" =
style=3D"color:rgb(80,0,80)"><span class=3D"gmail_default" style=3D"font-fa=
mily:tahoma,sans-serif">=C2=A0&gt;=C2=A0</span><span style=3D"color:rgb(34,=
34,34)">Consider using<span class=3D"gmail_default" style=3D"font-family:ta=
homa,sans-serif">=C2=A0</span></span></span>other, better databases than Po=
stgreSQL (if you can find them).</div><div><br></div><div><div class=3D"gma=
il_default" style=3D"font-family:tahoma,sans-serif">That&#39;s the point he=
re. If we can&#39;t design a good thing on Postgres, then we stick back to =
MariaDB. That&#39;s why we&#39;re researching and testing. As I mentioned a=
bove, Postgres is amazing at some points but lacks some simple things that =
other engines expose out of the box.</div><br></div><div><div class=3D"gmai=
l_default" style=3D"font-family:tahoma,sans-serif">&gt;=C2=A0<span style=3D=
"font-family:Arial,Helvetica,sans-serif">Perhaps you should get a consultan=
t; the mailing list does=C2=A0</span><span style=3D"font-family:Arial,Helve=
tica,sans-serif">not seem to be the right format for that request.</span></=
div></div><div class=3D"gmail_default" style=3D"font-family:tahoma,sans-ser=
if"><span style=3D"font-family:Arial,Helvetica,sans-serif"><br></span></div=
><div class=3D"gmail_default" style=3D"font-family:tahoma,sans-serif"><span=
 style=3D"font-family:Arial,Helvetica,sans-serif">We have done that over th=
e last decade. For SQL Server and then for MariaDB. We have come up with so=
me very practical and useful designs. Separating CLOBs from main tables, st=
oring UUID only as the name of files to match the cloud storage, storing da=
te-times as UTC, using bigint everywhere even for small tables for consiste=
ncy, denormalizing enum storage (storing text instead of numeric value) eve=
n in large tables, etc. etc.<br><br>But to choose a technology, we do have =
enough literacy and experience. It&#39;s just some simple questions and ans=
wers. If I know that FDW works differently for same-server databases, then =
I know that we will migrate.<br><br>&gt;=C2=A0</span><span style=3D"font-fa=
mily:Arial,Helvetica,sans-serif">Don&#39;t ever store arrays in the databas=
e.=C2=A0 It will be a nightmare.</span></div><br><div class=3D"gmail_defaul=
t" style=3D"font-family:tahoma,sans-serif">This is a very interesting claim=
. May I ask you to share its problems and your experience?</div><br></div><=
br><div class=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=
=3D"gmail_attr">On Thu, Mar 6, 2025 at 11:34=E2=80=AFAM Laurenz Albe &lt;<a=
 href=3D"mailto:laurenz.albe@cybertec.at">laurenz.albe@cybertec.at</a>&gt; =
wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0=
px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 2=
025-03-06 at 06:13 +0330, me nefcanto wrote:<br>
&gt; I once worked with a monolithic SQL Server database with more than 10 =
billion<br>
&gt; records and about 8 Terabytes=C2=A0of data. A single backup took us mo=
re than 21 days.<br>
&gt; It was a nightmare. Almost everybody knows that scaling up has a ceili=
ng, but<br>
&gt; scaling out has no boundaries.<br>
<br>
I hear you, and I agree with that.<br>
<br>
<br>
&gt; We initially chose to break the database into smaller databases, becau=
se it<br>
&gt; seemed natural for our modularized monolith architecture. And it worke=
d great<br>
&gt; for SQL Server. If you&#39;re small, we host them all on one server. I=
f you get<br>
&gt; bigger, we can put heavy databases on separate machines.<br>
<br>
So you mean that you had those databases on different servers?<br>
How would a cross-database query work in that case?=C2=A0 It must be someth=
ing<br>
akin to foreign data in PostgreSQL.<br>
<br>
If that worked well, then it should also work well with PostgreSQL and<br>
foreign data wrappers.=C2=A0 Look at the execution plan you got on SQL Serv=
er<br>
and see where PostgreSQL chooses a different plan.=C2=A0 Then try to improv=
e that.<br>
We can try to help if we see actual plans.<br>
<br>
&gt; However, I don&#39;t have experience working with other types of datab=
ase<br>
&gt; scaling. I have used table partitioning, but I have never used shardin=
g.<br>
<br>
Well, if you split the data into several databases, that *was* sharding.<br=
>
<br>
&gt; Anyway, that&#39;s why I asked you guys. However, encouraging me to go=
 back to<br>
&gt; monolith without giving solutions on how to scale, is not helping. To =
be<br>
&gt; honest, I&#39;m somehow disappointed by how the most advanced open sou=
rce<br>
&gt; database does not support cross-database querying just like how SQL Se=
rver<br>
&gt; does. But if it doesn&#39;t, it doesn&#39;t. Our team should either dr=
op it as a<br>
&gt; choice or find a way (by asking the experts who built it or use it) ho=
w<br>
&gt; to design based on its features. That&#39;s why I&#39;m asking.<br>
<br>
Excluding options from the start is limiting yourself.=C2=A0 Consider using=
<br>
other, better databases than PostgreSQL (if you can find them).<br>
<br>
It is difficult to come up with a concrete design based on the information<=
br>
you provided.=C2=A0 Perhaps you should get a consultant; the mailing list d=
oes<br>
not seem to be the right format for that request.<br>
<br>
Typically, you split the data in a ways that they have few interconnections=
,<br>
for example per customer, so that you don&#39;t regularly end up joining da=
ta<br>
from different databases (shards).<br>
<br>
&gt; One thing that comes to my mind, is to use custom types. Instead of st=
oring<br>
&gt; data in ItemCategories and ItemAttributes, store them as arrays in the=
<br>
&gt; relevant tables in the same database.<br>
<br>
Don&#39;t ever store arrays in the database.=C2=A0 It will be a nightmare.<=
br>
You seem to be drawn to questionable data design...<br>
<br>
Yours,<br>
Laurenz Albe<br>
</blockquote></div>

--00000000000071c20d062fa88920--