Re: Quesion about querying distributed databases

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Laurenz Albe <[email protected]>
To: me nefcanto <[email protected]>
To: Adrian Klaver <[email protected]>
Cc: [email protected]
Subject: Re: Quesion about querying distributed databases
Date: Thu, 06 Mar 2025 09:04:31 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAEHBEOCpxASoNn=u21kaqOn1A-4YPy_mVfgkEjT3wRT5G4ycbg@mail.gmail.com>
References: <CAEHBEOBuoMFWuhHM3L_Zr6o1enELju-Vns6Pknt4TT+6MFQOwQ@mail.gmail.com>
	<[email protected]>
	<CAEHBEOD969YrbPH_z9OEmThWx3-w4sMMaHLhZLOQwqCwE8Y58Q@mail.gmail.com>
	<[email protected]>
	<CAEHBEOBXzkGTqxQSYqmEFN5hbc=zsGWFpU9h8zf7AAPv4VdOWQ@mail.gmail.com>
	<[email protected]>
	<CAEHBEOBNoG8RkKuCcQQWkbYppMLMzA0MXq+s0kZ6wKWgD7+45Q@mail.gmail.com>
	<[email protected]>
	<CAEHBEODw8svX557pjB_EL-Os7KWtwi-9Uq=RuCkRKgHVZWw8Bw@mail.gmail.com>
	<[email protected]>
	<CAEHBEOCpxASoNn=u21kaqOn1A-4YPy_mVfgkEjT3wRT5G4ycbg@mail.gmail.com>

On Thu, 2025-03-06 at 06:13 +0330, me nefcanto wrote:
> I once worked with a monolithic SQL Server database with more than 10 billion
> records and about 8 Terabytes of data. A single backup took us more than 21 days.
> It was a nightmare. Almost everybody knows that scaling up has a ceiling, but
> scaling out has no boundaries.

I hear you, and I agree with that.

> We initially chose to break the database into smaller databases, because it
> seemed natural for our modularized monolith architecture. And it worked great
> for SQL Server. If you're small, we host them all on one server. If you get
> bigger, we can put heavy databases on separate machines.

So you mean that you had those databases on different servers?
How would a cross-database query work in that case?  It must be something
akin to foreign data in PostgreSQL.

If that worked well, then it should also work well with PostgreSQL and
foreign data wrappers.  Look at the execution plan you got on SQL Server
and see where PostgreSQL chooses a different plan.  Then try to improve that.
We can try to help if we see actual plans.

> However, I don't have experience working with other types of database
> scaling. I have used table partitioning, but I have never used sharding.

Well, if you split the data into several databases, that *was* sharding.

> Anyway, that's why I asked you guys. However, encouraging me to go back to
> monolith without giving solutions on how to scale, is not helping. To be
> honest, I'm somehow disappointed by how the most advanced open source
> database does not support cross-database querying just like how SQL Server
> does. But if it doesn't, it doesn't. Our team should either drop it as a
> choice or find a way (by asking the experts who built it or use it) how
> to design based on its features. That's why I'm asking.

Excluding options from the start is limiting yourself.  Consider using
other, better databases than PostgreSQL (if you can find them).

It is difficult to come up with a concrete design based on the information
you provided.  Perhaps you should get a consultant; the mailing list does
not seem to be the right format for that request.

Typically, you split the data in a ways that they have few interconnections,
for example per customer, so that you don't regularly end up joining data
from different databases (shards).

> One thing that comes to my mind, is to use custom types. Instead of storing
> data in ItemCategories and ItemAttributes, store them as arrays in the
> relevant tables in the same database.

Don't ever store arrays in the database.  It will be a nightmare.
You seem to be drawn to questionable data design...

Yours,
Laurenz Albe

view thread (19+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Quesion about querying distributed databases
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox