Re: Quesion about querying distributed databases

public inbox for [email protected]  
help / color / mirror / Atom feed

From: me nefcanto <[email protected]>
To: Adrian Klaver <[email protected]>
Cc: [email protected]
Subject: Re: Quesion about querying distributed databases
Date: Wed, 5 Mar 2025 10:12:13 +0330
Message-ID: <CAEHBEOD969YrbPH_z9OEmThWx3-w4sMMaHLhZLOQwqCwE8Y58Q@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAEHBEOBuoMFWuhHM3L_Zr6o1enELju-Vns6Pknt4TT+6MFQOwQ@mail.gmail.com>
	<[email protected]>

Adrian Klaver, thank you for the link. I asked the AI to create a query for
me using FDW.

This is the sample query:

with filtered_products as (
    select p.product_id
    from products.product p
    where p.title ilike '%search_term%'
), category_filtered as (
    select ic.product_id
    from taxonomy.item_categories ic
    where ic.category_id = any(array['category_id_1', 'category_id_2'])
), attribute_filtered as (
    select ia.product_id
    from attributes.item_attributes ia
    where ia.attribute_id = any(array['attribute_id_1', 'attribute_id_2'])
), final_products as (
    select f.product_id
    from filtered_products f
    join category_filtered c on f.product_id = c.product_id
    join attribute_filtered a on f.product_id = a.product_id
    order by f.product_id -- replace with relevant sorting column
    limit 50 offset 0
)
select p.*
from products.product p
join final_products fp on p.product_id = fp.product_id;

The problem here is that it collects all of the product_id values from the
ItemCategories table. Let's say each product is put in one category only.
This means that we have 100 thousand records in the ItemCategories table.
Thus, to show a list of 20 products on the website, this query first
fetches 100 thousand product_id values from the remote server.

That's not scalable. Is there a workaround for this?

Thank you
Saeed

On Wed, Mar 5, 2025 at 8:12 AM Adrian Klaver <[email protected]>
wrote:

> On 3/4/25 20:40, me nefcanto wrote:
> > Hello
> >
> > Consider this scenario:
> >
> >   * 3 servers, 3 databases, each on a separate server:
> >       o *Products database*: Contains the *Products* table (with over
> >         100,000 records).
> >       o *Taxonomy database*: Contains the *Categories* and
> >         *ItemCategories (EAV)* tables.
> >       o *Attributes database*: Contains the *Attributes* and
> >         *ItemAttributes (EAV)* tables.
> >
> > How do you find products based on the following criteria?
>
> https://www.postgresql.org/docs/current/postgres-fdw.html
>
> >
> >  1. A search in the title (e.g., "awesome shirts").
> >  2. Selected categories (e.g., "casual" and "sports").
> >  3. Selected attributes (e.g., "color: blue" and "size: large")
> >
> >
> > Regards
> > Saeed
>
> --
> Adrian Klaver
> [email protected]
>
>

view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Quesion about querying distributed databases
  In-Reply-To: <CAEHBEOD969YrbPH_z9OEmThWx3-w4sMMaHLhZLOQwqCwE8Y58Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox