public inbox for [email protected]  
help / color / mirror / Atom feed
Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
7+ messages / 2 participants
[nested] [flat]

* Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
@ 2022-05-23 11:55  kimaidou <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: kimaidou @ 2022-05-23 11:55 UTC (permalink / raw)
  To: [email protected]

Hi list,

I have a basic need, often encountered in spatial analysis: I have a list
of cities, parks, childcare centres, schools. I need to count the number of
items for each city (0 if no item exists for this city)

I have tested 3 different SQL queries to achieve this goal:

* one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
* one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
* one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

I love the first one, with LEFT JOINS, because it is concise, seems simple,
and allows querying easily more than one aggregated field from the child
items. But we need to use the DISTINCT clause inside the aggregation
functions (count, string_agg), in order to count each child only once.

The one with the subqueries seems the more common way (I have seen it a
lot) but I find it cumbersome, and I guess it won't scale well for bigger
datasets.

The one with the LATERAL joins seems overcomplicated, but I probably missed
some easier way to use the lateral join. I do not know well how the LATERAL
differs from the subqueries...

I would like to have your opinion on this scenario. What is the best query
for this use case, considering the fact that it should perform well on
heavier datasets (1 million cities and thousands of children).

NB: I used SQL Fiddle to help everyone see the data and SQL queries. Not
sure if everyone can modify it or not. Please try to keep the 3 example
unchanged. It seems SQL Fiddle has not been update since at least 2018, so
PostgreSQL version is 9.6.

Regards
Michaël


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
@ 2022-05-23 13:14  Frank Streitzig <[email protected]>
  parent: kimaidou <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Frank Streitzig @ 2022-05-23 13:14 UTC (permalink / raw)
  To: kimaidou <[email protected]>; +Cc: [email protected]

Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a list
> of cities, parks, childcare centres, schools. I need to count the number of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
> * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:  134.62
query 2: 8522.32
query 3:  134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select  c.*
        , coalesce(s.cnt,0) as cnt_school
        , s.schools
        , coalesce(cc.cnt,0) as cnt_childcare
        , cc.childcares
        , coalesce(p.cnt,0) as cnt_park
        , p.parks
  from city c
    left outer join
       (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS schools
           from school
           group by fk_id_city) s
      on s.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS childcares
            from childcare
           group by fk_id_city) cc
      on cc.fk_id_city = c.id
    left outer join
      (select fk_id_city, count(*) as cnt
               ,string_agg(name, ', ') AS parks
         from park
         group by fk_id_city) p
      on p.fk_id_city = c.id
  order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount
of data.

Regards,
Frank






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
@ 2022-05-23 14:15  kimaidou <[email protected]>
  parent: Frank Streitzig <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: kimaidou @ 2022-05-23 14:15 UTC (permalink / raw)
  To: Frank Streitzig <[email protected]>; +Cc: [email protected]

So you

Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a écrit :

> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> > Hi list,
> >
> > I have a basic need, often encountered in spatial analysis: I have a list
> > of cities, parks, childcare centres, schools. I need to count the number
> of
> > items for each city (0 if no item exists for this city)
> >
> > I have tested 3 different SQL queries to achieve this goal:
> >
> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6
>
> Hello,
>
> Cost of queries see link "View Execution Plan" in fiddle
>
> query 1:  134.62
> query 2: 8522.32
> query 3:  134.62
>
> query 1 and 3 have wrong count in result (columns nb_school,
> nb_childcare, nb_park)
>
> My try has cost of 81.83
>
> select  c.*
>         , coalesce(s.cnt,0) as cnt_school
>         , s.schools
>         , coalesce(cc.cnt,0) as cnt_childcare
>         , cc.childcares
>         , coalesce(p.cnt,0) as cnt_park
>         , p.parks
>   from city c
>     left outer join
>        (select fk_id_city, count(*) as cnt
>                ,string_agg(name, ', ') AS schools
>            from school
>            group by fk_id_city) s
>       on s.fk_id_city = c.id
>     left outer join
>       (select fk_id_city, count(*) as cnt
>                ,string_agg(name, ', ') AS childcares
>             from childcare
>            group by fk_id_city) cc
>       on cc.fk_id_city = c.id
>     left outer join
>       (select fk_id_city, count(*) as cnt
>                ,string_agg(name, ', ') AS parks
>          from park
>          group by fk_id_city) p
>       on p.fk_id_city = c.id
>   order by c.id
> ;
>
> IMHO, but without a where clause, the cost will increase with the amount
> of data.
>
> Regards,
> Frank
>
>


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
@ 2022-05-23 14:20  kimaidou <[email protected]>
  parent: kimaidou <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: kimaidou @ 2022-05-23 14:20 UTC (permalink / raw)
  To: Frank Streitzig <[email protected]>; +Cc: [email protected]

Hi Frank,

Thanks for your answer !

It seems it would perform better to aggregate as soon as possible, like you
illustrated in your example.
I will rewrite the query with "WITH" clauses to improve readability.

Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.

Michaël

Le lun. 23 mai 2022 à 16:15, kimaidou <[email protected]> a écrit :

> So you
>
> Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a
> écrit :
>
>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
>> > Hi list,
>> >
>> > I have a basic need, often encountered in spatial analysis: I have a
>> list
>> > of cities, parks, childcare centres, schools. I need to count the
>> number of
>> > items for each city (0 if no item exists for this city)
>> >
>> > I have tested 3 different SQL queries to achieve this goal:
>> >
>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6
>>
>> Hello,
>>
>> Cost of queries see link "View Execution Plan" in fiddle
>>
>> query 1:  134.62
>> query 2: 8522.32
>> query 3:  134.62
>>
>> query 1 and 3 have wrong count in result (columns nb_school,
>> nb_childcare, nb_park)
>>
>> My try has cost of 81.83
>>
>> select  c.*
>>         , coalesce(s.cnt,0) as cnt_school
>>         , s.schools
>>         , coalesce(cc.cnt,0) as cnt_childcare
>>         , cc.childcares
>>         , coalesce(p.cnt,0) as cnt_park
>>         , p.parks
>>   from city c
>>     left outer join
>>        (select fk_id_city, count(*) as cnt
>>                ,string_agg(name, ', ') AS schools
>>            from school
>>            group by fk_id_city) s
>>       on s.fk_id_city = c.id
>>     left outer join
>>       (select fk_id_city, count(*) as cnt
>>                ,string_agg(name, ', ') AS childcares
>>             from childcare
>>            group by fk_id_city) cc
>>       on cc.fk_id_city = c.id
>>     left outer join
>>       (select fk_id_city, count(*) as cnt
>>                ,string_agg(name, ', ') AS parks
>>          from park
>>          group by fk_id_city) p
>>       on p.fk_id_city = c.id
>>   order by c.id
>> ;
>>
>> IMHO, but without a where clause, the cost will increase with the amount
>> of data.
>>
>> Regards,
>> Frank
>>
>>


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
@ 2022-05-23 14:22  kimaidou <[email protected]>
  parent: kimaidou <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: kimaidou @ 2022-05-23 14:22 UTC (permalink / raw)
  To: Frank Streitzig <[email protected]>; +Cc: [email protected]

By the way, I was in fact aware of the duplicate count for the "nb_schools"
and other fields, this is why I used a count(DISTINCT ) to have a correct
count in the first example. I kept the nb_schools and 2 other fields to
illustrate the cost of using DISTINCT in the aggregate functions.

Le lun. 23 mai 2022 à 16:20, kimaidou <[email protected]> a écrit :

> Hi Frank,
>
> Thanks for your answer !
>
> It seems it would perform better to aggregate as soon as possible, like
> you illustrated in your example.
> I will rewrite the query with "WITH" clauses to improve readability.
>
> Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.
>
> Michaël
>
> Le lun. 23 mai 2022 à 16:15, kimaidou <[email protected]> a écrit :
>
>> So you
>>
>> Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a
>> écrit :
>>
>>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
>>> > Hi list,
>>> >
>>> > I have a basic need, often encountered in spatial analysis: I have a
>>> list
>>> > of cities, parks, childcare centres, schools. I need to count the
>>> number of
>>> > items for each city (0 if no item exists for this city)
>>> >
>>> > I have tested 3 different SQL queries to achieve this goal:
>>> >
>>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
>>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
>>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6
>>>
>>> Hello,
>>>
>>> Cost of queries see link "View Execution Plan" in fiddle
>>>
>>> query 1:  134.62
>>> query 2: 8522.32
>>> query 3:  134.62
>>>
>>> query 1 and 3 have wrong count in result (columns nb_school,
>>> nb_childcare, nb_park)
>>>
>>> My try has cost of 81.83
>>>
>>> select  c.*
>>>         , coalesce(s.cnt,0) as cnt_school
>>>         , s.schools
>>>         , coalesce(cc.cnt,0) as cnt_childcare
>>>         , cc.childcares
>>>         , coalesce(p.cnt,0) as cnt_park
>>>         , p.parks
>>>   from city c
>>>     left outer join
>>>        (select fk_id_city, count(*) as cnt
>>>                ,string_agg(name, ', ') AS schools
>>>            from school
>>>            group by fk_id_city) s
>>>       on s.fk_id_city = c.id
>>>     left outer join
>>>       (select fk_id_city, count(*) as cnt
>>>                ,string_agg(name, ', ') AS childcares
>>>             from childcare
>>>            group by fk_id_city) cc
>>>       on cc.fk_id_city = c.id
>>>     left outer join
>>>       (select fk_id_city, count(*) as cnt
>>>                ,string_agg(name, ', ') AS parks
>>>          from park
>>>          group by fk_id_city) p
>>>       on p.fk_id_city = c.id
>>>   order by c.id
>>> ;
>>>
>>> IMHO, but without a where clause, the cost will increase with the amount
>>> of data.
>>>
>>> Regards,
>>> Frank
>>>
>>>


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
@ 2022-05-23 14:33  kimaidou <[email protected]>
  parent: kimaidou <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: kimaidou @ 2022-05-23 14:33 UTC (permalink / raw)
  To: Frank Streitzig <[email protected]>; +Cc: [email protected]

Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses
http://sqlfiddle.com/#!17/fe902/31/0

Le lun. 23 mai 2022 à 16:22, kimaidou <[email protected]> a écrit :

> By the way, I was in fact aware of the duplicate count for the
> "nb_schools" and other fields, this is why I used a count(DISTINCT ) to
> have a correct count in the first example. I kept the nb_schools and 2
> other fields to illustrate the cost of using DISTINCT in the aggregate
> functions.
>
> Le lun. 23 mai 2022 à 16:20, kimaidou <[email protected]> a écrit :
>
>> Hi Frank,
>>
>> Thanks for your answer !
>>
>> It seems it would perform better to aggregate as soon as possible, like
>> you illustrated in your example.
>> I will rewrite the query with "WITH" clauses to improve readability.
>>
>> Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.
>>
>> Michaël
>>
>> Le lun. 23 mai 2022 à 16:15, kimaidou <[email protected]> a écrit :
>>
>>> So you
>>>
>>> Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a
>>> écrit :
>>>
>>>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
>>>> > Hi list,
>>>> >
>>>> > I have a basic need, often encountered in spatial analysis: I have a
>>>> list
>>>> > of cities, parks, childcare centres, schools. I need to count the
>>>> number of
>>>> > items for each city (0 if no item exists for this city)
>>>> >
>>>> > I have tested 3 different SQL queries to achieve this goal:
>>>> >
>>>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
>>>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
>>>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6
>>>>
>>>> Hello,
>>>>
>>>> Cost of queries see link "View Execution Plan" in fiddle
>>>>
>>>> query 1:  134.62
>>>> query 2: 8522.32
>>>> query 3:  134.62
>>>>
>>>> query 1 and 3 have wrong count in result (columns nb_school,
>>>> nb_childcare, nb_park)
>>>>
>>>> My try has cost of 81.83
>>>>
>>>> select  c.*
>>>>         , coalesce(s.cnt,0) as cnt_school
>>>>         , s.schools
>>>>         , coalesce(cc.cnt,0) as cnt_childcare
>>>>         , cc.childcares
>>>>         , coalesce(p.cnt,0) as cnt_park
>>>>         , p.parks
>>>>   from city c
>>>>     left outer join
>>>>        (select fk_id_city, count(*) as cnt
>>>>                ,string_agg(name, ', ') AS schools
>>>>            from school
>>>>            group by fk_id_city) s
>>>>       on s.fk_id_city = c.id
>>>>     left outer join
>>>>       (select fk_id_city, count(*) as cnt
>>>>                ,string_agg(name, ', ') AS childcares
>>>>             from childcare
>>>>            group by fk_id_city) cc
>>>>       on cc.fk_id_city = c.id
>>>>     left outer join
>>>>       (select fk_id_city, count(*) as cnt
>>>>                ,string_agg(name, ', ') AS parks
>>>>          from park
>>>>          group by fk_id_city) p
>>>>       on p.fk_id_city = c.id
>>>>   order by c.id
>>>> ;
>>>>
>>>> IMHO, but without a where clause, the cost will increase with the amount
>>>> of data.
>>>>
>>>> Regards,
>>>> Frank
>>>>
>>>>


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ?
@ 2022-05-23 20:39  Frank Streitzig <[email protected]>
  parent: kimaidou <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Frank Streitzig @ 2022-05-23 20:39 UTC (permalink / raw)
  To: kimaidou <[email protected]>; +Cc: [email protected]

Am Mon, May 23, 2022 at 04:33:16PM +0200 schrieb kimaidou:
> Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses
> http://sqlfiddle.com/#!17/fe902/31/0

yes, you can do it like this.

Regards
Frank






^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2022-05-23 20:39 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2022-05-23 11:55 Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? kimaidou <[email protected]>
2022-05-23 13:14 ` Frank Streitzig <[email protected]>
2022-05-23 14:15   ` kimaidou <[email protected]>
2022-05-23 14:20     ` kimaidou <[email protected]>
2022-05-23 14:22       ` kimaidou <[email protected]>
2022-05-23 14:33         ` kimaidou <[email protected]>
2022-05-23 20:39           ` Frank Streitzig <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox