MIME-Version: 1.0
References: 
 <CAMKXKO6Sam3WrzCb12yRyW+=OT_7K9zz3MfAQy4PNywuqhkGTQ@mail.gmail.com>
 <YouIuUVD9ivC03pj@frastr-dev>
 <CAMKXKO4xgawFHoVpz6hjeV-uGNf2W0NU90PkVBuCxJDLDUVmWg@mail.gmail.com>
 <CAMKXKO4UoPKACXYVWMxDAn-ZukRVD9S3TV52P93QOYHzC6AqbQ@mail.gmail.com>
 <CAMKXKO56Kc9Y32GEscw4F=mjkB9N3+aO8gVX1w9HmNC9T=1OrA@mail.gmail.com>
In-Reply-To: 
 <CAMKXKO56Kc9Y32GEscw4F=mjkB9N3+aO8gVX1w9HmNC9T=1OrA@mail.gmail.com>
From: kimaidou <kimaidou@gmail.com>
Date: Mon, 23 May 2022 16:33:16 +0200
Message-ID: 
 <CAMKXKO67oxVv9mBBEkve9YP7Urv53oFHyWY9qthrmEDCi-5HHA@mail.gmail.com>
Subject: Re: Count child objects for each line of a table: LEFT JOIN, LATERAL
 JOIN or subqueries ?
To: Frank Streitzig <fstreitzig@gmx.net>
Cc: pgsql-sql@lists.postgresql.org
Content-Type: multipart/alternative; boundary="0000000000002f09dc05dfaeb97b"
Archived-At: 
 <https://www.postgresql.org/message-id/CAMKXKO67oxVv9mBBEkve9YP7Urv53oFHyWY9qthrmEDCi-5HHA%40mail.gmail.com>
Precedence: bulk

--0000000000002f09dc05dfaeb97b
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses
http://sqlfiddle.com/#!17/fe902/31/0

Le lun. 23 mai 2022 =C3=A0 16:22, kimaidou <kimaidou@gmail.com> a =C3=A9cri=
t :

> By the way, I was in fact aware of the duplicate count for the
> "nb_schools" and other fields, this is why I used a count(DISTINCT ) to
> have a correct count in the first example. I kept the nb_schools and 2
> other fields to illustrate the cost of using DISTINCT in the aggregate
> functions.
>
> Le lun. 23 mai 2022 =C3=A0 16:20, kimaidou <kimaidou@gmail.com> a =C3=A9c=
rit :
>
>> Hi Frank,
>>
>> Thanks for your answer !
>>
>> It seems it would perform better to aggregate as soon as possible, like
>> you illustrated in your example.
>> I will rewrite the query with "WITH" clauses to improve readability.
>>
>> Thanks also for the Coalesce idea. It is better to see 0 instead of NULL=
.
>>
>> Micha=C3=ABl
>>
>> Le lun. 23 mai 2022 =C3=A0 16:15, kimaidou <kimaidou@gmail.com> a =C3=A9=
crit :
>>
>>> So you
>>>
>>> Le lun. 23 mai 2022 =C3=A0 15:14, Frank Streitzig <fstreitzig@gmx.net> =
a
>>> =C3=A9crit :
>>>
>>>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
>>>> > Hi list,
>>>> >
>>>> > I have a basic need, often encountered in spatial analysis: I have a
>>>> list
>>>> > of cities, parks, childcare centres, schools. I need to count the
>>>> number of
>>>> > items for each city (0 if no item exists for this city)
>>>> >
>>>> > I have tested 3 different SQL queries to achieve this goal:
>>>> >
>>>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3
>>>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4
>>>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6
>>>>
>>>> Hello,
>>>>
>>>> Cost of queries see link "View Execution Plan" in fiddle
>>>>
>>>> query 1:  134.62
>>>> query 2: 8522.32
>>>> query 3:  134.62
>>>>
>>>> query 1 and 3 have wrong count in result (columns nb_school,
>>>> nb_childcare, nb_park)
>>>>
>>>> My try has cost of 81.83
>>>>
>>>> select  c.*
>>>>         , coalesce(s.cnt,0) as cnt_school
>>>>         , s.schools
>>>>         , coalesce(cc.cnt,0) as cnt_childcare
>>>>         , cc.childcares
>>>>         , coalesce(p.cnt,0) as cnt_park
>>>>         , p.parks
>>>>   from city c
>>>>     left outer join
>>>>        (select fk_id_city, count(*) as cnt
>>>>                ,string_agg(name, ', ') AS schools
>>>>            from school
>>>>            group by fk_id_city) s
>>>>       on s.fk_id_city =3D c.id
>>>>     left outer join
>>>>       (select fk_id_city, count(*) as cnt
>>>>                ,string_agg(name, ', ') AS childcares
>>>>             from childcare
>>>>            group by fk_id_city) cc
>>>>       on cc.fk_id_city =3D c.id
>>>>     left outer join
>>>>       (select fk_id_city, count(*) as cnt
>>>>                ,string_agg(name, ', ') AS parks
>>>>          from park
>>>>          group by fk_id_city) p
>>>>       on p.fk_id_city =3D c.id
>>>>   order by c.id
>>>> ;
>>>>
>>>> IMHO, but without a where clause, the cost will increase with the amou=
nt
>>>> of data.
>>>>
>>>> Regards,
>>>> Frank
>>>>
>>>>

--0000000000002f09dc05dfaeb97b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Here is the 4th SQL fiddle with your proposal organiz=
ed with &quot;WITH&quot; clauses</div><div><a href=3D"http://sqlfiddle.com/=
#!17/fe902/31/0">http://sqlfiddle.com/#!17/fe902/31/0</a></div></div><br><d=
iv class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">Le=C2=A0lun.=
 23 mai 2022 =C3=A0=C2=A016:22, kimaidou &lt;<a href=3D"mailto:kimaidou@gma=
il.com">kimaidou@gmail.com</a>&gt; a =C3=A9crit=C2=A0:<br></div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">By the way, I was i=
n fact aware of the duplicate count for the &quot;nb_schools&quot; and othe=
r fields, this is why I used a count(DISTINCT ) to have a correct count in =
the first example. I kept the nb_schools and 2 other fields to illustrate t=
he cost of using DISTINCT in the aggregate functions.<br></div><br><div cla=
ss=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">Le=C2=A0lun. 23 ma=
i 2022 =C3=A0=C2=A016:20, kimaidou &lt;<a href=3D"mailto:kimaidou@gmail.com=
" target=3D"_blank">kimaidou@gmail.com</a>&gt; a =C3=A9crit=C2=A0:<br></div=
><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border=
-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div>Hi=
 Frank,</div><div><br></div><div>Thanks for your answer !</div><div><br></d=
iv><div>It seems it would perform better to aggregate as soon as possible, =
like you illustrated in your example.</div><div>I will rewrite the query wi=
th &quot;WITH&quot; clauses to improve readability.</div><div><br></div><di=
v>Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.=
</div><div><br></div><div>Micha=C3=ABl<br></div></div><br><div class=3D"gma=
il_quote"><div dir=3D"ltr" class=3D"gmail_attr">Le=C2=A0lun. 23 mai 2022 =
=C3=A0=C2=A016:15, kimaidou &lt;<a href=3D"mailto:kimaidou@gmail.com" targe=
t=3D"_blank">kimaidou@gmail.com</a>&gt; a =C3=A9crit=C2=A0:<br></div><block=
quote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1=
px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">So you<br></di=
v><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">Le=
=C2=A0lun. 23 mai 2022 =C3=A0=C2=A015:14, Frank Streitzig &lt;<a href=3D"ma=
ilto:fstreitzig@gmx.net" target=3D"_blank">fstreitzig@gmx.net</a>&gt; a =C3=
=A9crit=C2=A0:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0=
px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">A=
m Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:<br>
&gt; Hi list,<br>
&gt;<br>
&gt; I have a basic need, often encountered in spatial analysis: I have a l=
ist<br>
&gt; of cities, parks, childcare centres, schools. I need to count the numb=
er of<br>
&gt; items for each city (0 if no item exists for this city)<br>
&gt;<br>
&gt; I have tested 3 different SQL queries to achieve this goal:<br>
&gt;<br>
&gt; * one with several LEFT JOINS: <a href=3D"http://sqlfiddle.com/#!17/fe=
902/3" rel=3D"noreferrer" target=3D"_blank">http://sqlfiddle.com/#!17/fe902=
/3</a><br>
&gt; * one with sub-queries: <a href=3D"http://sqlfiddle.com/#!17/fe902/4" =
rel=3D"noreferrer" target=3D"_blank">http://sqlfiddle.com/#!17/fe902/4</a><=
br>
&gt; * one with several LATERAL JOINS: <a href=3D"http://sqlfiddle.com/#!17=
/fe902/6" rel=3D"noreferrer" target=3D"_blank">http://sqlfiddle.com/#!17/fe=
902/6</a><br>
<br>
Hello,<br>
<br>
Cost of queries see link &quot;View Execution Plan&quot; in fiddle<br>
<br>
query 1:=C2=A0 134.62<br>
query 2: 8522.32<br>
query 3:=C2=A0 134.62<br>
<br>
query 1 and 3 have wrong count in result (columns nb_school,<br>
nb_childcare, nb_park)<br>
<br>
My try has cost of 81.83<br>
<br>
select=C2=A0 c.*<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(s.cnt,0) as cnt_school<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , s.schools<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(cc.cnt,0) as cnt_childcare<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , cc.childcares<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(p.cnt,0) as cnt_park<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , p.parks<br>
=C2=A0 from city c<br>
=C2=A0 =C2=A0 left outer join<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0(select fk_id_city, count(*) as cnt<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#=
39;, &#39;) AS schools<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from school<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) s<br>
=C2=A0 =C2=A0 =C2=A0 on s.fk_id_city =3D <a href=3D"http://c.id" rel=3D"nor=
eferrer" target=3D"_blank">c.id</a><br>
=C2=A0 =C2=A0 left outer join<br>
=C2=A0 =C2=A0 =C2=A0 (select fk_id_city, count(*) as cnt<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#=
39;, &#39;) AS childcares<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 from childcare<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) cc<br>
=C2=A0 =C2=A0 =C2=A0 on cc.fk_id_city =3D <a href=3D"http://c.id" rel=3D"no=
referrer" target=3D"_blank">c.id</a><br>
=C2=A0 =C2=A0 left outer join<br>
=C2=A0 =C2=A0 =C2=A0 (select fk_id_city, count(*) as cnt<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#=
39;, &#39;) AS parks<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from park<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) p<br>
=C2=A0 =C2=A0 =C2=A0 on p.fk_id_city =3D <a href=3D"http://c.id" rel=3D"nor=
eferrer" target=3D"_blank">c.id</a><br>
=C2=A0 order by <a href=3D"http://c.id" rel=3D"noreferrer" target=3D"_blank=
">c.id</a><br>
;<br>
<br>
IMHO, but without a where clause, the cost will increase with the amount<br=
>
of data.<br>
<br>
Regards,<br>
Frank<br>
<br>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>

--0000000000002f09dc05dfaeb97b--