Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nt97o-0007p7-Tn for pgsql-sql@arkaria.postgresql.org; Mon, 23 May 2022 14:33:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nt97m-0007yF-ML for pgsql-sql@arkaria.postgresql.org; Mon, 23 May 2022 14:33:34 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nt97m-0007wa-7a for pgsql-sql@lists.postgresql.org; Mon, 23 May 2022 14:33:34 +0000 Received: from mail-ej1-x636.google.com ([2a00:1450:4864:20::636]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nt97h-0002cY-H5 for pgsql-sql@lists.postgresql.org; Mon, 23 May 2022 14:33:33 +0000 Received: by mail-ej1-x636.google.com with SMTP id jx22so15791582ejb.12 for ; Mon, 23 May 2022 07:33:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=TG7e6KoYB6jkePG+fKzrdPKiAy2pa+Rv9otdLMPNKp8=; b=ZsDobcVT1O8c9GTHMMKxHiqLXMlXhQtBQNDeHBUFyW88I8BjzcMsr+bTAgk3GcgkYG aGlDVxrNVScCjlTtvBM06y9zpOoEKCp2pM+oNZgO3HGHeDWoNQIs/4TESqa1qh5SH6m7 cDhm9AObt2K1Zu5/DU/B+IfKH3cEJjQuQ0X5mh0Vnt7XzVtIC/BoclGcFAM2l3tl6b5F sm45o4sehJtRxVe48HsDs3Lqaj0x6jlEPSFY/2CwCRXoMBk+urgj7AnMGi5OnPwrKrIg bN3xhZlLt0Cp2g8Acft6GvtWbae6aOj0mYBB8T7nl0BKjBdcopbbk7tgzabVMDQfvorT Iuhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=TG7e6KoYB6jkePG+fKzrdPKiAy2pa+Rv9otdLMPNKp8=; b=wrLxZE5rCUANqTh58LSeKuwBNFhKXOSjacyeoNm/Y/9w4BTeMY3400m52Wjrjh/rwi xeT4oiqn1Mj/U6D6APQH1+baAnu+wEF38WJxtjIQmlx/uZxvQ4buAB9U6wVjaeiIRcAZ wE3KyVn5l88U4z5JtuSjuYPkE0Hx48qWXRDDQq87kX7Pq7Kq0hEAwXQJxxYF0qCOpnTi VqyRCFW9lCnL14NdMNOx1iJPG9kFWfuvBVxuC3ROUOswGSwX6Ij7l5ggAlYeQaHx/Wzs dTrw5bUl3q2djhw2ojUoPHzNJmPDk111p33haaYsGPGXDrbHpua2AVCBJcXqJm68QUT0 TaPg== X-Gm-Message-State: AOAM531PjdjohODChneRnm0gC2Ku2vDyEXqFIYkc/39EP5XwWi9Fs4V3 03jCeHwRbd2KMlpK57LuOJiXfw7XRgoPaSTZigY= X-Google-Smtp-Source: ABdhPJyM5x4WPtFZXq4B+dMUzqR3mO7XRvMXpzlfGJPKDyIY66QJot639YhblAZhRqDFAKfafIJjbUP9eHkMsxIaZuE= X-Received: by 2002:a17:907:94c4:b0:6f9:f69f:2fd5 with SMTP id dn4-20020a17090794c400b006f9f69f2fd5mr20258650ejc.347.1653316407716; Mon, 23 May 2022 07:33:27 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: kimaidou Date: Mon, 23 May 2022 16:33:16 +0200 Message-ID: Subject: Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? To: Frank Streitzig Cc: pgsql-sql@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000002f09dc05dfaeb97b" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000002f09dc05dfaeb97b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses http://sqlfiddle.com/#!17/fe902/31/0 Le lun. 23 mai 2022 =C3=A0 16:22, kimaidou a =C3=A9cri= t : > By the way, I was in fact aware of the duplicate count for the > "nb_schools" and other fields, this is why I used a count(DISTINCT ) to > have a correct count in the first example. I kept the nb_schools and 2 > other fields to illustrate the cost of using DISTINCT in the aggregate > functions. > > Le lun. 23 mai 2022 =C3=A0 16:20, kimaidou a =C3=A9c= rit : > >> Hi Frank, >> >> Thanks for your answer ! >> >> It seems it would perform better to aggregate as soon as possible, like >> you illustrated in your example. >> I will rewrite the query with "WITH" clauses to improve readability. >> >> Thanks also for the Coalesce idea. It is better to see 0 instead of NULL= . >> >> Micha=C3=ABl >> >> Le lun. 23 mai 2022 =C3=A0 16:15, kimaidou a =C3=A9= crit : >> >>> So you >>> >>> Le lun. 23 mai 2022 =C3=A0 15:14, Frank Streitzig = a >>> =C3=A9crit : >>> >>>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou: >>>> > Hi list, >>>> > >>>> > I have a basic need, often encountered in spatial analysis: I have a >>>> list >>>> > of cities, parks, childcare centres, schools. I need to count the >>>> number of >>>> > items for each city (0 if no item exists for this city) >>>> > >>>> > I have tested 3 different SQL queries to achieve this goal: >>>> > >>>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 >>>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 >>>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 >>>> >>>> Hello, >>>> >>>> Cost of queries see link "View Execution Plan" in fiddle >>>> >>>> query 1: 134.62 >>>> query 2: 8522.32 >>>> query 3: 134.62 >>>> >>>> query 1 and 3 have wrong count in result (columns nb_school, >>>> nb_childcare, nb_park) >>>> >>>> My try has cost of 81.83 >>>> >>>> select c.* >>>> , coalesce(s.cnt,0) as cnt_school >>>> , s.schools >>>> , coalesce(cc.cnt,0) as cnt_childcare >>>> , cc.childcares >>>> , coalesce(p.cnt,0) as cnt_park >>>> , p.parks >>>> from city c >>>> left outer join >>>> (select fk_id_city, count(*) as cnt >>>> ,string_agg(name, ', ') AS schools >>>> from school >>>> group by fk_id_city) s >>>> on s.fk_id_city =3D c.id >>>> left outer join >>>> (select fk_id_city, count(*) as cnt >>>> ,string_agg(name, ', ') AS childcares >>>> from childcare >>>> group by fk_id_city) cc >>>> on cc.fk_id_city =3D c.id >>>> left outer join >>>> (select fk_id_city, count(*) as cnt >>>> ,string_agg(name, ', ') AS parks >>>> from park >>>> group by fk_id_city) p >>>> on p.fk_id_city =3D c.id >>>> order by c.id >>>> ; >>>> >>>> IMHO, but without a where clause, the cost will increase with the amou= nt >>>> of data. >>>> >>>> Regards, >>>> Frank >>>> >>>> --0000000000002f09dc05dfaeb97b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Here is the 4th SQL fiddle with your proposal organiz= ed with "WITH" clauses

Le=C2=A0lun.= 23 mai 2022 =C3=A0=C2=A016:22, kimaidou <kimaidou@gmail.com> a =C3=A9crit=C2=A0:
By the way, I was i= n fact aware of the duplicate count for the "nb_schools" and othe= r fields, this is why I used a count(DISTINCT ) to have a correct count in = the first example. I kept the nb_schools and 2 other fields to illustrate t= he cost of using DISTINCT in the aggregate functions.

Le=C2=A0lun. 23 ma= i 2022 =C3=A0=C2=A016:20, kimaidou <kimaidou@gmail.com> a =C3=A9crit=C2=A0:
Hi= Frank,

Thanks for your answer !

It seems it would perform better to aggregate as soon as possible, = like you illustrated in your example.
I will rewrite the query wi= th "WITH" clauses to improve readability.

Thanks also for the Coalesce idea. It is better to see 0 instead of NULL.=

Micha=C3=ABl

Le=C2=A0lun. 23 mai 2022 = =C3=A0=C2=A016:15, kimaidou <kimaidou@gmail.com> a =C3=A9crit=C2=A0:
So you

Le= =C2=A0lun. 23 mai 2022 =C3=A0=C2=A015:14, Frank Streitzig <fstreitzig@gmx.net> a =C3= =A9crit=C2=A0:
A= m Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a l= ist
> of cities, parks, childcare centres, schools. I need to count the numb= er of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902= /3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4<= br> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe= 902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:=C2=A0 134.62
query 2: 8522.32
query 3:=C2=A0 134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select=C2=A0 c.*
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(s.cnt,0) as cnt_school
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , s.schools
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(cc.cnt,0) as cnt_childcare
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , cc.childcares
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(p.cnt,0) as cnt_park
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , p.parks
=C2=A0 from city c
=C2=A0 =C2=A0 left outer join
=C2=A0 =C2=A0 =C2=A0 =C2=A0(select fk_id_city, count(*) as cnt
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#= 39;, ') AS schools
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from school
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) s
=C2=A0 =C2=A0 =C2=A0 on s.fk_id_city =3D c.id
=C2=A0 =C2=A0 left outer join
=C2=A0 =C2=A0 =C2=A0 (select fk_id_city, count(*) as cnt
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#= 39;, ') AS childcares
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 from childcare
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) cc
=C2=A0 =C2=A0 =C2=A0 on cc.fk_id_city =3D c.id
=C2=A0 =C2=A0 left outer join
=C2=A0 =C2=A0 =C2=A0 (select fk_id_city, count(*) as cnt
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#= 39;, ') AS parks
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from park
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) p
=C2=A0 =C2=A0 =C2=A0 on p.fk_id_city =3D c.id
=C2=A0 order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount of data.

Regards,
Frank

--0000000000002f09dc05dfaeb97b--