Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nt8xc-00077u-Ms for pgsql-sql@arkaria.postgresql.org; Mon, 23 May 2022 14:23:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nt8xb-0001IY-D6 for pgsql-sql@arkaria.postgresql.org; Mon, 23 May 2022 14:23:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nt8xb-0001IM-2y for pgsql-sql@lists.postgresql.org; Mon, 23 May 2022 14:23:03 +0000 Received: from mail-ej1-x62b.google.com ([2a00:1450:4864:20::62b]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nt8xY-00060i-GX for pgsql-sql@lists.postgresql.org; Mon, 23 May 2022 14:23:02 +0000 Received: by mail-ej1-x62b.google.com with SMTP id f9so29249076ejc.0 for ; Mon, 23 May 2022 07:23:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UUR2A5XbjxowWsbu37gc3vyG9CS22VM560cfR9FqozE=; b=or2axOA1WRtHKRITfFGWucJsBTJB8MYi9j70IjKVZ1lPeWLQt0FnNcgV4GK0xZerkJ niz+nAKzNuzzluSBZOncCtd1dSImXkczu5FrdtfOyI44IVz0d+k+taB7Yaxr3iSbLLtn t24zsJr/nSxxjHnQBBmAmnnlZ3p/fEEm8fzH36zUL+KAbHcOoUwcdVE98y6q4+j4tE0O 2qgvQQUex21qLy9SJITYSdWHMD89FVUFOyE0vs9/WWbLv7HS394kT90gYP4y4/wDmdS5 yYZoWKU8zbjxA61F89TtawPQw+XbEFSsPHAcp/6V/AE/JJz1fvNQPuNCO6v8UqP7GSxA jRKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UUR2A5XbjxowWsbu37gc3vyG9CS22VM560cfR9FqozE=; b=Sv7rQFnA8j1/hMOjSXJlbIMNPQq6borLo/zRiyX7OAVjsK323+EnLSRqKro8cY/SBG zkEPBvNxoOy71S3QilcfXOx3C4SY1L4C2DCMz/Wotwnux9Yqhybchg11w9lhAwsWzynd 2onmN2ZientjeXhLwA6urEVysjSJeJ7YjQz4HkqhmMUOxv0fMs1aJA8lx6oeYmxSfU19 k3cSJGhKdB5aujpaDvyTe/GcjCuIqXeSws4X6aQlz8vPIUqXuSsdLCGFGk6si4jBm8cw g52UCYfJ924t+xyLr8MPBIGXH8jRMvITXTY36zXGSmSfQvhy+sKylmR92Uy61pLHjNQ/ KY0g== X-Gm-Message-State: AOAM530pb3n4NJeyatDlDpn9yBpTQd0NEw9b5K/IZdkCaqdNb08K04yG t04/6tTVCRZ4N64n1dNRgr4e4Z66ltg/mE2F2d1rjLLB X-Google-Smtp-Source: ABdhPJzGffQmunZe1DXr31RzCryfeIP0fRWIqh9tn6U8MmW0ZjxEcJ1E9m0FcBQv5uECteZceHBVhujq6yr9Lwy2Rqs= X-Received: by 2002:a17:906:4fc3:b0:6e0:66a5:d59c with SMTP id i3-20020a1709064fc300b006e066a5d59cmr19614613ejw.131.1653315779966; Mon, 23 May 2022 07:22:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: kimaidou Date: Mon, 23 May 2022 16:22:48 +0200 Message-ID: Subject: Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? To: Frank Streitzig Cc: pgsql-sql@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000c439bb05dfae935a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c439bb05dfae935a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable By the way, I was in fact aware of the duplicate count for the "nb_schools" and other fields, this is why I used a count(DISTINCT ) to have a correct count in the first example. I kept the nb_schools and 2 other fields to illustrate the cost of using DISTINCT in the aggregate functions. Le lun. 23 mai 2022 =C3=A0 16:20, kimaidou a =C3=A9cri= t : > Hi Frank, > > Thanks for your answer ! > > It seems it would perform better to aggregate as soon as possible, like > you illustrated in your example. > I will rewrite the query with "WITH" clauses to improve readability. > > Thanks also for the Coalesce idea. It is better to see 0 instead of NULL. > > Micha=C3=ABl > > Le lun. 23 mai 2022 =C3=A0 16:15, kimaidou a =C3=A9c= rit : > >> So you >> >> Le lun. 23 mai 2022 =C3=A0 15:14, Frank Streitzig a >> =C3=A9crit : >> >>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou: >>> > Hi list, >>> > >>> > I have a basic need, often encountered in spatial analysis: I have a >>> list >>> > of cities, parks, childcare centres, schools. I need to count the >>> number of >>> > items for each city (0 if no item exists for this city) >>> > >>> > I have tested 3 different SQL queries to achieve this goal: >>> > >>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 >>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 >>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 >>> >>> Hello, >>> >>> Cost of queries see link "View Execution Plan" in fiddle >>> >>> query 1: 134.62 >>> query 2: 8522.32 >>> query 3: 134.62 >>> >>> query 1 and 3 have wrong count in result (columns nb_school, >>> nb_childcare, nb_park) >>> >>> My try has cost of 81.83 >>> >>> select c.* >>> , coalesce(s.cnt,0) as cnt_school >>> , s.schools >>> , coalesce(cc.cnt,0) as cnt_childcare >>> , cc.childcares >>> , coalesce(p.cnt,0) as cnt_park >>> , p.parks >>> from city c >>> left outer join >>> (select fk_id_city, count(*) as cnt >>> ,string_agg(name, ', ') AS schools >>> from school >>> group by fk_id_city) s >>> on s.fk_id_city =3D c.id >>> left outer join >>> (select fk_id_city, count(*) as cnt >>> ,string_agg(name, ', ') AS childcares >>> from childcare >>> group by fk_id_city) cc >>> on cc.fk_id_city =3D c.id >>> left outer join >>> (select fk_id_city, count(*) as cnt >>> ,string_agg(name, ', ') AS parks >>> from park >>> group by fk_id_city) p >>> on p.fk_id_city =3D c.id >>> order by c.id >>> ; >>> >>> IMHO, but without a where clause, the cost will increase with the amoun= t >>> of data. >>> >>> Regards, >>> Frank >>> >>> --000000000000c439bb05dfae935a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
By the way, I was in fact aware of the duplicate count for= the "nb_schools" and other fields, this is why I used a count(DI= STINCT ) to have a correct count in the first example. I kept the nb_school= s and 2 other fields to illustrate the cost of using DISTINCT in the aggreg= ate functions.

Le=C2=A0lun. 23 mai 2022 =C3=A0=C2=A016:20, kimaidou <= kimaidou@gmail.com> a =C3=A9cr= it=C2=A0:
Hi Frank,

Thanks for your answer !=

It seems it would perform better to aggregate as = soon as possible, like you illustrated in your example.
I will re= write the query with "WITH" clauses to improve readability.
=

Thanks also for the Coalesce idea. It is better to see = 0 instead of NULL.

Micha=C3=ABl
Le=C2=A0l= un. 23 mai 2022 =C3=A0=C2=A016:15, kimaidou <kimaidou@gmail.com> a =C3=A9crit=C2=A0:=
So you

Le=C2=A0lun. 23 mai 2022 =C3=A0=C2=A015:14, Frank Streitzig <= ;fstreitzig@gmx.net= > a =C3=A9crit=C2=A0:
Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou:
> Hi list,
>
> I have a basic need, often encountered in spatial analysis: I have a l= ist
> of cities, parks, childcare centres, schools. I need to count the numb= er of
> items for each city (0 if no item exists for this city)
>
> I have tested 3 different SQL queries to achieve this goal:
>
> * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902= /3
> * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4<= br> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe= 902/6

Hello,

Cost of queries see link "View Execution Plan" in fiddle

query 1:=C2=A0 134.62
query 2: 8522.32
query 3:=C2=A0 134.62

query 1 and 3 have wrong count in result (columns nb_school,
nb_childcare, nb_park)

My try has cost of 81.83

select=C2=A0 c.*
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(s.cnt,0) as cnt_school
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , s.schools
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(cc.cnt,0) as cnt_childcare
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , cc.childcares
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , coalesce(p.cnt,0) as cnt_park
=C2=A0 =C2=A0 =C2=A0 =C2=A0 , p.parks
=C2=A0 from city c
=C2=A0 =C2=A0 left outer join
=C2=A0 =C2=A0 =C2=A0 =C2=A0(select fk_id_city, count(*) as cnt
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#= 39;, ') AS schools
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from school
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) s
=C2=A0 =C2=A0 =C2=A0 on s.fk_id_city =3D c.id
=C2=A0 =C2=A0 left outer join
=C2=A0 =C2=A0 =C2=A0 (select fk_id_city, count(*) as cnt
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#= 39;, ') AS childcares
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 from childcare
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) cc
=C2=A0 =C2=A0 =C2=A0 on cc.fk_id_city =3D c.id
=C2=A0 =C2=A0 left outer join
=C2=A0 =C2=A0 =C2=A0 (select fk_id_city, count(*) as cnt
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0,string_agg(name, &#= 39;, ') AS parks
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from park
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0group by fk_id_city) p
=C2=A0 =C2=A0 =C2=A0 on p.fk_id_city =3D c.id
=C2=A0 order by c.id
;

IMHO, but without a where clause, the cost will increase with the amount of data.

Regards,
Frank

--000000000000c439bb05dfae935a--