public inbox for [email protected]help / color / mirror / Atom feed
Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? 7+ messages / 2 participants [nested] [flat]
* Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? @ 2022-05-23 11:55 kimaidou <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: kimaidou @ 2022-05-23 11:55 UTC (permalink / raw) To: [email protected] Hi list, I have a basic need, often encountered in spatial analysis: I have a list of cities, parks, childcare centres, schools. I need to count the number of items for each city (0 if no item exists for this city) I have tested 3 different SQL queries to achieve this goal: * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 I love the first one, with LEFT JOINS, because it is concise, seems simple, and allows querying easily more than one aggregated field from the child items. But we need to use the DISTINCT clause inside the aggregation functions (count, string_agg), in order to count each child only once. The one with the subqueries seems the more common way (I have seen it a lot) but I find it cumbersome, and I guess it won't scale well for bigger datasets. The one with the LATERAL joins seems overcomplicated, but I probably missed some easier way to use the lateral join. I do not know well how the LATERAL differs from the subqueries... I would like to have your opinion on this scenario. What is the best query for this use case, considering the fact that it should perform well on heavier datasets (1 million cities and thousands of children). NB: I used SQL Fiddle to help everyone see the data and SQL queries. Not sure if everyone can modify it or not. Please try to keep the 3 example unchanged. It seems SQL Fiddle has not been update since at least 2018, so PostgreSQL version is 9.6. Regards Michaël ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? @ 2022-05-23 13:14 Frank Streitzig <[email protected]> parent: kimaidou <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Frank Streitzig @ 2022-05-23 13:14 UTC (permalink / raw) To: kimaidou <[email protected]>; +Cc: [email protected] Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou: > Hi list, > > I have a basic need, often encountered in spatial analysis: I have a list > of cities, parks, childcare centres, schools. I need to count the number of > items for each city (0 if no item exists for this city) > > I have tested 3 different SQL queries to achieve this goal: > > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 Hello, Cost of queries see link "View Execution Plan" in fiddle query 1: 134.62 query 2: 8522.32 query 3: 134.62 query 1 and 3 have wrong count in result (columns nb_school, nb_childcare, nb_park) My try has cost of 81.83 select c.* , coalesce(s.cnt,0) as cnt_school , s.schools , coalesce(cc.cnt,0) as cnt_childcare , cc.childcares , coalesce(p.cnt,0) as cnt_park , p.parks from city c left outer join (select fk_id_city, count(*) as cnt ,string_agg(name, ', ') AS schools from school group by fk_id_city) s on s.fk_id_city = c.id left outer join (select fk_id_city, count(*) as cnt ,string_agg(name, ', ') AS childcares from childcare group by fk_id_city) cc on cc.fk_id_city = c.id left outer join (select fk_id_city, count(*) as cnt ,string_agg(name, ', ') AS parks from park group by fk_id_city) p on p.fk_id_city = c.id order by c.id ; IMHO, but without a where clause, the cost will increase with the amount of data. Regards, Frank ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? @ 2022-05-23 14:15 kimaidou <[email protected]> parent: Frank Streitzig <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: kimaidou @ 2022-05-23 14:15 UTC (permalink / raw) To: Frank Streitzig <[email protected]>; +Cc: [email protected] So you Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a écrit : > Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou: > > Hi list, > > > > I have a basic need, often encountered in spatial analysis: I have a list > > of cities, parks, childcare centres, schools. I need to count the number > of > > items for each city (0 if no item exists for this city) > > > > I have tested 3 different SQL queries to achieve this goal: > > > > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 > > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 > > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 > > Hello, > > Cost of queries see link "View Execution Plan" in fiddle > > query 1: 134.62 > query 2: 8522.32 > query 3: 134.62 > > query 1 and 3 have wrong count in result (columns nb_school, > nb_childcare, nb_park) > > My try has cost of 81.83 > > select c.* > , coalesce(s.cnt,0) as cnt_school > , s.schools > , coalesce(cc.cnt,0) as cnt_childcare > , cc.childcares > , coalesce(p.cnt,0) as cnt_park > , p.parks > from city c > left outer join > (select fk_id_city, count(*) as cnt > ,string_agg(name, ', ') AS schools > from school > group by fk_id_city) s > on s.fk_id_city = c.id > left outer join > (select fk_id_city, count(*) as cnt > ,string_agg(name, ', ') AS childcares > from childcare > group by fk_id_city) cc > on cc.fk_id_city = c.id > left outer join > (select fk_id_city, count(*) as cnt > ,string_agg(name, ', ') AS parks > from park > group by fk_id_city) p > on p.fk_id_city = c.id > order by c.id > ; > > IMHO, but without a where clause, the cost will increase with the amount > of data. > > Regards, > Frank > > ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? @ 2022-05-23 14:20 kimaidou <[email protected]> parent: kimaidou <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: kimaidou @ 2022-05-23 14:20 UTC (permalink / raw) To: Frank Streitzig <[email protected]>; +Cc: [email protected] Hi Frank, Thanks for your answer ! It seems it would perform better to aggregate as soon as possible, like you illustrated in your example. I will rewrite the query with "WITH" clauses to improve readability. Thanks also for the Coalesce idea. It is better to see 0 instead of NULL. Michaël Le lun. 23 mai 2022 à 16:15, kimaidou <[email protected]> a écrit : > So you > > Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a > écrit : > >> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou: >> > Hi list, >> > >> > I have a basic need, often encountered in spatial analysis: I have a >> list >> > of cities, parks, childcare centres, schools. I need to count the >> number of >> > items for each city (0 if no item exists for this city) >> > >> > I have tested 3 different SQL queries to achieve this goal: >> > >> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 >> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 >> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 >> >> Hello, >> >> Cost of queries see link "View Execution Plan" in fiddle >> >> query 1: 134.62 >> query 2: 8522.32 >> query 3: 134.62 >> >> query 1 and 3 have wrong count in result (columns nb_school, >> nb_childcare, nb_park) >> >> My try has cost of 81.83 >> >> select c.* >> , coalesce(s.cnt,0) as cnt_school >> , s.schools >> , coalesce(cc.cnt,0) as cnt_childcare >> , cc.childcares >> , coalesce(p.cnt,0) as cnt_park >> , p.parks >> from city c >> left outer join >> (select fk_id_city, count(*) as cnt >> ,string_agg(name, ', ') AS schools >> from school >> group by fk_id_city) s >> on s.fk_id_city = c.id >> left outer join >> (select fk_id_city, count(*) as cnt >> ,string_agg(name, ', ') AS childcares >> from childcare >> group by fk_id_city) cc >> on cc.fk_id_city = c.id >> left outer join >> (select fk_id_city, count(*) as cnt >> ,string_agg(name, ', ') AS parks >> from park >> group by fk_id_city) p >> on p.fk_id_city = c.id >> order by c.id >> ; >> >> IMHO, but without a where clause, the cost will increase with the amount >> of data. >> >> Regards, >> Frank >> >> ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? @ 2022-05-23 14:22 kimaidou <[email protected]> parent: kimaidou <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: kimaidou @ 2022-05-23 14:22 UTC (permalink / raw) To: Frank Streitzig <[email protected]>; +Cc: [email protected] By the way, I was in fact aware of the duplicate count for the "nb_schools" and other fields, this is why I used a count(DISTINCT ) to have a correct count in the first example. I kept the nb_schools and 2 other fields to illustrate the cost of using DISTINCT in the aggregate functions. Le lun. 23 mai 2022 à 16:20, kimaidou <[email protected]> a écrit : > Hi Frank, > > Thanks for your answer ! > > It seems it would perform better to aggregate as soon as possible, like > you illustrated in your example. > I will rewrite the query with "WITH" clauses to improve readability. > > Thanks also for the Coalesce idea. It is better to see 0 instead of NULL. > > Michaël > > Le lun. 23 mai 2022 à 16:15, kimaidou <[email protected]> a écrit : > >> So you >> >> Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a >> écrit : >> >>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou: >>> > Hi list, >>> > >>> > I have a basic need, often encountered in spatial analysis: I have a >>> list >>> > of cities, parks, childcare centres, schools. I need to count the >>> number of >>> > items for each city (0 if no item exists for this city) >>> > >>> > I have tested 3 different SQL queries to achieve this goal: >>> > >>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 >>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 >>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 >>> >>> Hello, >>> >>> Cost of queries see link "View Execution Plan" in fiddle >>> >>> query 1: 134.62 >>> query 2: 8522.32 >>> query 3: 134.62 >>> >>> query 1 and 3 have wrong count in result (columns nb_school, >>> nb_childcare, nb_park) >>> >>> My try has cost of 81.83 >>> >>> select c.* >>> , coalesce(s.cnt,0) as cnt_school >>> , s.schools >>> , coalesce(cc.cnt,0) as cnt_childcare >>> , cc.childcares >>> , coalesce(p.cnt,0) as cnt_park >>> , p.parks >>> from city c >>> left outer join >>> (select fk_id_city, count(*) as cnt >>> ,string_agg(name, ', ') AS schools >>> from school >>> group by fk_id_city) s >>> on s.fk_id_city = c.id >>> left outer join >>> (select fk_id_city, count(*) as cnt >>> ,string_agg(name, ', ') AS childcares >>> from childcare >>> group by fk_id_city) cc >>> on cc.fk_id_city = c.id >>> left outer join >>> (select fk_id_city, count(*) as cnt >>> ,string_agg(name, ', ') AS parks >>> from park >>> group by fk_id_city) p >>> on p.fk_id_city = c.id >>> order by c.id >>> ; >>> >>> IMHO, but without a where clause, the cost will increase with the amount >>> of data. >>> >>> Regards, >>> Frank >>> >>> ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? @ 2022-05-23 14:33 kimaidou <[email protected]> parent: kimaidou <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: kimaidou @ 2022-05-23 14:33 UTC (permalink / raw) To: Frank Streitzig <[email protected]>; +Cc: [email protected] Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses http://sqlfiddle.com/#!17/fe902/31/0 Le lun. 23 mai 2022 à 16:22, kimaidou <[email protected]> a écrit : > By the way, I was in fact aware of the duplicate count for the > "nb_schools" and other fields, this is why I used a count(DISTINCT ) to > have a correct count in the first example. I kept the nb_schools and 2 > other fields to illustrate the cost of using DISTINCT in the aggregate > functions. > > Le lun. 23 mai 2022 à 16:20, kimaidou <[email protected]> a écrit : > >> Hi Frank, >> >> Thanks for your answer ! >> >> It seems it would perform better to aggregate as soon as possible, like >> you illustrated in your example. >> I will rewrite the query with "WITH" clauses to improve readability. >> >> Thanks also for the Coalesce idea. It is better to see 0 instead of NULL. >> >> Michaël >> >> Le lun. 23 mai 2022 à 16:15, kimaidou <[email protected]> a écrit : >> >>> So you >>> >>> Le lun. 23 mai 2022 à 15:14, Frank Streitzig <[email protected]> a >>> écrit : >>> >>>> Am Mon, May 23, 2022 at 01:55:07PM +0200 schrieb kimaidou: >>>> > Hi list, >>>> > >>>> > I have a basic need, often encountered in spatial analysis: I have a >>>> list >>>> > of cities, parks, childcare centres, schools. I need to count the >>>> number of >>>> > items for each city (0 if no item exists for this city) >>>> > >>>> > I have tested 3 different SQL queries to achieve this goal: >>>> > >>>> > * one with several LEFT JOINS: http://sqlfiddle.com/#!17/fe902/3 >>>> > * one with sub-queries: http://sqlfiddle.com/#!17/fe902/4 >>>> > * one with several LATERAL JOINS: http://sqlfiddle.com/#!17/fe902/6 >>>> >>>> Hello, >>>> >>>> Cost of queries see link "View Execution Plan" in fiddle >>>> >>>> query 1: 134.62 >>>> query 2: 8522.32 >>>> query 3: 134.62 >>>> >>>> query 1 and 3 have wrong count in result (columns nb_school, >>>> nb_childcare, nb_park) >>>> >>>> My try has cost of 81.83 >>>> >>>> select c.* >>>> , coalesce(s.cnt,0) as cnt_school >>>> , s.schools >>>> , coalesce(cc.cnt,0) as cnt_childcare >>>> , cc.childcares >>>> , coalesce(p.cnt,0) as cnt_park >>>> , p.parks >>>> from city c >>>> left outer join >>>> (select fk_id_city, count(*) as cnt >>>> ,string_agg(name, ', ') AS schools >>>> from school >>>> group by fk_id_city) s >>>> on s.fk_id_city = c.id >>>> left outer join >>>> (select fk_id_city, count(*) as cnt >>>> ,string_agg(name, ', ') AS childcares >>>> from childcare >>>> group by fk_id_city) cc >>>> on cc.fk_id_city = c.id >>>> left outer join >>>> (select fk_id_city, count(*) as cnt >>>> ,string_agg(name, ', ') AS parks >>>> from park >>>> group by fk_id_city) p >>>> on p.fk_id_city = c.id >>>> order by c.id >>>> ; >>>> >>>> IMHO, but without a where clause, the cost will increase with the amount >>>> of data. >>>> >>>> Regards, >>>> Frank >>>> >>>> ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? @ 2022-05-23 20:39 Frank Streitzig <[email protected]> parent: kimaidou <[email protected]> 0 siblings, 0 replies; 7+ messages in thread From: Frank Streitzig @ 2022-05-23 20:39 UTC (permalink / raw) To: kimaidou <[email protected]>; +Cc: [email protected] Am Mon, May 23, 2022 at 04:33:16PM +0200 schrieb kimaidou: > Here is the 4th SQL fiddle with your proposal organized with "WITH" clauses > http://sqlfiddle.com/#!17/fe902/31/0 yes, you can do it like this. Regards Frank ^ permalink raw reply [nested|flat] 7+ messages in thread
end of thread, other threads:[~2022-05-23 20:39 UTC | newest] Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2022-05-23 11:55 Count child objects for each line of a table: LEFT JOIN, LATERAL JOIN or subqueries ? kimaidou <[email protected]> 2022-05-23 13:14 ` Frank Streitzig <[email protected]> 2022-05-23 14:15 ` kimaidou <[email protected]> 2022-05-23 14:20 ` kimaidou <[email protected]> 2022-05-23 14:22 ` kimaidou <[email protected]> 2022-05-23 14:33 ` kimaidou <[email protected]> 2022-05-23 20:39 ` Frank Streitzig <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox