public inbox for [email protected]  
help / color / mirror / Atom feed
From: Jürgen Purtz <[email protected]>
To: David G. Johnston <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Thomas Kellerer <[email protected]>
Cc: Pg Docs <[email protected]>
Subject: Re: Change JOIN tutorial to focus more on explicit joins
Date: Thu, 22 Oct 2020 15:32:00 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAKFQuwZY5EqtMH_9SzTG=wSxq14m3FuwqPNDeeyK=Z6JO5-vUw@mail.gmail.com>
References: <[email protected]>
	<CA+hUKG+K+75fo+cKA2Vb_kmFsxX=9qgi48_dLqPyShfhFdSKMw@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAKFQuwZY5EqtMH_9SzTG=wSxq14m3FuwqPNDeeyK=Z6JO5-vUw@mail.gmail.com>

On 22.10.20 01:40, David G. Johnston wrote:
> On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     On 04.09.20 08:52, Peter Eisentraut wrote:
>     >
>     > For the remaining patch I have a couple of concerns:
>
>
> This patch should not be changing the formatting choices for these 
> queries, just the addition of a JOIN clause and modification of the 
> WHERE clause.  Specifically, SELECT is left-aligned while all 
> subsequent clauses indent under it.  Forced alignment by adding extra 
> spaces isn't done here either.  I have not altered those in the attached.
>
> Did some word-smithing on the first paragraph.  The part about the 
> cross-join was hurt by "in some way" and "may be" is not needed.
>
> Pointing out that values from both tables doesn't seem like an 
> improvement when the second item covers that and it is more specific 
> in noting that the city name that is joined on appears twice - once 
> from each table.
>
> ON expression is more precise and the reader should be ok with the term.
>
> Removal of the exercise is good.  Not the time to discuss cross join 
> anyway.  Given that "ON true" works the cross join form isn't even 
> required.
>
> In the FROM clause form I would not add table prefixes to the column 
> names.  They are not part of the form changing.  If discussion about 
> table prefixing is desired it should be done explicitly and by 
> itself.  They are used later on, I didn't check to see whether that 
> was covered or might be confusing.
>
> I suggested a wording for why to use join syntax that doesn't involve 
> legacy and points out its merit compared to sticking a join expression 
> into the where clause.
>
> The original patch missed having the syntax for the first left outer 
> join conform to the multi-line query writing standard you introduced.  
> I did not change.
>
> The "AND" ON clause should just go with (not changed):
>
> ON (w1.temp_lo < w2.temp_lo
>     AND w1.temp_hi > w2.temp_high);
>
> Attaching my suggestions made on top of the attached original 
> 0002-query.patch
>
> David J.
>
(Hopefully) I have integrated all of David's suggestions as well as the 
following rules:

- Syntax formatting with the previously used 4 spaces plus newline for JOIN

- Table aliases only when necessary or explicitly discussed

The discussion about the explicit vs. implicit syntax is added to the 
"As join expressions serve a specific purpose ... " sentence and creates 
a paragraph of its own.

The patch is build on top of master.

--

J. Purtz




Attachments:

  [text/x-patch] 0003-query.patch (6.6K, 3-0003-query.patch)
  download | inline diff:
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index e73e805ec4..532427ab4e 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -440,13 +440,12 @@ SELECT DISTINCT city
 
    <para>
     Thus far, our queries have only accessed one table at a time.
-    Queries can access multiple tables at once, or access the same
-    table in such a way that multiple rows of the table are being
-    processed at the same time.  A query that accesses multiple rows
-    of the same or different tables at one time is called a
-    <firstterm>join</firstterm> query.  As an example, say you wish to
-    list all the weather records together with the location of the
-    associated city.  To do that, we need to compare the <structfield>city</structfield>
+    Queries which access multiple tables (including repeats) at once are called
+    <firstterm>join</firstterm> queries.  They internally combine
+    each row from one table with each row of a second table.  An expression is
+    specified to limit which pairs of rows are returned.
+    For example, to return all the weather records together with the location of the
+    associated city, the database compare the <structfield>city</structfield>
     column of each row of the <structname>weather</structname> table with the
     <structfield>name</structfield> column of all rows in the <structname>cities</structname>
     table, and select the pairs of rows where these values match.
@@ -461,10 +460,17 @@ SELECT DISTINCT city
 
 <programlisting>
 SELECT *
-    FROM weather, cities
-    WHERE city = name;
+    FROM weather
+    JOIN cities ON (city = name);
 </programlisting>
 
+    After the keyword <command>ON</command> follows the
+    expression comparing their rows. In this case, the
+    definition how to compare their rows. In this case, the
+    column <varname>city</varname> of table <varname>weather</varname>
+    must be equal to the column <varname>name</varname>
+    of table <varname>cities</varname>.
+
 <screen>
      city      | temp_lo | temp_hi | prcp |    date    |     name      | location
 ---------------+---------+---------+------+------------+---------------+-----------
@@ -499,23 +505,14 @@ SELECT *
        <literal>*</literal>:
 <programlisting>
 SELECT city, temp_lo, temp_hi, prcp, date, location
-    FROM weather, cities
-    WHERE city = name;
+    FROM weather
+    JOIN cities ON (city = name);
 </programlisting>
       </para>
      </listitem>
     </itemizedlist>
    </para>
 
-   <formalpara>
-    <title>Exercise:</title>
-
-    <para>
-     Attempt to determine the semantics of this query when the
-     <literal>WHERE</literal> clause is omitted.
-    </para>
-   </formalpara>
-
    <para>
     Since the columns all had different names, the parser
     automatically found which table they belong to.  If there
@@ -526,8 +523,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
 <programlisting>
 SELECT weather.city, weather.temp_lo, weather.temp_hi,
        weather.prcp, weather.date, cities.location
-    FROM weather, cities
-    WHERE cities.name = weather.city;
+    FROM weather
+    JOIN cities ON (cities.name = weather.city);
 </programlisting>
 
     It is widely considered good style to qualify all column names
@@ -537,15 +534,29 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
 
    <para>
     Join queries of the kind seen thus far can also be written in this
-    alternative form:
+    form:
 
 <programlisting>
 SELECT *
-    FROM weather INNER JOIN cities ON (weather.city = cities.name);
+    FROM weather, cities
+    WHERE city = name;
 </programlisting>
 
-    This syntax is not as commonly used as the one above, but we show
-    it here to help you understand the following topics.
+    This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal>
+    keywords.  The tables are simply listed in the <literal>FROM</literal>,
+    comma-separated, and the comparison expression added to the
+    <literal>WHERE</literal> clause.
+   </para>
+
+   <para>
+    As join expressions serve a specific
+    purpose in a multi-table query it is preferable to make them stand-out
+    by using join clauses to introduce additional tables into the query.
+    The results from the old, implicit syntax and the new, explicit
+    JOIN/ON syntax are identical.  But for a reader of the statement
+    its meaning is now easier to understand: the join condition is
+    introduced by its own key word whereas previously the condition was
+    merged into the WHERE clause together with other conditions.
    </para>
 
    <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -558,12 +569,13 @@ SELECT *
     found we want some <quote>empty values</quote> to be substituted
     for the <structname>cities</structname> table's columns.  This kind
     of query is called an <firstterm>outer join</firstterm>.  (The
-    joins we have seen so far are inner joins.)  The command looks
-    like this:
+    joins we have seen so far are <firstterm>inner joins</firstterm>.)
+    The command looks like this:
 
 <programlisting>
 SELECT *
-    FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
+    FROM weather
+    LEFT OUTER JOIN cities ON (city = name);
 </programlisting>
 
 <screen>
@@ -593,10 +605,9 @@ SELECT *
     </para>
    </formalpara>
 
+   <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+   <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
    <para>
-    <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
-    <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
     We can also join a table against itself.  This is called a
     <firstterm>self join</firstterm>.  As an example, suppose we wish
     to find all the weather records that are in the temperature range
@@ -610,10 +621,9 @@ SELECT *
 
 <programlisting>
 SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
-    w2.city, w2.temp_lo AS low, w2.temp_hi AS high
-    FROM weather w1, weather w2
-    WHERE w1.temp_lo &lt; w2.temp_lo
-    AND w1.temp_hi &gt; w2.temp_hi;
+       w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+    FROM weather w1
+    JOIN weather w2 ON (w1.temp_lo &lt; w2.temp_lo AND w1.temp_hi &gt; w2.temp_hi);
 </programlisting>
 
 <screen>
@@ -630,8 +640,8 @@ SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
     queries to save some typing, e.g.:
 <programlisting>
 SELECT *
-    FROM weather w, cities c
-    WHERE w.city = c.name;
+    FROM weather w
+    JOIN cities  c ON (w.city = c.name);
 </programlisting>
     You will encounter this style of abbreviating quite frequently.
    </para>


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Change JOIN tutorial to focus more on explicit joins
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox