public inbox for [email protected]
help / color / mirror / Atom feedFrom: David G. Johnston <[email protected]>
To: Jürgen Purtz <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Thomas Kellerer <[email protected]>
Cc: Pg Docs <[email protected]>
Subject: Re: Change JOIN tutorial to focus more on explicit joins
Date: Wed, 21 Oct 2020 16:40:18 -0700
Message-ID: <CAKFQuwZY5EqtMH_9SzTG=wSxq14m3FuwqPNDeeyK=Z6JO5-vUw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<CA+hUKG+K+75fo+cKA2Vb_kmFsxX=9qgi48_dLqPyShfhFdSKMw@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
On Fri, Sep 4, 2020 at 2:36 AM Jürgen Purtz <[email protected]> wrote:
> On 04.09.20 08:52, Peter Eisentraut wrote:
> >
> > For the remaining patch I have a couple of concerns:
>
This patch should not be changing the formatting choices for these queries,
just the addition of a JOIN clause and modification of the WHERE clause.
Specifically, SELECT is left-aligned while all subsequent clauses indent
under it. Forced alignment by adding extra spaces isn't done here either.
I have not altered those in the attached.
Did some word-smithing on the first paragraph. The part about the
cross-join was hurt by "in some way" and "may be" is not needed.
Pointing out that values from both tables doesn't seem like an improvement
when the second item covers that and it is more specific in noting that the
city name that is joined on appears twice - once from each table.
ON expression is more precise and the reader should be ok with the term.
Removal of the exercise is good. Not the time to discuss cross join
anyway. Given that "ON true" works the cross join form isn't even required.
In the FROM clause form I would not add table prefixes to the column
names. They are not part of the form changing. If discussion about table
prefixing is desired it should be done explicitly and by itself. They are
used later on, I didn't check to see whether that was covered or might be
confusing.
I suggested a wording for why to use join syntax that doesn't involve
legacy and points out its merit compared to sticking a join expression into
the where clause.
The original patch missed having the syntax for the first left outer join
conform to the multi-line query writing standard you introduced. I did not
change.
The "AND" ON clause should just go with (not changed):
ON (w1.temp_lo < w2.temp_lo
AND w1.temp_hi > w2.temp_high);
Attaching my suggestions made on top of the attached original
0002-query.patch
David J.
Attachments:
[application/octet-stream] 0002-00-query.patch (6.1K, 3-0002-00-query.patch)
download | inline diff:
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index e73e805ec4..413763691e 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -441,11 +441,11 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
+ table several times. Such queries — they are called
+ <firstterm>join</firstterm> queries — combine
+ rows of one table in some way with rows of the other table
+ and return a single row per combination. An example may be a
+ list of all the weather records together with the location of the
associated city. To do that, we need to compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
@@ -461,10 +461,17 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
+ The keyword <command>JOIN</command> connects the two tables.
+ Behind the keyword <command>ON</command> follows the
+ definition how to compare their rows. In this case, the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -476,8 +483,14 @@ SELECT *
</para>
<para>
- Observe two things about the result set:
+ Observe some things about the result set:
<itemizedlist>
+ <listitem>
+ <para>
+ The resulting rows contain values from both tables.
+ </para>
+ </listitem>
+
<listitem>
<para>
There is no result row for the city of Hayward. This is
@@ -499,23 +512,14 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
</para>
</listitem>
</itemizedlist>
</para>
- <formalpara>
- <title>Exercise:</title>
-
- <para>
- Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
- </para>
- </formalpara>
-
<para>
Since the columns all had different names, the parser
automatically found which table they belong to. If there
@@ -526,8 +530,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+FROM weather
+JOIN cities ON (cities.name = weather.city);
</programlisting>
It is widely considered good style to qualify all column names
@@ -537,15 +541,18 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+FROM weather, cities
+WHERE weather.city = cities.name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax is mainly used in legacy applications. It dates back
+ to the first days of SQL, avoids the <literal>JOIN</literal>
+ keyword, and uses the <literal>WHERE</literal> clause instead of the
+ <literal>ON</literal> clause.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -558,7 +565,7 @@ SELECT *
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are inner joins.) The command looks
+ joins we have seen so far are <firstterm>inner joins</firstterm>.) The command looks
like this:
<programlisting>
@@ -593,10 +600,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -610,10 +616,10 @@ SELECT *
<programlisting>
SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
- w2.city, w2.temp_lo AS low, w2.temp_hi AS high
- FROM weather w1, weather w2
- WHERE w1.temp_lo < w2.temp_lo
- AND w1.temp_hi > w2.temp_hi;
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+FROM weather w1
+JOIN weather w2 ON (w1.temp_lo < w2.temp_lo AND
+ w1.temp_hi > w2.temp_hi);
</programlisting>
<screen>
@@ -630,8 +636,8 @@ SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+FROM weather w
+JOIN cities c ON (w.city = c.name);
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>
[application/octet-stream] 0002-01-DGJ-query.patch (4.0K, 4-0002-01-DGJ-query.patch)
download | inline diff:
commit 4f5f5c6bfc964d9c136b0d3490ddaa6a53a4c25c
Author: David G. Johnston <[email protected]>
Date: Wed Oct 21 23:23:33 2020 +0000
Some word-smithing to go along with my email comments
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index 413763691e..bc42381427 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -440,13 +440,12 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
- Queries can access multiple tables at once, or access the same
- table several times. Such queries — they are called
- <firstterm>join</firstterm> queries — combine
- rows of one table in some way with rows of the other table
- and return a single row per combination. An example may be a
- list of all the weather records together with the location of the
- associated city. To do that, we need to compare the <structfield>city</structfield>
+ Queries which access multiple tables (including repeats) at once are called
+ <firstterm>join</firstterm> queries. They internally combine
+ each row from one table with each row of a second table. A expression is
+ specified to then limit which pairs of rows are returned.
+ For example, to return all the weather records together with the location of the
+ associated city, the database compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
table, and select the pairs of rows where these values match.
@@ -466,8 +465,8 @@ JOIN cities ON (city = name);
</programlisting>
The keyword <command>JOIN</command> connects the two tables.
- Behind the keyword <command>ON</command> follows the
- definition how to compare their rows. In this case, the
+ After the keyword <command>ON</command> follows the
+ expression comparing their rows. In this case, the
column <varname>city</varname> of table <varname>weather</varname>
must be equal to the column <varname>name</varname>
of table <varname>cities</varname>.
@@ -483,14 +482,8 @@ JOIN cities ON (city = name);
</para>
<para>
- Observe some things about the result set:
+ Observe two things about the result set:
<itemizedlist>
- <listitem>
- <para>
- The resulting rows contain values from both tables.
- </para>
- </listitem>
-
<listitem>
<para>
There is no result row for the city of Hayward. This is
@@ -546,13 +539,15 @@ JOIN cities ON (cities.name = weather.city);
<programlisting>
SELECT *
FROM weather, cities
-WHERE weather.city = cities.name;
+WHERE city = name;
</programlisting>
- This syntax is mainly used in legacy applications. It dates back
- to the first days of SQL, avoids the <literal>JOIN</literal>
- keyword, and uses the <literal>WHERE</literal> clause instead of the
- <literal>ON</literal> clause.
+ This syntax pre-dates the <literal>JOIN</literal> and <literal>ON</literal>
+ keywords. The tables are simply listed in the <literal>FROM</literal>,
+ comma-separated, and the comparison expression added to the
+ <literal>WHERE</literal> clause. As join expressions serve a specific
+ purpose in a multi-table query it is preferable to make them stand-out
+ by using join clauses to introduce additional tables into the query.
</para>
<indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
@@ -565,8 +560,8 @@ WHERE weather.city = cities.name;
found we want some <quote>empty values</quote> to be substituted
for the <structname>cities</structname> table's columns. This kind
of query is called an <firstterm>outer join</firstterm>. (The
- joins we have seen so far are <firstterm>inner joins</firstterm>.) The command looks
- like this:
+ joins we have seen so far are <firstterm>inner joins</firstterm>.)
+ The command looks like this:
<programlisting>
SELECT *
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Change JOIN tutorial to focus more on explicit joins
In-Reply-To: <CAKFQuwZY5EqtMH_9SzTG=wSxq14m3FuwqPNDeeyK=Z6JO5-vUw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox