Documentation of EXCEPT ALL may have a bug

public inbox for [email protected]  
help / color / mirror / Atom feed

Documentation of EXCEPT ALL may have a bug
5+ messages / 4 participants
[nested] [flat]

* Documentation of EXCEPT ALL may have a bug
@ 2018-02-09 17:48 PG Doc comments form <[email protected]>
  2018-02-10 11:59 ` Re: Documentation of EXCEPT ALL may have a bug Alvaro Herrera <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: PG Doc comments form @ 2018-02-09 17:48 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/static/queries-union.html
Description:

I believe that the documented behavior of EXCEPT is not in agreement
with Postgres behavior (I'm using Postgres 9.5). The documents say:

    EXCEPT returns all rows that are in the result of query1 but not
    in the result of query2. (This is sometimes called the difference
    between two queries.) Again, duplicates are eliminated unless
    EXCEPT ALL is used.

Here is a test script:

    drop table if exists t;
    drop table if exists u;

    create table t(x int);
    create table u(x int);

    insert into t values (1), (2), (2), (3), (3);
    insert into u values (1), (2);

    select * from t except select * from u;
    select * from t except all select * from u;

And here is the output:

    DROP TABLE
    DROP TABLE
    CREATE TABLE
    CREATE TABLE
    INSERT 0 5
    INSERT 0 2

     x 
    ---
     3
    (1 row)

     x 
    ---
     3
     3
     2
    (3 rows)

The output from EXCEPT matches the documented behavior.

The output from EXCEPT ALL makes sense to me, but I think it is at
odds with the documentation: "EXCEPT returns all rows that are in the
result of query1 but not in the result of query2." This general
statement is then modified by the discussion of EXCEPT ALL. That first
sentence does not deal with duplicates in the input tables, and just
discusses set membership. Each occurrence of (3) in query1 is
therefore kept.  For EXCEPT, the duplicates are eliminated (yielding
the output [3]), and EXCEPT ALL should therefore yield [3, 3].  In the
EXCEPT ALL case, both occurrences of (2) in query1 should be
eliminated by the one occurrence in query2. I think this is a fair
interpretation based on the wording.

To match the observed behavior, I think that the description of EXCEPT
ALL needs to be modified to something like this:

EXCEPT ALL returns those rows of query1 in excess of matching rows in
query2, as well as rows of query1 that have no match in query2.

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Documentation of EXCEPT ALL may have a bug
  2018-02-09 17:48 Documentation of EXCEPT ALL may have a bug PG Doc comments form <[email protected]>
@ 2018-02-10 11:59 ` Alvaro Herrera <[email protected]>
  2018-02-10 13:02   ` Re: Documentation of EXCEPT ALL may have a bug Pantelis Theodosiou <[email protected]>
  2018-02-10 19:06   ` Re: Documentation of EXCEPT ALL may have a bug Tom Lane <[email protected]>
  0 siblings, 2 replies; 5+ messages in thread

From: Alvaro Herrera @ 2018-02-10 11:59 UTC (permalink / raw)
  To: [email protected]; [email protected]

PG Doc comments form wrote:

>     create table t(x int);
>     create table u(x int);
>     
>     insert into t values (1), (2), (2), (3), (3);
>     insert into u values (1), (2);
>     
>     select * from t except all select * from u;

>      x 
>     ---
>      3
>      3
>      2
>     (3 rows)

I find this pretty odd behavior.  Is this not an outright bug?  I
haven't read the SQL standard on this matter[*], but if they define
EXCEPT ALL to work like this, then it seems pretty useless.  (Maybe they
just didn't intend EXCEPT ALL to be useful?)  If this is indeed the
case, maybe we should amend the docs not only to explain this behavior
but also to warn against the construct.

[*] I didn't try terribly hard, but couldn't actually find where the
behavior is defined.  What I have on hand is a draft of SQL:2011 where
this appears to be defined in 7.13 <query expression> but I was unable
to find the rules for set operations.  It refers to 9.12 Grouping
operations but that defines conformance rules only.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Documentation of EXCEPT ALL may have a bug
  2018-02-09 17:48 Documentation of EXCEPT ALL may have a bug PG Doc comments form <[email protected]>
  2018-02-10 11:59 ` Re: Documentation of EXCEPT ALL may have a bug Alvaro Herrera <[email protected]>
@ 2018-02-10 13:02   ` Pantelis Theodosiou <[email protected]>
  2018-02-10 13:11     ` Re: Documentation of EXCEPT ALL may have a bug Pantelis Theodosiou <[email protected]>
  1 sibling, 1 reply; 5+ messages in thread

From: Pantelis Theodosiou @ 2018-02-10 13:02 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: [email protected]; [email protected]

On Sat, Feb 10, 2018 at 11:59 AM, Alvaro Herrera <[email protected]>
wrote:

> PG Doc comments form wrote:
>
> >     create table t(x int);
> >     create table u(x int);
> >
> >     insert into t values (1), (2), (2), (3), (3);
> >     insert into u values (1), (2);
> >
> >     select * from t except all select * from u;
>
> >      x
> >     ---
> >      3
> >      3
> >      2
> >     (3 rows)
>
> I find this pretty odd behavior.  Is this not an outright bug?  I
> haven't read the SQL standard on this matter[*], but if they define
> EXCEPT ALL to work like this, then it seems pretty useless.  (Maybe they
> just didn't intend EXCEPT ALL to be useful?)  If this is indeed the
> case, maybe we should amend the docs not only to explain this behavior
> but also to warn against the construct.
>

I think that's the way it is defined (but I agree, I don't remember ever
seeing a use for it)


>
> [*] I didn't try terribly hard, but couldn't actually find where the
> behavior is defined.  What I have on hand is a draft of SQL:2011 where
> this appears to be defined in 7.13 <query expression> but I was unable
> to find the rules for set operations.  It refers to 9.12 Grouping
> operations but that defines conformance rules only.
>
> --
>

I may have a different version but I see:

iii)
T contains the following rows:
1) Let R be a row that is a duplicate of some row in ET1 or of some row in
ET2 or both. Let
m be the number of duplicates of R in ET1 and let n be the number of
duplicates of R in
ET2, where m ≥ 0 and n ≥ 0.
2) If DISTINCT is specified or implicit, then ....
...

3) If ALL is specified, then
Case:
A) If UNION is specified, then the number of duplicates of R that T
contains is (m + n).
B) If EXCEPT is specified, then the number of duplicates of R that T
contains is the
maximum of (m – n) and 0 (zero).
C) If INTERSECT is specified, then the number of duplicates of R that T
contains is the
minimum of m and n.


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Documentation of EXCEPT ALL may have a bug
  2018-02-09 17:48 Documentation of EXCEPT ALL may have a bug PG Doc comments form <[email protected]>
  2018-02-10 11:59 ` Re: Documentation of EXCEPT ALL may have a bug Alvaro Herrera <[email protected]>
  2018-02-10 13:02   ` Re: Documentation of EXCEPT ALL may have a bug Pantelis Theodosiou <[email protected]>
@ 2018-02-10 13:11     ` Pantelis Theodosiou <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Pantelis Theodosiou @ 2018-02-10 13:11 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: [email protected]; [email protected]

On Sat, Feb 10, 2018 at 1:02 PM, Pantelis Theodosiou <[email protected]>
wrote:

>
>
> On Sat, Feb 10, 2018 at 11:59 AM, Alvaro Herrera <[email protected]>
> wrote:
>
>> PG Doc comments form wrote:
>>
>> >     create table t(x int);
>> >     create table u(x int);
>> >
>> >     insert into t values (1), (2), (2), (3), (3);
>> >     insert into u values (1), (2);
>> >
>> >     select * from t except all select * from u;
>>
>> >      x
>> >     ---
>> >      3
>> >      3
>> >      2
>> >     (3 rows)
>>
>> I find this pretty odd behavior.  Is this not an outright bug?  I
>> haven't read the SQL standard on this matter[*], but if they define
>> EXCEPT ALL to work like this, then it seems pretty useless.  (Maybe they
>> just didn't intend EXCEPT ALL to be useful?)  If this is indeed the
>> case, maybe we should amend the docs not only to explain this behavior
>> but also to warn against the construct.
>>
>
> I think that's the way it is defined (but I agree, I don't remember ever
> seeing a use for it)
>
>
>>
>> [*] I didn't try terribly hard, but couldn't actually find where the
>> behavior is defined.  What I have on hand is a draft of SQL:2011 where
>> this appears to be defined in 7.13 <query expression> but I was unable
>> to find the rules for set operations.  It refers to 9.12 Grouping
>> operations but that defines conformance rules only.
>>
>> --
>>
>
> I may have a different version but I see:
>

That is in:
7.13 <query expression>
General Rules 3, Case b:

>
> iii)
> T contains the following rows:
> 1) Let R be a row that is a duplicate of some row in ET1 or of some row in
> ET2 or both. Let
> m be the number of duplicates of R in ET1 and let n be the number of
> duplicates of R in
> ET2, where m ≥ 0 and n ≥ 0.
> 2) If DISTINCT is specified or implicit, then ....
> ...
>
> 3) If ALL is specified, then
> Case:
> A) If UNION is specified, then the number of duplicates of R that T
> contains is (m + n).
> B) If EXCEPT is specified, then the number of duplicates of R that T
> contains is the
> maximum of (m – n) and 0 (zero).
> C) If INTERSECT is specified, then the number of duplicates of R that T
> contains is the
> minimum of m and n.
>
>


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Documentation of EXCEPT ALL may have a bug
  2018-02-09 17:48 Documentation of EXCEPT ALL may have a bug PG Doc comments form <[email protected]>
  2018-02-10 11:59 ` Re: Documentation of EXCEPT ALL may have a bug Alvaro Herrera <[email protected]>
@ 2018-02-10 19:06   ` Tom Lane <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Tom Lane @ 2018-02-10 19:06 UTC (permalink / raw)
  To: Alvaro Herrera <[email protected]>; +Cc: [email protected]; [email protected]

Alvaro Herrera <[email protected]> writes:
> PG Doc comments form wrote:
>> create table t(x int);
>> create table u(x int);
>> 
>> insert into t values (1), (2), (2), (3), (3);
>> insert into u values (1), (2);
>> 
>> select * from t except all select * from u;

>> x 
>> ---
>> 3
>> 3
>> 2
>> (3 rows)

> I find this pretty odd behavior.  Is this not an outright bug?

It's exactly what the standard says to do: if there are M occurrences
of a row value in the LHS, and N occurrences in the RHS, emit
max(M-N, 0) copies of the row.  To my mind that's a reasonable definition
of EXCEPT if you suppose that nominally-identical rows are matched and
discarded one by one, rather than with de-duplication occurring
beforehand.

> [*] I didn't try terribly hard, but couldn't actually find where the
> behavior is defined.

In SQL:2011, it's 7.13 <query expression> general rule 3) b) iii) 3) B),
on page 420 in the draft version I have.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2018-02-10 19:06 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2018-02-09 17:48 Documentation of EXCEPT ALL may have a bug PG Doc comments form <[email protected]>
2018-02-10 11:59 ` Alvaro Herrera <[email protected]>
2018-02-10 13:02   ` Pantelis Theodosiou <[email protected]>
2018-02-10 13:11     ` Pantelis Theodosiou <[email protected]>
2018-02-10 19:06   ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox