public inbox for [email protected]  
help / color / mirror / Atom feed
Clarify this MERGE warning? "Only columns from the target table that attempt to match data_source rows should appear in join_condition."
3+ messages / 3 participants
[nested] [flat]

* Clarify this MERGE warning? "Only columns from the target table that attempt to match data_source rows should appear in join_condition."
@ 2024-09-09 13:02  Philip Hazelden <[email protected]>
  0 siblings, 2 replies; 3+ messages in thread

From: Philip Hazelden @ 2024-09-09 13:02 UTC (permalink / raw)
  To: pgsql-general

The MERGE docs[1] give this warning:

> Only columns from the target table that attempt to match
> `data_source` rows should appear in `join_condition`.
> `join_condition` subexpressions that only reference the target
> table's columns can affect which action is taken, often in
> surprising ways.

(The docs for upcoming v17 have the same line.)

But when I tested this, it seems to work fine. For example, consider a
two-level primary key, where the source table implicitly has a fixed
value for one level:

    create table t1 (k1 int, k2 int, v text);
    insert into t1 values
      (1, 1, '1.1'), (1, 2, '1.2'),
      (2, 1, '2.1'), (2, 2, '2.2'), (2, 3, '2.3');

    create table t2 (k2 int, v text);
    insert into t2 values (1, '1.1 v2'), (3, '1.3 v2');

    merge into t1 using t2
      on t1.k2 = t2.k2 and t1.k1 = 1
      when matched then update set v = t2.v
      when not matched then insert values (1, t2.k2, t2.v);

`t1` now contains

     k1 | k2 |   v
    ----+----+--------
      1 |  1 | 1.1 v2
      1 |  2 | 1.2
      1 |  3 | 1.3 v2
      2 |  1 | 2.1
      2 |  2 | 2.2
      2 |  3 | 2.3
    (6 rows)

Which is what I'd expect.

So why should I avoid doing this? It's not clear to me whether the
warning is saying "this likely won't work like you expect because it's
difficult to reason about" or "because the behavior is unspecified" or
"because there's a bug" or what.

I found a thread[2] on the psql-hackers list which has this snippet of
conversation:

>>> * It might make sense to point out in the docs that join_condition
>>> should not filter the target table too much. Like SQL server docs say,
>>> don't put things in the join that filter the target that actually
>>> belong in the WHEN .. AND quals. In a way, this should be obvious,
>>> because it's an outer join. But I don't think it is, and ISTM that the
>>> sensible thing to do is to warn against it.
>>>
>>
>> Hmm, ok. Not sure how exactly to put that in words without confusing users.
>> Do you want to suggest something?
>
> Perhaps a Warning box should say:
>
> Only columns from "target_table_name" that attempt to match
> "data_source" rows should appear in "join_condition".
> "join_condition" subexpressions that only reference
> "target_table_name" columns can only affect which action is taken,
> often in surprising ways.

Notably, the "only affect" became simply "affect" in the docs, which I
think is less clear.

This makes me think the warning is trying to say something like: "if
you can move a subexpression from `join_condition` to `WHEN ... AND`,
you should probably do so". Is that right?

(I still don't know *why* I should do that. It sounds like maybe it's
more efficient that way because this is an outer join? But I don't
know why that matters. If I'd had to guess which would be more
efficient, I'd have weakly guessed "prefer to do it in a join, indexes
will get used better that way".)

And I think it's not always possible to move a subexpression without
changing `data_source`. In the example I posted above, I don't think
it would work, since we'd no longer have a `NOT MATCHED` on `t2`'s
`(3, '1.3 v2')` row.

[1]: https://www.postgresql.org/docs/current/sql-merge.html
[2]: https://www.postgresql.org/message-id/flat/CANP8%2BjKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc%2BXrz8QB0nXA%4...






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Clarify this MERGE warning? "Only columns from the target table that attempt to match data_source rows should appear in join_condition."
@ 2024-09-09 16:17  Adrian Klaver <[email protected]>
  parent: Philip Hazelden <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Adrian Klaver @ 2024-09-09 16:17 UTC (permalink / raw)
  To: Philip Hazelden <[email protected]>; pgsql-general

On 9/9/24 06:02, Philip Hazelden wrote:
> The MERGE docs[1] give this warning:
> 
>> Only columns from the target table that attempt to match
>> `data_source` rows should appear in `join_condition`.
>> `join_condition` subexpressions that only reference the target
>> table's columns can affect which action is taken, often in
>> surprising ways.
> 
> (The docs for upcoming v17 have the same line.)
> 

> 
> So why should I avoid doing this? It's not clear to me whether the
> warning is saying "this likely won't work like you expect because it's
> difficult to reason about" or "because the behavior is unspecified" or
> "because there's a bug" or what.
> 
> I found a thread[2] on the psql-hackers list which has this snippet of
> conversation:
> 
>>>> * It might make sense to point out in the docs that join_condition
>>>> should not filter the target table too much. Like SQL server docs say,
>>>> don't put things in the join that filter the target that actually
>>>> belong in the WHEN .. AND quals. In a way, this should be obvious,
>>>> because it's an outer join. But I don't think it is, and ISTM that the
>>>> sensible thing to do is to warn against it.

FYI the SQL Server note does not shed any more light on this:

https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver16

Caution

It's important to specify only the columns from the target table to use 
for matching purposes. That is, specify columns from the target table 
that are compared to the corresponding column of the source table. Don't 
attempt to improve query performance by filtering out rows in the target 
table in the ON clause; for example, such as specifying AND NOT 
target_table.column_x = value. Doing so can return unexpected and 
incorrect results.


> This makes me think the warning is trying to say something like: "if
> you can move a subexpression from `join_condition` to `WHEN ... AND`,
> you should probably do so". Is that right?
> 
> (I still don't know *why* I should do that. It sounds like maybe it's
> more efficient that way because this is an outer join? But I don't
> know why that matters. If I'd had to guess which would be more
> efficient, I'd have weakly guessed "prefer to do it in a join, indexes
> will get used better that way".)

I would say because you could end with WHEN clauses that are at odds 
with the JOIN clause. In other words you throw away rows in the JOIN 
that you need later in the WHEN(s). Basically keep rows around until you 
are sure they are not needed.

> 
> And I think it's not always possible to move a subexpression without
> changing `data_source`. In the example I posted above, I don't think
> it would work, since we'd no longer have a `NOT MATCHED` on `t2`'s
> `(3, '1.3 v2')` row.
> 
> [1]: https://www.postgresql.org/docs/current/sql-merge.html
> [2]: https://www.postgresql.org/message-id/flat/CANP8%2BjKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc%2BXrz8QB0nXA%4...
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Clarify this MERGE warning? "Only columns from the target table that attempt to match data_source rows should appear in join_condition."
@ 2024-09-21 12:48  Peter J. Holzer <[email protected]>
  parent: Philip Hazelden <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Peter J. Holzer @ 2024-09-21 12:48 UTC (permalink / raw)
  To: [email protected]

On 2024-09-09 14:02:50 +0100, Philip Hazelden wrote:
> The MERGE docs[1] give this warning:
> 
> > Only columns from the target table that attempt to match
> > `data_source` rows should appear in `join_condition`.
> > `join_condition` subexpressions that only reference the target
> > table's columns can affect which action is taken, often in
> > surprising ways.
> 
> (The docs for upcoming v17 have the same line.)
> 
> But when I tested this, it seems to work fine. For example, consider a
> two-level primary key, where the source table implicitly has a fixed
> value for one level:
[...]

The warning is a bit misleading, IMHO. I think what this is trying to
say is that this is effectively data_source LEFT JOIN target ON
condition, and any row from data_source not matched by condition will
end up in the "NOT MATCHED" set. So you might insert rows from
data_source which you thought you had excluded in the condition.

So it's important to get the match right, and then decide what to do in
the WHEN clauses.


>     merge into t1 using t2
>       on t1.k2 = t2.k2 and t1.k1 = 1
>       when matched then update set v = t2.v
>       when not matched then insert values (1, t2.k2, t2.v);

I think that's ok. The t1.k1 = 1 is used to match rows from the target
to the data source (for each row in the data source, find the rows in
the target which have the same k2 and k1 = 1).

But "columns from the target table that attempt to match data_source`
rows" for me sort of sounds like those columns have to have a counterpart
in the data_source, which k1 hasn't. Also maybe the order is the wrong
way around? "Match rows in the target to rows in the data_source" would
fit my mental model better.

        hp
-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2024-09-21 12:48 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-09 13:02 Clarify this MERGE warning? "Only columns from the target table that attempt to match data_source rows should appear in join_condition." Philip Hazelden <[email protected]>
2024-09-09 16:17 ` Adrian Klaver <[email protected]>
2024-09-21 12:48 ` Peter J. Holzer <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox