Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1snf9g-005BoL-Og for pgsql-general@arkaria.postgresql.org; Mon, 09 Sep 2024 14:14:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1snf9g-00Crax-Bp for pgsql-general@arkaria.postgresql.org; Mon, 09 Sep 2024 14:14:12 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sne2u-00BnFx-E1 for pgsql-general@lists.postgresql.org; Mon, 09 Sep 2024 13:03:08 +0000 Received: from mail-pg1-x52f.google.com ([2607:f8b0:4864:20::52f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sne2o-000Ila-TR for pgsql-general@postgresql.org; Mon, 09 Sep 2024 13:03:07 +0000 Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-7b0c9bbddb4so2983170a12.3 for ; Mon, 09 Sep 2024 06:03:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725886982; x=1726491782; darn=postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=zs4x3zQZZYspRJ1ABapuYHIbtX0RGEeI+rqwJxP//+c=; b=V+7CHQivKC20xnH//kO4Gn0zJfx+5QTQu4LwlgkMIqoSf0s6O+NHYKhwMx03d1FICh DA6waQkFIK+lWE9flaYm5yiQ7oimmElUYooLTV8XN9WH2giUjqjFJ/J9wybbvYDKUK1P YeQlEa7BAePuQoj7fto4a1nDoaqaWN6KExIg7hNCGs9vVBiKXS8ZgNAskHMKZjsRfH+s z95QcLX+i+w5GmEehTLaqM157HjDb2ItN8nPXts5WWNAAluMUkbqzm8t4oWhVbS0vY9S UJ/RWyTJD39rRf9dHNNv3yLk97AFA8CgqSKFp3VW8azj61kxQrgnmZZRmJX66EoTcd3E dFjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725886982; x=1726491782; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=zs4x3zQZZYspRJ1ABapuYHIbtX0RGEeI+rqwJxP//+c=; b=S0kUtCM0T2WEwkLOBiPaUt7Ir8LsCeKoYXbXRUP5gvKsvKS1WJn1dpPu1mZZusETML j15OYT/6na8uV6pAJPTWyE060yaHZFSv3Oh1w+DUOVtefVKh44zZInwB9kP9/kdTP+UK +PZFpFU/jFRQKu6jrE4jNGh8NZ2CFMUzxzgijyquA1UGMmYG2dI55PsUKtB4JhOxtfOr tIRqd31KcUsghyujwIy5RcfSKd0KpY8vlBbOjdbxzVkgFk2OLS8jC2//qQgpXQU85d1c E29n/yWvjAYQlEbG9E/J4GxvuKnTXviG6cMqStXo1Lkdx0B1WM7A4dJTFWiSPSdbZtvC kLxA== X-Gm-Message-State: AOJu0YxkwI4DQqup3TC5k7/0j3LNDQnhkzy03pns3tmjbnR4UyuJ2ggb VGZo1sw8NZAh5QBvHi90Fv4yBkfSryQPATwga9lLpVpSoQGtD0h1ub6HXMqGvl9SkmmpRj7idKI quZpI9mRtMrI5rLEUDLJm8m5LpcaHTkQt X-Google-Smtp-Source: AGHT+IFaFvcxtONP/WYIKofzrg7gNoiicflj0V3jvSUG8oUvrJvpYs/OOZdMOrMc2n1ekgS8cUrC56qQMUyO5KnuoE0= X-Received: by 2002:a17:903:124d:b0:206:99a8:526c with SMTP id d9443c01a7336-2070c196865mr121057025ad.41.1725886981587; Mon, 09 Sep 2024 06:03:01 -0700 (PDT) MIME-Version: 1.0 From: Philip Hazelden Date: Mon, 9 Sep 2024 14:02:50 +0100 Message-ID: Subject: Clarify this MERGE warning? "Only columns from the target table that attempt to match data_source rows should appear in join_condition." To: pgsql-general@postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk The MERGE docs[1] give this warning: > Only columns from the target table that attempt to match > `data_source` rows should appear in `join_condition`. > `join_condition` subexpressions that only reference the target > table's columns can affect which action is taken, often in > surprising ways. (The docs for upcoming v17 have the same line.) But when I tested this, it seems to work fine. For example, consider a two-level primary key, where the source table implicitly has a fixed value for one level: create table t1 (k1 int, k2 int, v text); insert into t1 values (1, 1, '1.1'), (1, 2, '1.2'), (2, 1, '2.1'), (2, 2, '2.2'), (2, 3, '2.3'); create table t2 (k2 int, v text); insert into t2 values (1, '1.1 v2'), (3, '1.3 v2'); merge into t1 using t2 on t1.k2 = t2.k2 and t1.k1 = 1 when matched then update set v = t2.v when not matched then insert values (1, t2.k2, t2.v); `t1` now contains k1 | k2 | v ----+----+-------- 1 | 1 | 1.1 v2 1 | 2 | 1.2 1 | 3 | 1.3 v2 2 | 1 | 2.1 2 | 2 | 2.2 2 | 3 | 2.3 (6 rows) Which is what I'd expect. So why should I avoid doing this? It's not clear to me whether the warning is saying "this likely won't work like you expect because it's difficult to reason about" or "because the behavior is unspecified" or "because there's a bug" or what. I found a thread[2] on the psql-hackers list which has this snippet of conversation: >>> * It might make sense to point out in the docs that join_condition >>> should not filter the target table too much. Like SQL server docs say, >>> don't put things in the join that filter the target that actually >>> belong in the WHEN .. AND quals. In a way, this should be obvious, >>> because it's an outer join. But I don't think it is, and ISTM that the >>> sensible thing to do is to warn against it. >>> >> >> Hmm, ok. Not sure how exactly to put that in words without confusing users. >> Do you want to suggest something? > > Perhaps a Warning box should say: > > Only columns from "target_table_name" that attempt to match > "data_source" rows should appear in "join_condition". > "join_condition" subexpressions that only reference > "target_table_name" columns can only affect which action is taken, > often in surprising ways. Notably, the "only affect" became simply "affect" in the docs, which I think is less clear. This makes me think the warning is trying to say something like: "if you can move a subexpression from `join_condition` to `WHEN ... AND`, you should probably do so". Is that right? (I still don't know *why* I should do that. It sounds like maybe it's more efficient that way because this is an outer join? But I don't know why that matters. If I'd had to guess which would be more efficient, I'd have weakly guessed "prefer to do it in a join, indexes will get used better that way".) And I think it's not always possible to move a subexpression without changing `data_source`. In the example I posted above, I don't think it would work, since we'd no longer have a `NOT MATCHED` on `t2`'s `(3, '1.3 v2')` row. [1]: https://www.postgresql.org/docs/current/sql-merge.html [2]: https://www.postgresql.org/message-id/flat/CANP8%2BjKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc%2BXrz8QB0nXA%40mail.gmail.com