Re: Insert works but fails for merge

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Insert works but fails for merge
8+ messages / 4 participants
[nested] [flat]

* Re: Insert works but fails for merge
@ 2024-08-10 14:52  Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Adrian Klaver @ 2024-08-10 14:52 UTC (permalink / raw)
  To: yudhi s <[email protected]>; David G. Johnston <[email protected]>; +Cc: pgsql-general <[email protected]>

On 8/10/24 05:07, yudhi s wrote:
> 
> 

> Thank You Adrian and David.
> 
> Even converting the merge avoiding the WITH clause/CTE as below , is 
> still making it fail with the same error. So it seems , only 
> direct "insert into values" query can be auto converted/casted but not 
> the other queries.
> 
> In our case , we were using this merge query in application code(in 
> Java) as a framework to dynamically take these values as bind values and 
> do the merge of input data/message. But it seems we have to now cast 
> each and every field which we get from the incoming message to make 
> this merge work in a correct way. I am wondering if the only way now is 
> to get the data types from information_schema.columns and then use the 
> cast function to write the values of the merge query dynamically 
> casted/converted for each of the fields in the application code. Please 
> correct me if my understanding is wrong.

Why not use INSERT ... ON CONFLICT instead of MERGE?

> 
> MERGE INTO tab1 AS target
> USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123, 
> '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS 
> source(id, mid,txn_timestamp, cre_ts)
> ON target.id <http://target.id; = source.id <http://source.id;
> WHEN MATCHED THEN
> UPDATE SET mid  = source.mid
> WHEN NOT MATCHED THEN
> INSERT (id, mid, txn_timestamp, cre_ts)
>      VALUES (source.id <http://source.id>,source.mid, 
>   source.txn_timestamp, source.cre_ts);

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Insert works but fails for merge
@ 2024-08-10 20:23  yudhi s <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 2 replies; 8+ messages in thread

From: yudhi s @ 2024-08-10 20:23 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-general <[email protected]>

On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver <[email protected]>
wrote:

>
>
> Why not use INSERT ... ON CONFLICT instead of MERGE?
>
> >
> > MERGE INTO tab1 AS target
> > USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
> > '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
> > source(id, mid,txn_timestamp, cre_ts)
> > ON target.id <http://target.id; = source.id <http://source.id;
> > WHEN MATCHED THEN
> > UPDATE SET mid  = source.mid
> > WHEN NOT MATCHED THEN
> > INSERT (id, mid, txn_timestamp, cre_ts)
> >      VALUES (source.id <http://source.id>,source.mid,
> >   source.txn_timestamp, source.cre_ts);
>
>
>
Actually , as per the business logic , we need to merge on a column which
is not unique or having any unique index on it. It's the leading column of
a composite unique key though. And in such scenarios the "INSERT.... ON
CONFLICT" will give an error. So we are opting for a merge statement here,
which will work fine with the column being having duplicate values in it.


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Insert works but fails for merge
@ 2024-08-10 21:27  Adrian Klaver <[email protected]>
  parent: yudhi s <[email protected]>
  1 sibling, 1 reply; 8+ messages in thread

From: Adrian Klaver @ 2024-08-10 21:27 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-general <[email protected]>

On 8/10/24 13:23, yudhi s wrote:
> 
> 
> On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
> 
>     Why not use INSERT ... ON CONFLICT instead of MERGE?
> 
>      >
>      > MERGE INTO tab1 AS target
>      > USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
>      >
>     '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
>      > source(id, mid,txn_timestamp, cre_ts)
>      > ON target.id <http://target.id; <http://target.id
>     <http://target.id>; = source.id <http://source.id; <http://source.id
>     <http://source.id>;
>      > WHEN MATCHED THEN
>      > UPDATE SET mid  = source.mid
>      > WHEN NOT MATCHED THEN
>      > INSERT (id, mid, txn_timestamp, cre_ts)
>      >      VALUES (source.id <http://source.id; <http://source.id
>     <http://source.id>>,source.mid,
>      >   source.txn_timestamp, source.cre_ts);
> 
> 
> 
> Actually , as per the business logic , we need to merge on a column 
> which is not unique or having any unique index on it. It's the leading 
> column of a composite unique key though. And in such scenarios the 
> "INSERT.... ON CONFLICT" will give an error. So we are opting for a 
> merge statement here, which will work fine with the column being 
> having duplicate values in it.


Alright it's official I am confused.

You started with:

WITH source_data (id, mid, txn_timestamp, cre_ts) AS (
     VALUES ('5efd4c91-ef93-4477-840c-a723ae212d66', 123,
'2024-08-09T11:33:49.402585600Z', '2024-08-09T11:33:49.402585600Z')
)
INSERT INTO tab1 (id, mid, txn_timestamp, cre_ts)
SELECT id, mid, txn_timestamp, cre_ts
FROM source_data
ON CONFLICT (id) DO UPDATE
SET    mid = EXCLUDED.mid,
     txn_timestamp = EXCLUDED.txn_timestamp,
     cre_ts = EXCLUDED.cre_ts;

That implied that id was unique in of itself. As side note you called it 
a merge, which it is not as in MERGE. At this point I got off track 
thinking of MERGE.

Then you went to the below which is a merge:

MERGE INTO tab1 AS target
USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
'2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
source(id, mid,txn_timestamp, cre_ts)
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET mid  = source.mid
WHEN NOT MATCHED THEN
INSERT (id, mid, txn_timestamp, cre_ts)
     VALUES (source.id,source.mid,  source.txn_timestamp, source.cre_ts);

The question I have now is if id is part of a composite UNIQUE index on 
this:

  CREATE TABLE tab1 (
     id varchar(100) ,
     mid INT,
     txn_timestamp TIMESTAMPTZ NOT NULL,
     cre_ts TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (txn_timestamp);

Then what is the other column in the UNIQUE index?


-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Insert works but fails for merge
@ 2024-08-10 21:40  yudhi s <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 0 replies; 8+ messages in thread

From: yudhi s @ 2024-08-10 21:40 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-general <[email protected]>

Apology for the confusion. The other column is the txn_timestamp in the
composite unique key,  which is also the partition key.

But yes we cant use both in the ON clause because of certain business
requirements. We realized it late. And that's why "on conflict " We are
unable to use.

On Sun, 11 Aug, 2024, 2:57 am Adrian Klaver, <[email protected]>
wrote:

> On 8/10/24 13:23, yudhi s wrote:
> >
> >
> > On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> >
> >     Why not use INSERT ... ON CONFLICT instead of MERGE?
> >
> >      >
> >      > MERGE INTO tab1 AS target
> >      > USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
> >      >
> >     '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z'))
> AS
> >      > source(id, mid,txn_timestamp, cre_ts)
> >      > ON target.id <http://target.id; <http://target.id
> >     <http://target.id>; = source.id <http://source.id; <http://source.id
> >     <http://source.id>;
> >      > WHEN MATCHED THEN
> >      > UPDATE SET mid  = source.mid
> >      > WHEN NOT MATCHED THEN
> >      > INSERT (id, mid, txn_timestamp, cre_ts)
> >      >      VALUES (source.id <http://source.id; <http://source.id
> >     <http://source.id>>,source.mid,
> >      >   source.txn_timestamp, source.cre_ts);
> >
> >
> >
> > Actually , as per the business logic , we need to merge on a column
> > which is not unique or having any unique index on it. It's the leading
> > column of a composite unique key though. And in such scenarios the
> > "INSERT.... ON CONFLICT" will give an error. So we are opting for a
> > merge statement here, which will work fine with the column being
> > having duplicate values in it.
>
>
> Alright it's official I am confused.
>
> You started with:
>
> WITH source_data (id, mid, txn_timestamp, cre_ts) AS (
>      VALUES ('5efd4c91-ef93-4477-840c-a723ae212d66', 123,
> '2024-08-09T11:33:49.402585600Z', '2024-08-09T11:33:49.402585600Z')
> )
> INSERT INTO tab1 (id, mid, txn_timestamp, cre_ts)
> SELECT id, mid, txn_timestamp, cre_ts
> FROM source_data
> ON CONFLICT (id) DO UPDATE
> SET    mid = EXCLUDED.mid,
>      txn_timestamp = EXCLUDED.txn_timestamp,
>      cre_ts = EXCLUDED.cre_ts;
>
> That implied that id was unique in of itself. As side note you called it
> a merge, which it is not as in MERGE. At this point I got off track
> thinking of MERGE.
>
> Then you went to the below which is a merge:
>
> MERGE INTO tab1 AS target
> USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
> '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
> source(id, mid,txn_timestamp, cre_ts)
> ON target.id = source.id
> WHEN MATCHED THEN
> UPDATE SET mid  = source.mid
> WHEN NOT MATCHED THEN
> INSERT (id, mid, txn_timestamp, cre_ts)
>      VALUES (source.id,source.mid,  source.txn_timestamp, source.cre_ts);
>
> The question I have now is if id is part of a composite UNIQUE index on
> this:
>
>   CREATE TABLE tab1 (
>      id varchar(100) ,
>      mid INT,
>      txn_timestamp TIMESTAMPTZ NOT NULL,
>      cre_ts TIMESTAMPTZ NOT NULL
> ) PARTITION BY RANGE (txn_timestamp);
>
> Then what is the other column in the UNIQUE index?
>
>
> --
> Adrian Klaver
> [email protected]
>
>


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Insert works but fails for merge
@ 2024-08-11 10:09  Alban Hertroys <[email protected]>
  parent: yudhi s <[email protected]>
  1 sibling, 1 reply; 8+ messages in thread

From: Alban Hertroys @ 2024-08-11 10:09 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: Adrian Klaver <[email protected]>; David G Johnston <[email protected]>; pgsql-general <[email protected]>

> On 10 Aug 2024, at 22:23, yudhi s <[email protected]> wrote:
> On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver <[email protected]> wrote:
> > MERGE INTO tab1 AS target
> > USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123, 
> > '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS 
> > source(id, mid,txn_timestamp, cre_ts)
> > ON target.id <http://target.id; = source.id <http://source.id;
> > WHEN MATCHED THEN
> > UPDATE SET mid  = source.mid
> > WHEN NOT MATCHED THEN
> > INSERT (id, mid, txn_timestamp, cre_ts)
> >      VALUES (source.id <http://source.id>,source.mid, 
> >   source.txn_timestamp, source.cre_ts);
> 
> Actually , as per the business logic , we need to merge on a column which is not unique or having any unique index on it.

Then how is the database supposed to determine which of those duplicate rows it should update? In the best case, it would update all of the duplicates with the same values, which usually is not what you want.

> It's the leading column of a composite unique key though.

Which could be unique of itself, I suppose that isn’t the case here?

In that case, IMHO your best course of action is to do something about those duplicates first.

> And in such scenarios the "INSERT.... ON CONFLICT" will give an error. So we are opting for a merge statement here, which will work fine with the column being having duplicate values in it.

I’m not so sure about that claim…

At least on MSSQL, MERGE has this requirement: "A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times.”. I’ve seen that as an error message on occasion.

The MERGE documentation for PostgreSQL says this: "You should ensure that the join produces at most one candidate change row for each target row.”, which also seems to imply that you shouldn’t have duplicates.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Insert works but fails for merge
@ 2024-08-11 15:11  Adrian Klaver <[email protected]>
  parent: Alban Hertroys <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Adrian Klaver @ 2024-08-11 15:11 UTC (permalink / raw)
  To: Alban Hertroys <[email protected]>; yudhi s <[email protected]>; +Cc: David G Johnston <[email protected]>; pgsql-general <[email protected]>

On 8/11/24 03:09, Alban Hertroys wrote:
> 
>> On 10 Aug 2024, at 22:23, yudhi s <[email protected]> wrote:
>> On Sat, Aug 10, 2024 at 8:22 PM Adrian Klaver <[email protected]> wrote:
>>> MERGE INTO tab1 AS target
>>> USING (VALUES ('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
>>> '2024-08-09T11:33:49.402585600Z','2024-08-09T11:33:49.402585600Z')) AS
>>> source(id, mid,txn_timestamp, cre_ts)
>>> ON target.id <http://target.id; = source.id <http://source.id;
>>> WHEN MATCHED THEN
>>> UPDATE SET mid  = source.mid
>>> WHEN NOT MATCHED THEN
>>> INSERT (id, mid, txn_timestamp, cre_ts)
>>>       VALUES (source.id <http://source.id>,source.mid,
>>>    source.txn_timestamp, source.cre_ts);
>>
>> Actually , as per the business logic , we need to merge on a column which is not unique or having any unique index on it.
> 
> Then how is the database supposed to determine which of those duplicate rows it should update? In the best case, it would update all of the duplicates with the same values, which usually is not what you want.
> 
>> It's the leading column of a composite unique key though.
> 
> Which could be unique of itself, I suppose that isn’t the case here?
> 
> In that case, IMHO your best course of action is to do something about those duplicates first.
> 
>> And in such scenarios the "INSERT.... ON CONFLICT" will give an error. So we are opting for a merge statement here, which will work fine with the column being having duplicate values in it.
> 
> I’m not so sure about that claim…
> 
> At least on MSSQL, MERGE has this requirement: "A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times.”. I’ve seen that as an error message on occasion.
> 
> The MERGE documentation for PostgreSQL says this: "You should ensure that the join produces at most one candidate change row for each target row.”, which also seems to imply that you shouldn’t have duplicates.

The next sentence says:

"In other words, a target row shouldn't join to more than one data 
source row."

In this case the OP's data source is a single VALUES(). As it is written 
I don't it tripping that rule, though it would not take much to change that.


> 
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Insert works but fails for merge
@ 2024-08-12 01:18  Greg Sabino Mullane <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Greg Sabino Mullane @ 2024-08-12 01:18 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: Alban Hertroys <[email protected]>; yudhi s <[email protected]>; David G Johnston <[email protected]>; pgsql-general <[email protected]>

So it looks like the OP does not mind updating more than one row. If you
want to keep it simple and not do a lot of casting, consider using a CTE to
do a reverse-upsert and use a prepared statement. Prepare and cast once,
and have your app send the raw uncasted strings many, many times:

prepare foo(text,int,timestamptz) as with x as (update tab1 set mid=$2
where id=$1 returning 1)
  insert into tab1 select $1,$2,$3 where not exists (select 1 from x);

execute foo('5efd4c91-ef93-4477-840c-a723ae212d99', 123,
'2024-08-09T11:33:49.402585600Z');
execute foo('some_other_id', 456, '2024-08-11T21:44:55.8675309Z');
etc.

Your app/driver may or may not already do protocol-level statement
prepare/execute automagically, so test that way first.

It's version 15.4 postgres.

Keep on the latest revision. Right now, that's 15.8. Upgrading revisions is
quick and painless.

Cheers,
Greg

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: Insert works but fails for merge
@ 2024-08-13 13:55  Greg Sabino Mullane <[email protected]>
  parent: Greg Sabino Mullane <[email protected]>
  0 siblings, 0 replies; 8+ messages in thread

From: Greg Sabino Mullane @ 2024-08-13 13:55 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: Alban Hertroys <[email protected]>; yudhi s <[email protected]>; David G Johnston <[email protected]>; pgsql-general <[email protected]>

I just remembered that one of the complaints was not wanting to worry about
looking up the data types. In my previous example, you can also leave out
the types and Postgres will do the right thing. I prefer the explicit data
type version for clarity, but though I would provide this one for
completeness:

prepare foo as with x as (update tab1 set mid=$2 where id=$1 returning 1)
  insert into tab1 select $1,$2,$3 where not exists (select 1 from x);

Cheers,
Greg

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

end of thread, other threads:[~2024-08-13 13:55 UTC | newest]

Thread overview: 8+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-08-10 14:52 Re: Insert works but fails for merge Adrian Klaver <[email protected]>
2024-08-10 20:23 ` yudhi s <[email protected]>
2024-08-10 21:27   ` Adrian Klaver <[email protected]>
2024-08-10 21:40     ` yudhi s <[email protected]>
2024-08-11 10:09   ` Alban Hertroys <[email protected]>
2024-08-11 15:11     ` Adrian Klaver <[email protected]>
2024-08-12 01:18       ` Greg Sabino Mullane <[email protected]>
2024-08-13 13:55         ` Greg Sabino Mullane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox