Re: Order of update

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Order of update
6+ messages / 3 participants
[nested] [flat]

* Re: Order of update
@ 2025-04-21 08:47 Peter J. Holzer <[email protected]>
  2025-04-21 15:43 ` Re: Order of update Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Peter J. Holzer @ 2025-04-21 08:47 UTC (permalink / raw)
  To: [email protected]

On 2025-04-20 08:28:22 -0700, Adrian Klaver wrote:
> On 4/20/25 02:10, Peter J. Holzer wrote:
> > I've just read Laurenz' blog post about the differences between Oracle
> > and PostgreSQL[1].
> > 
> > One of the differences is that something like
> > 
> >      UPDATE tab SET id = id + 1;
> > 
> > tends to fail on PostgreSQL because the the primary key constraint is
> > checked for every row, so it will stumble over the temporary conflicts.
> > 
> > The solution is to define the constraint as deferrable.
> > 
> > But that got me to thinking about different ways ...
> > 
> > There won't be a conflict if the ids are updated in descending order.
> > Is there a way to force PostgreSQL to update the rows in a specific
> > order?
> > 
> > I came up with
> > 
> >      with a as (select id from t where id > 50 order by id desc)
> >      update t set id = a.id+1 from a where t.id = a.id;
> > 
> > which works in my simple test case, but it doesn't look like it's
> > guaranteed to work. The implicit join in «update t ... from a» could
> > produce rows in any order, especially for large tables.
> 
> My read of this

Your read of the query, the PostgreSQL source or the SQL standard?

> is that for the duration of the query a temporary table a is create
> that is ordered on `id desc` and that '... from a where t.id = a.id'
> will apply that order to the selection of t.id.

Yes, that's the intention. In as I wrote it did work in my simple tests.
But is it guaranteed to work? Is there anything in the standard that
says that the order has to be preserved? Or failing that, is that the
way it's currently implemented and there are reasons to assume that it
will never be changed?


> As example:
> 
> create table id_update(id integer primary key);
> 
> insert into id_update select a from generate_series(1, 100000) as t(a);
> INSERT 0 100000
> 
> -- id(s) are temporarily in order.
> 
> update id_update set id = id where id between 50000 and 60000;
> UPDATE 10001
> 
> -- The above move the 10001 values to 'end' of id_update
> 
> with a as (select id from id_update where id > 100 order by id desc) update
> id_update as t set id = a.id + 1 from a  where t.id = a.id;
> UPDATE 99900

I note that this produces a hash join:

#v+
╔═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                      QUERY PLAN                                                       ║
╟───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Update on id_update t  (cost=3179.42..8662.63 rows=0 width=0)                                                         ║
║   ->  Hash Join  (cost=3179.42..8662.63 rows=99899 width=38)                                                          ║
║         Hash Cond: (a.id = t.id)                                                                                      ║
║         ->  Subquery Scan on a  (cost=0.42..4971.64 rows=99899 width=32)                                              ║
║               ->  Index Only Scan Backward using id_update_pkey on id_update  (cost=0.42..3972.65 rows=99899 width=4) ║
║                     Index Cond: (id > 100)                                                                            ║
║         ->  Hash  (cost=1929.00..1929.00 rows=100000 width=10)                                                        ║
║               ->  Seq Scan on id_update t  (cost=0.00..1929.00 rows=100000 width=10)                                  ║
╚═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
#v-

If the hash was the other way around it wouldn't work.

So let's try if we can get the optimizer to flip the plan by changing
the number of updated rows.

[a few minutes later]

#v+
hjp=> explain
with a as (select id from id_update where id > 90000 order by id desc)
update id_update as t set id = a.id + 1 from a  where a.id = t.id;
╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                         QUERY PLAN                                                         ║
╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
║ Update on id_update t  (cost=732.53..2675.61 rows=0 width=0)                                                               ║
║   ->  Hash Join  (cost=732.53..2675.61 rows=10006 width=38)                                                                ║
║         Hash Cond: (t.id = a.id)                                                                                           ║
║         ->  Seq Scan on id_update t  (cost=0.00..1443.00 rows=100000 width=10)                                             ║
║         ->  Hash  (cost=607.46..607.46 rows=10006 width=32)                                                                ║
║               ->  Subquery Scan on a  (cost=0.29..607.46 rows=10006 width=32)                                              ║
║                     ->  Index Only Scan Backward using id_update_pkey on id_update  (cost=0.29..507.40 rows=10006 width=4) ║
║                           Index Cond: (id > 90000)                                                                         ║
╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
#v-

Looks like we got it.

And indeed:

#v+
hjp=> with a as (select id from id_update where id > 90000 order by id desc)
      update id_update as t set id = a.id + 1 from a  where a.id = t.id;
ERROR:  duplicate key value violates unique constraint "id_update_pkey"
DETAIL:  Key (id)=(90002) already exists.
#v-

So, obviously that isn't guaranteed to work.

        hjp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Order of update
  2025-04-21 08:47 Re: Order of update Peter J. Holzer <[email protected]>
@ 2025-04-21 15:43 ` Adrian Klaver <[email protected]>
  2025-04-21 16:12   ` Re: Order of update Thiemo Kellner <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Adrian Klaver @ 2025-04-21 15:43 UTC (permalink / raw)
  To: [email protected]

On 4/21/25 01:47, Peter J. Holzer wrote:

> 
> If the hash was the other way around it wouldn't work.
> 
> So let's try if we can get the optimizer to flip the plan by changing
> the number of updated rows.
> 
> [a few minutes later]
> 
> #v+
> hjp=> explain
> with a as (select id from id_update where id > 90000 order by id desc)
> update id_update as t set id = a.id + 1 from a  where a.id = t.id;
> ╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
> ║                                                         QUERY PLAN                                                         ║
> ╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
> ║ Update on id_update t  (cost=732.53..2675.61 rows=0 width=0)                                                               ║
> ║   ->  Hash Join  (cost=732.53..2675.61 rows=10006 width=38)                                                                ║
> ║         Hash Cond: (t.id = a.id)                                                                                           ║
> ║         ->  Seq Scan on id_update t  (cost=0.00..1443.00 rows=100000 width=10)                                             ║
> ║         ->  Hash  (cost=607.46..607.46 rows=10006 width=32)                                                                ║
> ║               ->  Subquery Scan on a  (cost=0.29..607.46 rows=10006 width=32)                                              ║
> ║                     ->  Index Only Scan Backward using id_update_pkey on id_update  (cost=0.29..507.40 rows=10006 width=4) ║
> ║                           Index Cond: (id > 90000)                                                                         ║
> ╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
> #v-
> 
> Looks like we got it.
> 
> And indeed:
> 
> #v+
> hjp=> with a as (select id from id_update where id > 90000 order by id desc)
>        update id_update as t set id = a.id + 1 from a  where a.id = t.id;
> ERROR:  duplicate key value violates unique constraint "id_update_pkey"
> DETAIL:  Key (id)=(90002) already exists.
> #v-
> 
> So, obviously that isn't guaranteed to work.

I read from here:

https://www.postgresql.org/docs/current/sql-update.html

"Use of an ORDER BY clause allows the command to prioritize which rows 
will be updated; it can also prevent deadlock with other update 
operations if they use the same ordering."

I went back to those docs and realized I had missed the FOR UPDATE in 
the example.

explain with a as (select id from id_update where id > 90000 order by id 
desc for update) update id_update as t set id = a.id + 1 from a  where 
a.id = t.id;
                                                  QUERY PLAN 

-------------------------------------------------------------------------------------------------------------
  Update on id_update t  (cost=3609.71..3856.94 rows=0 width=0)
    CTE a
      ->  LockRows  (cost=0.29..872.71 rows=9840 width=10)
            ->  Index Scan Backward using id_update_pkey on id_update 
(cost=0.29..774.31 rows=9840 width=10)
                  Index Cond: (id > 90000)
    ->  Hash Join  (cost=2737.00..2984.23 rows=9840 width=38)
          Hash Cond: (a.id = t.id)
          ->  CTE Scan on a  (cost=0.00..196.80 rows=9840 width=32)
          ->  Hash  (cost=1487.00..1487.00 rows=100000 width=10)
                ->  Seq Scan on id_update t  (cost=0.00..1487.00 
rows=100000 width=10)
(10 rows)

and then:

with a as (select id from id_update where id > 90000 order by id desc 
for update) update id_update as t set id = a.id + 1 from a  where a.id = 
t.id;
UPDATE 10000

Though at this point I would agree with you on the no guarantee point.


> 
>          hjp
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Order of update
  2025-04-21 08:47 Re: Order of update Peter J. Holzer <[email protected]>
  2025-04-21 15:43 ` Re: Order of update Adrian Klaver <[email protected]>
@ 2025-04-21 16:12   ` Thiemo Kellner <[email protected]>
  2025-04-21 16:44     ` Re: Order of update Adrian Klaver <[email protected]>
  2025-04-24 06:26     ` Re: Order of update Peter J. Holzer <[email protected]>
  0 siblings, 2 replies; 6+ messages in thread

From: Thiemo Kellner @ 2025-04-21 16:12 UTC (permalink / raw)
  To: [email protected]

I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place.






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Order of update
  2025-04-21 08:47 Re: Order of update Peter J. Holzer <[email protected]>
  2025-04-21 15:43 ` Re: Order of update Adrian Klaver <[email protected]>
  2025-04-21 16:12   ` Re: Order of update Thiemo Kellner <[email protected]>
@ 2025-04-21 16:44     ` Adrian Klaver <[email protected]>
  2025-04-21 17:58       ` Re: Order of update Thiemo Kellner <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: Adrian Klaver @ 2025-04-21 16:44 UTC (permalink / raw)
  To: Thiemo Kellner <[email protected]>; [email protected]

On 4/21/25 09:12, Thiemo Kellner wrote:
> I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place.
> 

Check out this the thread below for discussion on that topic:

https://www.postgresql.org/message-id/[email protected]

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Order of update
  2025-04-21 08:47 Re: Order of update Peter J. Holzer <[email protected]>
  2025-04-21 15:43 ` Re: Order of update Adrian Klaver <[email protected]>
  2025-04-21 16:12   ` Re: Order of update Thiemo Kellner <[email protected]>
  2025-04-21 16:44     ` Re: Order of update Adrian Klaver <[email protected]>
@ 2025-04-21 17:58       ` Thiemo Kellner <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Thiemo Kellner @ 2025-04-21 17:58 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

Thanks for the pointer. I feel my doubts reflected. For such reasons, I prefer the UUID as surrogate key. No point in trying to establish an order or even id arithmetics.

21.04.2025 18:44:27 Adrian Klaver <[email protected]>:

> On 4/21/25 09:12, Thiemo Kellner wrote:
>> I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place.
>> 
> 
> Check out this the thread below for discussion on that topic:
> 
> https://www.postgresql.org/message-id/[email protected]
> 
> -- 
> Adrian Klaver
> [email protected]






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Order of update
  2025-04-21 08:47 Re: Order of update Peter J. Holzer <[email protected]>
  2025-04-21 15:43 ` Re: Order of update Adrian Klaver <[email protected]>
  2025-04-21 16:12   ` Re: Order of update Thiemo Kellner <[email protected]>
@ 2025-04-24 06:26     ` Peter J. Holzer <[email protected]>
  1 sibling, 0 replies; 6+ messages in thread

From: Peter J. Holzer @ 2025-04-24 06:26 UTC (permalink / raw)
  To: [email protected]

On 2025-04-21 18:12:13 +0200, Thiemo Kellner wrote:
> I wonder if that is a corner case. Updating a unique key sounds to me like a design flaw in the first place.

I agree that changing a surrogate key is almost always a mistake.

But there might be situations where a column should be unique but isn't
an id.

For example, many years ago it was a popular[1] programming pattern to
represent trees as nested ranges (i.e. if two children of a parent had
the ranges (a, b) and (b+1, c) then the parent had (a-1, c+1).
Insert-operations then need to update those columns. You want an index
on those columns (since you search for them a lot), and you might want
to make it a unique index, since that covers part of the invariant
(although not the complete invariant). If you do that you run into the
update problem.

There are probably other use-cases. Anything where you need a unique
order which can change, I guess?

Anyway, I don't have a pressing need for this, as I said I was just
curious.

        hjp

[1] Mostly in MySQL I think, since it didn't have recursive queries of
    any kind.

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

end of thread, other threads:[~2025-04-24 06:26 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-04-21 08:47 Re: Order of update Peter J. Holzer <[email protected]>
2025-04-21 15:43 ` Adrian Klaver <[email protected]>
2025-04-21 16:12   ` Thiemo Kellner <[email protected]>
2025-04-21 16:44     ` Adrian Klaver <[email protected]>
2025-04-21 17:58       ` Thiemo Kellner <[email protected]>
2025-04-24 06:26     ` Peter J. Holzer <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox