public inbox for [email protected]  
help / color / mirror / Atom feed
Re: [BUG?] check_exclusion_or_unique_constraint false negative
3+ messages / 2 participants
[nested] [flat]

* Re: [BUG?] check_exclusion_or_unique_constraint false negative
@ 2024-07-31 20:57  Michail Nikolaev <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Michail Nikolaev @ 2024-07-31 20:57 UTC (permalink / raw)
  To: pgsql-hackers; Andres Freund <[email protected]>; Amit Kapila <[email protected]>

It seems like I've identified the cause of the issue.

Currently, any DirtySnapshot (or SnapshotSelf) scan over a B-tree index may
skip (not find the TID for) some records in the case of parallel updates.

The following scenario is possible:

* Session 1 reads a B-tree page using SnapshotDirty and copies item X to
the buffer.
* Session 2 updates item X, inserting a new TID Y into the same page.
* Session 2 commits its transaction.
* Session 1 starts to fetch from the heap and tries to fetch X, but it was
already deleted by session 2. So, it goes to the B-tree for the next TID.
* The B-tree goes to the next page, skipping Y.
* Therefore, the search finds nothing, but tuple Y is still alive.

This situation is somewhat controversial. DirtySnapshot might seem to show
more (or more recent, even uncommitted) data than MVCC, but not less. So,
DirtySnapshot scan over a B-tree does not provide any guarantees, as far as
I understand.
Why does it work for MVCC? Because tuple X will be visible due to the
snapshot, making Y unnecessary.
This might be "as designed," but I think it needs to be clearly documented
(I couldn't find any documentation on this particular case, only
_bt_drop_lock_and_maybe_pin - related).

Here are the potential consequences of the issue:

* check_exclusion_or_unique_constraint

It may not find a record in a UNIQUE index during INSERT ON CONFLICT
UPDATE. However, this is just a minor performance issue.

* Exclusion constraints with B-tree, like ADD CONSTRAINT exclusion_data
EXCLUDE USING btree (data WITH =)

It should work correctly because the first inserter may "skip" the TID from
a concurrent inserter, but the second one should still find the TID from
the first.

* RelationFindReplTupleByIndex

Amit, this is why I've included you in this previously solo thread :)
RelationFindReplTupleByIndex uses DirtySnapshot and may not find some
records if they are updated by a parallel transaction. This could lead to
lost deletes/updates, especially in the case of streaming=parallel mode.
I'm not familiar with how parallel workers apply transactions, so maybe
this isn't possible.

Best regards,
Mikhail

>


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* RE: [BUG?] check_exclusion_or_unique_constraint false negative
@ 2024-08-01 05:54  Hayato Kuroda (Fujitsu) <[email protected]>
  parent: Michail Nikolaev <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Hayato Kuroda (Fujitsu) @ 2024-08-01 05:54 UTC (permalink / raw)
  To: 'Michail Nikolaev' <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Amit Kapila <[email protected]>

Dear Michail,

Thanks for pointing out the issue!

>* RelationFindReplTupleByIndex
>
>Amit, this is why I've included you in this previously solo thread :)
>RelationFindReplTupleByIndex uses DirtySnapshot and may not find some records
>if they are updated by a parallel transaction. This could lead to lost
>deletes/updates, especially in the case of streaming=parallel mode. 
>I'm not familiar with how parallel workers apply transactions, so maybe this
>isn't possible.

IIUC, the issue can happen when two concurrent transactions using DirtySnapshot access
the same tuples, which is not specific to the parallel apply. Consider that two
subscriptions exist and publishers modify the same tuple of the same table.
In this case, two workers access the tuple, so one of the changes may be missed
by the scenario you said. I feel we do not need special treatments for parallel
apply.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: [BUG?] check_exclusion_or_unique_constraint false negative
@ 2024-08-01 09:25  Michail Nikolaev <[email protected]>
  parent: Hayato Kuroda (Fujitsu) <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: Michail Nikolaev @ 2024-08-01 09:25 UTC (permalink / raw)
  To: Hayato Kuroda (Fujitsu) <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Amit Kapila <[email protected]>

Hello, Hayato!

> Thanks for pointing out the issue!

Thanks for your attention!

> IIUC, the issue can happen when two concurrent transactions using
DirtySnapshot access
> the same tuples, which is not specific to the parallel apply

Not exactly, it happens for any DirtySnapshot scan over a B-tree index with
some other transaction updating the same index page (even using the MVCC
snapshot).

So, logical replication related scenario looks like this:

* subscriber worker receives a tuple update\delete from the publisher
* it calls RelationFindReplTupleByIndex to find the tuple in the local table
* some other transaction updates the tuple in the local table (on
subscriber side) in parallel
* RelationFindReplTupleByIndex may not find the tuple because it uses
DirtySnapshot
* update\delete is lost

Parallel apply mode looks like more dangerous because it uses multiple
workers on the subscriber side, so the probability of the issue is higher.
In that case, "some other transaction" is just another worker applying
changes of different transaction in parallel.

Best regards,
Mikhail.


^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2024-08-01 09:25 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-07-31 20:57 Re: [BUG?] check_exclusion_or_unique_constraint false negative Michail Nikolaev <[email protected]>
2024-08-01 05:54 ` Hayato Kuroda (Fujitsu) <[email protected]>
2024-08-01 09:25   ` Michail Nikolaev <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox