public inbox for [email protected]  
help / color / mirror / Atom feed
From: Michail Nikolaev <[email protected]>
To: PostgreSQL Hackers <[email protected]>
To: Andres Freund <[email protected]>
To: Amit Kapila <[email protected]>
Subject: Re: [BUG?] check_exclusion_or_unique_constraint false negative
Date: Wed, 31 Jul 2024 22:57:00 +0200
Message-ID: <CANtu0ohUB9ky45iiMAYN1fGyt82+cg=+UYBom=P7drb+=97G9w@mail.gmail.com> (raw)
In-Reply-To: <CANtu0oh0tspW-xWzDGWP9ehz96KPt9aUP1c9JYhdBYxKsB0jpA@mail.gmail.com>
References: <CANtu0oiktqQ2pwExoXqDpByXNCJa-KE5vQRodTRnmFHN_+qwHg@mail.gmail.com>
	<CANtu0ohU2XRV9shtu14CffLPDS1x10q7ebOGf-vX0p+45_L8jw@mail.gmail.com>
	<CANtu0oh0tspW-xWzDGWP9ehz96KPt9aUP1c9JYhdBYxKsB0jpA@mail.gmail.com>

It seems like I've identified the cause of the issue.

Currently, any DirtySnapshot (or SnapshotSelf) scan over a B-tree index may
skip (not find the TID for) some records in the case of parallel updates.

The following scenario is possible:

* Session 1 reads a B-tree page using SnapshotDirty and copies item X to
the buffer.
* Session 2 updates item X, inserting a new TID Y into the same page.
* Session 2 commits its transaction.
* Session 1 starts to fetch from the heap and tries to fetch X, but it was
already deleted by session 2. So, it goes to the B-tree for the next TID.
* The B-tree goes to the next page, skipping Y.
* Therefore, the search finds nothing, but tuple Y is still alive.

This situation is somewhat controversial. DirtySnapshot might seem to show
more (or more recent, even uncommitted) data than MVCC, but not less. So,
DirtySnapshot scan over a B-tree does not provide any guarantees, as far as
I understand.
Why does it work for MVCC? Because tuple X will be visible due to the
snapshot, making Y unnecessary.
This might be "as designed," but I think it needs to be clearly documented
(I couldn't find any documentation on this particular case, only
_bt_drop_lock_and_maybe_pin - related).

Here are the potential consequences of the issue:

* check_exclusion_or_unique_constraint

It may not find a record in a UNIQUE index during INSERT ON CONFLICT
UPDATE. However, this is just a minor performance issue.

* Exclusion constraints with B-tree, like ADD CONSTRAINT exclusion_data
EXCLUDE USING btree (data WITH =)

It should work correctly because the first inserter may "skip" the TID from
a concurrent inserter, but the second one should still find the TID from
the first.

* RelationFindReplTupleByIndex

Amit, this is why I've included you in this previously solo thread :)
RelationFindReplTupleByIndex uses DirtySnapshot and may not find some
records if they are updated by a parallel transaction. This could lead to
lost deletes/updates, especially in the case of streaming=parallel mode.
I'm not familiar with how parallel workers apply transactions, so maybe
this isn't possible.

Best regards,
Mikhail

>


view thread (3+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: [BUG?] check_exclusion_or_unique_constraint false negative
  In-Reply-To: <CANtu0ohUB9ky45iiMAYN1fGyt82+cg=+UYBom=P7drb+=97G9w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox