public inbox for [email protected]  
help / color / mirror / Atom feed
pgsql: Fix rare assertion failure in standby, if primary is restarted
5+ messages / 1 participants
[nested] [flat]

* pgsql: Fix rare assertion failure in standby, if primary is restarted
@ 2025-03-23 18:45 Heikki Linnakangas <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Heikki Linnakangas @ 2025-03-23 18:45 UTC (permalink / raw)
  To: [email protected]

Fix rare assertion failure in standby, if primary is restarted

During hot standby, ExpireAllKnownAssignedTransactionIds() and
ExpireOldKnownAssignedTransactionIds() functions mark old transactions
as no-longer running, but they failed to update xactCompletionCount
and latestCompletedXid. AFAICS it would not lead to incorrect query
results, because those functions effectively turn in-progress
transactions into aborted transactions and an MVCC snapshot considers
both as "not visible". But it could surprise GetSnapshotDataReuse()
and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin,
RecentXmin))" assertion in it, if the apparent xmin in a backend would
move backwards. We saw this happen when GetCatalogSnapshot() would
reuse an older catalog snapshot, when GetTransactionSnapshot() had
already advanced TransactionXmin.

The bug goes back all the way to commit 623a9ba79b in v14 that
introduced the snapshot reuse mechanism, but it started to happen more
frequently with commit 952365cded6 which removed a
GetTransactionSnapshot() call from backend startup. That made it more
likely for ExpireOldKnownAssignedTransactionIds() to be called between
GetCatalogSnapshot() and the first GetTransactionSnapshot() in a
backend.

Andres Freund first spotted this assertion failure on buildfarm member
'skink'. Reproduction and analysis by Tomas Vondra.

Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5z...

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/2817525f0d56075e1f3a14c0dc6a180b337d8aed

Modified Files
--------------
src/backend/storage/ipc/procarray.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* pgsql: Fix rare assertion failure in standby, if primary is restarted
@ 2025-03-23 18:45 Heikki Linnakangas <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Heikki Linnakangas @ 2025-03-23 18:45 UTC (permalink / raw)
  To: [email protected]

Fix rare assertion failure in standby, if primary is restarted

During hot standby, ExpireAllKnownAssignedTransactionIds() and
ExpireOldKnownAssignedTransactionIds() functions mark old transactions
as no-longer running, but they failed to update xactCompletionCount
and latestCompletedXid. AFAICS it would not lead to incorrect query
results, because those functions effectively turn in-progress
transactions into aborted transactions and an MVCC snapshot considers
both as "not visible". But it could surprise GetSnapshotDataReuse()
and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin,
RecentXmin))" assertion in it, if the apparent xmin in a backend would
move backwards. We saw this happen when GetCatalogSnapshot() would
reuse an older catalog snapshot, when GetTransactionSnapshot() had
already advanced TransactionXmin.

The bug goes back all the way to commit 623a9ba79b in v14 that
introduced the snapshot reuse mechanism, but it started to happen more
frequently with commit 952365cded6 which removed a
GetTransactionSnapshot() call from backend startup. That made it more
likely for ExpireOldKnownAssignedTransactionIds() to be called between
GetCatalogSnapshot() and the first GetTransactionSnapshot() in a
backend.

Andres Freund first spotted this assertion failure on buildfarm member
'skink'. Reproduction and analysis by Tomas Vondra.

Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5z...

Branch
------
REL_17_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/302ce5bd93b48549bf6717512ea92252319dc944

Modified Files
--------------
src/backend/storage/ipc/procarray.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* pgsql: Fix rare assertion failure in standby, if primary is restarted
@ 2025-03-23 18:45 Heikki Linnakangas <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Heikki Linnakangas @ 2025-03-23 18:45 UTC (permalink / raw)
  To: [email protected]

Fix rare assertion failure in standby, if primary is restarted

During hot standby, ExpireAllKnownAssignedTransactionIds() and
ExpireOldKnownAssignedTransactionIds() functions mark old transactions
as no-longer running, but they failed to update xactCompletionCount
and latestCompletedXid. AFAICS it would not lead to incorrect query
results, because those functions effectively turn in-progress
transactions into aborted transactions and an MVCC snapshot considers
both as "not visible". But it could surprise GetSnapshotDataReuse()
and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin,
RecentXmin))" assertion in it, if the apparent xmin in a backend would
move backwards. We saw this happen when GetCatalogSnapshot() would
reuse an older catalog snapshot, when GetTransactionSnapshot() had
already advanced TransactionXmin.

The bug goes back all the way to commit 623a9ba79b in v14 that
introduced the snapshot reuse mechanism, but it started to happen more
frequently with commit 952365cded6 which removed a
GetTransactionSnapshot() call from backend startup. That made it more
likely for ExpireOldKnownAssignedTransactionIds() to be called between
GetCatalogSnapshot() and the first GetTransactionSnapshot() in a
backend.

Andres Freund first spotted this assertion failure on buildfarm member
'skink'. Reproduction and analysis by Tomas Vondra.

Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5z...

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/66235baab72b22456c88a28db788048b52712100

Modified Files
--------------
src/backend/storage/ipc/procarray.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* pgsql: Fix rare assertion failure in standby, if primary is restarted
@ 2025-03-23 18:45 Heikki Linnakangas <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Heikki Linnakangas @ 2025-03-23 18:45 UTC (permalink / raw)
  To: [email protected]

Fix rare assertion failure in standby, if primary is restarted

During hot standby, ExpireAllKnownAssignedTransactionIds() and
ExpireOldKnownAssignedTransactionIds() functions mark old transactions
as no-longer running, but they failed to update xactCompletionCount
and latestCompletedXid. AFAICS it would not lead to incorrect query
results, because those functions effectively turn in-progress
transactions into aborted transactions and an MVCC snapshot considers
both as "not visible". But it could surprise GetSnapshotDataReuse()
and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin,
RecentXmin))" assertion in it, if the apparent xmin in a backend would
move backwards. We saw this happen when GetCatalogSnapshot() would
reuse an older catalog snapshot, when GetTransactionSnapshot() had
already advanced TransactionXmin.

The bug goes back all the way to commit 623a9ba79b in v14 that
introduced the snapshot reuse mechanism, but it started to happen more
frequently with commit 952365cded6 which removed a
GetTransactionSnapshot() call from backend startup. That made it more
likely for ExpireOldKnownAssignedTransactionIds() to be called between
GetCatalogSnapshot() and the first GetTransactionSnapshot() in a
backend.

Andres Freund first spotted this assertion failure on buildfarm member
'skink'. Reproduction and analysis by Tomas Vondra.

Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5z...

Branch
------
REL_16_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/2f33de3cdbc814bcc4270aafd98880a12d265777

Modified Files
--------------
src/backend/storage/ipc/procarray.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* pgsql: Fix rare assertion failure in standby, if primary is restarted
@ 2025-03-23 18:45 Heikki Linnakangas <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Heikki Linnakangas @ 2025-03-23 18:45 UTC (permalink / raw)
  To: [email protected]

Fix rare assertion failure in standby, if primary is restarted

During hot standby, ExpireAllKnownAssignedTransactionIds() and
ExpireOldKnownAssignedTransactionIds() functions mark old transactions
as no-longer running, but they failed to update xactCompletionCount
and latestCompletedXid. AFAICS it would not lead to incorrect query
results, because those functions effectively turn in-progress
transactions into aborted transactions and an MVCC snapshot considers
both as "not visible". But it could surprise GetSnapshotDataReuse()
and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin,
RecentXmin))" assertion in it, if the apparent xmin in a backend would
move backwards. We saw this happen when GetCatalogSnapshot() would
reuse an older catalog snapshot, when GetTransactionSnapshot() had
already advanced TransactionXmin.

The bug goes back all the way to commit 623a9ba79b in v14 that
introduced the snapshot reuse mechanism, but it started to happen more
frequently with commit 952365cded6 which removed a
GetTransactionSnapshot() call from backend startup. That made it more
likely for ExpireOldKnownAssignedTransactionIds() to be called between
GetCatalogSnapshot() and the first GetTransactionSnapshot() in a
backend.

Andres Freund first spotted this assertion failure on buildfarm member
'skink'. Reproduction and analysis by Tomas Vondra.

Backpatch-through: 14
Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5z...

Branch
------
REL_15_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/b30c77a0e480352cce573195af819bd41a3c1b42

Modified Files
--------------
src/backend/storage/ipc/procarray.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)



^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2025-03-23 18:45 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-23 18:45 pgsql: Fix rare assertion failure in standby, if primary is restarted Heikki Linnakangas <[email protected]>
2025-03-23 18:45 pgsql: Fix rare assertion failure in standby, if primary is restarted Heikki Linnakangas <[email protected]>
2025-03-23 18:45 pgsql: Fix rare assertion failure in standby, if primary is restarted Heikki Linnakangas <[email protected]>
2025-03-23 18:45 pgsql: Fix rare assertion failure in standby, if primary is restarted Heikki Linnakangas <[email protected]>
2025-03-23 18:45 pgsql: Fix rare assertion failure in standby, if primary is restarted Heikki Linnakangas <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox