public inbox for [email protected]
help / color / mirror / Atom feedFrom: Xuneng Zhou <[email protected]>
To: pgsql-hackers <[email protected]>
Cc: Masahiko Sawada <[email protected]>
Subject: Re: Optimize SnapBuild by maintaining committed.xip in sorted order
Date: Tue, 16 Dec 2025 18:52:13 +0800
Message-ID: <CABPTF7Vty_fsJLpj6=Xn5MM5PXiV_n=XeN87S+7JhSmROYwCAA@mail.gmail.com> (raw)
In-Reply-To: <CABPTF7WcZ4_xvPeBh9Ws3_AJ3u67gHdHSAmE5Pp+0pDppY=ZvA@mail.gmail.com>
References: <CABPTF7XiwbA38OZBj5Y2F-q+fZ=03YFN9iFnb_406F4SfE-f4w@mail.gmail.com>
<CABPTF7WcZ4_xvPeBh9Ws3_AJ3u67gHdHSAmE5Pp+0pDppY=ZvA@mail.gmail.com>
Hi,
On Tue, Dec 16, 2025 at 6:20 PM Xuneng Zhou <[email protected]> wrote:
>
> Hi,
>
> On Fri, Nov 7, 2025 at 12:55 PM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi hackers,
> >
> > I'd like to propose an optimization for logical decoding's snapshot building
> > mechanism that eliminates repeated sorting overhead once a replication slot
> > reaches the CONSISTENT state.
> >
> > 1) Problem
> >
> > Currently, SnapBuildBuildSnapshot() sorts the entire committed.xip array on
> > every call using qsort(). Once a logical decoding slot reaches CONSISTENT
> > state, snapshots are built frequently—after every catalog-modifying transaction
> > commit. This repeated O(n log n) sorting becomes a performance bottleneck as
> > the committed transaction array grows.
> >
> > The existing code even has a TODO comment acknowledging this inefficiency:
> > "TODO: It's unclear whether that reasoning has much merit. Every time we add
> > something here after becoming consistent will also require distributing a
> > snapshot."
> >
> > 2) Proposed Solution
> >
> > Maintain sorted order via batch merge on each commit:
> >
> > 1. Collect all relevant XIDs (including subtransactions) into a batch array
> > 2. Sort the batch: O(m log m) where m is typically small (1 + nsubxacts)
> > 3. Merge into the global array: O(n + m) via reverse in-place merge
> >
> > While per-commit cost increases from O(m) to O(m log m + n), this is offset by
> > eliminating O(n log n) sorts at each snapshot build. Since CONSISTENT state
> > builds snapshots after each catalog-modifying commit, the amortized cost is
> > significantly lower.
> >
> > 3) Performance Impact
> >
> > Expected improvements in CONSISTENT state:
> > - Eliminates O(n log n) qsort on every snapshot build
> > - Replaces with O(m log m + n) merge per commit
> > - For typical cases where m << n and snapshots are frequent, this is a win
> >
> >
> > The benchmark results for this patch are shown below.
> >
> > 4) Configuration
> >
> > shared_buffers = '4GB'
> > wal_level = logical
> > max_replication_slots = 10
> > max_wal_senders = 10
> > log_min_messages = warning
> > max_connections = 600
> > autovacuum = off
> > checkpoint_timeout = 15min
> > max_wal_size = 4GB
> >
> >
> > 5) Workloads
> >
> > # Workload 1: DDL - High frequency, small commits
> > create_ddl_workload() {
> > local ROOT="$1"
> > cat >"$ROOT/ddl.sql" <<'SQL'
> > -- High-frequency small catalog commits
> > -- Each commit triggers SnapBuildBuildSnapshot
> > DO $$
> > DECLARE
> > tbl text := format('temp_%s_%s',
> > current_setting('application_name', true),
> > floor(random()*1e9)::int);
> > BEGIN
> > EXECUTE format('CREATE TEMP TABLE %I (id int, data text) ON COMMIT
> > DROP', tbl);
> > EXECUTE format('INSERT INTO %I VALUES (1, ''x'')', tbl);
> > EXECUTE format('SELECT * FROM %I', tbl);
> > END$$;
> > SQL
> > }
> >
> > # Workload 2: MIXED - mix of DDL and DML, 50%-50%
> > create_mixed_workload() {
> > local ROOT="$1"
> > cat >"$ROOT/mixed_ddl.sql" <<'SQL'
> > -- DDL workload (catalog changes)
> > DO $$
> > DECLARE
> > tbl text := format('t_%s_%s',
> > current_setting('application_name', true),
> > floor(random()*1e9)::int);
> > BEGIN
> > EXECUTE format('CREATE TABLE %I (id int, data text) ON COMMIT DROP', tbl);
> > EXECUTE format('INSERT INTO %I VALUES (1, ''x'')', tbl);
> > END$$;
> >
> > SQL
> > cat >"$ROOT/mixed_dml.sql" <<'SQL'
> > -- DML workload (no catalog changes)
> > INSERT INTO app_data (id, data)
> > VALUES (floor(random()*1e6)::int, repeat('x', 100))
> > ON CONFLICT (id) DO UPDATE SET data = repeat('y', 100);
> > SQL
> > }
> >
> > # Workload 3: CONTROL - Pure DML, no catalog changes
> > create_control_workload() {
> > local ROOT="$1"
> > cat >"$ROOT/control.sql" <<'SQL'
> >
> > -- Pure DML, no catalog changes
> > -- Should show no difference between baseline and patched
> > INSERT INTO control_data (id, data)
> > VALUES (floor(random()*1e6)::int, repeat('x', 100))
> > ON CONFLICT (id) DO UPDATE SET data = repeat('y', 100);
> > SQL
> > }
> >
> >
> > 6) Test strategy
> >
> > Clients: 100 concurrent connections per workload
> > Duration: 40 seconds per run
> > Repetitions: 1 run per workload type
> > Replication: Logical replication slot using `test_decoding` plugin
> > with `EXPORT_SNAPSHOT=true`
> >
> > Workload Types:
> > 1. DDL - Pure catalog churn (temp table create/drop)
> > 2. Mixed- 50% DDL + 50% DML (workload)
> > 3. Control - Pure DML (no catalog changes)
> >
> > Measurement:
> > - Metrics captured from pg_stat_replication_slots before/after each run
> > - Primary metrics: total_txns (transactions decoded) and total_bytes
> > (data volume)
> > - Compared baseline (vanilla PostgreSQL) vs patched (sorted
> > committed.xip optimization)
> >
> >
> > 7) Performance results
> >
> > DDL Workload: +235% Decoder Improvement
> > Decoder throughput: 713.76 → 2396.52 txns/sec (+235%)
> > Throughput (MB/s): 672.67 → 1747.22 MB/s (+159%)
> > Decode efficiency: 9.46% → 33.29% (+23.83 points)
> >
> > Mixed Workload: No Change
> > Decoder throughput: 2751.10 → 2730.00 txns/sec (0%)
> > Decode efficiency: 40.47% → 40.47% (unchanged)
> >
> > We can see that the qsort overhead in SnapBuild has been eliminated in
> > the flamegraphs in the mixed workload. However, the performance
> > improvement was not observed as in the DDL workload. My guess is that,
> > for DML workloads, ReorderBufferApplyChange has become the new
> > hotspot.
> >
> > DML Workload: No Change
> > Decoder throughput: 3062.57 → 3066.37 txns/sec (0%)
> > Decode efficiency: 49.97% → 49.97% (unchanged)
> >
> > === Workload: ddl ===
> > Client commits/sec:
> > Baseline: 7545.76 commits/sec
> > Patched: 7198.21 commits/sec
> >
> > Decoder throughput (from pg_stat_replication_slots):
> > Baseline: 713.76 txns/sec (672.67 MB/s)
> > Patched: 2396.52 txns/sec (1747.22 MB/s)
> >
> > Transaction efficiency (decoded vs committed):
> > Baseline: 309376 committed → 29264 decoded (9.46%)
> > Patched: 302325 committed → 100654 decoded (33.29%)
> >
> > Total decoded (all reps):
> > Baseline: 29264 txns (27579.53 MB)
> > Patched: 100654 txns (73383.10 MB)
> >
> > Decoder improvement: +235.00% (txns/sec)
> > Decoder improvement: +159.00% (MB/s)
> > Efficiency improvement: +23.83% points (more transactions decoded
> > per committed)
> >
> > === Workload: mixed ===
> > Client commits/sec:
> > Baseline: 6797.50 commits/sec
> > Patched: 6745.26 commits/sec
> >
> > Decoder throughput (from pg_stat_replication_slots):
> > Baseline: 2751.10 txns/sec (210.35 MB/s)
> > Patched: 2730.00 txns/sec (205.64 MB/s)
> >
> > Transaction efficiency (decoded vs committed):
> > Baseline: 285495 committed → 115546 decoded (40.47%)
> > Patched: 283301 committed → 114660 decoded (40.47%)
> >
> > Total decoded (all reps):
> > Baseline: 115546 txns (8834.71 MB)
> > Patched: 114660 txns (8636.96 MB)
> >
> > ≈ Decoder unchanged: 0.00% (txns/sec)
> >
> > === Workload: DML ===
> > Client commits/sec:
> > Baseline: 6129.24 commits/sec
> > Patched: 6136.93 commits/sec
> >
> > Decoder throughput (from pg_stat_replication_slots):
> > Baseline: 3062.57 txns/sec (0.26 MB/s)
> > Patched: 3066.37 txns/sec (0.26 MB/s)
> >
> > Transaction efficiency (decoded vs committed):
> > Baseline: 257428 committed → 128628 decoded (49.97%)
> > Patched: 251614 committed → 125721 decoded (49.97%)
> >
> > Total decoded (all reps):
> > Baseline: 128628 txns (10.98 MB)
> > Patched: 125721 txns (10.69 MB)
> >
> > ≈ Decoder unchanged: 0.00% (txns/sec)
> >
> > 8) Potential regression
> >
> > The potential regression point could be before the slot reaches the
> > CONSISTENT state, particularly when building_full_snapshot is set to
> > true. In this phase, all transactions including those that don’t
> > modify the catalog — must be added to the committed.xip array. These
> > XIDs don’t require later snapshot builds or sorting, so the
> > batch-insert logic increases the per-insert cost from O(1) to O(m + n)
> > without providing a direct benefit.
> >
> > However, the impact of this regression could be limited. The system
> > remains in the pre-CONSISTENT phase only briefly during initial
> > snapshot building, and the building_full_snapshot = true case is rare,
> > mainly used when creating replication slots with the EXPORT_SNAPSHOT
> > option.
> >
> > Once the slot becomes CONSISTENT, only catalog-modifying transactions
> > are tracked in committed.xip, and the patch reduces overall
> > snapshot-building overhead by eliminating repeated full-array sorts.
> >
> > We could also adopt a two-phase approach — keeping the current
> > behavior before reaching the CONSISTENT state and maintaining a sorted
> > array only after that point. This would preserve the performance
> > benefits while avoiding potential regressions. However, it would
> > introduce additional complexity and potential risks in handling the
> > state transitions.
> >
> > if (builder->state < SNAPBUILD_CONSISTENT)
> > {
> > /* ensure that only commits after this are getting replayed */
> > if (builder->start_decoding_at <= lsn)
> > builder->start_decoding_at = lsn + 1;
> >
> > /*
> > * If building an exportable snapshot, force xid to be tracked, even
> > * if the transaction didn't modify the catalog.
> > */
> > if (builder->building_full_snapshot)
> > {
> > needs_timetravel = true;
> > }
> > }
>
> After some consideration, a two-phase sorting strategy seems feasible
> to implement in a relatively straightforward manner. So it's done in
> v2. I also plan to run benchmarks to evaluate potential regressions of
> the original “sort-at-all-stages” approach of v1.
>
> > 9) Additional benefit
> > With this patch applied, we can optimize the SnapBuildPurgeOlderTxn to use
> > two binary searchs rather than interating and copying to find the interval
> > to keep in the sorted commited.xip array. [1]
> >
> > Feedbacks welcomed.
> >
> > [1] https://www.postgresql.org/message-id/flat/CABPTF7V9gcpTLrOY0fG4YontoHjVg8YrbmiH4XB_5PT6K56xhg%40mai...
>
> V2 fixes several issues in v1, including a potential memory leak, type
> inconsistencies, and applies pgindent to the files.
>
V3 fixes a critical issue where the snapshot->xip array in
SnapBuildBuildSnapshot might not be sorted before reaching the
consistent state. Sorry for the noise here.
--
Best,
Xuneng
Attachments:
[application/octet-stream] v3-0001-Optimize-SnapBuild-by-maintaining-committed.xip-i.patch (11.3K, 2-v3-0001-Optimize-SnapBuild-by-maintaining-committed.xip-i.patch)
download | inline diff:
From a7c8d1da5d0496b2a7d05d52e37511a37075e13d Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 16 Dec 2025 15:15:14 +0800
Subject: [PATCH v3 1/2] Optimize SnapBuild by maintaining committed.xip in
sorted order
Maintain committed.xip in xidComparator order using a two-phase approach
that balances performance during slot initialization with efficiency during
steady-state decoding:
Pre-CONSISTENT phase:
- Use fast O(1) append for unsorted XIDs
- Sort once at transition to CONSISTENT state
- Avoids merge overhead during initial snapshot building
Post-CONSISTENT phase:
- Use per-commit batch insertion with sorted merge
- Collect all relevant XIDs for each commit
- Sort the batch once: O(m log m)
- Reverse-merge into the global array: O(N + m)
With this change, snapshot builds can skip the qsort() step and simply
memcpy the sorted array. While per-commit work increases from O(m) (plain
append) to O(m log m + N) (sort-and-merge) after CONSISTENT, eliminating
repeated O(N log N) sorts at each snapshot build significantly reduces
overall steady-state cost once CONSISTENT is reached and snapshots are
built frequently.
---
src/backend/replication/logical/snapbuild.c | 164 +++++++++++++++++--
src/include/replication/snapbuild_internal.h | 12 +-
2 files changed, 151 insertions(+), 25 deletions(-)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index d6ab1e017eb..e0d7d3e0ae8 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -407,8 +407,25 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
builder->committed.xip,
builder->committed.xcnt * sizeof(TransactionId));
- /* sort so we can bsearch() */
- qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ /*
+ * Pre-CONSISTENT: committed.xip is unsorted, sort the snapshot copy.
+ */
+ if (snapshot->xcnt > 1)
+ qsort(snapshot->xip, snapshot->xcnt,
+ sizeof(TransactionId), xidComparator);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * Post-CONSISTENT: committed.xip should already be sorted, verify.
+ */
+ for (int i = 1; i < snapshot->xcnt; i++)
+ Assert(snapshot->xip[i - 1] < snapshot->xip[i]);
+ }
+#endif
/*
* Initially, subxip is empty, i.e. it's a snapshot to be used by
@@ -822,17 +839,33 @@ SnapBuildDistributeSnapshotAndInval(SnapBuild *builder, XLogRecPtr lsn, Transact
}
/*
- * Keep track of a new catalog changing transaction that has committed.
+ * Keep track of new catalog changing transactions that have committed.
+ *
+ * Before reaching CONSISTENT state, we use fast O(1) append since the array
+ * doesn't need to be sorted yet. After CONSISTENT, we maintain sorted order
+ * using a merge approach to avoid repeated full-array sorts at snapshot build.
*/
static void
-SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
+SnapBuildAddCommittedTxns(SnapBuild *builder,
+ const TransactionId *batch_xids,
+ int batch_cnt)
{
- Assert(TransactionIdIsValid(xid));
+ TransactionId *committed_xids;
+ size_t old_xcnt;
+
+ old_xcnt = builder->committed.xcnt;
- if (builder->committed.xcnt == builder->committed.xcnt_space)
+ /* Ensure we have space for all elements */
+ if (old_xcnt + batch_cnt > builder->committed.xcnt_space)
{
builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
+ /*
+ * Repeat if we need more than 2x current space.
+ */
+ while (old_xcnt + batch_cnt > builder->committed.xcnt_space)
+ builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
+
elog(DEBUG1, "increasing space for committed transactions to %u",
(uint32) builder->committed.xcnt_space);
@@ -840,12 +873,57 @@ SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
builder->committed.xcnt_space * sizeof(TransactionId));
}
+ committed_xids = builder->committed.xip;
+
/*
- * TODO: It might make sense to keep the array sorted here instead of
- * doing it every time we build a new snapshot. On the other hand this
- * gets called repeatedly when a transaction with subtransactions commits.
+ * Before CONSISTENT state, just append unsorted for O(1) performance. The
+ * array will be sorted once when transitioning to CONSISTENT.
*/
- builder->committed.xip[builder->committed.xcnt++] = xid;
+ if (builder->state < SNAPBUILD_CONSISTENT)
+ {
+ for (int i = 0; i < batch_cnt; i++)
+ {
+ Assert(TransactionIdIsValid(batch_xids[i]));
+ committed_xids[old_xcnt++] = batch_xids[i];
+ }
+ builder->committed.xcnt = old_xcnt;
+ }
+ else
+ {
+ /*
+ * After CONSISTENT, maintain sorted order via merge. Merge from the
+ * end to avoid overwriting unread data.
+ */
+ int old_idx = old_xcnt - 1;
+ int batch_idx = batch_cnt - 1;
+ int write_idx = old_xcnt + batch_cnt - 1;
+
+ while (old_idx >= 0 && batch_idx >= 0)
+ {
+ Assert(TransactionIdIsValid(batch_xids[batch_idx]));
+
+ if (committed_xids[old_idx] > batch_xids[batch_idx])
+ {
+ committed_xids[write_idx--] = committed_xids[old_idx--];
+ }
+ else
+ {
+ /* Duplicates should never occur */
+ Assert(committed_xids[old_idx] != batch_xids[batch_idx]);
+ committed_xids[write_idx--] = batch_xids[batch_idx--];
+ }
+ }
+
+ /* Copy any remaining batch elements */
+ while (batch_idx >= 0)
+ {
+ Assert(TransactionIdIsValid(batch_xids[batch_idx]));
+ committed_xids[write_idx--] = batch_xids[batch_idx--];
+ }
+
+ /* Old elements that weren't moved are already in correct position */
+ builder->committed.xcnt = old_xcnt + batch_cnt;
+ }
}
/*
@@ -947,6 +1025,13 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
TransactionId xmax = xid;
+ /*
+ * Collect XIDs that need tracking into a batch. We'll sort and merge
+ * them into committed.xip in one pass at the end.
+ */
+ TransactionId *batch_xids = NULL;
+ int batch_cnt = 0;
+
/*
* Transactions preceding BUILDING_SNAPSHOT will neither be decoded, nor
* will they be part of a snapshot. So we don't need to record anything.
@@ -977,6 +1062,12 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
}
}
+ if (nsubxacts > 0 || builder->building_full_snapshot ||
+ SnapBuildXidHasCatalogChanges(builder, xid, xinfo))
+ {
+ batch_xids = palloc((nsubxacts + 1) * sizeof(TransactionId));
+ }
+
for (nxact = 0; nxact < nsubxacts; nxact++)
{
TransactionId subxid = subxacts[nxact];
@@ -993,7 +1084,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
elog(DEBUG1, "found subtransaction %u:%u with catalog changes",
xid, subxid);
- SnapBuildAddCommittedTxn(builder, subxid);
+ batch_xids[batch_cnt++] = subxid;
if (NormalTransactionIdFollows(subxid, xmax))
xmax = subxid;
@@ -1007,7 +1098,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
*/
else if (needs_timetravel)
{
- SnapBuildAddCommittedTxn(builder, subxid);
+ batch_xids[batch_cnt++] = subxid;
if (NormalTransactionIdFollows(subxid, xmax))
xmax = subxid;
}
@@ -1020,7 +1111,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
xid);
needs_snapshot = true;
needs_timetravel = true;
- SnapBuildAddCommittedTxn(builder, xid);
+ batch_xids[batch_cnt++] = xid;
}
else if (sub_needs_timetravel)
{
@@ -1028,15 +1119,33 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransactions",
xid);
needs_timetravel = true;
- SnapBuildAddCommittedTxn(builder, xid);
+ batch_xids[batch_cnt++] = xid;
}
else if (needs_timetravel)
{
elog(DEBUG2, "forced transaction %u to do timetravel", xid);
- SnapBuildAddCommittedTxn(builder, xid);
+ batch_xids[batch_cnt++] = xid;
+ }
+
+ /*
+ * Sort and merge the batch into committed.xip. This maintains the
+ * invariant that committed.xip is globally sorted by raw uint32 order
+ * (xidComparator).
+ */
+ if (batch_cnt > 0)
+ {
+ /* Sort by raw uint32 order */
+ qsort(batch_xids, batch_cnt, sizeof(TransactionId), xidComparator);
+
+ /* Merge into global array */
+ SnapBuildAddCommittedTxns(builder, batch_xids, batch_cnt);
}
+ /* Free batch array if allocated (even if nothing was added to it) */
+ if (batch_xids != NULL)
+ pfree(batch_xids);
+
if (!needs_timetravel)
{
/* record that we cannot export a general snapshot anymore */
@@ -1305,6 +1414,15 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
Assert(TransactionIdIsNormal(builder->xmin));
Assert(TransactionIdIsNormal(builder->xmax));
+ /*
+ * Sort committed.xip before transitioning to CONSISTENT. During the
+ * pre-CONSISTENT phase, XIDs were appended unsorted for performance.
+ * Now we need sorted order for efficient snapshot building.
+ */
+ if (builder->committed.xcnt > 1)
+ qsort(builder->committed.xip, builder->committed.xcnt,
+ sizeof(TransactionId), xidComparator);
+
builder->state = SNAPBUILD_CONSISTENT;
builder->next_phase_at = InvalidTransactionId;
@@ -1402,6 +1520,15 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
TransactionIdPrecedesOrEquals(builder->next_phase_at,
running->oldestRunningXid))
{
+ /*
+ * Sort committed.xip before transitioning to CONSISTENT. During the
+ * pre-CONSISTENT phase, XIDs were appended unsorted for performance.
+ * Now we need sorted order for efficient snapshot building.
+ */
+ if (builder->committed.xcnt > 1)
+ qsort(builder->committed.xip, builder->committed.xcnt,
+ sizeof(TransactionId), xidComparator);
+
builder->state = SNAPBUILD_CONSISTENT;
builder->next_phase_at = InvalidTransactionId;
@@ -1889,6 +2016,13 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
pfree(builder->committed.xip);
builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
builder->committed.xip = ondisk.builder.committed.xip;
+
+ /*
+ * Sort the restored array to ensure it's in xidComparator order. Old
+ * snapshot files may have been written with unsorted arrays.
+ */
+ qsort(builder->committed.xip, builder->committed.xcnt,
+ sizeof(TransactionId), xidComparator);
}
ondisk.builder.committed.xip = NULL;
diff --git a/src/include/replication/snapbuild_internal.h b/src/include/replication/snapbuild_internal.h
index 3b915dc8793..164160e5e5e 100644
--- a/src/include/replication/snapbuild_internal.h
+++ b/src/include/replication/snapbuild_internal.h
@@ -115,16 +115,8 @@ struct SnapBuild
/*
* Array of committed transactions that have modified the catalog.
*
- * As this array is frequently modified we do *not* keep it in
- * xidComparator order. Instead we sort the array when building &
- * distributing a snapshot.
- *
- * TODO: It's unclear whether that reasoning has much merit. Every
- * time we add something here after becoming consistent will also
- * require distributing a snapshot. Storing them sorted would
- * potentially also make it easier to purge (but more complicated wrt
- * wraparound?). Should be improved if sorting while building the
- * snapshot shows up in profiles.
+ * Maintained in sorted order (by raw uint32 value) to allow efficient
+ * snapshot building without repeated sorting overhead.
*/
TransactionId *xip;
} committed;
--
2.51.0
view thread (5+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Optimize SnapBuild by maintaining committed.xip in sorted order
In-Reply-To: <CABPTF7Vty_fsJLpj6=Xn5MM5PXiV_n=XeN87S+7JhSmROYwCAA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox