MIME-Version: 1.0
References: 
 <CA+T=_GU-vTxFqRwWJMR4Hz8YUXkpUv_q6Nm3CrTqHNbhCrS5BA@mail.gmail.com>
In-Reply-To: 
 <CA+T=_GU-vTxFqRwWJMR4Hz8YUXkpUv_q6Nm3CrTqHNbhCrS5BA@mail.gmail.com>
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 23 Mar 2026 07:24:23 -0700
Message-ID: 
 <CALj2ACVc3iYLOkC36VJwoXyVZmGcb0WEMKoc478q+xdRG+2BtA@mail.gmail.com>
Subject: Re: Add logical_decoding_spill_limit to cap spill file disk usage per
 slot
To: shawn wang <shawn.wang.pg@gmail.com>
Cc: pgsql-hackers@lists.postgresql.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CALj2ACVc3iYLOkC36VJwoXyVZmGcb0WEMKoc478q%2BxdRG%2B2BtA%40mail.gmail.com>
Precedence: bulk

Hi,

On Mon, Mar 23, 2026 at 6:20=E2=80=AFAM shawn wang <shawn.wang.pg@gmail.com=
> wrote:
>
> Hi hackers,

Thank you for proposing this new feature.

>  =3D=3D Motivation =3D=3D
>
> We operate a fleet of PostgreSQL instances with logical replication. On s=
everal occasions, we have experienced production incidents where logical de=
coding spill files (pg_replslot/<slot>/xid-*.spill) grew uncontrollably =E2=
=80=94 consuming tens of gigabytes and eventually filling up the data disk.=
 This caused the entire instance to go read-only, impacting not just replic=
ation but all write workloads.
>
> The typical scenario is a large transaction (e.g. bulk data load or a lon=
g-running DDL) combined with a subscriber that is either slow or temporaril=
y disconnected. The reorder buffer exceeds logical_decoding_work_mem and st=
arts spilling, but there is no upper bound on how much can be spilled. The =
only backstop today is the OS returning ENOSPC, at which point the damage i=
s already done.

Having a lot of spill files also increases crash/recovery times.
However, files spilling to disk causing no-space-left-on-disk issues
leading to downtime applies to WAL files, historical catalog snapshot
files, subtransaction overflow files, CLOG (and all the subsystems
backed by SLRU data structure), etc. - basically any Postgres
subsystem writing files to disk. I'm a bit worried that we may end up
solving disk space issues, which IMHO are outside of the database
scope, in the database. Others may have different opinions though.

How common is this issue? Could you please add a test case to the
proposed patch that without this feature would otherwise hit the issue
described?

Having said that, were alternatives like disabling subscriptions when
seen occupying the disk space considered?

> We looked for existing protections:
>
> max_slot_wal_keep_size: limits WAL retention, but does not affect spill f=
iles at all.
> logical_decoding_work_mem: controls *when* spilling starts, but not *how =
much* can be spilled.
> There is no existing GUC, patch, or commitfest entry that addresses spill=
 file disk quota.

Interesting!

> The "Report reorder buffer size" patch (CF #6053, by Ashutosh Bapat) impr=
oves observability of reorder buffer state, which is complementary =E2=80=
=94 but observability alone cannot prevent disk-full incidents.

With the proposed reorder buffer stats above, would it be possible to
have a monitoring solution (an extension or a tool) to disable
subscriptions and notify the admin? Would something like this work?

> =3D=3D Proposed solution =3D=3D
>
> The attached patch adds a new GUC:
> logical_decoding_spill_limit (integer, unit kB, default 0)
>
> When set to a positive value, it limits the total size of on-disk spill f=
iles per replication slot. Key design points:
>
> Tracking: We add two new fields: - ReorderBuffer.spillBytesOnDisk =E2=80=
=94 current total on-disk spill size for this slot (unlike spillBytes which=
 is a cumulative statistic counter, this is a live gauge). - ReorderBufferT=
XN.serialized_size =E2=80=94 per-transaction on-disk size, so we can accura=
tely decrement the global counter during cleanup.
> Increment: In ReorderBufferSerializeChange(), after a successful write(),=
 both counters are incremented by the size written.
> Decrement: In ReorderBufferRestoreCleanup(), when spill files are unlinke=
d, the global counter is decremented by the transaction's serialized_size.
> Enforcement: In ReorderBufferCheckMemoryLimit(), before calling ReorderBu=
fferSerializeTXN(), we check: if (spillBytesOnDisk + txn->size > spill_limi=
t) ereport(ERROR, ...) This is only checked on the spill-to-disk path =E2=
=80=94 not on the streaming path (which involves no disk I/O).
> Behavior on limit exceeded: An ERROR is raised with ERRCODE_CONFIGURATION=
_LIMIT_EXCEEDED. The walsender exits, but the slot's restart_lsn and confir=
med_flush are preserved. The subscriber can reconnect after the DBA:
>
> increases logical_decoding_spill_limit, or
> increases logical_decoding_work_mem (to reduce spilling), or
> switches to a streaming-capable output plugin (which avoids spilling enti=
rely).

When the logical_decoding_spill_limit is exceeded, ERRORing out in the
walsender is even more problematic, right? The replication slot would
be inactive, causing bloat and preventing tuple freezing, WAL files
growth and eventually the system may hit disk-space issues - it is
like "we avoided disk space issues for one subsystem, but introduced
it for another". This looks a bit problematic IMHO. Others may have
different opinions though.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com