Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4fCU-002QM0-0E for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Mar 2026 13:20:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w4fCS-0005tw-1H for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Mar 2026 13:20:08 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4fCS-0005tn-0E for pgsql-hackers@lists.postgresql.org; Mon, 23 Mar 2026 13:20:08 +0000 Received: from mail-yx1-xb134.google.com ([2607:f8b0:4864:20::b134]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w4fCP-00000000gp1-3RWH for pgsql-hackers@lists.postgresql.org; Mon, 23 Mar 2026 13:20:08 +0000 Received: by mail-yx1-xb134.google.com with SMTP id 956f58d0204a3-64ad9238d8fso5302681d50.3 for ; Mon, 23 Mar 2026 06:20:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774272005; cv=none; d=google.com; s=arc-20240605; b=ikB+88x7T06t8EvGn99+0HrKtsF2WW5MKxaDB85GwE8z02Rr2aI4xlAt9c7DNIukU2 /UKyIAytiv1Xkp8TR1z7VYCVReuvSsfGcdYrAFoi7HE6J0YTFatm5zoW6ZH+OE5AxKf3 z6uZZ4QhTTiOmE0C0LdaqHgYr/p855qDKyiQfRcz1ExTOP8jEsbS3AyMHWsXYYsoI5s2 PX99FazSR9pV+9fr5q7u1Rd6ctiHd7iMMfYW/PtxDntW+OY4TWsN65MqeABzRvARAL+v Rp3nf9BGi7t5R5DwuN9JCE+VSLzEkZJUNdbLaCInXa0xhLIGYrYgjg7aE7IZ9L3BAGpF j5WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:mime-version:dkim-signature; bh=KsYg7adfo0+JOSWFWm6+n0cux8dAoWAhIYmIpFKeVDk=; fh=nwNxTtLLPTU0ewfLM7SSbrjMajMl+wwnFkCY/fi90vE=; b=GHOVQIZCeIJ9l9ydRSdfzwlBxz+t7rAVvRAFBfuumaSexnmKIgbr2LIilqDYaq9ObF FLpgS9Jyncq5VEBB8F7QJiIAVEh4mBvbwsb0aV9AaeM4Q/20aEHJaqgkqUapx9DFmbJk 8lJ2qbyABE4tcoWG8cDAWXBhOpZr/dP8f44x9DeXlxAj2QlK8hSpBq5KIZJYePcYNb7x huxFEPwJUDbZ+ETHc2TrKU1+ty3Vy6fG9yv41slIF3jvCKhViEmKt+fjEAPGjZe6zuhZ 2Vct8HWlGeze7ZINQsVpvI7sKatBRXBSIzviA/18wrLQccYRrwthK1OxrWagIPZLFR+L aDLA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774272005; x=1774876805; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=KsYg7adfo0+JOSWFWm6+n0cux8dAoWAhIYmIpFKeVDk=; b=WakWXIn6v0mI68SvaoOKavKInLS3ZF3L0uSPTC/BxMqaiHotBUe/HcSHBoyeuOFSe+ ouFZmpHTmMr7CgFQ4uTRKMdQQYw1mk/R1zsLH6UUmiqRAUDGVoEeuqXdkNnVvDahRy+8 QCIXEKjRUHOey5jaUP+CIqmggV6XN8diP+adcwva/b1iksBAmqUQWO73W51GQBguQPnc fiTCPqfHH/+7rWlrjLvqCDJO9YIgSLI4FaA2UG1WkB2cdFpF/sDbhP0PXJNNujVIFtBc Zykl+WQzS5bc3UINSmuFJBsyIIHk4Bx515maHKbP4KE+KGBRIX+erT19lFFWfwSyGQtp /K6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774272005; x=1774876805; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KsYg7adfo0+JOSWFWm6+n0cux8dAoWAhIYmIpFKeVDk=; b=T18tltRbXxkmeMIbQPyRscxpAT9e+Qri1t9pt7NhM7D/4gP3Ou7FmJFiRaqXxdX+HE OIbPktgcD+T83apax9+rE4pSQz2isDvWuvMjG68hWInJxkpbIiOQYg8MNDyYUIIwW3rn vIJCYsW/FbEAZrLnQ4xvmnMvS28xbNngc33c45mgxMX5OMBW8/z2N8AV7Vbvr0wrepWU FjJVbs9720TqR++F5H1WTKTKJTC3C9PWrYAF4InYaV4lEJeD0dhmFkoLBvE5ORGtL4Cl 2HgUs0DZXOqSF/Uq7CysnCkbFrXxK9xOmkFVljKEydwJXHMa+CmFTcE+J+LcWcFEicd1 acMA== X-Gm-Message-State: AOJu0YyVW7ACg9+Um/RrFZ4SanYkAzd/fDBhebi1QoKMndRTUYUA4Itf CZfaVXA9kSGbXjl5S4wpqZ10frqw/S/qvCcTv3Du8AihLt2goToZaYri2agF5zJt0NKb3F24CcW U8nz0/oOIt1/mn56cJh6feR++gmCqvz2DS5R+byQ= X-Gm-Gg: ATEYQzzvlruybd3zLFmZ5iWHPLVTMrNRsz/6qIzFETOfkjzNsGjYOiifntgsxkU5SKt OP+0ovBmIUw6qyeo5Wac6raF/RbhVxsrrkTsd0VreKkWoB7qlUfXAb+9TYbdPamMJT3prFaZ5DR IAiPspEpvna/AhV90Equd1kBKWqK5K1ZbQy26wWtbsXAVC4wHou+8mfikw25JE2s9sko8r5A0R/ BsAr0+YOnuYfLbPpcisFK+/IIa8Q801BpzirhYcbAXVgfSEaHL9/pHYR1oFzOrNLEOx6h5lz3de 5BdUdcz+3RODtVSgLg== X-Received: by 2002:a05:690e:74e:b0:64c:a0c6:da78 with SMTP id 956f58d0204a3-64eaa7a4aa4mr9504237d50.38.1774272004102; Mon, 23 Mar 2026 06:20:04 -0700 (PDT) MIME-Version: 1.0 From: shawn wang Date: Mon, 23 Mar 2026 21:19:52 +0800 X-Gm-Features: AQROBzA-K5Lwo4rSYYWYmmdM6MLDnc7S4hNWaSbDelm-xFcKrbsbIuejCDb2hk0 Message-ID: Subject: Add logical_decoding_spill_limit to cap spill file disk usage per slot To: pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000008a2242064db0e4d8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000008a2242064db0e4d8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi hackers, =3D=3D Motivation =3D=3D We operate a fleet of PostgreSQL instances with logical replication. On several occasions, we have experienced production incidents where logical decoding spill files (pg_replslot//xid-*.spill) grew uncontrollably = =E2=80=94 consuming tens of gigabytes and eventually filling up the data disk. This caused the entire instance to go read-only, impacting not just replication but all write workloads. The typical scenario is a large transaction (e.g. bulk data load or a long-running DDL) combined with a subscriber that is either slow or temporarily disconnected. The reorder buffer exceeds logical_decoding_work_mem and starts spilling, but there is no upper bound on how much can be spilled. The only backstop today is the OS returning ENOSPC, at which point the damage is already done. We looked for existing protections: - max_slot_wal_keep_size: limits WAL retention, but does not affect spill files at all. - logical_decoding_work_mem: controls *when* spilling starts, but not *how much* can be spilled. - There is no existing GUC, patch, or commitfest entry that addresses spill file disk quota. The "Report reorder buffer size" patch (CF #6053, by Ashutosh Bapat) improves observability of reorder buffer state, which is complementary =E2= =80=94 but observability alone cannot prevent disk-full incidents. =3D=3D Proposed solution =3D=3D The attached patch adds a new GUC: logical_decoding_spill_limit (integer, unit kB, default 0) When set to a positive value, it limits the total size of on-disk spill files per replication slot. Key design points: 1. Tracking: We add two new fields: - ReorderBuffer.spillBytesOnDisk =E2= =80=94 current total on-disk spill size for this slot (unlike spillBytes which = is a cumulative statistic counter, this is a live gauge). - ReorderBufferTXN.serialized_size =E2=80=94 per-transaction on-disk size,= so we can accurately decrement the global counter during cleanup. 2. Increment: In ReorderBufferSerializeChange(), after a successful write(), both counters are incremented by the size written. 3. Decrement: In ReorderBufferRestoreCleanup(), when spill files are unlinked, the global counter is decremented by the transaction's serialized_size. 4. Enforcement: In ReorderBufferCheckMemoryLimit(), before calling ReorderBufferSerializeTXN(), we check: if (spillBytesOnDisk + txn->size = > spill_limit) ereport(ERROR, ...) This is only checked on the spill-to-di= sk path =E2=80=94 not on the streaming path (which involves no disk I/O). 5. Behavior on limit exceeded: An ERROR is raised with ERRCODE_CONFIGURATION_LIMIT_EXCEEDED. The walsender exits, but the slot'= s restart_lsn and confirmed_flush are preserved. The subscriber can reconn= ect after the DBA: 1. increases logical_decoding_spill_limit, or 2. increases logical_decoding_work_mem (to reduce spilling), or 3. switches to a streaming-capable output plugin (which avoids spilling entirely). 6. Default 0 means unlimited =E2=80=94 fully backward compatible. =3D=3D Why per-slot, not global? =3D=3D Each ReorderBuffer instance lives in a single walsender process and corresponds to exactly one replication slot. A per-slot limit is: - Lock-free (no shared memory coordination needed) - Simple to reason about (each slot has its own budget) - Sufficient to protect against disk-full (the DBA sets the limit based on available disk / number of slots) A global (cross-slot) limit could be layered on top later if needed, but would require shared-memory counters with spinlock/atomic protection. =3D=3D Performance impact =3D=3D - Hot path (in-memory change queuing): zero overhead. - Spill path: one integer comparison before serialization, one integer addition after write() =E2=80=94 negligible compared to the I/O cost. - Cleanup path: one integer subtraction after unlink() =E2=80=94 negligi= ble. Looking forward to feedback. Thanks, Shawn. --0000000000008a2242064db0e4d8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi hackers,

=
=C2=A0=3D=3D Motivation =3D=3D=C2=A0
We operate a fleet of PostgreSQL instances= with logical replication. On several occasions, we have experienced produc= tion incidents where logical decoding spill files (pg_replslot/<slot>= /xid-*.spill) grew uncontrollably =E2=80=94 consuming tens of gigabytes and= eventually filling up the data disk. This caused the entire instance to go= read-only, impacting not just replication but all write workloads.=C2=A0

The typical scenari= o is a large transaction (e.g. bulk data load or a long-running DDL) combin= ed with a subscriber that is either slow or temporarily disconnected. The r= eorder buffer exceeds logical_decoding_work_mem and starts spilling, but th= ere is no upper bound on how much can be spilled. The only backstop today i= s the OS returning ENOSPC, at which point the damage is already done.=C2=A0=

We looked for exi= sting protections:=C2=A0
  • max_slot_wal_kee= p_size:=C2=A0limits WAL retention, but does not affect spil= l files at all.=C2=A0
  • logical_decoding_work_mem: co= ntrols *when* spilling starts, but not *how much* can be spilled.=C2=A0
  • There is no existing GUC, patch, or commitfest entry t= hat addresses spill file disk quota.=C2=A0

The "Report reorder buffer size&quo= t; patch (CF #6053, by Ashutosh Bapat) improves observability of reorder bu= ffer state, which is complementary =E2=80=94 but observability alone cannot= prevent disk-full incidents.=C2=A0

=3D=3D Proposed solution =3D=3D=C2=A0

The attached patch adds a new GUC:=C2= =A0
logical_decoding_spill_limit (integer, unit k= B, default 0)=C2=A0

In= crement: In ReorderBufferSerializeChange(), after a successful write(), bot= h counters are incremented by the size written.=C2=A0
  • Decrement: In ReorderBufferRestoreCleanup(), when spill files are unlink= ed, the global counter is decremented by the transaction's serialized_s= ize.=C2=A0
  • Enforcement: In ReorderBufferCheckMemory= Limit(), before calling ReorderBufferSerializeTXN(), we check: if (spillByt= esOnDisk + txn->size > spill_limit) ereport(ERROR, ...) This is only = checked on the spill-to-disk path =E2=80=94 not on the streaming path (whic= h involves no disk I/O).=C2=A0
  • Behavior on limit ex= ceeded: An ERROR is raised with ERRCODE_CONFIGURATION_LIMIT_EXCEEDED. The w= alsender exits, but the slot's restart_lsn and confirmed_flush are pres= erved. The subscriber can reconnect after the DBA:=C2=A0
    1. increases logical_decoding_spill_limit, or=C2=A0
    2. increases logical= _decoding_work_mem (to reduce spilling), or=C2=A0
    3. switches to a str= eaming-capable output plugin (which avoids spilling entirely).=C2=A0
    4. Default 0 means unlimited =E2=80=94 fully backward compatible.=C2=A0=
    =3D=3D Why per-slot, not global? =3D=3D=C2=A0=

    Each ReorderBuffe= r instance lives in a single walsender process and corresponds to exactly o= ne replication slot. A per-slot limit is:=C2=A0
    • Lock-free (no shared memory coordination needed)=C2=A0
    • Simple to reason about (each slot has its own budget)=C2=A0
    • Sufficient to protect against disk-full (the DBA sets t= he limit based on available disk / number of slots)=C2=A0
    <= /div>
    A global (cross-slot) limit could be layered on top late= r if needed, but would require shared-memory counters with spinlock/atomic = protection.=C2=A0

    = =3D=3D Performance impact =3D=3D=C2=A0
    • Ho= t path (in-memory change queuing): zero overhead.=C2=A0
    • Spill path: one integer comparison before serialization, one integer a= ddition after write() =E2=80=94 negligible compared to the I/O cost.=C2=A0<= /span>
    • Cleanup path: one integer subtraction after unlink(= ) =E2=80=94 negligible.=C2=A0

    Looking forward to feedback.=C2=A0
    <= span style=3D"color:rgb(0,0,0);font-family:"PingFang SC";font-siz= e:11px">Thanks,=C2=A0
    Shawn.
    --0000000000008a2242064db0e4d8--