Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8h8e-000n0q-1K for pgsql-hackers@arkaria.postgresql.org; Fri, 03 Apr 2026 16:12:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8h8d-00CVWH-0g for pgsql-hackers@arkaria.postgresql.org; Fri, 03 Apr 2026 16:12:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8h8c-00CVW9-2Z for pgsql-hackers@lists.postgresql.org; Fri, 03 Apr 2026 16:12:51 +0000 Received: from mail-yx1-xb133.google.com ([2607:f8b0:4864:20::b133]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w8h8b-00000000NYs-0cou for pgsql-hackers@lists.postgresql.org; Fri, 03 Apr 2026 16:12:50 +0000 Received: by mail-yx1-xb133.google.com with SMTP id 956f58d0204a3-6500040f172so910588d50.1 for ; Fri, 03 Apr 2026 09:12:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775232768; cv=none; d=google.com; s=arc-20240605; b=FbMA5RkHjPp/tIYt3y4T1579rrDQ93sI7Dn/kbDCBEAFYUx6u3/6sKIpFjav/kE75a hRTDC8Mkh3swDQHvYMAxeMoi79THwUDbPcCZz81gp6IZTLqSaXROa76H2v0BC4qTu7N3 1rkcPqot/4ChBEJMlVFeb7SP4wPSCLwQ0ip3zvCz++aIKOhnCpYI2xWW6tNZ1/iqaPJW NS7Dp8iJDpIXR+/F0wQeoYyV80iWA6OMdg+ITyihkV0XK0hK/eA2IA7PWxsOwDiWE/lo e6lz1VyEfK7TzGWa2inh4Q7vExuf6z2/WY07itqLy0Ro7M1zYbCrEonbNnyE77f89Yu7 ML1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=RbDgD0qEj0WhKFRfrkbFPIdki3JCJ/BXDl1hEFLVcYI=; fh=A6HMHLyGSbEbquJXh+uimZim3DVLQyL/mlH2yTwaWzI=; b=L3JN8zLZ8NhnR/9ZwSsloxKlYIzQepAv4+AGH9zPQeVNTM2LvNptoV3F/rT9CDZ1KX pr8w7NnKJfbWo981o1DRyf72EwEdXrLYRHxFlZY7NbzK7DOfbhUYFJvyU0esH3dnkdAM UTGhufgVD4Tv2TJPBaWKMglofucFdpXQSVWXQ2K3vxaoBspjq0Z7rKhpAzHRN37FWOB/ 6GNsaz+wONKRG+XRav+BFqCHcP83pBeciB9DM1FVs720YjOtHGAY/xJyJ2YUPxe+FVhK JwfJS6DBl407jRLE+QgotYVRh07dLnHgIqiMxKHuluEqvmoIZPsJIMs4Rln7ywUALNxj Sh+g==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775232768; x=1775837568; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RbDgD0qEj0WhKFRfrkbFPIdki3JCJ/BXDl1hEFLVcYI=; b=aRaaXfLCo+/0x/cNNFFxq5OUcSfe4iCiJCWR9N68FdhDZ5xs2daRckBgQ+gFP4wTH+ KA+r5ovDSAoDtjq0KzO4swWIBl+vlsgUBjtdlTfFFKxs+MwpEatM5VtP1FrQF/rcxq3R MC7BfU066F0KU4CsG+1XvDBUSlNJ0oBzjDLCQ6kxcICz3SkFkBT6fR3OGV2bPXuMm/L6 L64SzBDxRIF7UDIkeYF+NyLizOgOX3fXy0mJEx65geK2vcPrHEtrtoCsAERfpzMDpZ1D Krsa4mWAknkkk7Skp18EhMeQyCTYu7knMptlUXafoQBWtLeNLEqNSugFa5t+aMhHpwgn wHTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775232768; x=1775837568; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=RbDgD0qEj0WhKFRfrkbFPIdki3JCJ/BXDl1hEFLVcYI=; b=qasgD2PHmfXYBvuV/HY2K09uZgE7w4Y2poUmdNkvFk7Ri+EM+rTbancxS6EJ6HBqad Enl7jQaJ3suDuyF8NABfujm8u/zatvTg3dDTcyrqbVWj1xcJctwLRmLsP91CbxebsjXY Xd96owF1KwQ0WZD5m0WsOgrgCv4hX2itBydA1aRJ/PlIxNFq2tAZE8lBEwSUyv1xYz6F 9zkxYBppAuWYQQlaT+zZRAcwT7e1xtzAT5ZVBr2hyLAQmuV4v6HJtHCknY/ZaVGNvcvH DaI6pcSOLCTquO+RLf8ZLNGLh5vgU8RDHgpWJ4W8eEc8UGT//vPsH5iHwVNUFT62v3XT seog== X-Gm-Message-State: AOJu0YzD6tg6HO+8zBq+xN9JM/F7kYZ+VKF/njcp6KZZ0u/+2kXTFmyv 4hhT86OTRbofHhbhg5LdMWoMS39tU14raOBDAIZMnIM+lPZhm2sWzHzJri6Td1H3XItEvwxZJh5 SMK7X8HAMQH2/LHwiAIY/nUDaB4uLR+8= X-Gm-Gg: AeBDies/aseHmG+MwHwPa6Pee71DA9xLTmuIsvTRI/hgPZvo4O8IT6Xe4wa4L/FbGHp o7NPrYsOVyxB9Aj8kejcIiw8Aqprn7aO4PNqtuigggE+Y3MXZ/jGUEtoe8GdzErCvQy8+JGhOb5 St7CeSAdlfI774k6X9sm9QYvFqAJddi/adFlnz/AXLUMYV0xP4ROmEIkC1qAaIv7oQ7nRRy5IIS BsFHr1FFPhGYHHAQcJL+PZ3ON2xxSNBHJZmWJQ2vPLzS2uRdlB8LYBV2+yPPCB6DWz/OlUCg5Wn rE4ADzU52EdDD233PQ== X-Received: by 2002:a05:690e:4146:b0:650:1707:c9f5 with SMTP id 956f58d0204a3-65048766dbamr3183482d50.30.1775232768445; Fri, 03 Apr 2026 09:12:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: shawn wang Date: Sat, 4 Apr 2026 00:11:52 +0800 X-Gm-Features: AQROBzC4jHXtIIRJ_qMPg0FgwZxzdCawvr_Wj8NHNPOIqkBDaW6Hhz7Y_h2bD9o Message-ID: Subject: Re: Add logical_decoding_spill_limit to cap spill file disk usage per slot To: "Hayato Kuroda (Fujitsu)" Cc: "pgsql-hackers@lists.postgresql.org" Content-Type: multipart/alternative; boundary="0000000000008e900a064e909636" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000008e900a064e909636 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Kuroda, Thank you for the review and the great questions! > We have provided the subscription option streaming=3Dparallel since PG16.= It > replicates on-going transactions and applies immediately. Does it avoid the > issue? streaming=3Dparallel does significantly reduce publisher-side spill files in the common case =E2=80=94 when enabled, the reorder buffer streams chang= es directly instead of spilling to disk. However, it cannot guarantee 100% avoidance of spilling. There are several fallback scenarios in the code where streaming is not possible and the reorder buffer falls back to spill-to-disk even when streaming=3Dparallel is configured: 1. Snapshot not yet consistent (snapbuild.c =E2=80=94 SnapBuildCurrentSta= te() < SNAPBUILD_CONSISTENT), e.g. right after slot creation. 2. Transaction is being re-decoded after a restart (SnapBuildXactNeedsSkip() returns true). 3. Transaction contains TOAST partial changes (rbtxn_has_partial_change), which cannot be streamed. 4. Transaction contains speculative inserts (INSERT ... ON CONFLICT), also flagged as partial changes. 5. Transaction has no streamable changes yet (!rbtxn_has_streamable_change). 6. Output plugin does not support streaming callbacks (e.g. test_decoding without the streaming option). 7. Parallel apply worker is busy for >10 seconds =E2=80=94 the leader fal= ls back to serializing changes to disk (applyparallelworker.c, SHM_SEND_TIMEOUT_MS). 8. No parallel worker available =E2=80=94 the leader serializes the entir= e streamed transaction to disk (worker.c, get_transaction_apply_action =E2=86=92 TRANS_LEADER_SERIALIZE). Additionally, streaming is a *subscription-level* parameter that only applies to built-in logical replication. Users of pg_recvlogical or third-party CDC tools (Debezium, etc.) consume changes directly from the publisher's walsender and have no subscription to configure. So streaming=3Dparallel and logical_decoding_spill_limit are complementary: streaming reduces spilling in the common case, while the spill limit provides a hard safety net for the cases where spilling is unavoidable. > Not sure, but doesn't it mean the error is repeating till the GUC is increased? Good question. Yes, if the same large transaction is re-decoded without any configuration change, the same ERROR will occur again. This is intentional =E2=80=94 the behavior is analogous to temp_file_limit: once the limit is hit, the operation fails, and it will keep failing until the DBA takes action. The DBA has several options to resolve it: - Increase logical_decoding_spill_limit. - Increase logical_decoding_work_mem (so less data is spilled). - Enable streaming on the subscriber (streaming=3Don or streaming=3Dparallel), which avoids spilling in most cases. - Investigate and address the root cause (e.g. break up the large transaction). The ERROR message includes the current spill size and the configured limit, making it straightforward to diagnose. > Also, is there any difference for the slots's behavior, with the normal walsender's > exit case? No, the slot behavior is the same as a normal walsender exit. Specifically: - The slot remains valid (it is NOT invalidated). - restart_lsn and confirmed_flush are preserved. - The subscriber can reconnect and resume from where it left off. - In v2, spill files are properly cleaned up in the error path (via WalSndErrorCleanup), so no orphaned files are left behind. The only difference is that the walsender's exit reason is logged as an ERROR with ERRCODE_CONFIGURATION_LIMIT_EXCEEDED, rather than a normal shutdown. The slot itself is in exactly the same state as if the walsender had exited normally or the connection was dropped. Best regards, Shawn --0000000000008e900a064e909636 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Kuroda,

Thank you for the review and the great questions!


> We have provided the subscription option streaming=3Dparallel since PG= 16. It
> replicates on-going transactions and applies immediately. Does it avoi= d the
> issue?

streaming=3Dparallel does significantly reduce publisher-side spill files in the common case =E2=80=94 when enabled, the reorder buffer streams chang= es
directly instead of spilling to disk.

However, it cannot guarantee 100% avoidance of spilling.=C2=A0 There are several fallback scenarios in the code where streaming is not possible
and the reorder buffer falls back to spill-to-disk even when
streaming=3Dparallel is configured:

=C2=A0 1. Snapshot not yet consistent (snapbuild.c =E2=80=94 SnapBuildCurre= ntState()
=C2=A0 =C2=A0 =C2=A0< SNAPBUILD_CONSISTENT), e.g. right after slot creat= ion.

=C2=A0 2. Transaction is being re-decoded after a restart
=C2=A0 =C2=A0 =C2=A0(SnapBuildXactNeedsSkip() returns true).

=C2=A0 3. Transaction contains TOAST partial changes
=C2=A0 =C2=A0 =C2=A0(rbtxn_has_partial_change), which cannot be streamed.
=C2=A0 4. Transaction contains speculative inserts (INSERT ... ON CONFLICT)= ,
=C2=A0 =C2=A0 =C2=A0also flagged as partial changes.

=C2=A0 5. Transaction has no streamable changes yet
=C2=A0 =C2=A0 =C2=A0(!rbtxn_has_streamable_change).

=C2=A0 6. Output plugin does not support streaming callbacks
=C2=A0 =C2=A0 =C2=A0(e.g. test_decoding without the streaming option).

=C2=A0 7. Parallel apply worker is busy for >10 seconds =E2=80=94 the le= ader falls
=C2=A0 =C2=A0 =C2=A0back to serializing changes to disk
=C2=A0 =C2=A0 =C2=A0(applyparallelworker.c, SHM_SEND_TIMEOUT_MS).

=C2=A0 8. No parallel worker available =E2=80=94 the leader serializes the = entire
=C2=A0 =C2=A0 =C2=A0streamed transaction to disk (worker.c,
=C2=A0 =C2=A0 =C2=A0get_transaction_apply_action =E2=86=92 TRANS_LEADER_SER= IALIZE).

Additionally, streaming is a *subscription-level* parameter that only
applies to built-in logical replication.=C2=A0 Users of pg_recvlogical or third-party CDC tools (Debezium, etc.) consume changes directly from
the publisher's walsender and have no subscription to configure.

So streaming=3Dparallel and logical_decoding_spill_limit are
complementary: streaming reduces spilling in the common case, while
the spill limit provides a hard safety net for the cases where
spilling is unavoidable.


> Not sure, but doesn't it mean the error is repeating till the GUC = is increased?

Good question.=C2=A0 Yes, if the same large transaction is re-decoded
without any configuration change, the same ERROR will occur again.
This is intentional =E2=80=94 the behavior is analogous to temp_file_limit:=
once the limit is hit, the operation fails, and it will keep failing
until the DBA takes action.

The DBA has several options to resolve it:

=C2=A0 - Increase logical_decoding_spill_limit.
=C2=A0 - Increase logical_decoding_work_mem (so less data is spilled).
=C2=A0 - Enable streaming on the subscriber (streaming=3Don or
=C2=A0 =C2=A0 streaming=3Dparallel), which avoids spilling in most cases. =C2=A0 - Investigate and address the root cause (e.g. break up the
=C2=A0 =C2=A0 large transaction).

The ERROR message includes the current spill size and the configured
limit, making it straightforward to diagnose.


> Also, is there any difference for the slots's behavior, with the n= ormal walsender's
> exit case?

No, the slot behavior is the same as a normal walsender exit.
Specifically:

=C2=A0 - The slot remains valid (it is NOT invalidated).
=C2=A0 - restart_lsn and confirmed_flush are preserved.
=C2=A0 - The subscriber can reconnect and resume from where it left off. =C2=A0 - In v2, spill files are properly cleaned up in the error path
=C2=A0 =C2=A0 (via WalSndErrorCleanup), so no orphaned files are left behin= d.

The only difference is that the walsender's exit reason is logged as an ERROR with ERRCODE_CONFIGURATION_LIMIT_EXCEEDED, rather than a
normal shutdown.=C2=A0 The slot itself is in exactly the same state as if the walsender had exited normally or the connection was dropped.

Best regards,
Shawn
--0000000000008e900a064e909636--