Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRh9J-002bZc-2N for pgsql-hackers@arkaria.postgresql.org; Tue, 26 May 2026 02:04:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wRh9H-002ZEk-2N for pgsql-hackers@arkaria.postgresql.org; Tue, 26 May 2026 02:04:04 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRh9H-002ZEc-14 for pgsql-hackers@lists.postgresql.org; Tue, 26 May 2026 02:04:04 +0000 Received: from mail-lf1-x12d.google.com ([2a00:1450:4864:20::12d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wRh9G-00000000mkG-2FXH for pgsql-hackers@lists.postgresql.org; Tue, 26 May 2026 02:04:03 +0000 Received: by mail-lf1-x12d.google.com with SMTP id 2adb3069b0e04-5a8e33556c0so12219188e87.0 for ; Mon, 25 May 2026 19:04:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779761040; cv=none; d=google.com; s=arc-20240605; b=d9o8k111rX3FmnvTZx2qDre236rZ1rzATh5/O1piOjWSPBL3OQSqXwNETqk9IY+4Bv 9CTkzkNxNfkO1qUwBBbTmPlooDkKGmBTnSXLTFbI5UDiRFnlUxr2z95Lag8WRzwjHZgS nsMdvBM9o1CI049eqpWD7XR1FVRdsq0Qxo4LqYOnJRZkiwXXf3alva7xPW85792nm3af 2FfyP8TeI7WgCI2ZSnoh6KExcmSeSI5iu6Mc7KgZB9dDvuQ4afm6x9i5UsICvM5zAW+E nws6KopjcR2aTE2jmNLU2hVZkCp+N71O/GxAWek7VE3xlJLpKY6+Q8ngRtOLiaE3ap+m 7kVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=TYmB6XZGXBNIKy999y2CNe7EW/JhORUHlnWYKlootS4=; fh=F440l+NJLCng81Y5a71Obq3+fh/hRx4SJLn5vu3n9kI=; b=ICgXOPCqZNe1ZNAItgpZl9OCEaM1fB9jQhuyl9wB4WIbmbOf3xZW6cBhjChpnv39GQ 2W/0MHsTpMuB6iODX+e/3/j8YZyjMIA1zKQ45EwwbSs/bVX8xIhO7i6TQVF/bWW9JLiu 1Ohn0VmRlPRKItCMxj8UVwj/Kd0zRMywGTYSbKc2hzVAAJb0tXXNYG7RIPwoLNWbB5jG nVVtd4cJ+sxr0dFZnenOHxGvZxKeOBASBPZx4+749m4ZWmPpray86XAtGvgF3pduXLVi 6W/Xq4orlPB6Q0MWItQJYW70aPzr3nYiyR02tb+bWCDTf1qsW4pWZFPu17mQ6ngND19L vOuA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779761040; x=1780365840; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TYmB6XZGXBNIKy999y2CNe7EW/JhORUHlnWYKlootS4=; b=U8aO5xtaAsEYd9zeViofEhChH/g8mfnJLIyfa1MJH77XGEhpuPmCIzecmJFVOoylNG PXPStAgJ4+KedHOT/gdG5Hoji904BNQUpZKlcj2iUwQn51nVsYdC+GuMXKiyZ6CryCjp QomcPvRFi7vxHpflLCJYh2xaiaWFtNgnJZQO9epS+ER7PVj2JtWwduBTJg3/iXkCfcKV NO/u8LiZh8/vyfFgElj9ML/qYjZurqmDgeQjWDyo5sRHrLdKFD1bBdWUbo87T1hyU7iK FXtOJ7gqLhGV3Yb6rAmyeml9balyy99h8cT8QPJSnhXLTJ+IXh5VoLOHnmQFRcSbxeaz xIkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779761040; x=1780365840; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=TYmB6XZGXBNIKy999y2CNe7EW/JhORUHlnWYKlootS4=; b=IgoYhHndXTFnAhPcN3LdEFgDp0UpcTrVd+6PV+58bAuSdFzpY4ARnr38F8ck1ksa2U 4r3F8O97btlJA5F4kJ0yA5SCx7er4pM7INcSyiGSwpb8M3R4+TccTyYzoO9xM2CHM9ZQ WbO8JKWJzU42ULDPDOzi+svCV1ssQ2TPBp6ErhMqlGDpsf4lGiQSjDGjm0hkWEOSTGgy Sg7M1btT6nGoUViQvtMtnuI3CLcZWAngF5kq/5JK2NY4ttlKC8f44YRzotAiNHrKZ1WN BRIR1L3FVwapPiI/5QJTY9Qg0y+9sgLbHzPFs5BK8+ZzKuNQorbu05ylhYdfe4q++IG+ XvaA== X-Gm-Message-State: AOJu0Yygs/mFZRmHtzYN4aj9hD/rXbyVUw5C51LP5HLIvx1pFieuLDxT vDVsLgI66fFckKnoejYMfTc3EdCSoDC30y/ypXNd0OYVsxt/4cBn3cGeaB/O/JRh5Sg/dyrXqYB X7wQSMxJYx/QfVThEyuUcWf/LMiAS4jE= X-Gm-Gg: Acq92OESBBu61p24Dc7cwh6BJTiqilVV8hkD2c66KjO5OHoEoQSrwmKDSG1A3XZFmr9 Is3+bHQIlZUodjewVrmrONT6/SW3/OQM8V1BswvONed+IFGKAQVYe+tCldc/QxORn46Ym/cbVi/ S/LVOyQh+BayVDUTwVv56kRFS+PmwjnUf5PsssnusFTqSPSpvjCoHTIaCKQkGY0Cz6qQWFPaVt3 WG4BSZsUrRE7lgxgruZLu4bfVHa8CezqmdJmDgHhrrzrYuEaEb/SFwqf2GQJNQ60bd5M1zM27N4 G+6uocElKwY5DRn+yzq3dr5aAiIP1f4R5AERBmupobyykwA3ycvOWgDrEF6oPDKcOqYNpH6uCY7 aHajk X-Received: by 2002:a05:6512:33d6:b0:5a8:9a1d:b951 with SMTP id 2adb3069b0e04-5aa2ba62e26mr6466690e87.2.1779761040234; Mon, 25 May 2026 19:04:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Amit Kapila Date: Mon, 25 May 2026 19:03:48 -0700 X-Gm-Features: AVHnY4IxtvdoaJI5V1mqO9oivsLFmhWRCsHItJhQsAKIBsAVie1bi_HgX99EZOU Message-ID: Subject: Re: Bound memory usage during manual slot sync retries To: Xuneng Zhou Cc: pgsql-hackers , itsajin@gmail.com, "Zhijie Hou (Fujitsu)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, May 16, 2026 at 4:35=E2=80=AFAM Xuneng Zhou = wrote: > > On Fri, May 15, 2026 at 9:20=E2=80=AFPM Xuneng Zhou wrote: > > > > Hi Amit, > > > > On Fri, May 15, 2026 at 5:23=E2=80=AFPM Amit Kapila wrote: > > > > > > On Fri, May 15, 2026 at 11:02=E2=80=AFAM Xuneng Zhou wrote: > > > > > > > > pg_sync_replication_slots() now retries inside a single SQL functio= n > > > > call. Unlike the slotsync worker, it does not get a transaction bo= undary > > > > between retry cycles, so allocations made while fetching and synchr= onizing > > > > remote slots can accumulate until the function returns. > > > > > > > > The existing list_free_deep(remote_slots) is not enough to bound th= is. > > > > It frees the List cells and the RemoteSlot structs stored as list > > > > elements, but it does not recursively free allocations owned by tho= se > > > > structs, such as the copied slot name, plugin name, and database na= me. > > > > > > > > > > Right. > > > > > > > It also does not release unrelated per-cycle allocations made while > > > > fetching and processing the remote slots. > > > > > > > > > > BTW, did you notice via test or otherwise, what and how much other > > > unrelated memory is being allocated during each sync cycle if we > > > manually free the allocations you observed? This is mainly to learn > > > the impact of not doing all these allocations in some short-duration > > > memory context. > > > > I noticed this by reading the feature code while walking through the > > PG 19 release notes, not by observing actual memory growth in a > > running system. Besides the RemoteSlot field strings, there seems to > > be a few smaller per-cycle allocations that also accumulate: > > quote_literal_cstr() strings used to build the filtered query, a > > temporary TextDatumGetCString() result for invalidation_reason, the > > standalone TupleTableSlot in fetch_remote_slots(), and the list > > container built by get_local_synced_slots(). I chose a per-cycle > > memory context over manual pfrees to make the retry-cycle lifetime > > explicit and avoid maintaining a destructor for every current and > > future allocation in this path. It may be worth measuring how much > > extra memory actually accumulates during an extended wait to confirm > > the practical impact. > > I did some measurements for the memory growth in the manual > pg_sync_replication_slots() retry path. > > The test used 100 failover logical slots and forced > pg_sync_replication_slots() to keep retrying on the standby. Memory > was sampled with pg_log_backend_memory_contexts() on the backend > running the function. > > On HEAD, the short run showed: > timestamp total_bytes delta > 2026-05-16T10:47:54Z,63422,1339920, > 2026-05-16T10:47:56Z,63422,1405456,65536 > 2026-05-16T10:47:59Z,63422,1405456,0 > 2026-05-16T10:48:01Z,63422,1405456,0 > 2026-05-16T10:48:03Z,63422,1536528,131072 > 2026-05-16T10:48:05Z,63422,1536528,0 > > So the total increase was about 192 KiB. > > After adding a targeted cleanup for the copied RemoteSlot strings, the > same test showed: > > timestamp total_bytes delta > 2026-05-16T11:04:58Z,77000,1339920, > 2026-05-16T11:05:00Z,77000,1339920,0 > 2026-05-16T11:05:02Z,77000,1405456,65536 > 2026-05-16T11:05:04Z,77000,1405456,0 > > So the increase dropped to about 64 KiB. > > With a per-retry memory context around the fetch/synchronize cycle, > the same test stayed flat: > > 2026-05-16T11:09:10Z,84600,1315344, > 2026-05-16T11:09:12Z,84600,1315344,0 > 2026-05-16T11:09:14Z,84600,1315344,0 > 2026-05-16T11:09:16Z,84600,1315344,0 > 2026-05-16T11:09:18Z,84600,1315344,0 > 2026-05-16T11:09:20Z,84600,1315344,0 > > Looking at the memory-context dumps, the growth on HEAD was visible > under ExprContext. The grand-total increase matched the ExprContex > increase, which is consistent with retry-cycle allocations surviving > for the duration of the single SQL function call. > > That said, the practical severity looks small. This is mainly because > the retry loop is not a tight loop. With no slot activity, > wait_for_slot_activity() doubles the sleep time up to 30 seconds. > > So after about 51 seconds it retries only about twice per minute. For > 100 slots and no slot activity, a rough 1-hour test from the short > runs is on the order of a few MB on HEAD, and around 1 MB with the > manual RemoteSlot cleanup. For installations with fewer slots, it > should be smaller. > > So my read is: > the accumulation is real; > it is modest because of the retry backoff; > manually freeing RemoteSlot=E2=80=99s copied strings removes most of the > observed growth; > a per-retry memory context fully bounds the retry-cycle lifetime > Okay, then let's go with a per-retry memory context approach. @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list) local_slot->data.database)); } } + + list_free(local_slots); } Why do we need this retail pfree if the caller is using memory context? --=20 With Regards, Amit Kapila.