Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wALLl-002Hxr-0W for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Apr 2026 05:21:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wALLj-004xio-1s for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Apr 2026 05:21:12 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wALLj-004xig-0q for pgsql-hackers@lists.postgresql.org; Wed, 08 Apr 2026 05:21:11 +0000 Received: from mail-wr1-x430.google.com ([2a00:1450:4864:20::430]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wALLh-00000001F9r-1QPg for pgsql-hackers@postgresql.org; Wed, 08 Apr 2026 05:21:11 +0000 Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-43cf73bbfbdso3404880f8f.1 for ; Tue, 07 Apr 2026 22:21:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775625668; cv=none; d=google.com; s=arc-20240605; b=k0kBiz2Wo+05SCQCTnNkjornaqDa010xeVNZt7g/0cZ5fHFB+8jhc7yam5HI5mIx91 OlR8Uuo8z2cNveumi/pUpZa9scvlN4CWCId1L5F5/Od+2V6/BbZCT3h2jSgaJhVp59WD 1GHGoBSEEO3UX7KdboUZY4g8lVYHyGA9AR+Jp5m0lMbLm6XSrD0nVFOKzIfutChJmo1n /m30CDnA3LLufKTA2ITvoy0OhJnSqpPD5NhFwRRiZ4pNW7BqsUc5yFzjc73XX7UiH09l 4kmB2Z0y9QggoFkhpZfTUb4b1KkZDYlpA+YpVlrSryhoqbnT9A8n5m9bFpY92H3ZVMEc IEDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=R/l1kZbtfzf/7JLTNh2x020iUS5lY/WFmNnLFMfpVmo=; fh=swt11EZeTUiX2MTJxuDAbcxQLEmWDfSxNqIgkhUOOzs=; b=CdQ0K7IS7lCEWxfaOBqXfxPE+u5ojX7EnTbRBLMPi30JZh9t4CrxqQ8DN+X2CPA9pP auVDq4DOyP3b9CltPrzLaCY20hInyw68RNVA0ltF/WF7gDvJ8ddJXdxkH4LoP1nuWLFR FM7GAQg/keHZ5AgjTYCMVAT6FxUjjtFuyRHJ7P16xFSW8NRaf+vg/j9AUY8q6hmVM62b JIB49aZB9p6KDGbbP13cJVAlQmntpny4D1RuzokX76UqyfA8OJgkNFeLH3215KOP1Wk3 rzIZQCygYupX9g0f+3y7COpH7CEn0D9pwR0Rv4j4lBD+r4CGDO4j2sJPLsg8mUlzcYtD eOLA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775625668; x=1776230468; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=R/l1kZbtfzf/7JLTNh2x020iUS5lY/WFmNnLFMfpVmo=; b=in0xpRgZ7YrQzEOrYTAEwUlQkxCuU7ICVgQ48w0uCDMZLqVObbJ1GeIpgFLhA+sE96 wGcaOTUXHqvphiwfVDYBFehINmc4M/77q6niHYbJoxAXS+sseS8KqdHc9NJzHP7kYkov EX3mF1EV8FH6dWuFEGebac429reLKM5HQugYv6OMTT2lVcSjwyk3opJnFecXQSYe6SrH BIpzWgr6dyzyLpRusVWvkQo7LYhHukMeFkgR2mTlqnCYBy9M7HS6fFup7xsmfVbgGtgi wjmLRVq3sZv4B/h1sG/7osM8NwFlZY6CssA92frMDxcHamawy3QGEIKzYz5nXCL9HqK0 /kQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775625668; x=1776230468; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=R/l1kZbtfzf/7JLTNh2x020iUS5lY/WFmNnLFMfpVmo=; b=UeQYs1ONEI5XrUk0EWbRcVASci/2p4LM6mt/6bs8QmeqdjD8DBFXOA9XaaDXW5mB28 ixcMpJcS/H6XUiqhyTdi5VEQ7MQj8kvpVmenQ+D3oIHllb77b6BjSXHZBerLfebqbCWi tF92weBJ9WvUPkaZWalPmXUiW6Tw6HaviLwO4Qb1vEGFaCiJWlEJTxU4JDVzjPitY9kB hHUY2AxtQx+pp+LGYuHa38xmdP4f+gQ3gwvo2vMlGjrqNKAj/4vhcvT3XMmnX0Y9lCSn a+ddtipdcghff46OKoKHjX43hHDoot8p9kYBfd08lon7oKGIaRrZjmoPnob5NGS7vaPS TjBQ== X-Forwarded-Encrypted: i=1; AJvYcCUU7Yf2u+4NaNHVwdNoOF5JqStDIqOs0Rsc1aW5JQLSbFFFguWuvQNUdGDB+liteQifSxhGbGInfHDz+pyf@postgresql.org X-Gm-Message-State: AOJu0YykyXhdH1eizxfeQbYYPXY06O/FXmvLcOs6Hj/2DEo3l5rLM9Ls ZtGn8jogpXYkXlTP1h8JPyBz5hGjY60p5q4fNdBa2hweHx1d+M41MlsOS7dJyw3aLV6XXGngoNw FvnOVctQoz6UseSOBlsV5BjhoU2TJyVk= X-Gm-Gg: AeBDieuddRl1BQrEvZsWwn3eco6mbs+6IfR1VPtwTNwy7RxXPmdnswZ1SEe7ks/Hezg cSHKQ4U2LAryXseZEJCiOtKGesmvzIc1RI3Acwy4tpslE9pX9GgxGO86z63jfiaRDT5+RLttxuR cjByoWAf1FjAHrLexcLuAE7RZ1/J57Q1DmUJ39EzCGI06NYn6WPXXLLLx25NdAx3LGaH4yid8lz vpSn/A6l3CvXFGx93lQRkX2czZKLcfFpp313GXCZWRcK48ZFJ1dNFMPpbhiax3E5wcLYI/+LWGh 1sTYprJrTrcOOS9PZYAvdKFxmWpfEKssGVFoeFNHSqswCz93OlU/zg== X-Received: by 2002:a05:6000:4201:b0:43c:f5d7:94dd with SMTP id ffacd0b85a97d-43d29293a79mr28914363f8f.11.1775625668039; Tue, 07 Apr 2026 22:21:08 -0700 (PDT) MIME-Version: 1.0 References: <7d3ba240-9350-4dfc-bbe1-be6584aee236@iki.fi> <1c3a07a7-158d-4800-927c-2641c73277d8@iki.fi> <6d4383eb-4aaa-47ae-bda8-ee40dc60ad84@iki.fi> In-Reply-To: From: Ashutosh Bapat Date: Wed, 8 Apr 2026 10:50:53 +0530 X-Gm-Features: AQROBzAzG4-x85XA0_TgfLuhcwCwqgmuCmOmxoLahzwEOruQew1szia8-gn1SsE Message-ID: Subject: Re: Better shared data structure management and resizable shared data structures To: Andres Freund Cc: Heikki Linnakangas , Matthias van de Meent , Robert Haas , pgsql-hackers , chaturvedipalak1911@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Apr 8, 2026 at 1:39=E2=80=AFAM Andres Freund w= rote: > > Hi, > > On 2026-04-07 22:48:17 +0300, Heikki Linnakangas wrote: > > > +/* > > > + * ShmemResizeStruct() --- resize a resizable shared memory structur= e. > > > + * > > > + * The new size must be within [minimum_size, maximum_size]. If the= structure > > > + * is being shrunk, the memory pages that are no longer needed are f= reed. If > > > + * the structure is being expanded, the memory pages that are needed= for the > > > + * new size are allocated. See EstimateAllocatedSize() for explanati= on of which > > > + * pages are allocated for a resizable structure. > > > + */ > > > +void > > > +ShmemResizeStruct(const char *name, Size new_size) > > > > This interface only allows shrinking and growing the allocated region a= t the > > end, but the underlying mechanism is madvise(MADV_REMOVE) and > > madvise(MADV_WRITE_POPULATE), which supports also "punching holes", i.e= . > > freeing memory in the middle of a region. Do we gain anything by restri= cting > > ourselves to changing the size at the end? It seems to me that it could= be > > handy to punch holes for some use cases. > > Agreed. The hard part may be the "communication" with the user about how > granular the punches can be. Because that will depend on things like > huge_pages, huge_page_size and may depend on what alignment you happened = to > get. > We can extend it that way if there is a valid usecase. For now I kept it simple for two reasons: 1. Buffer manager structures shrink and expand only at the end right now. Longer note on buffer lookup table later. This effort started with buffer resizing and didn't want to expand scope more than what's needed. 2. Not all the approaches we tried to implement resizable shared memory have the facility to free memory in the middle. Usually they have a facility to shrink or expand at the end. If we offer ability to free memory in the middle based on facilities on one platform, we will face big hurdles when supporting other platforms. I think it's better to avoid it when it's not needed. Buffer lookup table is fixed. It may benefit from punching holes in the middle if we can somehow get pages worth of free entries together somewhere in the middle. First it's not easy to perform such compaction. But even if implement compaction, we can collect those entries at the end instead of in the middle; the current API will still be useful. Is there any other usecase you are envisioning? I also think that it will be better to introduce a new ShmemFreeStructPart()/ShmemAllocStructPart() instead of the current ShmemResizeStruct(). > > > What's the portability story? I understand that this is Linux-only at t= he > > moment, but what platforms can we support in the future, and what's the > > effort? I think BSD's have similar capabilities with plain mmap() and > > MADV_FREE if I read the man pages right. > > At least linux' MADV_FREE is only for private mappings. It's not clear in= at > least freebsd's man page, but the described use case makes me suspect it = may > be similar there. > looks so. FreeBSD also has fallocate with PUNCH_HOLES. We could use it with fd created using memfd_create() on .and it will need memfd_create(). I haven't checked whether that works. > > > What about macOS and Windows? This doesn't necessarily need to be fully > > portable, if some OS's don't have the capabilities we need, but would b= e > > nice to know what's possible. > > Looks like windows has OfferVirtualMemory > https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryap= i-offervirtualmemory > but it's not clear to me if it actually does what we need when multiple > processes are attached. > Those APIs look similar to madvise+ MADV_REMOVE/MADV_WRITE_POPULATE, with specific and cleaner interface. At least worth a try. > I suspect it's going to be a lot easier once we're threaded... The reaso= n I > am ok with doing resizing this way before threading is because it's > architecturally pretty similar to what you'd want to do once threaded, so= it's > not a huge dead end. But I'm doubtful we'll find facilities that allow t= his > across processes in all operating systems... check --=20 Best Wishes, Ashutosh Bapat