Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w9JRU-001L7p-04 for pgsql-hackers@arkaria.postgresql.org; Sun, 05 Apr 2026 09:06:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w9JRP-0027V5-2X for pgsql-hackers@arkaria.postgresql.org; Sun, 05 Apr 2026 09:06:48 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w9JRP-0027Uw-1D for pgsql-hackers@lists.postgresql.org; Sun, 05 Apr 2026 09:06:47 +0000 Received: from mail-lf1-x131.google.com ([2a00:1450:4864:20::131]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w9JRM-00000000hWk-3uqL for pgsql-hackers@postgresql.org; Sun, 05 Apr 2026 09:06:47 +0000 Received: by mail-lf1-x131.google.com with SMTP id 2adb3069b0e04-5a3cf308c54so1158816e87.2 for ; Sun, 05 Apr 2026 02:06:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775380000; cv=none; d=google.com; s=arc-20240605; b=WEzn++pYHyxNNqsNVXi1Bvy0IbzKB87ptDRiuMzkiCyrgKE+kFst8Enwo0tChYZDLA mdRuc5MDN90XRXOstKkz5nWUmVZ6MfXxVlY96bzVaO/4hAXUos28MBinYUyxvV6jm5g4 AXfYqk/I5uuXCdbWDXvhHg+AYfAZjgnJz2hSFHOG+/z8zqTp3NETEOalAKw4pbGcMECi vdo/dPA7AmwArjP2FJ0vPlxpCJDpCwQej802qQyU7gKMpo9hSyzF0F3mjSbutxrWaeho IM3v2S3Q4C/Df5jGSDsT1nqqNcwbAcESPihxev1Lu0+ea4oFOmghbJG3wM3KDahMynBC KAcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=aqydqBOxKcJoU2KQhD+k7ulXCcKN5PCFGHdwN47Ckzw=; fh=kez+8B/L5N7GsTQwUpIvIif5uF5ygseb16Lf9jjyBnY=; b=ZDxG3aMoMMrfguxsQe7jOxTdFbDV0SV8h+6ItxPC+AmU0qByEz99GE/vReg+DFXK+K GZHm7sK5i9UUmj4PfjlTPQAbzeiejKGPKtAPpKSUalGzJLKMUh1sx0d3bdjiN9EA/wIY cCcKRJAiz/jQ6v0NJgVRqjEkNK+Ggv6yEn6eGVWL27z5eHqP9GxGv9aZF2J9H/RwWxzv ewq3vR6072AYOSZJAlmvGCoGS3WKNokpSwRYjCxPM9wuOz76IB/J8Y+P7991RLjX1kuU SXscLc5K8ZPMO38DOhQD9S4SKmf8FTLDF0fsLX8J6CCWQd/ojfCUnferw60xqyx9hNgz McOQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775380000; x=1775984800; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aqydqBOxKcJoU2KQhD+k7ulXCcKN5PCFGHdwN47Ckzw=; b=XoPiV3v8nmLJZK+HIPZpen5TbRAqKcxmc/5aBs6xBNWC2RUwMr/M7KeMoikBG9TY4y aYdjr7SBY1kY9kf+rgj7DtWpajG8qFaCBaJo639I+EiIGRSSc0t48EUEv9p0T6XV9iFa vJ55und1Jcd/O6UN7qYYUHzpip0sxkWdalARro5r2P1LooV5VdpRpAHrbeGLeXKefy51 GQUN6emaTd3OGQwuq2JIqzSVYspsiKYtv7L7naXe52OxRzRk9wZqLjMXsDlMbUKhLFGZ C8I0NRKHUpDw6lnizXho2KEWoKMzKK05+7juXCsasmcI69H9qVpHOVSfsLzZzJc1sALs fpiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775380000; x=1775984800; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=aqydqBOxKcJoU2KQhD+k7ulXCcKN5PCFGHdwN47Ckzw=; b=LIbdNxxXhf3U+CiJu/Y5L4Uw32IG/sXOBbQg4nl3xBWkq8uivWnKENy3PLG4ELBRnf 01k3AT4zdjNZ/Ut7zoTLeQCDdKm6M+wxvT6tXENOtpaGnOSpRCsw0kDsWyDAeRpmSKEE c6WnO+KYGFQuI8oexCYAh29lLf2D1/UgfINHRMD246GJgswymm2QmCB8eht0xbTgrsK9 ODPEQx63gBZrN8bDP7Qpu2S+/aLTmXi+HX/p+fa7lF5RtgCe7iOohSDtqzcXeq97YpRy y6VgBrksq5SW92PmeA+y1eMYNIdVqy5nY5GZjDxhUDWA/Yc/PgMq+yfyIEdQMbIsFf7h eiQQ== X-Forwarded-Encrypted: i=1; AJvYcCUxyxsXDkzevLoNyjeo/awcyhEsRwLgkAZv46xwfrUnDUOq/bqqWCrp1g28RCPpa4B/Ne5oyzBwfTwDnLFx@postgresql.org X-Gm-Message-State: AOJu0Yw+mkWs5EuFxKiZgm/Rl7UxNa2152mC6pH7k4lZx8ISq4ICI5wB fNWzMkKiQBiGfW19v5nJRWGl/hwtRh9UmURZbanObp6frjk1MSbEf7c1BhpPEerZhfPhZWTB2JN d1eWPUg5TkHQ9I8yWshRGHmtUoXOMxFo= X-Gm-Gg: AeBDieuSYevk4xbD+n6E8sPlrFw69pdnHiPUvQ/V2S2STyhkDrwJ1ueu43yI/mPq/BO 6qnn8Du7e4rJOw2oC4ErLOjmFeXCJvKPFgPLGS4brOI6UjoO5mV8eY8EsLkKAWT0MQKGMwNJd3k 7j1RJyO6GsL/LCtXJceoXflkBTPN5JfEikPEcXiQ7If4vAGt8iWU3fnsFHTWNw5TlkoI42yWWiS y0AFBRPZ0mXQjiKQOQVw5FVRRAUbUNuYmpgLofzddBliuBLOIqXv/aMNcQ6SxMQnmksYhJMncYj bMK1hmY0BKw0PypamfRA3KeeTDopqKybIGq0BKQAFECtDFBQX1OcmKYRgMIUizGXzSDp2GWmndE 0KcE+ X-Received: by 2002:a05:6512:3b1f:b0:5a1:3bfd:b87b with SMTP id 2adb3069b0e04-5a337588a0emr2668501e87.30.1775379999166; Sun, 05 Apr 2026 02:06:39 -0700 (PDT) MIME-Version: 1.0 References: <113724ab-0028-493f-9605-6e8570f0939f@iki.fi> <791c3f18-f4de-4d84-ac6b-c7ccc074dd38@iki.fi> <9d919bd9-94dd-4bda-8ccf-ebced4178c53@iki.fi> <7d3ba240-9350-4dfc-bbe1-be6584aee236@iki.fi> <1c3a07a7-158d-4800-927c-2641c73277d8@iki.fi> In-Reply-To: From: Matthias van de Meent Date: Sun, 5 Apr 2026 11:06:27 +0200 X-Gm-Features: AQROBzBapZxQ99eXIDXcacazzHYJHRShaO9H0VUpBIUiGY7gMoUHD1CB3QtUw0M Message-ID: Subject: Re: Better shared data structure management and resizable shared data structures To: Ashutosh Bapat Cc: Heikki Linnakangas , Robert Haas , Andres Freund , pgsql-hackers , chaturvedipalak1911@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, 5 Apr 2026, 07:59 Ashutosh Bapat, wr= ote: > > On Sun, Apr 5, 2026 at 11:18=E2=80=AFAM Ashutosh Bapat > wrote: > > > > I will post my resizable shmem structures patch in a separate email in > > this thread but continue to review your patches. > > > > Attached is your patchset (0001 - 0014) + resizable shared memory > structures patchset 0015. > > Resizable shared memory structures > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > > When allocating memory to the requested shared structures, we allocate > space for each structure. In mmap'ed shared memory, the memory is > allocated against those structures only when those structures are > initialized. > Resizable shared memory structures are simply allocated maximum space > when that happens. The function which initializes the structure is > expected to initialize only the memory worth its initial size. When > resizing the structure memory is freed or allocated against the > reserved space depending upon the new size. This allows the structures > to be resized while keeping their starting address stable which is a > hard requirement in PostgreSQL. > > Resizable shared memory feature depends upon the existence of function > madvise() and constants MADV_REMOVE and MADV_WRITE_POPULATE. > > On the platforms which do not have these, we disable this feature at > compile time. The commit introduces a compile time flag > HAVE_RESIZABLE_SHMEM which is defined if MADV_REMOVE and > MADV_WRITE_POPULATE exist. We don't check the existence of madvise > separately, since the existence of the constants implies the existence > of the function. > > HAVE_RESIZABLE_SHMEM is not defined in EXEC_BACKEND builds since > that's largely used for Windows where the APIs to free and allocate > memory from and to a given address space are not known to the author > right now. Given that PostgreSQL is used widely on Linux, providing > this feature on Linux covers benefits most of its users. Once we > figure out the required Windows APIs, we will support this feature on > Windows as well. > > The feature is also not available when Sys-V shared memory is used > even on Linux since we do not know whether required Sys-V APIs exist; > mostly they don't. Since that combination is only available for > development and testing, not supporting the feature isn't going to > impact PostgreSQL users much. > > Using HAVE_RESIZABLE_SHMEM we disable compiling the code related to > resizable shared memory structures on the platforms which do not > support the feature. But we also have run time checks to disable this > feature when Sys-V shared memory is used. In order to know whether a > given instance of a running server supports resizable structures, we > have introduced GUC have_resizable_shmem. I'm not opposed to HAVE_RESIZABLE_SHMEM, but is it universal enough on its platforms to make it part of the exposed ABI for Shmem? I think that we should expose the same functions and structs, and just have the shmem internals throw an error if the configuration used by the user implies the user wants to update shmem sizing when the system doesn't support it. That would avoid extensions having to recompile between have/have not systems that have an otherwise compatible ABI; especially when those extensions don't actually need the resizeable part of the shmem system. > Following points are up for discussion > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > 1. calculation of allocated_size of resizable structures > -------------------------------- > The memory allocated to that structure is the > {maximum size of the structure} - {total size of unallocated pages}. I > think setting allocated_size to the actually allocated memory is more > accurate than {current size of the structure} + {alignment} which does > not reflect the actual memory allocated to the structure. I would like > to know what others think. I agree: For allocated_size, it should be the max size of the structure (+alignment, if any), minus the total size of its deallocated pages. Nit: I think "reserved"/"space_reserved" is a better descriptor than "allocated_space", as "allocated_space" could reasonably imply the memory isn't available to the OS. > 2. maximum_size member in various structures and in pg_shmem_allocations = view > -------------------------------------------------------------------------= ---- > A resizable structure is requested by specifying non-zero maximum_size > in ShmemStructOpts. It gets copied to the maximum_size member in > ShmemStructDesc, ShmemIndexEnt. The question is for fixed-size > structures what should be the value maximum_size in those structures. > Setting it to the same value as the size member in the respective > structure is logical since their maximum size is the same as their > initial size. Note that currently, your patch rejects the case where resizeable structs are initialized at their maximum size: > +++ b/src/backend/storage/ipc/shmem.c > +#ifdef HAVE_RESIZABLE_SHMEM > + if (options->maximum_size > 0 && options->size >=3D options->maximum= _size) > + elog(ERROR, "resizable shared memory structure \"%s\" should hav= e maximum size (%zd) greater than size (%zd)", > + options->name, options->maximum_size, options->size); It'd need to check 'options->size > options->maximum_size' to allow max-sized initialization to succeed here without erroring. > But if we do so, we need another member in > ShmemStructDesc and ShmemIndexEnt to indicate whether the structure is > resizable or not. Instead the patches set maximum_size to 0 for > fixed-size structures and non-zero for resizable structures. This way > we can check whether a structure is resizable or not by checking > whether its maximum_size is zero or not. pg_shmem_allocations view > also has a maximum_size column which has the similar characteristics. > I would like to know what others think. I think that shmem allocations can set .size for the initial size, and .minimum_size/.maximum_size for configuring resizeability; The latter fields can then be initialized with .size if they're 0. > 3. allocated_space member in various structures and in pg_shmem_allocatio= ns view > -------------------------------------------------------------------------= ------ > The patch adds a new member allocated_space to ShmemIndexEnt and > pg_shmem_allocations view. allocated_space to maximum_size is what > allocated_size is to size - it's the type aligned value of > maximum_size. But it also highlights the difference between the > address space allocation and the actual memory allocation. This > difference is crucial to resizable structures. However, unlike > maximum_size, we set it to a non-zero value, allocated_size, for > fixed-size structures as well since they are allocated the same amount > of space as their allocated_size. While this seems logically correct > to me, some may find maximum_size to be zero but allocated_space to be > non-zero for fixed-size structures a bit weird. I would like to know > what others think. I'd prefer to have consistent values; constant-sized structs are no different from resizable structs whose min/max size equal their current size. The only alternative that I think could be considered correct is returning NULL for those, but zero is definitely wrong. Note that returning min/max=3Dsize would also allow for better aggregations on pg_shmem_allocations columns. Note: if we expose minimum_size, we may also want to expose min_allocated_size (i.e., the full reservation minus the size of MADV_REMOVEd pages when the shmem allocation is min-sized). > As a side question, do we want to allow users to specify minimum_size > in ShmemStructOpts for resizable structures? Resizing memory lower > than that would be prohibited. For fixed sized structures, > minimum_size would be same as size and also maximum_size. I think it would be useful, if only to inform users and developers about this in e.g. pg_shmem_allocations. > For now, it > seems only for the sanity checks, but it could be seen as a useful > safety feature. A difference in maximum_size and minimum_size would > indicate that the structure is resizable. I think that's the right approach. > 4. to mprotect or not to mprotect > --------------------------------- > If memory beyond the current size of a resizable structure is > accessed, it won't cause any segfault or bus error. When writing > memory will be simply allocated and when reading, it will return > zeroes if memory is not allocated yet. mprotect'ing the memory beyond > the current size of a resizable structure to PROT_NONE can prevent > accidental access to unallocated memory (sans page boundaries), but it > needs to be done in every backend process which requires a > synchronization mechanism beyond the scope of shmem.c. Hence the patch > does not use mprotect. It seems to me that the synchronization is a crucial component of resizing; isn't it bad if shmem structs can suddenly without synchronization contain zeroes? > A subsystem will require some higher level > synchronization mechanism between users of the structure and the > process which resizes it. That synchronization mechanism can be used > to mprotect the memory, if required. I have documented this, but I > would like to know whether we should provide an API in shmem.c to > mprotect. I think we should; I think it would simplify and deduplicate external code that needs to mark the pages PROT_NONE, and centralize OS page calculations to within the shmem subsystem. It'd also allow checks that validate that the pages marked with PROT_NONE are 1) within a shmem allocation and 2) currently not in use by that shmem allocation. (Was there a point 5. for discussion? I can't find it) (This is where I ran out of time for these questions, sorry I didn't get to point 6) Kind regards, Matthias van de Meent