public inbox for [email protected]  
help / color / mirror / Atom feed
From: Heikki Linnakangas <[email protected]>
To: Ashutosh Bapat <[email protected]>
To: pgsql-hackers <[email protected]>
To: Andres Freund <[email protected]>
Cc: [email protected]
Subject: Re: Better shared data structure management and resizable shared data structures
Date: Fri, 13 Feb 2026 14:03:12 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com>
References: <CAExHW5vM1bneLYfg0wGeAa=52UiJ3z4vKd3AJ72X8Fw6k3KKrg@mail.gmail.com>

On 13/02/2026 13:47, Ashutosh Bapat wrote:
> `man madvise` has this
>         MADV_REMOVE (since Linux 2.6.16)
>                Free  up a given range of pages and its associated
> backing store.  This is equivalent to punching a
>                hole in the corresponding byte range of the backing
> store (see fallocate(2)).  Subsequent  accesses
>                in the specified address range will see bytes containing zero.
> 
>                The  specified  address  range  must be mapped shared
> and writable.  This flag cannot be applied to
>                locked pages, Huge TLB pages, or VM_PFNMAP pages.
> 
>                In the initial implementation, only tmpfs(5) was
> supported MADV_REMOVE; but since  Linux  3.5,  any
>                filesystem  which  supports  the  fallocate(2)
> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE.
>                Hugetlbfs fails with the error EINVAL and other
> filesystems fail with the error EOPNOTSUPP.
> 
> It says the flag can not be applied to Huge TLB pages. We won't be
> able to make resizable shared memory structures allocated with huge
> pages. That seems like a serious restriction.

Per https://man7.org/linux/man-pages/man2/madvise.2.html:

MADV_REMOVE (since Linux 2.6.16)
               ...

               Support for the Huge TLB filesystem was added in Linux
               v4.3.

> I may be misunderstanding something, but it seems like this is useful
> to free already allocated memory, not necessarily allocate more
> memory. I don't understand how a user would start with a larger
> reserved address space with only small portions of that space being
> backed by memory.

Hmm, I guess you'll need to use MAP_NORESERVE in the first mmap() call. 
to reserve address space for the maximum size, and then 
madvise(MADV_POPULATE_WRITE) using the initial size. Later, 
madvise(MADV_REMOVE) to shrink, and madvise(MADV_POPULATE_WRITE) to grow 
again.

- Heikki






view thread (54+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Better shared data structure management and resizable shared data structures
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox