public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tomas Vondra <[email protected]>
To: Jakub Wartak <[email protected]>
To: Christoph Berg <[email protected]>
Cc: [email protected]
Subject: Re: failed NUMA pages inquiry status: Operation not permitted
Date: Tue, 6 Jan 2026 16:36:15 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAKZiRmwV_O73DdSosD-k62kS2wWPc3C8mRZY8j9ozfOu5OLLjg@mail.gmail.com>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAKZiRmwV_O73DdSosD-k62kS2wWPc3C8mRZY8j9ozfOu5OLLjg@mail.gmail.com>



On 1/6/26 14:23, Jakub Wartak wrote:
> On Mon, Jan 5, 2026 at 11:30 PM Christoph Berg <[email protected]> wrote:
>>
>> Re: Tomas Vondra
>>> I guess the only solution is to accept -2 as a possible value (unknown
>>> node). But that makes regression testing harder, because it means the
>>> output could change a lot ...
> 
> Hi Tomas! That's pretty wild, nice find about that swapping s_b thing!
> So just to confirm, that was reproduced outside containers/docker,
> right?
> 

Yes, this is a regular bare-metal Debian system.

>> Or just not test that, or do something like
>>
>> select numa_node = -2 or numa_node between 0 and 1000 from pg_shmem_allocations_numa;
> 
> Well, with the huge-pages it should be not swappable, so another idea
> would be simply alter first line of src/test/regress/sql/numa.sql and
> sql/pg_buffercache_numa.sql just like below:
> - SELECT NOT(pg_numa_available()) AS skip_test \gset
> + SELECT (pg_numa_available() is false OR
> current_setting('huge_pages_status')::bool is false) as skip_test
> \gset
> 
> (I'm making assumption that there are buildfarm animals that
> huge_pages enabled, no idea how to check that)
> 

Yes, using huge pages makes this go away.

I'm also even more sure it's about swap, because /proc/PID/smaps for
postmaster tracks how much of the mapping is in swap, and with regular
memory pages I get values like this for the main shmem segment:

Swap:              90508 kB
Swap:             275272 kB
Swap:             135020 kB
Swap:             116460 kB
Swap:             102388 kB
Swap:              93832 kB
Swap:             155616 kB
Swap:             165692 kB

These are just values from "grep" while the pgbench is running. The
instance has 16GB shared buffers, so 200MB is close to 1%. Not a huge
part, but still ...

I've always "known" shared buffers could be swapped out, but I've never
realized it would affect cases like this one.

I'm not a huge fan of fixing just the tests. Sure, the tests will pass,
but what's the point of that if you then can't run this on production
because it also fails (I mean, the pg_shmem_allocations_numa will fail)?

I think it's clear we need to tweak this to handle -2 status. And then
also adjust tests to accept non-deterministic results.


regards

-- 
Tomas Vondra






view thread (83+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: failed NUMA pages inquiry status: Operation not permitted
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox