public inbox for [email protected]
help / color / mirror / Atom feedpgsql: Introduce pg_shmem_allocations_numa view
83+ messages / 8 participants
[nested] [flat]
* pgsql: Introduce pg_shmem_allocations_numa view
@ 2025-04-07 21:18 Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
0 siblings, 2 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-04-07 21:18 UTC (permalink / raw)
To: [email protected]
Introduce pg_shmem_allocations_numa view
Introduce new pg_shmem_alloctions_numa view with information about how
shared memory is distributed across NUMA nodes. For each shared memory
segment, the view returns one row for each NUMA node backing it, with
the total amount of memory allocated from that node.
The view may be relatively expensive, especially when executed for the
first time in a backend, as it has to touch all memory pages to get
reliable information about the NUMA node. This may also force allocation
of the shared memory.
Unlike pg_shmem_allocations, the view does not show anonymous shared
memory allocations. It also does not show memory allocated using the
dynamic shared memory infrastructure.
Author: Jakub Wartak <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/8cc139bec34a2971b0682a04eb52ce7b3f5bb425
Modified Files
--------------
doc/src/sgml/system-views.sgml | 95 ++++++++++++++++++
src/backend/catalog/system_views.sql | 8 ++
src/backend/storage/ipc/shmem.c | 159 +++++++++++++++++++++++++++++++
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.dat | 8 ++
src/test/regress/expected/numa.out | 13 +++
src/test/regress/expected/numa_1.out | 5 +
src/test/regress/expected/privileges.out | 16 +++-
src/test/regress/expected/rules.out | 4 +
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/numa.sql | 10 ++
src/test/regress/sql/privileges.sql | 6 +-
12 files changed, 322 insertions(+), 6 deletions(-)
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-12 21:16 ` Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-12 21:16 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: [email protected]
Re: Tomas Vondra
> Introduce pg_shmem_allocations_numa view
This is acting up on Debian's 32-bit architectures, namely i386, armel
and armhf:
--- /build/reproducible-path/postgresql-18-18~beta1+20250612/src/test/regress/expected/numa.out 2025-06-12 12:21:21.000000000 +0000
+++ /build/reproducible-path/postgresql-18-18~beta1+20250612/build/src/test/regress/results/numa.out 2025-06-12 20:20:33.124292694 +0000
@@ -6,8 +6,4 @@
-- switch to superuser
\c -
SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
- ok
-----
- t
-(1 row)
-
+ERROR: invalid NUMA node id outside of allowed range [0, 0]: -14
The diff is the same on all architectures.
-14 seems to be -EFAULT, and move_pages(2) says:
Page states in the status array
The following values can be returned in each element of the status array.
-EFAULT
This is a zero page or the memory area is not mapped by the process.
https://buildd.debian.org/status/logs.php?pkg=postgresql-18&ver=18%7Ebeta1%2B20250612-1
https://buildd.debian.org/status/fetch.php?pkg=postgresql-18&arch=armel&ver=18%7Ebeta1%2B202...
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 14:42 ` Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 14:42 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: [email protected]
Re: To Tomas Vondra
> This is acting up on Debian's 32-bit architectures, namely i386, armel
> and armhf:
... and x32 (x86_64 instruction set with 32-bit pointers).
> SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
> +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -14
>
> -14 seems to be -EFAULT, and move_pages(2) says:
> -EFAULT
> This is a zero page or the memory area is not mapped by the process.
I did some debugging on i386 and made it print the page numbers:
SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
+WARNING: invalid NUMA node id outside of allowed range [0, 0]: -14 for page 35
+WARNING: invalid NUMA node id outside of allowed range [0, 0]: -14 for page 36
...
+WARNING: invalid NUMA node id outside of allowed range [0, 0]: -14 for page 32768
+WARNING: invalid NUMA node id outside of allowed range [0, 0]: -14 for page 32769
So it works for the first few pages and then the rest is EFAULT.
I think the pg_numa_touch_mem_if_required() hack might not be enough
to force the pages to be allocated. Changing that to a memcpy() didn't
help. Is there some optimization that zero pages aren't allocated
until being written to?
Why do we try to force the pages to be allocated at all? This is just
a monitoring function, it should not change the actual system state.
Why not just skip any page where the status is <0 ?
The attached patch removes that logic. Regression tests pass, but we
probably have to think about whether to report these negative numbers
as-is or perhaps convert them to NULL.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 14:48 ` Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 14:48 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: [email protected]
Re: To Tomas Vondra
> Why do we try to force the pages to be allocated at all? This is just
> a monitoring function, it should not change the actual system state.
One-time touching might also not be enough, what if the pages later
get swapped out and the monitoring functions are called again? They
will have to deal with these "not in memory" error conditions anyway.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 15:14 ` Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Andres Freund @ 2025-06-23 15:14 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; [email protected]
Hi,
On 2025-06-23 16:48:27 +0200, Christoph Berg wrote:
> Re: To Tomas Vondra
> > Why do we try to force the pages to be allocated at all? This is just
> > a monitoring function, it should not change the actual system state.
The problem is that the kernel function just gives bogus results for pages
that *are* present in memory but that have only touched in another process
that has mapped the same range of memory.
> One-time touching might also not be enough, what if the pages later
> get swapped out and the monitoring functions are called again?
I don't think that's a problem, the process still has a relevant page table
entry in that case.
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
@ 2025-06-23 15:20 ` Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 15:20 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Tomas Vondra <[email protected]>; [email protected]
Re: Andres Freund
> > > Why do we try to force the pages to be allocated at all? This is just
> > > a monitoring function, it should not change the actual system state.
>
> The problem is that the kernel function just gives bogus results for pages
> that *are* present in memory but that have only touched in another process
> that has mapped the same range of memory.
Ok, so we leave the touching in, but still defend against negative
status values?
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 15:59 ` Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 15:59 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Tomas Vondra <[email protected]>; [email protected]
Re: To Andres Freund
> Ok, so we leave the touching in, but still defend against negative
> status values?
v2 attached.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 16:26 ` Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Andres Freund @ 2025-06-23 16:26 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; [email protected]
Hi,
On 2025-06-23 17:59:24 +0200, Christoph Berg wrote:
> Re: To Andres Freund
> > Ok, so we leave the touching in, but still defend against negative
> > status values?
>
> v2 attached.
How confident are we that this isn't actually because we passed a bogus
address to the kernel or such? With this patch, are *any* pages recognized as
valid on the machines that triggered the error?
I wonder if we ought to report the failures as a separate "numa node"
(e.g. NULL as node id) instead ...
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
@ 2025-06-23 19:57 ` Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 19:57 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Tomas Vondra <[email protected]>; [email protected]
Re: Andres Freund
> How confident are we that this isn't actually because we passed a bogus
> address to the kernel or such? With this patch, are *any* pages recognized as
> valid on the machines that triggered the error?
See upthread - the first 35 pages were ok, then a lot of -14.
> I wonder if we ought to report the failures as a separate "numa node"
> (e.g. NULL as node id) instead ...
Did that now, using N+1 (== 1 here) for errors in this Debian i386
environment (chroot on an amd64 host):
select * from pg_shmem_allocations_numa \crosstabview
name │ 0 │ 1
────────────────────────────────────────────────┼──────────┼──────────
multixact_offset │ 69632 │ 65536
subtransaction │ 139264 │ 131072
notify │ 139264 │ 0
Shared Memory Stats │ 188416 │ 131072
serializable │ 188416 │ 86016
PROCLOCK hash │ 4096 │ 0
FinishedSerializableTransactions │ 4096 │ 0
XLOG Ctl │ 2117632 │ 2097152
Shared MultiXact State │ 4096 │ 0
Proc Header │ 4096 │ 0
Archiver Data │ 4096 │ 0
.... more 0s in the last column ...
AioHandleData │ 1429504 │ 0
Buffer Blocks │ 67117056 │ 67108864
Buffer IO Condition Variables │ 266240 │ 0
Proc Array │ 4096 │ 0
.... more 0s
(73 rows)
There is something fishy with pg_buffercache. If I restart PG, I'm
getting "Bad address" (errno 14), this time as return value of
move_pages().
postgres =# select * from pg_buffercache_numa;
DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:383
2025-06-23 19:41:41.315 UTC [1331894] ERROR: failed NUMA pages inquiry: Bad address
2025-06-23 19:41:41.315 UTC [1331894] STATEMENT: select * from pg_buffercache_numa;
ERROR: XX000: failed NUMA pages inquiry: Bad address
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394
Repeated calls are fine.
Maybe NUMA is just not supported on 32-bit archs, but I'd rather be
sure about that before play that card.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 20:10 ` Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-23 20:10 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; +Cc: Tomas Vondra <[email protected]>; [email protected]
On 6/23/25 21:57, Christoph Berg wrote:
> Re: Andres Freund
>> How confident are we that this isn't actually because we passed a bogus
>> address to the kernel or such? With this patch, are *any* pages recognized as
>> valid on the machines that triggered the error?
>
> See upthread - the first 35 pages were ok, then a lot of -14.
>
>> I wonder if we ought to report the failures as a separate "numa node"
>> (e.g. NULL as node id) instead ...
>
> Did that now, using N+1 (== 1 here) for errors in this Debian i386
> environment (chroot on an amd64 host):
>
> select * from pg_shmem_allocations_numa \crosstabview
>
> name │ 0 │ 1
> ────────────────────────────────────────────────┼──────────┼──────────
> multixact_offset │ 69632 │ 65536
> subtransaction │ 139264 │ 131072
> notify │ 139264 │ 0
> Shared Memory Stats │ 188416 │ 131072
> serializable │ 188416 │ 86016
> PROCLOCK hash │ 4096 │ 0
> FinishedSerializableTransactions │ 4096 │ 0
> XLOG Ctl │ 2117632 │ 2097152
> Shared MultiXact State │ 4096 │ 0
> Proc Header │ 4096 │ 0
> Archiver Data │ 4096 │ 0
> .... more 0s in the last column ...
> AioHandleData │ 1429504 │ 0
> Buffer Blocks │ 67117056 │ 67108864
> Buffer IO Condition Variables │ 266240 │ 0
> Proc Array │ 4096 │ 0
> .... more 0s
> (73 rows)
>
>
> There is something fishy with pg_buffercache. If I restart PG, I'm
> getting "Bad address" (errno 14), this time as return value of
> move_pages().
>
> postgres =# select * from pg_buffercache_numa;
> DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:383
> 2025-06-23 19:41:41.315 UTC [1331894] ERROR: failed NUMA pages inquiry: Bad address
> 2025-06-23 19:41:41.315 UTC [1331894] STATEMENT: select * from pg_buffercache_numa;
> ERROR: XX000: failed NUMA pages inquiry: Bad address
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394
>
> Repeated calls are fine.
>
Huh. So it's only the first call that does this?
Can you maybe print the addresses passed to pg_numa_query_pages? I
wonder if there's some bug in how we fill that array. Not sure why would
it happen only on 32-bit systems, though.
I'll create a 32-bit VM so that I can try reproducing this.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-23 20:31 ` Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 20:31 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Tomas Vondra
> Huh. So it's only the first call that does this?
The first call after a restart. Reconnecting is not enough.
> Can you maybe print the addresses passed to pg_numa_query_pages? I
The addresses look good:
Breakpoint 1, pg_numa_query_pages (pid=0, count=32768, pages=0xeb44d02c, status=0xeb42c02c) at ../src/port/pg_numa.c:49
49 return numa_move_pages(pid, count, pages, NULL, status, 0);
(gdb) p *pages
$1 = (void *) 0xebc33000
(gdb) p pages[1]
$2 = (void *) 0xebc34000
(gdb) p pages[2]
$3 = (void *) 0xebc35000
> wonder if there's some bug in how we fill that array. Not sure why would
> it happen only on 32-bit systems, though.
I found something, but that should be harmless:
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -365,7 +365,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
/* Used to determine the NUMA node for all OS pages at once */
os_page_ptrs = palloc0(sizeof(void *) * os_page_count);
- os_page_status = palloc(sizeof(uint64) * os_page_count);
+ os_page_status = palloc(sizeof(int) * os_page_count);
/* Fill pointers for all the memory pages. */
idx = 0;
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 20:37 ` Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-23 20:37 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/23/25 22:31, Christoph Berg wrote:
> Re: Tomas Vondra
>> Huh. So it's only the first call that does this?
>
> The first call after a restart. Reconnecting is not enough.
>
>> Can you maybe print the addresses passed to pg_numa_query_pages? I
>
> The addresses look good:
>
> Breakpoint 1, pg_numa_query_pages (pid=0, count=32768, pages=0xeb44d02c, status=0xeb42c02c) at ../src/port/pg_numa.c:49
> 49 return numa_move_pages(pid, count, pages, NULL, status, 0);
> (gdb) p *pages
> $1 = (void *) 0xebc33000
> (gdb) p pages[1]
> $2 = (void *) 0xebc34000
> (gdb) p pages[2]
> $3 = (void *) 0xebc35000
>
Didn't you say the first ~35 addresses succeed, right? What about the
addresses after that?
>
>> wonder if there's some bug in how we fill that array. Not sure why would
>> it happen only on 32-bit systems, though.
>
> I found something, but that should be harmless:
>
> --- a/contrib/pg_buffercache/pg_buffercache_pages.c
> +++ b/contrib/pg_buffercache/pg_buffercache_pages.c
> @@ -365,7 +365,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
>
> /* Used to determine the NUMA node for all OS pages at once */
> os_page_ptrs = palloc0(sizeof(void *) * os_page_count);
> - os_page_status = palloc(sizeof(uint64) * os_page_count);
> + os_page_status = palloc(sizeof(int) * os_page_count);
>
Yes, good catch. But as you say, that should be benign - we allocate
more memory than needed, I don't think it should break anything.
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-23 20:51 ` Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 20:51 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Tomas Vondra
> Didn't you say the first ~35 addresses succeed, right? What about the
> addresses after that?
That was pg_shmem_allocations_numa. The pg_numa_query_pages() in there
works (does not return -1), but then some of the status[] values are
-14.
When pg_buffercache_numa fails, pg_numa_query_pages() itself
returns -14.
The printed os_page_ptrs[] contents are the same for the failing and
non-failing calls, so the problem is probably elsewhere.
/* Fill pointers for all the memory pages. */
idx = 0;
for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
{
+ if (idx < 50)
+ elog(DEBUG1, "os_page_ptrs idx %d = %p", idx, ptr);
os_page_ptrs[idx++] = ptr;
20:47 myon@postgres =# select * from pg_buffercache_numa;
DEBUG: 00000: os_page_ptrs idx 0 = 0xebc44000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 1 = 0xebc45000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 2 = 0xebc46000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 3 = 0xebc47000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 4 = 0xebc48000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 5 = 0xebc49000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 6 = 0xebc4a000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 7 = 0xebc4b000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 8 = 0xebc4c000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 9 = 0xebc4d000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 10 = 0xebc4e000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 11 = 0xebc4f000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 12 = 0xebc50000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 13 = 0xebc51000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 14 = 0xebc52000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 15 = 0xebc53000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 16 = 0xebc54000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 17 = 0xebc55000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 18 = 0xebc56000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 19 = 0xebc57000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 20 = 0xebc58000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 21 = 0xebc59000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 22 = 0xebc5a000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 23 = 0xebc5b000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 24 = 0xebc5c000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 25 = 0xebc5d000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 26 = 0xebc5e000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 27 = 0xebc5f000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 28 = 0xebc60000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 29 = 0xebc61000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 30 = 0xebc62000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 31 = 0xebc63000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 32 = 0xebc64000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 33 = 0xebc65000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 34 = 0xebc66000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 35 = 0xebc67000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 36 = 0xebc68000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 37 = 0xebc69000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 38 = 0xebc6a000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 39 = 0xebc6b000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 40 = 0xebc6c000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 41 = 0xebc6d000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 42 = 0xebc6e000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 43 = 0xebc6f000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 44 = 0xebc70000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 45 = 0xebc71000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 46 = 0xebc72000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 47 = 0xebc73000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 48 = 0xebc74000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 49 = 0xebc75000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:385
2025-06-23 20:47:41.827 UTC [1368080] ERROR: failed NUMA pages inquiry: Bad address
2025-06-23 20:47:41.827 UTC [1368080] STATEMENT: select * from pg_buffercache_numa;
ERROR: XX000: failed NUMA pages inquiry: Bad address
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:396
Time: 92.757 ms
20:47 myon@postgres =# select * from pg_buffercache_numa;
DEBUG: 00000: os_page_ptrs idx 0 = 0xebc44000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 1 = 0xebc45000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 2 = 0xebc46000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 3 = 0xebc47000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 4 = 0xebc48000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 5 = 0xebc49000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 6 = 0xebc4a000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 7 = 0xebc4b000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 8 = 0xebc4c000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 9 = 0xebc4d000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 10 = 0xebc4e000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 11 = 0xebc4f000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 12 = 0xebc50000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 13 = 0xebc51000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 14 = 0xebc52000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 15 = 0xebc53000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 16 = 0xebc54000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 17 = 0xebc55000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 18 = 0xebc56000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 19 = 0xebc57000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 20 = 0xebc58000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 21 = 0xebc59000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 22 = 0xebc5a000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 23 = 0xebc5b000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 24 = 0xebc5c000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 25 = 0xebc5d000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 26 = 0xebc5e000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 27 = 0xebc5f000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 28 = 0xebc60000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 29 = 0xebc61000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 30 = 0xebc62000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 31 = 0xebc63000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 32 = 0xebc64000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 33 = 0xebc65000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 34 = 0xebc66000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 35 = 0xebc67000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 36 = 0xebc68000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 37 = 0xebc69000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 38 = 0xebc6a000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 39 = 0xebc6b000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 40 = 0xebc6c000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 41 = 0xebc6d000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 42 = 0xebc6e000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 43 = 0xebc6f000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 44 = 0xebc70000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 45 = 0xebc71000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 46 = 0xebc72000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 47 = 0xebc73000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 48 = 0xebc74000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: os_page_ptrs idx 49 = 0xebc75000
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:385
DEBUG: 00000: NUMA: page-faulting the buffercache for proper NUMA readouts
LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:444
Time: 24.547 ms
20:47 myon@postgres =#
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 21:14 ` Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-23 21:14 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/23/25 22:51, Christoph Berg wrote:
> Re: Tomas Vondra
>> Didn't you say the first ~35 addresses succeed, right? What about the
>> addresses after that?
>
> That was pg_shmem_allocations_numa. The pg_numa_query_pages() in there
> works (does not return -1), but then some of the status[] values are
> -14.
>
> When pg_buffercache_numa fails, pg_numa_query_pages() itself
> returns -14.
>
> The printed os_page_ptrs[] contents are the same for the failing and
> non-failing calls, so the problem is probably elsewhere.
>
> /* Fill pointers for all the memory pages. */
> idx = 0;
> for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
> {
> + if (idx < 50)
> + elog(DEBUG1, "os_page_ptrs idx %d = %p", idx, ptr);
> os_page_ptrs[idx++] = ptr;
>
>
> 20:47 myon@postgres =# select * from pg_buffercache_numa;
> DEBUG: 00000: os_page_ptrs idx 0 = 0xebc44000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 1 = 0xebc45000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 2 = 0xebc46000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 3 = 0xebc47000
...
> DEBUG: 00000: os_page_ptrs idx 47 = 0xebc73000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 48 = 0xebc74000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 49 = 0xebc75000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:385
> 2025-06-23 20:47:41.827 UTC [1368080] ERROR: failed NUMA pages inquiry: Bad address
> 2025-06-23 20:47:41.827 UTC [1368080] STATEMENT: select * from pg_buffercache_numa;
> ERROR: XX000: failed NUMA pages inquiry: Bad address
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:396
> Time: 92.757 ms
>
> 20:47 myon@postgres =# select * from pg_buffercache_numa;
> DEBUG: 00000: os_page_ptrs idx 0 = 0xebc44000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 1 = 0xebc45000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 2 = 0xebc46000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 3 = 0xebc47000
...> DEBUG: 00000: os_page_ptrs idx 46 = 0xebc72000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 47 = 0xebc73000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 48 = 0xebc74000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: os_page_ptrs idx 49 = 0xebc75000
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375
> DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:385
> DEBUG: 00000: NUMA: page-faulting the buffercache for proper NUMA readouts
> LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:444
> Time: 24.547 ms
> 20:47 myon@postgres =#
>
True. If it fails on first call, but succeeds on the other, then the
problem is likely somewhere else. But also on the second call we won't
do the memory touching. Can you try setting firstNumaTouch=false, so
that we do this on every call?
At the beginning you mentioned this is happening on i386, armel and
armhf - are all those in qemu? I've tried on my rpi5 (with 32-bit user
space), and there everything seems to work fine. But that's aarch64
kernel, just the user space if 32-bit.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-23 21:25 ` Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-06-23 21:25 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Tomas Vondra
> True. If it fails on first call, but succeeds on the other, then the
> problem is likely somewhere else. But also on the second call we won't
> do the memory touching. Can you try setting firstNumaTouch=false, so
> that we do this on every call?
firstNumaTouch=false, it still fails on the first call.
I assume you meant actually keeping firstNumaTouch=true - but it still
fails on the first call.
The memory touching is done for the first call in each backend, but
reconnecting doesn't reset it, I have to restart PG.
> At the beginning you mentioned this is happening on i386, armel and
> armhf - are all those in qemu? I've tried on my rpi5 (with 32-bit user
> space), and there everything seems to work fine. But that's aarch64
> kernel, just the user space if 32-bit.
I'm testing on i386 in a chroot on a amd64 kernel. (same for x32)
armel and armhf are also 32-bit chroots on a arm64 host.
https://buildd.debian.org/status/package.php?p=postgresql-18&suite=experimental
Maybe this is a kernel bug.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-23 21:47 ` Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-23 21:47 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/23/25 23:25, Christoph Berg wrote:
> Re: Tomas Vondra
>> True. If it fails on first call, but succeeds on the other, then the
>> problem is likely somewhere else. But also on the second call we won't
>> do the memory touching. Can you try setting firstNumaTouch=false, so
>> that we do this on every call?
>
> firstNumaTouch=false, it still fails on the first call.
>
> I assume you meant actually keeping firstNumaTouch=true - but it still
> fails on the first call.
>
No, I meant firstNumaTouch=false, so that the touching happens on every
call. I was wondering if that makes all calls fail.
> The memory touching is done for the first call in each backend, but
> reconnecting doesn't reset it, I have to restart PG.
>
I don't follow. Why wouldn't reconnecting reset it?
>> At the beginning you mentioned this is happening on i386, armel and
>> armhf - are all those in qemu? I've tried on my rpi5 (with 32-bit user
>> space), and there everything seems to work fine. But that's aarch64
>> kernel, just the user space if 32-bit.
>
> I'm testing on i386 in a chroot on a amd64 kernel. (same for x32)
> armel and armhf are also 32-bit chroots on a arm64 host.
>
> https://buildd.debian.org/status/package.php?p=postgresql-18&suite=experimental
>
> Maybe this is a kernel bug.
>
Or maybe the 32-bit chroot on 64-bit host matters and confuses some
calculation.
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-24 01:43 ` Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
0 siblings, 2 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-06-24 01:43 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/23/25 23:47, Tomas Vondra wrote:
> ...
>
> Or maybe the 32-bit chroot on 64-bit host matters and confuses some
> calculation.
>
I think it's likely something like this. I noticed that if I modify
pg_buffercache_numa_pages() to query the addresses one by one, it works.
And when I increase the number, it stops working somewhere between 16k
and 17k items.
It may be a coincidence, but I suspect it's related to the sizeof(void
*) being 8 in the kernel, but only 4 in the chroot. So the userspace
passes an array of 4-byte items, but kernel interprets that as 8-byte
items. That is, we call
long move_pages(int pid, unsigned long count, void *pages[.count], const
int nodes[.count], int status[.count], int flags);
Which (I assume) just passes the parameters to kernel. And it'll
interpret them per kernel pointer size.
If this is what's happening, I'm not sure what to do about it ...
FWIW while looking into this, I tried running this under valgrind (on a
regular 64-bit system, not in the chroot), and I get this report:
==65065== Invalid read of size 8
==65065== at 0x113B0EBE: pg_buffercache_numa_pages
(pg_buffercache_pages.c:380)
==65065== by 0x6B539D: ExecMakeTableFunctionResult (execSRF.c:234)
==65065== by 0x6CEB7E: FunctionNext (nodeFunctionscan.c:94)
==65065== by 0x6B6ACA: ExecScanFetch (execScan.h:126)
==65065== by 0x6B6B31: ExecScanExtended (execScan.h:170)
==65065== by 0x6B6C9D: ExecScan (execScan.c:59)
==65065== by 0x6CEF0F: ExecFunctionScan (nodeFunctionscan.c:269)
==65065== by 0x6B29FA: ExecProcNodeFirst (execProcnode.c:469)
==65065== by 0x6A6F56: ExecProcNode (executor.h:313)
==65065== by 0x6A9533: ExecutePlan (execMain.c:1679)
==65065== by 0x6A7422: standard_ExecutorRun (execMain.c:367)
==65065== by 0x6A7330: ExecutorRun (execMain.c:304)
==65065== by 0x934EF0: PortalRunSelect (pquery.c:921)
==65065== by 0x934BD8: PortalRun (pquery.c:765)
==65065== by 0x92E4CD: exec_simple_query (postgres.c:1273)
==65065== by 0x93301E: PostgresMain (postgres.c:4766)
==65065== by 0x92A88B: BackendMain (backend_startup.c:124)
==65065== by 0x85A7C7: postmaster_child_launch (launch_backend.c:290)
==65065== by 0x860111: BackendStartup (postmaster.c:3580)
==65065== by 0x85DE6F: ServerLoop (postmaster.c:1702)
==65065== Address 0x7b6c000 is in a rw- anonymous segment
This fails here (on the pg_numa_touch_mem_if_required call):
for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
{
os_page_ptrs[idx++] = ptr;
/* Only need to touch memory once per backend process */
if (firstNumaTouch)
pg_numa_touch_mem_if_required(touch, ptr);
}
The 0x7b6c000 is the very first pointer, and it's the only pointer that
triggers this warning. At first I thought there's something wrong with
how we align the pointer using TYPEALIGN_DOWN(), but then I noticed it's
actually the pointer of BufferGetBlock(1).
So I'm a bit puzzled by this, and I'm not sure it's related to the other
issue at all (it probably is not).
It's a bit too late here, I'll continue investigating this tomorrow.
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-24 08:24 ` Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-24 08:24 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Tue, Jun 24, 2025 at 03:43:19AM +0200, Tomas Vondra wrote:
> On 6/23/25 23:47, Tomas Vondra wrote:
> > ...
> >
> > Or maybe the 32-bit chroot on 64-bit host matters and confuses some
> > calculation.
> >
>
> I think it's likely something like this.
I think the same.
> I noticed that if I modify
> pg_buffercache_numa_pages() to query the addresses one by one, it works.
> And when I increase the number, it stops working somewhere between 16k
> and 17k items.
Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on more
than 16 pages.
It's also confirmed by test_chunk_size.c attached:
$ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
$ ./test_chunk_size
1 pages: SUCCESS (0 errors)
2 pages: SUCCESS (0 errors)
3 pages: SUCCESS (0 errors)
4 pages: SUCCESS (0 errors)
5 pages: SUCCESS (0 errors)
6 pages: SUCCESS (0 errors)
7 pages: SUCCESS (0 errors)
8 pages: SUCCESS (0 errors)
9 pages: SUCCESS (0 errors)
10 pages: SUCCESS (0 errors)
11 pages: SUCCESS (0 errors)
12 pages: SUCCESS (0 errors)
13 pages: SUCCESS (0 errors)
14 pages: SUCCESS (0 errors)
15 pages: SUCCESS (0 errors)
16 pages: SUCCESS (0 errors)
17 pages: 1 errors
Threshold: 17 pages
No error if -m32 is not used.
> It may be a coincidence, but I suspect it's related to the sizeof(void
> *) being 8 in the kernel, but only 4 in the chroot. So the userspace
> passes an array of 4-byte items, but kernel interprets that as 8-byte
> items. That is, we call
>
> long move_pages(int pid, unsigned long count, void *pages[.count], const
> int nodes[.count], int status[.count], int flags);
>
> Which (I assume) just passes the parameters to kernel. And it'll
> interpret them per kernel pointer size.
>
I also suspect something in this area...
> If this is what's happening, I'm not sure what to do about it ...
We could work by chunks (16?) on 32 bits but would probably produce performance
degradation (we mention it in the doc though). Also would always 16 be a correct
chunk size?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
[text/x-csrc] test_chunk_size.c (1.6K, 2-test_chunk_size.c)
download | inline:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <string.h>
#include <errno.h>
int test_chunk_size(int chunk_size) {
size_t page_size = sysconf(_SC_PAGESIZE);
void *mem = malloc(page_size * chunk_size);
if (!mem) return -1;
memset(mem, 0xFF, page_size * chunk_size);
void **ptrs = malloc(sizeof(void*) * chunk_size);
int *status = malloc(sizeof(int) * chunk_size);
for (int j = 0; j < chunk_size; j++) {
ptrs[j] = (char*)mem + (j * page_size);
status[j] = -999;
}
long result = syscall(SYS_move_pages, 0, chunk_size, ptrs, NULL, status, 0);
int errors = 0;
if (result == 0) {
for (int j = 0; j < chunk_size; j++) {
if (status[j] < 0) errors++;
}
}
free(mem);
free(ptrs);
free(status);
return (result == 0) ? errors : -1;
}
int main() {
int threshold = -1;
// Test sizes from 1 to 40 pages
for (int size = 1; size <= 40; size++) {
int errors = test_chunk_size(size);
if (errors == -1) {
if (threshold == -1) threshold = size;
break;
} else if (errors == 0) {
printf("%2d pages: SUCCESS (0 errors)\n", size);
} else {
printf("%2d pages: %d errors\n",
size, errors);
threshold = size;
break;
}
}
if (threshold > 0)
printf("Threshold: %d pages\n", threshold);
else
printf("No threshold found in range 1-40 pages\n");
return 0;
}
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-24 09:20 ` Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-24 09:20 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/24/25 10:24, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jun 24, 2025 at 03:43:19AM +0200, Tomas Vondra wrote:
>> On 6/23/25 23:47, Tomas Vondra wrote:
>>> ...
>>>
>>> Or maybe the 32-bit chroot on 64-bit host matters and confuses some
>>> calculation.
>>>
>>
>> I think it's likely something like this.
>
> I think the same.
>
>> I noticed that if I modify
>> pg_buffercache_numa_pages() to query the addresses one by one, it works.
>> And when I increase the number, it stops working somewhere between 16k
>> and 17k items.
>
> Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
> pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on more
> than 16 pages.
>
> It's also confirmed by test_chunk_size.c attached:
>
> $ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
> $ ./test_chunk_size
> 1 pages: SUCCESS (0 errors)
> 2 pages: SUCCESS (0 errors)
> 3 pages: SUCCESS (0 errors)
> 4 pages: SUCCESS (0 errors)
> 5 pages: SUCCESS (0 errors)
> 6 pages: SUCCESS (0 errors)
> 7 pages: SUCCESS (0 errors)
> 8 pages: SUCCESS (0 errors)
> 9 pages: SUCCESS (0 errors)
> 10 pages: SUCCESS (0 errors)
> 11 pages: SUCCESS (0 errors)
> 12 pages: SUCCESS (0 errors)
> 13 pages: SUCCESS (0 errors)
> 14 pages: SUCCESS (0 errors)
> 15 pages: SUCCESS (0 errors)
> 16 pages: SUCCESS (0 errors)
> 17 pages: 1 errors
> Threshold: 17 pages
>
> No error if -m32 is not used.
>
>> It may be a coincidence, but I suspect it's related to the sizeof(void
>> *) being 8 in the kernel, but only 4 in the chroot. So the userspace
>> passes an array of 4-byte items, but kernel interprets that as 8-byte
>> items. That is, we call
>>
>> long move_pages(int pid, unsigned long count, void *pages[.count], const
>> int nodes[.count], int status[.count], int flags);
>>
>> Which (I assume) just passes the parameters to kernel. And it'll
>> interpret them per kernel pointer size.
>>
>
> I also suspect something in this area...
>
>> If this is what's happening, I'm not sure what to do about it ...
>
> We could work by chunks (16?) on 32 bits but would probably produce performance
> degradation (we mention it in the doc though). Also would always 16 be a correct
> chunk size?
I don't see how this would solve anything?
AFAICS the problem is the two places are confused about how large the
array elements are, and get to interpret that differently. Using a
smaller array won't solve that. The pg function would still allocate
array of 16 x 32-bit pointers, and the kernel would interpret this as 16
x 64-bit pointers. And that means the kernel will (a) write into memory
beyond the allocated buffer - a clear buffer overflow, and (b) see bogus
pointers, because it'll concatenate two 32-bit pointers.
I don't see how using smaller array makes this correct. That it works is
more a matter of luck, and also a consequence of still allocating the
whole array, so there's no overflow (at least I kept that, not sure how
you did the chunks).
If I fix the code to make the entries 64-bit (by treating the pointers
as int64), it suddenly starts working - no bad addresses, etc. Well,
almost, because I get this
bufferid | os_page_num | numa_node
----------+-------------+-----------
1 | 0 | 0
1 | 1 | -14
2 | 2 | 0
2 | 3 | -14
3 | 4 | 0
3 | 5 | -14
4 | 6 | 0
4 | 7 | -14
...
The -14 status is interesting, because that's the same value Christoph
reported as the other issue (in pg_shmem_allocations_numa).
I did an experiment and changed os_page_status to be declared as int64,
not just int. And interestingly, that produced this:
bufferid | os_page_num | numa_node
----------+-------------+-----------
1 | 0 | 0
1 | 1 | 0
2 | 2 | 0
2 | 3 | 0
3 | 4 | 0
3 | 5 | 0
4 | 6 | 0
4 | 7 | 0
...
But I don't see how this makes any sense, because "int" should be 4B in
both cases (in 64-bit kernel and 32-bit chroot).
FWIW I realized this applies to "official" systems with 32-bit user
space on 64-bit kernels, like e.g. rpi5 with RPi OS 32-bit. (Fun fact,
rpi5 has 8 NUMA nodes, with all CPUs attached to all NUMA nodes.)
I'm starting to think we need to disable NUMA for setups like this,
mixing 64-bit kernels with 32-bit chroot. Is there a good way to detect
those, so that we can error-out?
FWIW this doesn't explain the strange valgrind issue, though.
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-24 11:10 ` Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-24 11:10 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote:
> On 6/24/25 10:24, Bertrand Drouvot wrote:
> > Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
> > pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on more
> > than 16 pages.
> >
> > It's also confirmed by test_chunk_size.c attached:
> >
> > $ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
> > $ ./test_chunk_size
> > 1 pages: SUCCESS (0 errors)
> > 2 pages: SUCCESS (0 errors)
> > 3 pages: SUCCESS (0 errors)
> > 4 pages: SUCCESS (0 errors)
> > 5 pages: SUCCESS (0 errors)
> > 6 pages: SUCCESS (0 errors)
> > 7 pages: SUCCESS (0 errors)
> > 8 pages: SUCCESS (0 errors)
> > 9 pages: SUCCESS (0 errors)
> > 10 pages: SUCCESS (0 errors)
> > 11 pages: SUCCESS (0 errors)
> > 12 pages: SUCCESS (0 errors)
> > 13 pages: SUCCESS (0 errors)
> > 14 pages: SUCCESS (0 errors)
> > 15 pages: SUCCESS (0 errors)
> > 16 pages: SUCCESS (0 errors)
> > 17 pages: 1 errors
> > Threshold: 17 pages
> >
> > No error if -m32 is not used.
> >
> > We could work by chunks (16?) on 32 bits but would probably produce performance
> > degradation (we mention it in the doc though). Also would always 16 be a correct
> > chunk size?
>
> I don't see how this would solve anything?
>
> AFAICS the problem is the two places are confused about how large the
> array elements are, and get to interpret that differently.
> I don't see how using smaller array makes this correct. That it works is
> more a matter of luck,
Not sure it's luck, maybe the wrong pointers arithmetic has no effect if batch
size is <= 16.
So we have kernel_move_pages() -> kernel_move_pages() (because nodes is NULL here
for us as we call "numa_move_pages(pid, count, pages, NULL, status, 0);").
So, if we look at do_pages_stat() ([1]), we can see that it uses an hardcoded
"#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic:
"
pages += chunk_nr;
status += chunk_nr;
"
is done but has no effect since nr_pages will exit the loop if we use a batch
size <= 16.
So if this pointer arithmetic is not correct, (it seems that it should advance
by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the batch
size is <= 16.
Does test_chunk_size also fails at 17 for you?
[1]: https://github.com/torvalds/linux/blob/master/mm/migrate.c
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-24 12:33 ` Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-24 12:33 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/24/25 13:10, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote:
>> On 6/24/25 10:24, Bertrand Drouvot wrote:
>>> Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
>>> pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on more
>>> than 16 pages.
>>>
>>> It's also confirmed by test_chunk_size.c attached:
>>>
>>> $ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
>>> $ ./test_chunk_size
>>> 1 pages: SUCCESS (0 errors)
>>> 2 pages: SUCCESS (0 errors)
>>> 3 pages: SUCCESS (0 errors)
>>> 4 pages: SUCCESS (0 errors)
>>> 5 pages: SUCCESS (0 errors)
>>> 6 pages: SUCCESS (0 errors)
>>> 7 pages: SUCCESS (0 errors)
>>> 8 pages: SUCCESS (0 errors)
>>> 9 pages: SUCCESS (0 errors)
>>> 10 pages: SUCCESS (0 errors)
>>> 11 pages: SUCCESS (0 errors)
>>> 12 pages: SUCCESS (0 errors)
>>> 13 pages: SUCCESS (0 errors)
>>> 14 pages: SUCCESS (0 errors)
>>> 15 pages: SUCCESS (0 errors)
>>> 16 pages: SUCCESS (0 errors)
>>> 17 pages: 1 errors
>>> Threshold: 17 pages
>>>
>>> No error if -m32 is not used.
>>>
>>> We could work by chunks (16?) on 32 bits but would probably produce performance
>>> degradation (we mention it in the doc though). Also would always 16 be a correct
>>> chunk size?
>>
>> I don't see how this would solve anything?
>>
>> AFAICS the problem is the two places are confused about how large the
>> array elements are, and get to interpret that differently.
>
>> I don't see how using smaller array makes this correct. That it works is
>> more a matter of luck,
>
> Not sure it's luck, maybe the wrong pointers arithmetic has no effect if batch
> size is <= 16.
>
> So we have kernel_move_pages() -> kernel_move_pages() (because nodes is NULL here
> for us as we call "numa_move_pages(pid, count, pages, NULL, status, 0);").
>
> So, if we look at do_pages_stat() ([1]), we can see that it uses an hardcoded
> "#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic:
>
> "
> pages += chunk_nr;
> status += chunk_nr;
> "
>
> is done but has no effect since nr_pages will exit the loop if we use a batch
> size <= 16.
>
> So if this pointer arithmetic is not correct, (it seems that it should advance
> by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the batch
> size is <= 16.
>
> Does test_chunk_size also fails at 17 for you?
Yes, it fails for me at 17 too. So you're saying the access within each
chunk of 16 elements is OK, but that maybe advancing to the next chunk
is not quite right? In which case limiting the access to 16 entries
might be a workaround.
In any case, this sounds like a kernel bug, right? I don't have much
experience with the kernel code, so don't want to rely too much on my
interpretation of it.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-24 13:25 ` Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 18:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 2 replies; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-24 13:25 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Tue, Jun 24, 2025 at 02:33:59PM +0200, Tomas Vondra wrote:
>
>
> On 6/24/25 13:10, Bertrand Drouvot wrote:
> > So, if we look at do_pages_stat() ([1]), we can see that it uses an hardcoded
> > "#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic:
> >
> > "
> > pages += chunk_nr;
> > status += chunk_nr;
> > "
> >
> > is done but has no effect since nr_pages will exit the loop if we use a batch
> > size <= 16.
> >
> > So if this pointer arithmetic is not correct, (it seems that it should advance
> > by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the batch
> > size is <= 16.
> >
> > Does test_chunk_size also fails at 17 for you?
>
> Yes, it fails for me at 17 too. So you're saying the access within each
> chunk of 16 elements is OK, but that maybe advancing to the next chunk
> is not quite right?
Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used
in do_pages_move()).
Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
by the wrong pointer arithmetic.
> In which case limiting the access to 16 entries
> might be a workaround.
Yes, something like:
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index c9ae3b45b76..070ad2f13e7 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -689,8 +689,17 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
CHECK_FOR_INTERRUPTS();
}
- if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry status: %m");
+ #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+
+ for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, shm_ent_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &page_ptrs[chunk_start],
+ &pages_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+
+ #undef NUMA_QUERY_CHUNK_SIZE
> In any case, this sounds like a kernel bug, right?
yes it sounds like a kernel bug.
> I don't have much
> experience with the kernel code, so don't want to rely too much on my
> interpretation of it.
I don't have that much experience too but I think the issue is in do_pages_stat()
and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) instead.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-24 14:41 ` Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
1 sibling, 2 replies; 83+ messages in thread
From: Christoph Berg @ 2025-06-24 14:41 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Bertrand Drouvot
> Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used
> in do_pages_move()).
I was also reading the kernel source around that place but you spotted
the problem before me. This patch resolves the issue here:
diff --git a/mm/migrate.c b/mm/migrate.c
index 8cf0f9c9599..542c81ec3ed 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status)))
break;
- pages += chunk_nr;
+ if (in_compat_syscall()) {
+ compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages;
+
+ pages32 += chunk_nr;
+ pages = (const void __user * __user *) pages32;
+ } else
+ pages += chunk_nr;
status += chunk_nr;
nr_pages -= chunk_nr;
}
> Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
> by the wrong pointer arithmetic.
Good idea. Buggy kernels will be around for some time.
> + #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> +
> + for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
Perhaps optimize it to this:
#if sizeof(void *) == 4
#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
#else
#define NUMA_QUERY_CHUNK_SIZE 1024
#endif
... or some other bigger number.
The loop could also include CHECK_FOR_INTERRUPTS();
> > I don't have much
> > experience with the kernel code, so don't want to rely too much on my
> > interpretation of it.
>
> I don't have that much experience too but I think the issue is in do_pages_stat()
> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) instead.
Me neither, but I'll try submit this fix.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-24 15:04 ` Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-24 15:04 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/24/25 16:41, Christoph Berg wrote:
> Re: Bertrand Drouvot
>> Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used
>> in do_pages_move()).
>
> I was also reading the kernel source around that place but you spotted
> the problem before me. This patch resolves the issue here:
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8cf0f9c9599..542c81ec3ed 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
> if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status)))
> break;
>
> - pages += chunk_nr;
> + if (in_compat_syscall()) {
> + compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages;
> +
> + pages32 += chunk_nr;
> + pages = (const void __user * __user *) pages32;
> + } else
> + pages += chunk_nr;
> status += chunk_nr;
> nr_pages -= chunk_nr;
> }
>
>
>> Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
>> by the wrong pointer arithmetic.
>
> Good idea. Buggy kernels will be around for some time.
>
If it's a reliable fix, then I guess we can do it like this. But won't
that be a performance penalty on everyone? Or does the system split the
array into 16-element chunks anyway, so this makes no difference?
Anyway, maybe we should start by reporting this to the kernel people. Do
you want me to do that, or shall one of you take care of that? I suppose
that'd be better, as you already wrote a fix / know the code better.
>> + #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
>> +
>> + for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
>
> Perhaps optimize it to this:
>
> #if sizeof(void *) == 4
> #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> #else
> #define NUMA_QUERY_CHUNK_SIZE 1024
> #endif
>
> ... or some other bigger number.
>
Hmm, maybe. I guess that'd hurt only fully 32-bit systems, but that also
seems like a non-issue.
> The loop could also include CHECK_FOR_INTERRUPTS();
>
>>> I don't have much
>>> experience with the kernel code, so don't want to rely too much on my
>>> interpretation of it.
>>
>> I don't have that much experience too but I think the issue is in do_pages_stat()
>> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) instead.
>
> Me neither, but I'll try submit this fix.
>
+1
Thanks to both of you for the report and the investigation.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-24 15:30 ` Christoph Berg <[email protected]>
2025-06-24 20:32 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-25 06:11 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 07:15 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Jakub Wartak <[email protected]>
0 siblings, 3 replies; 83+ messages in thread
From: Christoph Berg @ 2025-06-24 15:30 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Tomas Vondra
> If it's a reliable fix, then I guess we can do it like this. But won't
> that be a performance penalty on everyone? Or does the system split the
> array into 16-element chunks anyway, so this makes no difference?
There's still the overhead of the syscall itself. But no idea how
costly it is to have this 16-step loop in user or kernel space.
We could claim that on 32-bit systems, shared_buffers would be smaller
anyway, so there the overhead isn't that big. And the step size should
be larger (if at all) on 64-bit.
> Anyway, maybe we should start by reporting this to the kernel people. Do
> you want me to do that, or shall one of you take care of that? I suppose
> that'd be better, as you already wrote a fix / know the code better.
Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-24 20:32 ` Tomas Vondra <[email protected]>
2025-06-25 06:45 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-26 06:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2 siblings, 2 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-06-24 20:32 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/24/25 17:30, Christoph Berg wrote:
> Re: Tomas Vondra
>> If it's a reliable fix, then I guess we can do it like this. But won't
>> that be a performance penalty on everyone? Or does the system split the
>> array into 16-element chunks anyway, so this makes no difference?
>
> There's still the overhead of the syscall itself. But no idea how
> costly it is to have this 16-step loop in user or kernel space.
>
> We could claim that on 32-bit systems, shared_buffers would be smaller
> anyway, so there the overhead isn't that big. And the step size should
> be larger (if at all) on 64-bit.
>
>> Anyway, maybe we should start by reporting this to the kernel people. Do
>> you want me to do that, or shall one of you take care of that? I suppose
>> that'd be better, as you already wrote a fix / know the code better.
>
> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
>
Thanks! Now we wait ...
Attached is a minor tweak of the valgrind suppresion rules, to add the
two places touching the memory. I was hoping I could add a single rule
for pg_numa_touch_mem_if_required, but that does not work - it's a
macro, not a function. So I had to add one rule for both functions,
querying the NUMA. That's a bit disappointing, because it means it'll
hide all other failues (of Memcheck:Addr8 type) in those functions.
Perhaps it'd be be better to turn pg_numa_touch_mem_if_required into a
proper (inlined) function, at least with USE_VALGRIND defined. Something
like the v2 patch - needs more testing to ensure the inlined function
doesn't break the touching or something silly like that.
regards
--
Tomas Vondra
Attachments:
[text/x-patch] fix-valgrind-for-numa.patch (831B, 2-fix-valgrind-for-numa.patch)
download | inline diff:
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..36bf3253f76 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,22 @@
Memcheck:Cond
fun:PyObject_Realloc
}
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for both places querying the NUMA status.
+{
+ pg_buffercache_numa_pages
+ Memcheck:Addr8
+ fun:pg_buffercache_numa_pages
+ fun:ExecMakeTableFunctionResult
+}
+
+{
+ pg_get_shmem_allocations_numa
+ Memcheck:Addr8
+ fun:pg_get_shmem_allocations_numa
+ fun:ExecMakeTableFunctionResult
+}
[text/x-patch] fix-valgrind-for-numa-v2.patch (1.4K, 3-fix-valgrind-for-numa-v2.patch)
download | inline diff:
diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h
index 40f1d324dcf..3b9a5b42898 100644
--- a/src/include/port/pg_numa.h
+++ b/src/include/port/pg_numa.h
@@ -24,9 +24,22 @@ extern PGDLLIMPORT int pg_numa_get_max_node(void);
* This is required on Linux, before pg_numa_query_pages() as we
* need to page-fault before move_pages(2) syscall returns valid results.
*/
+#ifdef USE_VALGRIND
+
+static inline void
+pg_numa_touch_mem_if_required(uint64 tmp, char *ptr)
+{
+ volatile uint64 ro_volatile_var pg_attribute_unused();
+ ro_volatile_var = *(volatile uint64 *) ptr;
+}
+
+#else
+
#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
ro_volatile_var = *(volatile uint64 *) ptr
+#endif
+
#else
#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..6b9a8998f82 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,14 @@
Memcheck:Cond
fun:PyObject_Realloc
}
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for all places querying the NUMA status.
+{
+ pg_numa_touch_mem_if_required
+ Memcheck:Addr8
+ fun:pg_numa_touch_mem_if_required
+}
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 20:32 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-25 06:45 ` Bertrand Drouvot <[email protected]>
1 sibling, 0 replies; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-25 06:45 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Tue, Jun 24, 2025 at 10:32:25PM +0200, Tomas Vondra wrote:
>
> Attached is a minor tweak of the valgrind suppresion rules,
Thanks!
> to add the
> two places touching the memory. I was hoping I could add a single rule
> for pg_numa_touch_mem_if_required, but that does not work - it's a
> macro, not a function. So I had to add one rule for both functions,
> querying the NUMA. That's a bit disappointing, because it means it'll
> hide all other failues (of Memcheck:Addr8 type) in those functions.
>
Shouldn't we add 2 rules for Memcheck:Addr4 too?
> Perhaps it'd be be better to turn pg_numa_touch_mem_if_required into a
> proper (inlined) function, at least with USE_VALGRIND defined.
Yeah I think that's probably better to reduce the scope to what we really want to.
> Something
> like the v2 patch -
yeah, maybe:
- add a rule for Memcheck:Addr4?
- have the same parameters name for the macro and the function?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 20:32 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-26 06:00 ` Bertrand Drouvot <[email protected]>
2025-06-26 08:53 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-26 06:00 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Tue, Jun 24, 2025 at 10:32:25PM +0200, Tomas Vondra wrote:
> On 6/24/25 17:30, Christoph Berg wrote:
> > Re: Tomas Vondra
> >> If it's a reliable fix, then I guess we can do it like this. But won't
> >> that be a performance penalty on everyone? Or does the system split the
> >> array into 16-element chunks anyway, so this makes no difference?
> >
> > There's still the overhead of the syscall itself. But no idea how
> > costly it is to have this 16-step loop in user or kernel space.
> >
> > We could claim that on 32-bit systems, shared_buffers would be smaller
> > anyway, so there the overhead isn't that big. And the step size should
> > be larger (if at all) on 64-bit.
> >
> >> Anyway, maybe we should start by reporting this to the kernel people. Do
> >> you want me to do that, or shall one of you take care of that? I suppose
> >> that'd be better, as you already wrote a fix / know the code better.
> >
> > Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
> >
>
> Thanks! Now we wait ...
It looks like that the bug is "confirmed" and that it will be fixed:
https://marc.info/?l=linux-kernel&m=175088392116841&w=2
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 20:32 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-26 06:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-26 08:53 ` Tomas Vondra <[email protected]>
2025-07-21 20:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-26 08:53 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/26/25 08:00, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jun 24, 2025 at 10:32:25PM +0200, Tomas Vondra wrote:
>> On 6/24/25 17:30, Christoph Berg wrote:
>>> Re: Tomas Vondra
>>>> If it's a reliable fix, then I guess we can do it like this. But won't
>>>> that be a performance penalty on everyone? Or does the system split the
>>>> array into 16-element chunks anyway, so this makes no difference?
>>>
>>> There's still the overhead of the syscall itself. But no idea how
>>> costly it is to have this 16-step loop in user or kernel space.
>>>
>>> We could claim that on 32-bit systems, shared_buffers would be smaller
>>> anyway, so there the overhead isn't that big. And the step size should
>>> be larger (if at all) on 64-bit.
>>>
>>>> Anyway, maybe we should start by reporting this to the kernel people. Do
>>>> you want me to do that, or shall one of you take care of that? I suppose
>>>> that'd be better, as you already wrote a fix / know the code better.
>>>
>>> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
>>>
>>
>> Thanks! Now we wait ...
>
> It looks like that the bug is "confirmed" and that it will be fixed:
> https://marc.info/?l=linux-kernel&m=175088392116841&w=2
>
Yay! I like how the first response is "you sent the patch wrong" ;-)
cheers
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 20:32 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-26 06:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-26 08:53 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-07-21 20:52 ` Christoph Berg <[email protected]>
2025-07-22 07:01 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-07-21 20:52 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Tomas Vondra
> >>> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
> >>>
> >>
> >> Thanks! Now we wait ...
> >
> > It looks like that the bug is "confirmed" and that it will be fixed:
> > https://marc.info/?l=linux-kernel&m=175088392116841&w=2
If I'm reading the Linux git log correctly, the fix was merged into
Linux 6.16-rc7. Yay :)
> Yay! I like how the first response is "you sent the patch wrong" ;-)
I would have thought that coming from two major projects that use
email extensively (Debian, PostgreSQL), I would navigate that process
better. But it worked in the end...
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 20:32 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-26 06:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-26 08:53 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-21 20:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-07-22 07:01 ` Bertrand Drouvot <[email protected]>
0 siblings, 0 replies; 83+ messages in thread
From: Bertrand Drouvot @ 2025-07-22 07:01 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Mon, Jul 21, 2025 at 10:52:12PM +0200, Christoph Berg wrote:
> Re: Tomas Vondra
> > >>> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
> > >>>
> > >>
> > >> Thanks! Now we wait ...
> > >
> > > It looks like that the bug is "confirmed" and that it will be fixed:
> > > https://marc.info/?l=linux-kernel&m=175088392116841&w=2
>
> If I'm reading the Linux git log correctly, the fix was merged into
> Linux 6.16-rc7. Yay :)
Yeah! ;-) https://github.com/torvalds/linux/commit/10d04c26ab2b7
> > Yay! I like how the first response is "you sent the patch wrong" ;-)
>
> I would have thought that coming from two major projects that use
> email extensively (Debian, PostgreSQL), I would navigate that process
> better. But it worked in the end...
Indeed, thanks!
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-25 06:11 ` Bertrand Drouvot <[email protected]>
2 siblings, 0 replies; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-25 06:11 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Tue, Jun 24, 2025 at 05:30:02PM +0200, Christoph Berg wrote:
> Re: Tomas Vondra
> > If it's a reliable fix, then I guess we can do it like this. But won't
> > that be a performance penalty on everyone? Or does the system split the
> > array into 16-element chunks anyway, so this makes no difference?
>
> There's still the overhead of the syscall itself. But no idea how
> costly it is to have this 16-step loop in user or kernel space.
>
> We could claim that on 32-bit systems, shared_buffers would be smaller
> anyway, so there the overhead isn't that big. And the step size should
> be larger (if at all) on 64-bit.
Right, and we already mention in the doc that using those views is "very slow"
or "can take a noticeable amount of time".
> > Anyway, maybe we should start by reporting this to the kernel people. Do
> > you want me to do that, or shall one of you take care of that? I suppose
> > that'd be better, as you already wrote a fix / know the code better.
>
> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
Thanks! I had in mind to look at how to report such a bug and provide a patch
but you beat me to it.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-25 07:15 ` Jakub Wartak <[email protected]>
2025-06-25 09:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2 siblings, 1 reply; 83+ messages in thread
From: Jakub Wartak @ 2025-06-25 07:15 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On Tue, Jun 24, 2025 at 5:30 PM Christoph Berg <[email protected]> wrote:
>
> Re: Tomas Vondra
> > If it's a reliable fix, then I guess we can do it like this. But won't
> > that be a performance penalty on everyone? Or does the system split the
> > array into 16-element chunks anyway, so this makes no difference?
>
> There's still the overhead of the syscall itself. But no idea how
> costly it is to have this 16-step loop in user or kernel space.
>
> We could claim that on 32-bit systems, shared_buffers would be smaller
> anyway, so there the overhead isn't that big. And the step size should
> be larger (if at all) on 64-bit.
>
> > Anyway, maybe we should start by reporting this to the kernel people. Do
> > you want me to do that, or shall one of you take care of that? I suppose
> > that'd be better, as you already wrote a fix / know the code better.
>
> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
>
Hi all, I'm quite late to the party (just noticed the thread), but
here's some addition context: it technically didn't make any sense to
me to have NUMA on 32-bit due too small amount of addressable memory
(after all, NUMA is about big iron, probably not even VMs), so in the
first versions of the patchset I've excluded 32-bit (and back then for
some reason I couldn't even find libnuma i386, but Andres pointed to
me that it exists, so we re-added it probably just to stay
consistent). The thread has kind of snowballed since then, but I still
believe that NUMA on 32-bit does not make a lot of sense.
Even assuming future shm interleaving one day in future version,
allocation of small s_b sizes will usually fit a single NUMA node.
-J.
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 07:15 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Jakub Wartak <[email protected]>
@ 2025-06-25 09:31 ` Tomas Vondra <[email protected]>
2025-06-25 12:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Álvaro Herrera <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-25 09:31 UTC (permalink / raw)
To: Jakub Wartak <[email protected]>; Christoph Berg <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/25/25 09:15, Jakub Wartak wrote:
> On Tue, Jun 24, 2025 at 5:30 PM Christoph Berg <[email protected]> wrote:
>>
>> Re: Tomas Vondra
>>> If it's a reliable fix, then I guess we can do it like this. But won't
>>> that be a performance penalty on everyone? Or does the system split the
>>> array into 16-element chunks anyway, so this makes no difference?
>>
>> There's still the overhead of the syscall itself. But no idea how
>> costly it is to have this 16-step loop in user or kernel space.
>>
>> We could claim that on 32-bit systems, shared_buffers would be smaller
>> anyway, so there the overhead isn't that big. And the step size should
>> be larger (if at all) on 64-bit.
>>
>>> Anyway, maybe we should start by reporting this to the kernel people. Do
>>> you want me to do that, or shall one of you take care of that? I suppose
>>> that'd be better, as you already wrote a fix / know the code better.
>>
>> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
>>
>
> Hi all, I'm quite late to the party (just noticed the thread), but
> here's some addition context: it technically didn't make any sense to
> me to have NUMA on 32-bit due too small amount of addressable memory
> (after all, NUMA is about big iron, probably not even VMs), so in the
> first versions of the patchset I've excluded 32-bit (and back then for
> some reason I couldn't even find libnuma i386, but Andres pointed to
> me that it exists, so we re-added it probably just to stay
> consistent). The thread has kind of snowballed since then, but I still
> believe that NUMA on 32-bit does not make a lot of sense.
>
> Even assuming future shm interleaving one day in future version,
> allocation of small s_b sizes will usually fit a single NUMA node.
>
Not sure. I thought NUMA doesn't matter very much on 32-bit systems too,
exactly because those systems tend to use small amounts of memory. But
then while investigating this issue I realized even rpi5 has NUMA, in
fact it has a whopping 8 nodes:
debian@raspberry-32:~ $ numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3
node 0 size: 981 MB
node 0 free: 882 MB
node 1 cpus: 0 1 2 3
node 1 size: 1007 MB
node 1 free: 936 MB
node 2 cpus: 0 1 2 3
node 2 size: 1007 MB
node 2 free: 936 MB
node 3 cpus: 0 1 2 3
node 3 size: 943 MB
node 3 free: 873 MB
node 4 cpus: 0 1 2 3
node 4 size: 1007 MB
node 4 free: 936 MB
node 5 cpus: 0 1 2 3
node 5 size: 1007 MB
node 5 free: 935 MB
node 6 cpus: 0 1 2 3
node 6 size: 1007 MB
node 6 free: 936 MB
node 7 cpus: 0 1 2 3
node 7 size: 990 MB
node 7 free: 918 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 10 10 10 10 10 10 10
1: 10 10 10 10 10 10 10 10
2: 10 10 10 10 10 10 10 10
3: 10 10 10 10 10 10 10 10
4: 10 10 10 10 10 10 10 10
5: 10 10 10 10 10 10 10 10
6: 10 10 10 10 10 10 10 10
7: 10 10 10 10 10 10 10 10
This is with the 32-bit system (which AFAICS means 64-bit kernel and
32-bit user space). I'm not saying it's a particularly interesting NUMA
system, considering all the costs are 10, and it's not like it's
critical to get the best performance on rpi5. But it's NUMA, and maybe
there are some other (more practical) systems. I find it interesting
mostly for testing purposes.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 07:15 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Jakub Wartak <[email protected]>
2025-06-25 09:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-25 12:42 ` Álvaro Herrera <[email protected]>
2025-06-25 12:53 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Álvaro Herrera @ 2025-06-25 12:42 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; Christoph Berg <[email protected]>; Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 2025-Jun-25, Tomas Vondra wrote:
> Not sure. I thought NUMA doesn't matter very much on 32-bit systems too,
> exactly because those systems tend to use small amounts of memory. But
> then while investigating this issue I realized even rpi5 has NUMA, in
> fact it has a whopping 8 nodes:
>
> debian@raspberry-32:~ $ numactl --hardware
> available: 8 nodes (0-7)
Interesting. Mine only shows a single node.
alvherre@amras:~ $ uname -a
Linux amras 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
alvherre@amras:~ $ sudo numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 8051 MB
node 0 free: 202 MB
node distances:
node 0
0: 10
alvherre@amras:~ $ sudo lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A76
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r4p1
CPU(s) scaling MHz: 62%
CPU max MHz: 2400.0000
CPU min MHz: 1500.0000
BogoMIPS: 108.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
asimdhp cpuid asimdrdm lrcpc dcpop asimddp
[...]
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Did you enable something special on it maybe?
... Oh, I found this:
https://www.jeffgeerling.com/blog/2024/numa-emulation-speeds-pi-5-and-other-improvements
Sounds like you have this in your system and I don't in mine.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-24 15:04 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 07:15 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Jakub Wartak <[email protected]>
2025-06-25 09:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-25 12:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Álvaro Herrera <[email protected]>
@ 2025-06-25 12:53 ` Tomas Vondra <[email protected]>
0 siblings, 0 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-06-25 12:53 UTC (permalink / raw)
To: Álvaro Herrera <[email protected]>; +Cc: Jakub Wartak <[email protected]>; Christoph Berg <[email protected]>; Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/25/25 14:42, Álvaro Herrera wrote:
> On 2025-Jun-25, Tomas Vondra wrote:
>
>> Not sure. I thought NUMA doesn't matter very much on 32-bit systems too,
>> exactly because those systems tend to use small amounts of memory. But
>> then while investigating this issue I realized even rpi5 has NUMA, in
>> fact it has a whopping 8 nodes:
>>
>> debian@raspberry-32:~ $ numactl --hardware
>> available: 8 nodes (0-7)
>
> Interesting. Mine only shows a single node.
>
> alvherre@amras:~ $ uname -a
> Linux amras 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
> alvherre@amras:~ $ sudo numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3
> node 0 size: 8051 MB
> node 0 free: 202 MB
> node distances:
> node 0
> 0: 10
> alvherre@amras:~ $ sudo lscpu
> Architecture: aarch64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 4
> On-line CPU(s) list: 0-3
> Vendor ID: ARM
> Model name: Cortex-A76
> Model: 1
> Thread(s) per core: 1
> Core(s) per cluster: 4
> Socket(s): -
> Cluster(s): 1
> Stepping: r4p1
> CPU(s) scaling MHz: 62%
> CPU max MHz: 2400.0000
> CPU min MHz: 1500.0000
> BogoMIPS: 108.00
> Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> asimdhp cpuid asimdrdm lrcpc dcpop asimddp
> [...]
> NUMA:
> NUMA node(s): 1
> NUMA node0 CPU(s): 0-3
>
>
> Did you enable something special on it maybe?
>
> ... Oh, I found this:
> https://www.jeffgeerling.com/blog/2024/numa-emulation-speeds-pi-5-and-other-improvements
> Sounds like you have this in your system and I don't in mine.
>
I don't think I had to enable anything special. On the machine running
32-bit RaspberryPi OS I had to install a newer kernel, but I don't
recall doing anything else. I certainly did not apply any kernel patches
or anything like that.
And it seems one of the rpi machines has exactly the same kernel version:
Linux raspberry-64 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian
1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
So I wonder what's going on, why there's no NUMA on your rpi.
--
Tomas Vondra
Attachments:
[text/x-log] rpi.log (5.8K, 2-rpi.log)
download | inline:
debian@raspberry-32:~ $ uname -a
Linux raspberry-32 6.12.34-v8+ #1889 SMP PREEMPT Mon Jun 23 11:11:06 BST 2025 aarch64 GNU/Linux
debian@raspberry-32:~ $ numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3
node 0 size: 981 MB
node 0 free: 881 MB
node 1 cpus: 0 1 2 3
node 1 size: 1007 MB
node 1 free: 935 MB
node 2 cpus: 0 1 2 3
node 2 size: 1007 MB
node 2 free: 936 MB
node 3 cpus: 0 1 2 3
node 3 size: 943 MB
node 3 free: 871 MB
node 4 cpus: 0 1 2 3
node 4 size: 1007 MB
node 4 free: 936 MB
node 5 cpus: 0 1 2 3
node 5 size: 1007 MB
node 5 free: 935 MB
node 6 cpus: 0 1 2 3
node 6 size: 1007 MB
node 6 free: 936 MB
node 7 cpus: 0 1 2 3
node 7 size: 990 MB
node 7 free: 918 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 10 10 10 10 10 10 10
1: 10 10 10 10 10 10 10 10
2: 10 10 10 10 10 10 10 10
3: 10 10 10 10 10 10 10 10
4: 10 10 10 10 10 10 10 10
5: 10 10 10 10 10 10 10 10
6: 10 10 10 10 10 10 10 10
7: 10 10 10 10 10 10 10 10
debian@raspberry-32:~ $ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A76
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r4p1
CPU(s) scaling MHz: 100%
CPU max MHz: 2400.0000
CPU min MHz: 1500.0000
BogoMIPS: 108.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):
L1d: 256 KiB (4 instances)
L1i: 256 KiB (4 instances)
L2: 2 MiB (4 instances)
L3: 2 MiB (1 instance)
NUMA:
NUMA node(s): 8
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 0-3
NUMA node2 CPU(s): 0-3
NUMA node3 CPU(s): 0-3
NUMA node4 CPU(s): 0-3
NUMA node5 CPU(s): 0-3
NUMA node6 CPU(s): 0-3
NUMA node7 CPU(s): 0-3
Vulnerabilities:
Gather data sampling: Not affected
Indirect target selection: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Mitigation; CSV2, BHB
Srbds: Not affected
Tsx async abort: Not affected
debian@raspberry-64:~ $ uname -a
Linux raspberry-64 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
debian@raspberry-64:~ $ numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3
node 0 size: 992 MB
node 0 free: 34 MB
node 1 cpus: 0 1 2 3
node 1 size: 1019 MB
node 1 free: 126 MB
node 2 cpus: 0 1 2 3
node 2 size: 1019 MB
node 2 free: 35 MB
node 3 cpus: 0 1 2 3
node 3 size: 955 MB
node 3 free: 34 MB
node 4 cpus: 0 1 2 3
node 4 size: 1019 MB
node 4 free: 35 MB
node 5 cpus: 0 1 2 3
node 5 size: 1019 MB
node 5 free: 41 MB
node 6 cpus: 0 1 2 3
node 6 size: 1019 MB
node 6 free: 287 MB
node 7 cpus: 0 1 2 3
node 7 size: 1014 MB
node 7 free: 28 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 10 10 10 10 10 10 10
1: 10 10 10 10 10 10 10 10
2: 10 10 10 10 10 10 10 10
3: 10 10 10 10 10 10 10 10
4: 10 10 10 10 10 10 10 10
5: 10 10 10 10 10 10 10 10
6: 10 10 10 10 10 10 10 10
7: 10 10 10 10 10 10 10 10
debian@raspberry-64:~ $ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A76
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r4p1
CPU(s) scaling MHz: 100%
CPU max MHz: 2400.0000
CPU min MHz: 1500.0000
BogoMIPS: 108.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):
L1d: 256 KiB (4 instances)
L1i: 256 KiB (4 instances)
L2: 2 MiB (4 instances)
L3: 2 MiB (1 instance)
NUMA:
NUMA node(s): 8
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 0-3
NUMA node2 CPU(s): 0-3
NUMA node3 CPU(s): 0-3
NUMA node4 CPU(s): 0-3
NUMA node5 CPU(s): 0-3
NUMA node6 CPU(s): 0-3
NUMA node7 CPU(s): 0-3
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Mitigation; CSV2, BHB
Srbds: Not affected
Tsx async abort: Not affected
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-25 06:05 ` Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-25 06:05 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Tue, Jun 24, 2025 at 04:41:33PM +0200, Christoph Berg wrote:
> Re: Bertrand Drouvot
> > Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used
> > in do_pages_move()).
>
> I was also reading the kernel source around that place but you spotted
> the problem before me. This patch resolves the issue here:
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8cf0f9c9599..542c81ec3ed 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
> if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status)))
> break;
>
> - pages += chunk_nr;
> + if (in_compat_syscall()) {
> + compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages;
> +
> + pages32 += chunk_nr;
> + pages = (const void __user * __user *) pages32;
> + } else
> + pages += chunk_nr;
> status += chunk_nr;
> nr_pages -= chunk_nr;
> }
>
Thanks! Yeah, I had the same kind of patch idea in mind.
> > + #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> > +
> > + for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
>
> Perhaps optimize it to this:
>
> #if sizeof(void *) == 4
> #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> #else
> #define NUMA_QUERY_CHUNK_SIZE 1024
> #endif
>
> ... or some other bigger number.
I had in mind to split the batch size on the PG side only for 32-bits, what about
the attached?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-25 09:00 ` Christoph Berg <[email protected]>
2025-06-25 09:22 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
0 siblings, 2 replies; 83+ messages in thread
From: Christoph Berg @ 2025-06-25 09:00 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Bertrand Drouvot
> +/*
> + * Work around Linux kernel bug in 32-bit compat mode: do_pages_stat() has
> + * incorrect pointer arithmetic for more than DO_PAGES_STAT_CHUNK_NR pages.
> + */
> +#if SIZEOF_SIZE_T == 4
I was also missing it in my suggested patch draft, but this should
probably include #ifdef __linux__.
Re: Tomas Vondra
> +#ifdef USE_VALGRIND
> +
> +static inline void
> +pg_numa_touch_mem_if_required(uint64 tmp, char *ptr)
Stupid question, if this function gets properly inlined, why not
always use it as there should be no performance difference vs using a
macro?
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-25 09:22 ` Tomas Vondra <[email protected]>
1 sibling, 0 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-06-25 09:22 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/25/25 11:00, Christoph Berg wrote:
> Re: Bertrand Drouvot
>> +/*
>> + * Work around Linux kernel bug in 32-bit compat mode: do_pages_stat() has
>> + * incorrect pointer arithmetic for more than DO_PAGES_STAT_CHUNK_NR pages.
>> + */
>> +#if SIZEOF_SIZE_T == 4
>
> I was also missing it in my suggested patch draft, but this should
> probably include #ifdef __linux__.
>
>
> Re: Tomas Vondra
>> +#ifdef USE_VALGRIND
>> +
>> +static inline void
>> +pg_numa_touch_mem_if_required(uint64 tmp, char *ptr)
>
> Stupid question, if this function gets properly inlined, why not
> always use it as there should be no performance difference vs using a
> macro?
>
TBH I'm not 100% sure it works correctly, I need to check it actually
touches the memory etc. It's possible it was discussed in one of the
earlier NUMA threads, and there are reasons to do a macro.
I also dislike the ifdefs because it adds subtle differences between the
"normal" code and the code tested with valgrind. So just having the
inlined function would be "nicer".
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-06-26 05:28 ` Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-26 05:28 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Wed, Jun 25, 2025 at 11:00:38AM +0200, Christoph Berg wrote:
> Re: Bertrand Drouvot
> > +/*
> > + * Work around Linux kernel bug in 32-bit compat mode: do_pages_stat() has
> > + * incorrect pointer arithmetic for more than DO_PAGES_STAT_CHUNK_NR pages.
> > + */
> > +#if SIZEOF_SIZE_T == 4
>
> I was also missing it in my suggested patch draft, but this should
> probably include #ifdef __linux__.
I'm not sure because the workaround is after this part of the code in pg_numa.c:
"
/*
* At this point we provide support only for Linux thanks to libnuma, but in
* future support for other platforms e.g. Win32 or FreeBSD might be possible
* too. For Win32 NUMA APIs see
* https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support
*/
#ifdef USE_LIBNUMA
"
So I guess that the "#ifdef __linux__" would have to be at a higher level anyway
(should we support NUMA on more than Linux in the future).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-27 14:52 ` Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-27 14:52 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; Christoph Berg <[email protected]>; +Cc: Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Here's three small patches, that should handle the issue
0001 - Adds the batching into pg_numa_query_pages, so that the callers
don't need to do anything.
The batching doesn't seem to cause any performance regression. 32-bit
systems can't use that much memory anyway, and on 64-bit systems the
batch is sufficiently large (1024).
0002 - Silences the valgrind about the memory touching. It replaces the
macro with a static inline function, and adds suppressions for both
32-bit and 64-bits. The 32-bit may be a bit pointless, because on my
rpi5 valgrind produces about a bunch of other stuff anyway. But doesn't
hurt.
The function now looks like this:
static inline void
pg_numa_touch_mem_if_required(void *ptr)
{
volatile uint64 touch pg_attribute_unused();
touch = *(volatile uint64 *) ptr;
}
I did a lot of testing on multiple systems to check replacing the macro
with a static inline function still works - and it seems it does. But if
someone thinks the function won't work, I'd like to know.
0003 - While working on these patches, it occurred to me we could/should
add CHECK_FOR_INTERRUPTS() into the batch loop. This querying can take
quite a bit of time, so letting people to interrupt it seems reasonable.
It wasn't possible with just one call into the kernel, but with the
batching we can add a CFI.
Please, take a look.
regards
--
Tomas Vondra
Attachments:
[text/x-patch] 0001-Add-batching-when-calling-numa_move_pages.patch (3.3K, 2-0001-Add-batching-when-calling-numa_move_pages.patch)
download | inline diff:
From 3d935f62665a18d96e6bec59cb1f3f7cd7daa068 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Fri, 27 Jun 2025 12:43:20 +0200
Subject: [PATCH 1/3] Add batching when calling numa_move_pages
There's a kernel bug in do_pages_stat(), resulting in numa_move_pages()
producing bogus status when querying location of memory pages. The bug
only affects systems combining 64-bit kernel and 32-bit user space. This
may seem uncommon, but we use such systems for building 32-bit Debian
packages (which happens in a 32-bit chroot).
This is a long-standing kernel bug (since 2010), affecting pretty much
all kernels, so it'll take time until all systems get a fixed kernel.
Luckily, we can work around that on our end, by batching the requests
the same way as in do_pages_stat(). On 32-bit systems we use batches of
16 pointers, same as do_pages_stat(). 64-bit systems are not affected,
so we use a much larger batch of 1024.
Reported-by: Christoph Berg <[email protected]>
Author: Christoph Berg <[email protected]>
Author: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/include/port/pg_numa.h | 2 +-
src/port/pg_numa.c | 45 +++++++++++++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h
index 40f1d324dcf..d707d149a43 100644
--- a/src/include/port/pg_numa.h
+++ b/src/include/port/pg_numa.h
@@ -29,7 +29,7 @@ extern PGDLLIMPORT int pg_numa_get_max_node(void);
#else
-#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
+#define pg_numa_touch_mem_if_required(ptr) \
do {} while(0)
#endif
diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index 4b487a2a4e8..54ab9c70d56 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -29,6 +29,19 @@
#include <numa.h>
#include <numaif.h>
+/*
+ * numa_move_pages() batch size, has to be <= 16 to work around a kernel bug
+ * in do_pages_stat() (chunked by DO_PAGES_STAT_CHUNK_NR). By using the same
+ * batch size, we make it work even on unfixed kernels.
+ *
+ * 64-bit system are not affected by the bug, and so use much larger batches.
+ */
+#if SIZEOF_SIZE_T == 4
+#define NUMA_QUERY_BATCH_SIZE 16
+#else
+#define NUMA_QUERY_BATCH_SIZE 1024
+#endif
+
/* libnuma requires initialization as per numa(3) on Linux */
int
pg_numa_init(void)
@@ -46,7 +59,37 @@ pg_numa_init(void)
int
pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status)
{
- return numa_move_pages(pid, count, pages, NULL, status, 0);
+ unsigned long next = 0;
+ int ret = 0;
+
+ /*
+ * Batch pointers passed to numa_move_pages to NUMA_QUERY_BATCH_SIZE
+ * items, to work around a kernel bug in do_pages_stat().
+ */
+ while (next < count)
+ {
+ unsigned long count_batch = Min(count - next,
+ NUMA_QUERY_BATCH_SIZE);
+
+ /*
+ * Bail out if any of the batches errors out (ret<0). We ignore
+ * (ret>0) which is used to return number of nonmigrated pages,
+ * but we're not migrating any pages here.
+ */
+ ret = numa_move_pages(pid, count_batch, &pages[next], NULL, &status[next], 0);
+ if (ret < 0)
+ {
+ /* plain error, return as is */
+ return ret;
+ }
+
+ next += count_batch;
+ }
+
+ /* should have consumed the input array exactly */
+ Assert(next == count);
+
+ return 0;
}
int
--
2.49.0
[text/x-patch] 0002-Silence-valgrind-about-pg_numa_touch_mem_if_required.patch (3.9K, 3-0002-Silence-valgrind-about-pg_numa_touch_mem_if_required.patch)
download | inline diff:
From 613a85c50a17574fcd34689582ce00c879187463 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Fri, 27 Jun 2025 12:47:38 +0200
Subject: [PATCH 2/3] Silence valgrind about pg_numa_touch_mem_if_required
When querying NUMA status of pages in shared memory, we need to touch
the memory first to get valid results. This may trigger valgrind
reports, because some of the memory (e.g. unpinned buffers) may be
marked as noaccess.
Solved by adding a valgrind suppresion, to ignore this. An alternative
would be to adjust the access/noaccess status before touching the
memory, but that seems far too invasive - it would require all those
pages to have detailed knowledge of what the memory stores.
The pg_numa_touch_mem_if_required() macro is replaced with a plain
function. Macros are invisible to suppressions, so it'd have to suppress
reports for the caller - e.g. pg_get_shmem_allocations_numa(). Which
means we'd suppress reports for the whole function.
Reviewed-by: Christoph Berg <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
contrib/pg_buffercache/pg_buffercache_pages.c | 3 +--
src/backend/storage/ipc/shmem.c | 4 +---
src/include/port/pg_numa.h | 8 ++++++--
src/tools/valgrind.supp | 14 ++++++++++++++
4 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 4b007f6e1b0..ae0291e6e96 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -320,7 +320,6 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
uint64 os_page_count;
int pages_per_buffer;
int max_entries;
- volatile uint64 touch pg_attribute_unused();
char *startptr,
*endptr;
@@ -375,7 +374,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
/* Only need to touch memory once per backend process lifetime */
if (firstNumaTouch)
- pg_numa_touch_mem_if_required(touch, ptr);
+ pg_numa_touch_mem_if_required(ptr);
}
Assert(idx == os_page_count);
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index c9ae3b45b76..ca3656fc76f 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -679,12 +679,10 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
*/
for (i = 0; i < shm_ent_page_count; i++)
{
- volatile uint64 touch pg_attribute_unused();
-
page_ptrs[i] = startptr + (i * os_page_size);
if (firstNumaTouch)
- pg_numa_touch_mem_if_required(touch, page_ptrs[i]);
+ pg_numa_touch_mem_if_required(page_ptrs[i]);
CHECK_FOR_INTERRUPTS();
}
diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h
index d707d149a43..6c8b7103cc3 100644
--- a/src/include/port/pg_numa.h
+++ b/src/include/port/pg_numa.h
@@ -24,8 +24,12 @@ extern PGDLLIMPORT int pg_numa_get_max_node(void);
* This is required on Linux, before pg_numa_query_pages() as we
* need to page-fault before move_pages(2) syscall returns valid results.
*/
-#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
- ro_volatile_var = *(volatile uint64 *) ptr
+static inline void
+pg_numa_touch_mem_if_required(void *ptr)
+{
+ volatile uint64 touch pg_attribute_unused();
+ touch = *(volatile uint64 *) ptr;
+}
#else
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..2ad5b81526d 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,17 @@
Memcheck:Cond
fun:PyObject_Realloc
}
+
+# NUMA introspection requires touching memory first, and some of it may
+# be marked as noacess (e.g. unpinned buffers). So just ignore that.
+{
+ pg_numa_touch_mem_if_required
+ Memcheck:Addr4
+ fun:pg_numa_touch_mem_if_required
+}
+
+{
+ pg_numa_touch_mem_if_required
+ Memcheck:Addr8
+ fun:pg_numa_touch_mem_if_required
+}
--
2.49.0
[text/x-patch] 0003-Add-CHECK_FOR_INTERRUPTS-into-pg_numa_query_pages.patch (1.2K, 4-0003-Add-CHECK_FOR_INTERRUPTS-into-pg_numa_query_pages.patch)
download | inline diff:
From add3768156d05382b2de1dd64d8c420e8c50f92b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Fri, 27 Jun 2025 16:42:43 +0200
Subject: [PATCH 3/3] Add CHECK_FOR_INTERRUPTS into pg_numa_query_pages
Querying the NUMA status can be quite time consuming. Thanks to the
batching, we can do CHECK_FOR_INTERRUPTS(), to allow users aborting
the execution.
Reviewed-by: Christoph Berg <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/port/pg_numa.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index 54ab9c70d56..f76876b2906 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -16,6 +16,7 @@
#include "c.h"
#include <unistd.h>
+#include "miscadmin.h"
#include "port/pg_numa.h"
/*
@@ -71,6 +72,8 @@ pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status)
unsigned long count_batch = Min(count - next,
NUMA_QUERY_BATCH_SIZE);
+ CHECK_FOR_INTERRUPTS();
+
/*
* Bail out if any of the batches errors out (ret<0). We ignore
* (ret>0) which is used to return number of nonmigrated pages,
--
2.49.0
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-27 17:33 ` Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2025-06-27 17:33 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Fri, Jun 27, 2025 at 04:52:08PM +0200, Tomas Vondra wrote:
> Here's three small patches, that should handle the issue
Thanks for the patches!
> 0001 - Adds the batching into pg_numa_query_pages, so that the callers
> don't need to do anything.
>
> The batching doesn't seem to cause any performance regression. 32-bit
> systems can't use that much memory anyway, and on 64-bit systems the
> batch is sufficiently large (1024).
=== 1
-#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
+#define pg_numa_touch_mem_if_required(ptr) \
Looks unrelated, should be in 0002?
=== 2
I thought that it would be better to provide a batch size only in the 32-bit
case (see [1]), but I now think it makes sense to also provide (a larger) one
for non 32-bit (as you did) due to the CFI added in 0003 (as it's also good to
have it for non 32-bit).
> 0002 - Silences the valgrind about the memory touching. It replaces the
> macro with a static inline function, and adds suppressions for both
> 32-bit and 64-bits. The 32-bit may be a bit pointless, because on my
> rpi5 valgrind produces about a bunch of other stuff anyway. But doesn't
> hurt.
>
> The function now looks like this:
>
> static inline void
> pg_numa_touch_mem_if_required(void *ptr)
> {
> volatile uint64 touch pg_attribute_unused();
> touch = *(volatile uint64 *) ptr;
> }
>
> I did a lot of testing on multiple systems to check replacing the macro
> with a static inline function still works - and it seems it does. But if
> someone thinks the function won't work, I'd like to know.
LGTM.
> 0003 - While working on these patches, it occurred to me we could/should
> add CHECK_FOR_INTERRUPTS() into the batch loop. This querying can take
> quite a bit of time, so letting people to interrupt it seems reasonable.
> It wasn't possible with just one call into the kernel, but with the
> batching we can add a CFI.
Yeah, LGTM.
[1]: https://www.postgresql.org/message-id/aFuRoUieUVh%2BpMfZ%40ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-30 18:56 ` Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-06-30 18:56 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/27/25 19:33, Bertrand Drouvot wrote:
> Hi,
>
> On Fri, Jun 27, 2025 at 04:52:08PM +0200, Tomas Vondra wrote:
>> Here's three small patches, that should handle the issue
>
> Thanks for the patches!
>
>> 0001 - Adds the batching into pg_numa_query_pages, so that the callers
>> don't need to do anything.
>>
>> The batching doesn't seem to cause any performance regression. 32-bit
>> systems can't use that much memory anyway, and on 64-bit systems the
>> batch is sufficiently large (1024).
>
> === 1
>
> -#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
> +#define pg_numa_touch_mem_if_required(ptr) \
>
> Looks unrelated, should be in 0002?
>
Of course, I merged it into the wrong patch. Here's a v2 that fixes
this, and also reworded some of the comments and commit messages a
little bit.
In particular it now uses "chunking" instead of "batching". I believe
bathing is "combining multiple requests into a single one", but we're
doing exactly the opposite - splitting a large request into smaller
ones. Which is what "chunking" does.
> === 2
>
> I thought that it would be better to provide a batch size only in the 32-bit
> case (see [1]), but I now think it makes sense to also provide (a larger) one
> for non 32-bit (as you did) due to the CFI added in 0003 (as it's also good to
> have it for non 32-bit).
>
Agreed, I think the CFI is a good thing to have.
>> 0002 - Silences the valgrind about the memory touching. It replaces the
>> macro with a static inline function, and adds suppressions for both
>> 32-bit and 64-bits. The 32-bit may be a bit pointless, because on my
>> rpi5 valgrind produces about a bunch of other stuff anyway. But doesn't
>> hurt.
>>
>> The function now looks like this:
>>
>> static inline void
>> pg_numa_touch_mem_if_required(void *ptr)
>> {
>> volatile uint64 touch pg_attribute_unused();
>> touch = *(volatile uint64 *) ptr;
>> }
>>
>> I did a lot of testing on multiple systems to check replacing the macro
>> with a static inline function still works - and it seems it does. But if
>> someone thinks the function won't work, I'd like to know.
>
> LGTM.
> >> 0003 - While working on these patches, it occurred to me we could/should
>> add CHECK_FOR_INTERRUPTS() into the batch loop. This querying can take
>> quite a bit of time, so letting people to interrupt it seems reasonable.
>> It wasn't possible with just one call into the kernel, but with the
>> batching we can add a CFI.
>
> Yeah, LGTM.
>
Thanks!
I plan to push this tomorrow morning.
--
Tomas Vondra
Attachments:
[text/x-patch] v2-0001-Limit-the-size-of-numa_move_pages-requests.patch (3.8K, 2-v2-0001-Limit-the-size-of-numa_move_pages-requests.patch)
download | inline diff:
From d5dd0631c5c5233cabe2000f310a3f902230a284 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Fri, 27 Jun 2025 12:43:20 +0200
Subject: [PATCH v2 1/3] Limit the size of numa_move_pages requests
There's a kernel bug in do_pages_stat(), affecting systems combining
64-bit kernel and 32-bit user space. The function splits the request
into chunks of 16 pointers, but forgets the pointers are 32-bit when
advancing to the next chunk. Some of the pointers get skipped, and
memory after the array is interpreted as pointers. The result is that
the produced status of memory pages is mostly bogus.
Systems combining 64-bit and 32-bit environments like this might seem
rare, but that's not the case - all 32-bit Debian packages are built in
a 32-bit chroot on a system with 64-bit kernel.
This is a long-standing kernel bug (since 2010), affecting pretty much
all kernels, so it'll take time until all systems get a fixed kernel.
Luckily, we can work around the issue by chunking the requests the same
way do_pages_stat() does, at least on affected systems. We don't know
what kernel a 32-bit build will run on, so all 32-bit builds use chunks
of 16 elements (the largest chunk before hitting the issue).
64-bit builds are not affected by this issue, and so could work without
the chunking. But chunking has other advantages, so we apply chunking
even for 64-bit builds, with chunks of 1024 elements.
Reported-by: Christoph Berg <[email protected]>
Author: Christoph Berg <[email protected]>
Author: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/port/pg_numa.c | 50 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 49 insertions(+), 1 deletion(-)
diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index 4b487a2a4e8..d5935207d0a 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -29,6 +29,19 @@
#include <numa.h>
#include <numaif.h>
+/*
+ * numa_move_pages() chunk size, has to be <= 16 to work around a kernel bug
+ * in do_pages_stat() (chunked by DO_PAGES_STAT_CHUNK_NR). By using the same
+ * chunk size, we make it work even on unfixed kernels.
+ *
+ * 64-bit system are not affected by the bug, and so use much larger chunks.
+ */
+#if SIZEOF_SIZE_T == 4
+#define NUMA_QUERY_CHUNK_SIZE 16
+#else
+#define NUMA_QUERY_CHUNK_SIZE 1024
+#endif
+
/* libnuma requires initialization as per numa(3) on Linux */
int
pg_numa_init(void)
@@ -42,11 +55,46 @@ pg_numa_init(void)
* We use move_pages(2) syscall here - instead of get_mempolicy(2) - as the
* first one allows us to batch and query about many memory pages in one single
* giant system call that is way faster.
+ *
+ * We call numa_move_pages() for smaller chunks of the whole array. The first
+ * reason is to work around a kernel bug, but also to allow interrupting the
+ * query between the calls (for many pointers processing the whole array can
+ * take a lot of time).
*/
int
pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status)
{
- return numa_move_pages(pid, count, pages, NULL, status, 0);
+ unsigned long next = 0;
+ int ret = 0;
+
+ /*
+ * Chunk pointers passed to numa_move_pages to NUMA_QUERY_CHUNK_SIZE
+ * items, to work around a kernel bug in do_pages_stat().
+ */
+ while (next < count)
+ {
+ unsigned long count_chunk = Min(count - next,
+ NUMA_QUERY_CHUNK_SIZE);
+
+ /*
+ * Bail out if any of the chunks errors out (ret<0). We ignore
+ * (ret>0) which is used to return number of nonmigrated pages,
+ * but we're not migrating any pages here.
+ */
+ ret = numa_move_pages(pid, count_chunk, &pages[next], NULL, &status[next], 0);
+ if (ret < 0)
+ {
+ /* plain error, return as is */
+ return ret;
+ }
+
+ next += count_chunk;
+ }
+
+ /* should have consumed the input array exactly */
+ Assert(next == count);
+
+ return 0;
}
int
--
2.49.0
[text/x-patch] v2-0002-Silence-valgrind-about-pg_numa_touch_mem_if_requi.patch (4.1K, 3-v2-0002-Silence-valgrind-about-pg_numa_touch_mem_if_requi.patch)
download | inline diff:
From 9cb095a2061deea7f0a781177f4c89928392d7ce Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Fri, 27 Jun 2025 12:47:38 +0200
Subject: [PATCH v2 2/3] Silence valgrind about pg_numa_touch_mem_if_required
When querying NUMA status of pages in shared memory, we need to touch
the memory first to get valid results. This may trigger valgrind
reports, because some of the memory (e.g. unpinned buffers) may be
marked as noaccess.
Solved by adding a valgrind suppresion. An alternative would be to
adjust the access/noaccess status before touching the memory, but that
seems far too invasive. It would require all those places to have
detailed knowledge of what the shared memory stores.
The pg_numa_touch_mem_if_required() macro is replaced with a function.
Macros are invisible to suppressions, so it'd have to suppress reports
for the caller - e.g. pg_get_shmem_allocations_numa(). So we'd suppress
reports for the whole function, and that seems to heavy-handed. It might
easily hide other valid issues.
Reviewed-by: Christoph Berg <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
contrib/pg_buffercache/pg_buffercache_pages.c | 3 +--
src/backend/storage/ipc/shmem.c | 4 +---
src/include/port/pg_numa.h | 10 +++++++---
src/tools/valgrind.supp | 14 ++++++++++++++
4 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 4b007f6e1b0..ae0291e6e96 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -320,7 +320,6 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
uint64 os_page_count;
int pages_per_buffer;
int max_entries;
- volatile uint64 touch pg_attribute_unused();
char *startptr,
*endptr;
@@ -375,7 +374,7 @@ pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
/* Only need to touch memory once per backend process lifetime */
if (firstNumaTouch)
- pg_numa_touch_mem_if_required(touch, ptr);
+ pg_numa_touch_mem_if_required(ptr);
}
Assert(idx == os_page_count);
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index c9ae3b45b76..ca3656fc76f 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -679,12 +679,10 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
*/
for (i = 0; i < shm_ent_page_count; i++)
{
- volatile uint64 touch pg_attribute_unused();
-
page_ptrs[i] = startptr + (i * os_page_size);
if (firstNumaTouch)
- pg_numa_touch_mem_if_required(touch, page_ptrs[i]);
+ pg_numa_touch_mem_if_required(page_ptrs[i]);
CHECK_FOR_INTERRUPTS();
}
diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h
index 40f1d324dcf..6c8b7103cc3 100644
--- a/src/include/port/pg_numa.h
+++ b/src/include/port/pg_numa.h
@@ -24,12 +24,16 @@ extern PGDLLIMPORT int pg_numa_get_max_node(void);
* This is required on Linux, before pg_numa_query_pages() as we
* need to page-fault before move_pages(2) syscall returns valid results.
*/
-#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
- ro_volatile_var = *(volatile uint64 *) ptr
+static inline void
+pg_numa_touch_mem_if_required(void *ptr)
+{
+ volatile uint64 touch pg_attribute_unused();
+ touch = *(volatile uint64 *) ptr;
+}
#else
-#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
+#define pg_numa_touch_mem_if_required(ptr) \
do {} while(0)
#endif
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..2ad5b81526d 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,17 @@
Memcheck:Cond
fun:PyObject_Realloc
}
+
+# NUMA introspection requires touching memory first, and some of it may
+# be marked as noacess (e.g. unpinned buffers). So just ignore that.
+{
+ pg_numa_touch_mem_if_required
+ Memcheck:Addr4
+ fun:pg_numa_touch_mem_if_required
+}
+
+{
+ pg_numa_touch_mem_if_required
+ Memcheck:Addr8
+ fun:pg_numa_touch_mem_if_required
+}
--
2.49.0
[text/x-patch] v2-0003-Add-CHECK_FOR_INTERRUPTS-into-pg_numa_query_pages.patch (1.4K, 4-v2-0003-Add-CHECK_FOR_INTERRUPTS-into-pg_numa_query_pages.patch)
download | inline diff:
From 690904468235f7093214e1323714d14b8a22a6ca Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Fri, 27 Jun 2025 16:42:43 +0200
Subject: [PATCH v2 3/3] Add CHECK_FOR_INTERRUPTS into pg_numa_query_pages
Querying the NUMA status can be quite time consuming, especially with
large shared buffers. 8cc139bec34a simply called numa_move_pages(),
which meant we have to wait for the syscall to complete.
But with the chunking, introduced to work around the do_pages_stat()
bug, we can do CHECK_FOR_INTERRUPTS() after each chunk, to allow users
aborting the execution.
Reviewed-by: Christoph Berg <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/port/pg_numa.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index d5935207d0a..c65f22020ea 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -16,6 +16,7 @@
#include "c.h"
#include <unistd.h>
+#include "miscadmin.h"
#include "port/pg_numa.h"
/*
@@ -76,6 +77,8 @@ pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status)
unsigned long count_chunk = Min(count - next,
NUMA_QUERY_CHUNK_SIZE);
+ CHECK_FOR_INTERRUPTS();
+
/*
* Bail out if any of the chunks errors out (ret<0). We ignore
* (ret>0) which is used to return number of nonmigrated pages,
--
2.49.0
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-07-01 04:06 ` Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2025-07-01 04:06 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Mon, Jun 30, 2025 at 08:56:43PM +0200, Tomas Vondra wrote:
> In particular it now uses "chunking" instead of "batching". I believe
> bathing is "combining multiple requests into a single one", but we're
> doing exactly the opposite - splitting a large request into smaller
> ones. Which is what "chunking" does.
I do agree that "chuncking" is more appropriate here.
> I plan to push this tomorrow morning.
Thanks!
LGTM, just 2 nit about the commit messages:
For 0001:
Is it worth to add a link to the Kernel Bug report or mentioned it can be
found in the discussion?
For 0003:
"
But with the chunking, introduced to work around the do_pages_stat()
bug"
Do you have in mind to quote the hex commit object name that will be generated
by 0001?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-07-01 11:03 ` Tomas Vondra <[email protected]>
2025-09-11 11:36 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2026-02-12 14:21 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Heikki Linnakangas <[email protected]>
0 siblings, 2 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-07-01 11:03 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 7/1/25 06:06, Bertrand Drouvot wrote:
> Hi,
>
> On Mon, Jun 30, 2025 at 08:56:43PM +0200, Tomas Vondra wrote:
>> In particular it now uses "chunking" instead of "batching". I believe
>> bathing is "combining multiple requests into a single one", but we're
>> doing exactly the opposite - splitting a large request into smaller
>> ones. Which is what "chunking" does.
>
> I do agree that "chuncking" is more appropriate here.
>
>> I plan to push this tomorrow morning.
>
> Thanks!
>
> LGTM, just 2 nit about the commit messages:
>
> For 0001:
>
> Is it worth to add a link to the Kernel Bug report or mentioned it can be
> found in the discussion?
>
> For 0003:
>
> "
> But with the chunking, introduced to work around the do_pages_stat()
> bug"
>
> Do you have in mind to quote the hex commit object name that will be generated
> by 0001?
>
Thanks! Pushed, with both adjustments (link to kernel thread, adding the
commit hash).
But damn it, right after pushing I noticed two typos in the second
commit message :-/
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-09-11 11:36 ` Christoph Berg <[email protected]>
2025-09-11 11:39 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-09-11 11:36 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Tomas Vondra
> Thanks! Pushed, with both adjustments (link to kernel thread, adding the
> commit hash).
The PG18 Debian package is still carrying the contrib complement of
this patch (see attachment).
Should that be addressed before 18.0?
Christoph
Work around a Linux bug in move_pages
In 32-bit mode on 64-bit kernels, move_pages() does not correctly advance to
the next chunk. Work around by not asking for more than 16 pages at once so
move_pages() internal loop is not executed more than once.
https://www.postgresql.org/message-id/flat/a3a4fe3d-1a80-4e03-aa8e-150ee15f6c35%40vondra.me#6abe7eaa...
https://marc.info/?l=linux-mm&m=175077821909222&w=2
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -390,8 +390,15 @@ pg_buffercache_numa_pages(PG_FUNCTION_AR
memset(os_page_status, 0xff, sizeof(int) * os_page_count);
/* Query NUMA status for all the pointers */
- if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, os_page_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < os_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, os_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &os_page_ptrs[chunk_start],
+ &os_page_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Initialize the multi-call context, load entries about buffers */
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -689,8 +689,15 @@ pg_get_shmem_allocations_numa(PG_FUNCTIO
CHECK_FOR_INTERRUPTS();
}
- if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry status: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, shm_ent_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &page_ptrs[chunk_start],
+ &pages_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Count number of NUMA nodes used for this shared memory entry */
memset(nodes, 0, sizeof(Size) * (max_nodes + 1));
Attachments:
[text/plain] move-pages32 (2.3K, 2-move-pages32)
download | inline diff:
Work around a Linux bug in move_pages
In 32-bit mode on 64-bit kernels, move_pages() does not correctly advance to
the next chunk. Work around by not asking for more than 16 pages at once so
move_pages() internal loop is not executed more than once.
https://www.postgresql.org/message-id/flat/a3a4fe3d-1a80-4e03-aa8e-150ee15f6c35%40vondra.me#6abe7eaa802b5b07bb70cc3229e63a9f
https://marc.info/?l=linux-mm&m=175077821909222&w=2
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -390,8 +390,15 @@ pg_buffercache_numa_pages(PG_FUNCTION_AR
memset(os_page_status, 0xff, sizeof(int) * os_page_count);
/* Query NUMA status for all the pointers */
- if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, os_page_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < os_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, os_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &os_page_ptrs[chunk_start],
+ &os_page_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Initialize the multi-call context, load entries about buffers */
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -689,8 +689,15 @@ pg_get_shmem_allocations_numa(PG_FUNCTIO
CHECK_FOR_INTERRUPTS();
}
- if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry status: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, shm_ent_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &page_ptrs[chunk_start],
+ &pages_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Count number of NUMA nodes used for this shared memory entry */
memset(nodes, 0, sizeof(Size) * (max_nodes + 1));
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-09-11 11:36 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
@ 2025-09-11 11:39 ` Christoph Berg <[email protected]>
0 siblings, 0 replies; 83+ messages in thread
From: Christoph Berg @ 2025-09-11 11:39 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: To Tomas Vondra
> The PG18 Debian package is still carrying the contrib complement of
> this patch (see attachment).
Ah sorry, I was confused here. I had assumed that the patch is
required as long as it doesn't conflict, but it doesn't conflict since
the problem was fixed inside pg_numa_query_pages() in git, while the
workaround was outside.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2026-02-12 14:21 ` Heikki Linnakangas <[email protected]>
2026-02-12 16:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Álvaro Herrera <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Heikki Linnakangas @ 2026-02-12 14:21 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; Bertrand Drouvot <[email protected]>; +Cc: Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 01/07/2025 14:03, Tomas Vondra wrote:
> Thanks! Pushed, with both adjustments (link to kernel thread, adding the
> commit hash).
I just noticed that this (commit bf1119d74a: Add CHECK_FOR_INTERRUPTS
into pg_numa_query_pages) made the function unusable in frontend
programs, because CHECK_FOR_INTERRUPTS is server only. It's not used in
frontend programs today, but it was placed in src/port/ with the idea
that it could be.
That's pretty easy to fix by wrapping it in an "#ifndef FRONTEND" block,
per attached.
- Heikki
Attachments:
[text/x-patch] 0001-Make-pg_numa_query_pages-work-in-frontend-programs.patch (1.1K, 2-0001-Make-pg_numa_query_pages-work-in-frontend-programs.patch)
download | inline diff:
From 92e66ea5f17cacfb3408d2af51b9d24d4413cc9e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Thu, 12 Feb 2026 16:11:44 +0200
Subject: [PATCH 1/1] Make pg_numa_query_pages() work in frontend programs
It's currently only used in the server, but it was placed in src/port
with the idea that it might be useful in client programs too. However,
it will currently fail to link if used in a client program, because
CHECK_FOR_INTERRUPTS() is not usable in client programs. Fix that by
wrapping it in "#ifndef FRONTEND".
---
src/port/pg_numa.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index d574a686b42..8954669273a 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -87,7 +87,9 @@ pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status)
unsigned long count_chunk = Min(count - next,
NUMA_QUERY_CHUNK_SIZE);
+#ifndef FRONTEND
CHECK_FOR_INTERRUPTS();
+#endif
/*
* Bail out if any of the chunks errors out (ret<0). We ignore (ret>0)
--
2.47.3
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2026-02-12 14:21 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Heikki Linnakangas <[email protected]>
@ 2026-02-12 16:43 ` Álvaro Herrera <[email protected]>
2026-02-12 17:23 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Álvaro Herrera @ 2026-02-12 16:43 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Bertrand Drouvot <[email protected]>; Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 2026-Feb-12, Heikki Linnakangas wrote:
> I just noticed that this (commit bf1119d74a: Add CHECK_FOR_INTERRUPTS into
> pg_numa_query_pages) made the function unusable in frontend programs,
> because CHECK_FOR_INTERRUPTS is server only. It's not used in frontend
> programs today, but it was placed in src/port/ with the idea that it could
> be.
Your patch LGTM.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Ed is the standard text editor."
http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2026-02-12 14:21 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Heikki Linnakangas <[email protected]>
2026-02-12 16:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Álvaro Herrera <[email protected]>
@ 2026-02-12 17:23 ` Bertrand Drouvot <[email protected]>
2026-02-12 17:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Bertrand Drouvot @ 2026-02-12 17:23 UTC (permalink / raw)
To: Álvaro Herrera <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Tomas Vondra <[email protected]>; Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On Thu, Feb 12, 2026 at 05:43:47PM +0100, Álvaro Herrera wrote:
> On 2026-Feb-12, Heikki Linnakangas wrote:
>
> > I just noticed that this (commit bf1119d74a: Add CHECK_FOR_INTERRUPTS into
> > pg_numa_query_pages) made the function unusable in frontend programs,
> > because CHECK_FOR_INTERRUPTS is server only.
Good catch! Out of curiosity how did you find the issue? Were you building a
client tool making used of pg_numa_query_pages()?
> It's not used in frontend
> > programs today, but it was placed in src/port/ with the idea that it could
> > be.
>
> Your patch LGTM.
+1
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-25 06:05 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-26 05:28 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2026-02-12 14:21 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Heikki Linnakangas <[email protected]>
2026-02-12 16:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Álvaro Herrera <[email protected]>
2026-02-12 17:23 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2026-02-12 17:42 ` Heikki Linnakangas <[email protected]>
0 siblings, 0 replies; 83+ messages in thread
From: Heikki Linnakangas @ 2026-02-12 17:42 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; Álvaro Herrera <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Christoph Berg <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 12/02/2026 19:23, Bertrand Drouvot wrote:
> Hi,
>
> On Thu, Feb 12, 2026 at 05:43:47PM +0100, Álvaro Herrera wrote:
>> On 2026-Feb-12, Heikki Linnakangas wrote:
>>
>>> I just noticed that this (commit bf1119d74a: Add CHECK_FOR_INTERRUPTS into
>>> pg_numa_query_pages) made the function unusable in frontend programs,
>>> because CHECK_FOR_INTERRUPTS is server only.
>
> Good catch! Out of curiosity how did you find the issue? Were you building a
> client tool making used of pg_numa_query_pages()?
I was working on my "interrupts vs signals" patch, which needed to
change some #includes in pg_numa.c, when I spotted that it already had
that issue.
>> It's not used in frontend
>>> programs today, but it was placed in src/port/ with the idea that it could
>>> be.
>>
>> Your patch LGTM.
>
> +1
Pushed, thanks!
- Heikki
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Bertrand Drouvot <[email protected]>
@ 2025-06-24 18:24 ` Christoph Berg <[email protected]>
1 sibling, 0 replies; 83+ messages in thread
From: Christoph Berg @ 2025-06-24 18:24 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Re: Bertrand Drouvot
> Yes, something like:
>
> diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
> index c9ae3b45b76..070ad2f13e7 100644
> --- a/src/backend/storage/ipc/shmem.c
> +++ b/src/backend/storage/ipc/shmem.c
> @@ -689,8 +689,17 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
> CHECK_FOR_INTERRUPTS();
> }
>
> - if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1)
> - elog(ERROR, "failed NUMA pages inquiry status: %m");
> + #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> +
> + for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
> + uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, shm_ent_page_count - chunk_start);
> +
> + if (pg_numa_query_pages(0, chunk_size, &page_ptrs[chunk_start],
> + &pages_status[chunk_start]) == -1)
> + elog(ERROR, "failed NUMA pages inquiry status: %m");
> + }
> +
> + #undef NUMA_QUERY_CHUNK_SIZE
I uploaded a variant of this patch to Debian and it seems to have fixed the issue:
https://buildd.debian.org/status/package.php?p=postgresql-18&suite=experimental
(No reply from linux-mm yet.)
Christoph
Work around a Linux bug in move_pages
In 32-bit mode on 64-bit kernels, move_pages() does not correctly advance to
the next chunk. Work around by not asking for more than 16 pages at once so
move_pages() internal loop is not executed more than once.
https://www.postgresql.org/message-id/flat/a3a4fe3d-1a80-4e03-aa8e-150ee15f6c35%40vondra.me#6abe7eaa...
https://marc.info/?l=linux-mm&m=175077821909222&w=2
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -390,8 +390,15 @@ pg_buffercache_numa_pages(PG_FUNCTION_AR
memset(os_page_status, 0xff, sizeof(int) * os_page_count);
/* Query NUMA status for all the pointers */
- if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, os_page_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < os_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, os_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &os_page_ptrs[chunk_start],
+ &os_page_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Initialize the multi-call context, load entries about buffers */
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -689,8 +689,15 @@ pg_get_shmem_allocations_numa(PG_FUNCTIO
CHECK_FOR_INTERRUPTS();
}
- if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry status: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, shm_ent_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &page_ptrs[chunk_start],
+ &pages_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Count number of NUMA nodes used for this shared memory entry */
memset(nodes, 0, sizeof(Size) * (max_nodes + 1));
Attachments:
[text/plain] move-pages32 (2.3K, 2-move-pages32)
download | inline diff:
Work around a Linux bug in move_pages
In 32-bit mode on 64-bit kernels, move_pages() does not correctly advance to
the next chunk. Work around by not asking for more than 16 pages at once so
move_pages() internal loop is not executed more than once.
https://www.postgresql.org/message-id/flat/a3a4fe3d-1a80-4e03-aa8e-150ee15f6c35%40vondra.me#6abe7eaa802b5b07bb70cc3229e63a9f
https://marc.info/?l=linux-mm&m=175077821909222&w=2
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -390,8 +390,15 @@ pg_buffercache_numa_pages(PG_FUNCTION_AR
memset(os_page_status, 0xff, sizeof(int) * os_page_count);
/* Query NUMA status for all the pointers */
- if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, os_page_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < os_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, os_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &os_page_ptrs[chunk_start],
+ &os_page_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Initialize the multi-call context, load entries about buffers */
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -689,8 +689,15 @@ pg_get_shmem_allocations_numa(PG_FUNCTIO
CHECK_FOR_INTERRUPTS();
}
- if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1)
- elog(ERROR, "failed NUMA pages inquiry status: %m");
+#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
+ for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
+ uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, shm_ent_page_count - chunk_start);
+
+ if (pg_numa_query_pages(0, chunk_size, &page_ptrs[chunk_start],
+ &pages_status[chunk_start]) == -1)
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
+ }
+#undef NUMA_QUERY_CHUNK_SIZE
/* Count number of NUMA nodes used for this shared memory entry */
memset(nodes, 0, sizeof(Size) * (max_nodes + 1));
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-06-24 11:10 ` Andres Freund <[email protected]>
2025-06-24 12:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Andres Freund @ 2025-06-24 11:10 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
Hi,
On 2025-06-24 03:43:19 +0200, Tomas Vondra wrote:
> FWIW while looking into this, I tried running this under valgrind (on a
> regular 64-bit system, not in the chroot), and I get this report:
>
> ==65065== Invalid read of size 8
> ==65065== at 0x113B0EBE: pg_buffercache_numa_pages
> (pg_buffercache_pages.c:380)
> ==65065== by 0x6B539D: ExecMakeTableFunctionResult (execSRF.c:234)
> ==65065== by 0x6CEB7E: FunctionNext (nodeFunctionscan.c:94)
> ==65065== by 0x6B6ACA: ExecScanFetch (execScan.h:126)
> ==65065== by 0x6B6B31: ExecScanExtended (execScan.h:170)
> ==65065== by 0x6B6C9D: ExecScan (execScan.c:59)
> ==65065== by 0x6CEF0F: ExecFunctionScan (nodeFunctionscan.c:269)
> ==65065== by 0x6B29FA: ExecProcNodeFirst (execProcnode.c:469)
> ==65065== by 0x6A6F56: ExecProcNode (executor.h:313)
> ==65065== by 0x6A9533: ExecutePlan (execMain.c:1679)
> ==65065== by 0x6A7422: standard_ExecutorRun (execMain.c:367)
> ==65065== by 0x6A7330: ExecutorRun (execMain.c:304)
> ==65065== by 0x934EF0: PortalRunSelect (pquery.c:921)
> ==65065== by 0x934BD8: PortalRun (pquery.c:765)
> ==65065== by 0x92E4CD: exec_simple_query (postgres.c:1273)
> ==65065== by 0x93301E: PostgresMain (postgres.c:4766)
> ==65065== by 0x92A88B: BackendMain (backend_startup.c:124)
> ==65065== by 0x85A7C7: postmaster_child_launch (launch_backend.c:290)
> ==65065== by 0x860111: BackendStartup (postmaster.c:3580)
> ==65065== by 0x85DE6F: ServerLoop (postmaster.c:1702)
> ==65065== Address 0x7b6c000 is in a rw- anonymous segment
>
>
> This fails here (on the pg_numa_touch_mem_if_required call):
>
> for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
> {
> os_page_ptrs[idx++] = ptr;
>
> /* Only need to touch memory once per backend process */
> if (firstNumaTouch)
> pg_numa_touch_mem_if_required(touch, ptr);
> }
That's because we mark unpinned pages as inaccessible / mark them as
accessible when pinning. See logic related to that in PinBuffer():
/*
* Assume that we acquired a buffer pin for the purposes of
* Valgrind buffer client checks (even in !result case) to
* keep things simple. Buffers that are unsafe to access are
* not generally guaranteed to be marked undefined or
* non-accessible in any case.
*/
> The 0x7b6c000 is the very first pointer, and it's the only pointer that
> triggers this warning.
I suspect that that's because valgrind combines different reports or such.
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: pgsql: Introduce pg_shmem_allocations_numa view
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:42 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 14:48 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 15:20 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 15:59 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 16:26 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
2025-06-23 19:57 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 20:37 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:14 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Christoph Berg <[email protected]>
2025-06-23 21:47 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Re: pgsql: Introduce pg_shmem_allocations_numa view Andres Freund <[email protected]>
@ 2025-06-24 12:42 ` Tomas Vondra <[email protected]>
0 siblings, 0 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-06-24 12:42 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Christoph Berg <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 6/24/25 13:10, Andres Freund wrote:
> Hi,
>
> On 2025-06-24 03:43:19 +0200, Tomas Vondra wrote:
>> FWIW while looking into this, I tried running this under valgrind (on a
>> regular 64-bit system, not in the chroot), and I get this report:
>>
>> ==65065== Invalid read of size 8
>> ==65065== at 0x113B0EBE: pg_buffercache_numa_pages
>> (pg_buffercache_pages.c:380)
>> ==65065== by 0x6B539D: ExecMakeTableFunctionResult (execSRF.c:234)
>> ==65065== by 0x6CEB7E: FunctionNext (nodeFunctionscan.c:94)
>> ==65065== by 0x6B6ACA: ExecScanFetch (execScan.h:126)
>> ==65065== by 0x6B6B31: ExecScanExtended (execScan.h:170)
>> ==65065== by 0x6B6C9D: ExecScan (execScan.c:59)
>> ==65065== by 0x6CEF0F: ExecFunctionScan (nodeFunctionscan.c:269)
>> ==65065== by 0x6B29FA: ExecProcNodeFirst (execProcnode.c:469)
>> ==65065== by 0x6A6F56: ExecProcNode (executor.h:313)
>> ==65065== by 0x6A9533: ExecutePlan (execMain.c:1679)
>> ==65065== by 0x6A7422: standard_ExecutorRun (execMain.c:367)
>> ==65065== by 0x6A7330: ExecutorRun (execMain.c:304)
>> ==65065== by 0x934EF0: PortalRunSelect (pquery.c:921)
>> ==65065== by 0x934BD8: PortalRun (pquery.c:765)
>> ==65065== by 0x92E4CD: exec_simple_query (postgres.c:1273)
>> ==65065== by 0x93301E: PostgresMain (postgres.c:4766)
>> ==65065== by 0x92A88B: BackendMain (backend_startup.c:124)
>> ==65065== by 0x85A7C7: postmaster_child_launch (launch_backend.c:290)
>> ==65065== by 0x860111: BackendStartup (postmaster.c:3580)
>> ==65065== by 0x85DE6F: ServerLoop (postmaster.c:1702)
>> ==65065== Address 0x7b6c000 is in a rw- anonymous segment
>>
>>
>> This fails here (on the pg_numa_touch_mem_if_required call):
>>
>> for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
>> {
>> os_page_ptrs[idx++] = ptr;
>>
>> /* Only need to touch memory once per backend process */
>> if (firstNumaTouch)
>> pg_numa_touch_mem_if_required(touch, ptr);
>> }
>
> That's because we mark unpinned pages as inaccessible / mark them as
> accessible when pinning. See logic related to that in PinBuffer():
>
> /*
> * Assume that we acquired a buffer pin for the purposes of
> * Valgrind buffer client checks (even in !result case) to
> * keep things simple. Buffers that are unsafe to access are
> * not generally guaranteed to be marked undefined or
> * non-accessible in any case.
> */
>
>
>> The 0x7b6c000 is the very first pointer, and it's the only pointer that
>> triggers this warning.
>
> I suspect that that's because valgrind combines different reports or such.
>
Thanks. It probably is something like that, although I made sure to not
use any such options when running valgrind (so --error-limit=no). But
maybe there's something else, hiding the reports.
I guess there are two ways to address this - make sure the buffers are
marked as accessible/defined, or add a valgrind suppression. I think the
suppression is the right approach here, otherwise we'd need to worry
about already pinned buffers etc. Which seems not great, the functions
don't even care about buffers right now, they mostly work with memory
pages (especially pg_shmem_allocations_numa).
Barring objections, I'll fix it this way.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
@ 2025-10-16 11:38 ` Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-10-16 11:38 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; Jakub Wartak <[email protected]>; +Cc: [email protected]
> src/test/regress/expected/numa.out | 13 +++
> src/test/regress/expected/numa_1.out | 5 +
numa_1.out is catching this error:
ERROR: libnuma initialization failed or NUMA is not supported on this platform
This is what I'm getting when running PG18 in docker on Debian trixie
(libnuma 2.0.19).
However, on older distributions, the error is different:
postgres =# select * from pg_shmem_allocations_numa;
ERROR: XX000: failed NUMA pages inquiry status: Operation not permitted
LOCATION: pg_get_shmem_allocations_numa, shmem.c:691
This makes the numa regression tests fail in Docker on Debian bookworm
(libnuma 2.0.16) and older and all of the Ubuntu LTS releases.
The attached patch makes it accept these errors, but perhaps it would
be better to detect it in pg_numa_available().
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-10-16 14:27 ` Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-10-16 14:27 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; Tomas Vondra <[email protected]>; Jakub Wartak <[email protected]>; +Cc: [email protected]
On 10/16/25 13:38, Christoph Berg wrote:
>> src/test/regress/expected/numa.out | 13 +++
>> src/test/regress/expected/numa_1.out | 5 +
>
> numa_1.out is catching this error:
>
> ERROR: libnuma initialization failed or NUMA is not supported on this platform
>
> This is what I'm getting when running PG18 in docker on Debian trixie
> (libnuma 2.0.19).
>
> However, on older distributions, the error is different:
>
> postgres =# select * from pg_shmem_allocations_numa;
> ERROR: XX000: failed NUMA pages inquiry status: Operation not permitted
> LOCATION: pg_get_shmem_allocations_numa, shmem.c:691
>
> This makes the numa regression tests fail in Docker on Debian bookworm
> (libnuma 2.0.16) and older and all of the Ubuntu LTS releases.
>
It's probably more about the kernel version. What kernels are used by
these systems?
> The attached patch makes it accept these errors, but perhaps it would
> be better to detect it in pg_numa_available().
>
Not sure how would that work. It seems this is some sort of permission
check in numa_move_pages, that's not what pg_numa_available does. Also,
it may depending on the page queried (e.g. whether it's exclusive or
shared by multiple processes).
thanks
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2025-10-16 14:54 ` Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:09 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 2 replies; 83+ messages in thread
From: Christoph Berg @ 2025-10-16 14:54 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> It's probably more about the kernel version. What kernels are used by
> these systems?
It's the very same kernel, just different docker containers on the
same system. I did not investigate yet where the problem is coming
from, different libnuma versions seemed like the best bet.
Same (differing) results on both these systems:
Linux turing 6.16.7+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.16.7-1 (2025-09-11) x86_64 GNU/Linux
Linux jenkins 6.1.0-39-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.148-1 (2025-08-26) x86_64 GNU/Linux
> Not sure how would that work. It seems this is some sort of permission
> check in numa_move_pages, that's not what pg_numa_available does. Also,
> it may depending on the page queried (e.g. whether it's exclusive or
> shared by multiple processes).
It's probably the lack of some process capability in that environment.
Maybe there is a way to query that, but I don't know much about that
yet.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-10-16 15:06 ` Christoph Berg <[email protected]>
2025-10-16 15:08 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
1 sibling, 2 replies; 83+ messages in thread
From: Christoph Berg @ 2025-10-16 15:06 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: To Tomas Vondra
> It's the very same kernel, just different docker containers on the
> same system. I did not investigate yet where the problem is coming
> from, different libnuma versions seemed like the best bet.
numactl shows the problem already:
Host system:
$ numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cpubind: 0
nodebind: 0
membind: 0
preferred:
debian:trixie-slim container:
$ numactl --show
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
No NUMA support available on this system.
debian:bookworm-slim container:
$ numactl --show
get_mempolicy: Operation not permitted
get_mempolicy: Operation not permitted
get_mempolicy: Operation not permitted
get_mempolicy: Operation not permitted
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cpubind: 0
nodebind: 0
membind: 0
preferred:
Running with sudo does not change the result.
So maybe all that's needed is a get_mempolicy() call in
pg_numa_available() ?
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-10-16 15:08 ` Christoph Berg <[email protected]>
1 sibling, 0 replies; 83+ messages in thread
From: Christoph Berg @ 2025-10-16 15:08 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: To Tomas Vondra
> So maybe all that's needed is a get_mempolicy() call in
> pg_numa_available() ?
Or perhaps give up on pg_numa_available, and just have two _1.out and
_2.out that just contain the two different error messages, without
trying to catch the problem.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-10-16 15:19 ` Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-28 15:20 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
1 sibling, 2 replies; 83+ messages in thread
From: Christoph Berg @ 2025-10-16 15:19 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
> So maybe all that's needed is a get_mempolicy() call in
> pg_numa_available() ?
numactl 2.0.19 --show does this:
if (numa_available() < 0) {
show_physcpubind();
printf("No NUMA support available on this system.\n");
exit(1);
}
int numa_available(void)
{
if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && (errno == ENOSYS || errno == EPERM))
return -1;
return 0;
}
pg_numa_available is already calling numa_available.
But numactl 2.0.16 has this:
int numa_available(void)
{
if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && errno == ENOSYS)
return -1;
return 0;
}
... which is not catching the "permission denied" error I am seeing.
So maybe PG should implement numa_available itself like that. (Or
accept the output difference so the regression tests are passing.)
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-10-28 15:14 ` Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-10-28 15:14 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
On 10/16/25 17:19, Christoph Berg wrote:
>> So maybe all that's needed is a get_mempolicy() call in
>> pg_numa_available() ?
>
> ...
>
> So maybe PG should implement numa_available itself like that. (Or
> accept the output difference so the regression tests are passing.)
>
I'm not sure which of those options is better. I'm a bit worried just
accepting the alternative output would hide some failures in the future
(although it's a low risk).
So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
attached patch. It still calls numa_available(), so that we don't
silently miss future libnuma changes.
Can you check this makes it work inside the docker container?
regards
--
Tomas Vondra
Attachments:
[text/x-patch] 0001-Handle-EPERM-in-pg_numa_init.patch (873B, 2-0001-Handle-EPERM-in-pg_numa_init.patch)
download | inline diff:
From b5550ae6f5bac3de14a86a0f7677db755b27aa73 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Tue, 28 Oct 2025 16:00:07 +0100
Subject: [PATCH] Handle EPERM in pg_numa_init
---
src/port/pg_numa.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index 3368a43a338..540ada3f8ef 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -47,7 +47,17 @@
int
pg_numa_init(void)
{
- int r = numa_available();
+ int r;
+
+ /*
+ * XXX libnuma versions before 2.0.19 don't handle EPERM by disabling
+ * NUMA, which then leads to unexpected failures later. This affects
+ * containers that disable get_mempolicy by a seccomp profile.
+ */
+ if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && (errno == EPERM))
+ r = -1;
+ else
+ r = numa_available();
return r;
}
--
2.51.0
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2025-11-14 12:52 ` Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-11-14 12:52 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
> attached patch. It still calls numa_available(), so that we don't
> silently miss future libnuma changes.
>
> Can you check this makes it work inside the docker container?
Yes your patch works. (Sorry I meant to test earlier, but RL...)
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-11-20 12:53 ` Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-11-20 12:53 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
On 11/14/25 13:52, Christoph Berg wrote:
> Re: Tomas Vondra
>> So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
>> attached patch. It still calls numa_available(), so that we don't
>> silently miss future libnuma changes.
>>
>> Can you check this makes it work inside the docker container?
>
> Yes your patch works. (Sorry I meant to test earlier, but RL...)
>
Thanks. I've pushed the fix (and backpatched to 18).
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2025-12-11 12:29 ` Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-12-11 12:29 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> >> So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
> >> attached patch. It still calls numa_available(), so that we don't
> >> silently miss future libnuma changes.
> >>
> >> Can you check this makes it work inside the docker container?
> >
> > Yes your patch works. (Sorry I meant to test earlier, but RL...)
>
> Thanks. I've pushed the fix (and backpatched to 18).
It looks like we are not done here yet :(
postgresql-18 is failing here intermittently with this diff:
12:20:24 --- /build/reproducible-path/postgresql-18-18.1/src/test/regress/expected/numa.out 2025-11-10 21:52:06.000000000 +0000
12:20:24 +++ /build/reproducible-path/postgresql-18-18.1/build/src/test/regress/results/numa.out 2025-12-11 11:20:22.618989603 +0000
12:20:24 @@ -6,8 +6,4 @@
12:20:24 -- switch to superuser
12:20:24 \c -
12:20:24 SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
12:20:24 - ok
12:20:24 -----
12:20:24 - t
12:20:24 -(1 row)
12:20:24 -
12:20:24 +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
That's REL_18_STABLE @ 580b5c, with the Debian packaging on top.
I've seen it on unstable/amd64, unstable/arm64, and Ubuntu
questing/amd64, where libnuma should take care of this itself, without
the extra patch in PG. There was another case on bullseye/amd64 which
has the old libnuma.
It's been frequent enough so it killed 4 out of the 10 builds
currently visible on
https://jengus.postgresql.org/job/postgresql-18-binaries-snapshot/.
(Though to be fair, only one distribution/arch combination was failing
for each of them.)
There is also one instance of it in
https://jengus.postgresql.org/job/postgresql-19-binaries-snapshot/
I currently have no idea what's happening.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-12-11 12:46 ` Tomas Vondra <[email protected]>
2025-12-13 17:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
0 siblings, 2 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-12-11 12:46 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
On 12/11/25 13:29, Christoph Berg wrote:
> Re: Tomas Vondra
>>>> So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
>>>> attached patch. It still calls numa_available(), so that we don't
>>>> silently miss future libnuma changes.
>>>>
>>>> Can you check this makes it work inside the docker container?
>>>
>>> Yes your patch works. (Sorry I meant to test earlier, but RL...)
>>
>> Thanks. I've pushed the fix (and backpatched to 18).
>
> It looks like we are not done here yet :(
>
> postgresql-18 is failing here intermittently with this diff:
>
> 12:20:24 --- /build/reproducible-path/postgresql-18-18.1/src/test/regress/expected/numa.out 2025-11-10 21:52:06.000000000 +0000
> 12:20:24 +++ /build/reproducible-path/postgresql-18-18.1/build/src/test/regress/results/numa.out 2025-12-11 11:20:22.618989603 +0000
> 12:20:24 @@ -6,8 +6,4 @@
> 12:20:24 -- switch to superuser
> 12:20:24 \c -
> 12:20:24 SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
> 12:20:24 - ok
> 12:20:24 -----
> 12:20:24 - t
> 12:20:24 -(1 row)
> 12:20:24 -
> 12:20:24 +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
>
> That's REL_18_STABLE @ 580b5c, with the Debian packaging on top.
>
> I've seen it on unstable/amd64, unstable/arm64, and Ubuntu
> questing/amd64, where libnuma should take care of this itself, without
> the extra patch in PG. There was another case on bullseye/amd64 which
> has the old libnuma.
>
> It's been frequent enough so it killed 4 out of the 10 builds
> currently visible on
> https://jengus.postgresql.org/job/postgresql-18-binaries-snapshot/.
> (Though to be fair, only one distribution/arch combination was failing
> for each of them.)
>
> There is also one instance of it in
> https://jengus.postgresql.org/job/postgresql-19-binaries-snapshot/
>
> I currently have no idea what's happening.
>
Hmmm, strange. -2 is ENOENT, which should mean this:
-ENOENT
The page is not present.
But what does "not present" mean in this context? And why would that be
only intermittent? Presumably this is still running in Docker, so maybe
it's another weird consequence of that?
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2025-12-13 17:36 ` Christoph Berg <[email protected]>
1 sibling, 0 replies; 83+ messages in thread
From: Christoph Berg @ 2025-12-13 17:36 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> Hmmm, strange. -2 is ENOENT, which should mean this:
>
> -ENOENT
> The page is not present.
>
> But what does "not present" mean in this context? And why would that be
> only intermittent? Presumably this is still running in Docker, so maybe
> it's another weird consequence of that?
Sorry I forgot to mention that this is now in the normal apt.pg.o
build environment (chroots without any funky permission restrictions).
I have not tried Docker yet.
I think it was not happening before the backport of the Docker fix.
But I have no idea why this should have broken anything, and why it
would only happen like 3% of the time.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2025-12-16 13:16 ` Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
1 sibling, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-12-16 13:16 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> Hmmm, strange. -2 is ENOENT, which should mean this:
>
> -ENOENT
> The page is not present.
>
> But what does "not present" mean in this context? And why would that be
> only intermittent? Presumably this is still running in Docker, so maybe
> it's another weird consequence of that?
I've managed to reproduce it once, running this loop on
18-as-of-today. It errored out after a few 100 iterations:
while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done
2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa
That was on the apt.pg.o amd64 build machine while a few things were
just building. Maybe ENOENT "The page is not present" means something
was just swapped out because the machine was under heavy load.
I tried reading the kernel source and it sounds related:
* If the source virtual memory range has any unmapped holes, or if
* the destination virtual memory range is not a whole unmapped hole,
* move_pages() will fail respectively with -ENOENT or -EEXIST. This
* provides a very strict behavior to avoid any chance of memory
* corruption going unnoticed if there are userland race conditions.
* Only one thread should resolve the userland page fault at any given
* time for any given faulting address. This means that if two threads
* try to both call move_pages() on the same destination address at the
* same time, the second thread will get an explicit error from this
* command.
...
* The UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES flag can be specified to
* prevent -ENOENT errors to materialize if there are holes in the
* source virtual range that is being remapped. The holes will be
* accounted as successfully remapped in the retval of the
* command. This is mostly useful to remap hugepage naturally aligned
* virtual regions without knowing if there are transparent hugepage
* in the regions or not, but preventing the risk of having to split
* the hugepmd during the remap.
...
ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
unsigned long src_start, unsigned long len, __u64 mode)
...
if (!(mode & UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES)) {
err = -ENOENT;
break;
What I don't understand yet is why this move_pages() signature does
not match the one from libnuma and move_pages(2) (note "mode" vs "flags"):
int numa_move_pages(int pid, unsigned long count,
void **pages, const int *nodes, int *status, int flags)
{
return move_pages(pid, count, pages, nodes, status, flags);
}
I guess the answer is somewhere in that gap.
> ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
Maybe instead of putting sanity checks on what the kernel is
returning, we should just pass that through to the user? (Or perhaps
transform negative numbers to NULL?)
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-12-16 14:48 ` Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-12-16 14:48 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: To Tomas Vondra
> I've managed to reproduce it once, running this loop on
> 18-as-of-today. It errored out after a few 100 iterations:
>
> while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done
>
> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa
>
> That was on the apt.pg.o amd64 build machine while a few things were
> just building. Maybe ENOENT "The page is not present" means something
> was just swapped out because the machine was under heavy load.
I played a bit more with it.
* It seems to trigger only once for a running cluster. The next one
needs a restart
* If it doesn't trigger within the first 30s, it probably never will
* It seems easier to trigger on a system that is under load (I started
a few pgmodeler compile runs in parallel (C++))
But none of that answers the "why".
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-12-16 15:17 ` Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-12-16 15:17 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
On 12/16/25 15:48, Christoph Berg wrote:
> Re: To Tomas Vondra
>> I've managed to reproduce it once, running this loop on
>> 18-as-of-today. It errored out after a few 100 iterations:
>>
>> while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done
>>
>> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
>> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa
>>
>> That was on the apt.pg.o amd64 build machine while a few things were
>> just building. Maybe ENOENT "The page is not present" means something
>> was just swapped out because the machine was under heavy load.
>
> I played a bit more with it.
>
> * It seems to trigger only once for a running cluster. The next one
> needs a restart
> * If it doesn't trigger within the first 30s, it probably never will
> * It seems easier to trigger on a system that is under load (I started
> a few pgmodeler compile runs in parallel (C++))
>
> But none of that answers the "why".
>
Hmmm, so this is interesting. I tried this on my workstation (with a
single NUMA node), and I see this:
1) right after opening a connection, I get this
test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 290
-2 | 32478
(2 rows)
2) but a select from pg_shmem_allocations_numa works fine
test=# select numa_node, count(*) from pg_shmem_allocations_numa group by 1;
numa_node | count
-----------+-------
0 | 72
(1 row)
3) and if I repeat the pg_buffercache_numa query, it now works
test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 32768
(1 row)
That's a bit strange. I have no idea why is this happening. If I
reconnect, I start getting the failures again.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2025-12-16 17:54 ` Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2025-12-16 17:54 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> 1) right after opening a connection, I get this
>
> test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
> numa_node | count
> -----------+-------
> 0 | 290
> -2 | 32478
Does that mean that the "touch all pages" logic is missing in some
code paths?
But even with that, it seems to be able to degenerate again and
accepting -2 in the regression tests would be required to make it
stable.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-12-17 11:07 ` Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2025-12-17 11:07 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
On 12/16/25 18:54, Christoph Berg wrote:
> Re: Tomas Vondra
>> 1) right after opening a connection, I get this
>>
>> test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
>> numa_node | count
>> -----------+-------
>> 0 | 290
>> -2 | 32478
>
> Does that mean that the "touch all pages" logic is missing in some
> code paths?
>
I did check and AFAICS we are touching the pages in pg_buffercache_numa.
To make it even more confusing, I can no longer reproduce the behavior I
reported yesterday. It just consistently reports "0" and I have no idea
why it changed :-( I did restart since yesterday, so maybe that changed
something.
> But even with that, it seems to be able to degenerate again and
> accepting -2 in the regression tests would be required to make it
> stable.
>
No opinion yet. Either the -2 can happen occasionally, and then we'd
need to adjust the regression tests. Or maybe it's some thinko, and then
it'd be good to figure out why it's happening.
I find it interesting it does not seem to fail on the buildfarm. Or at
least I'm not aware of such failures. Even a rare failure should show
itself on the buildfarm a couple times, so how come it didn't?
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2026-01-05 21:35 ` Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2026-01-05 21:35 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
On 12/17/25 12:07, Tomas Vondra wrote:
>
>
> On 12/16/25 18:54, Christoph Berg wrote:
>> Re: Tomas Vondra
>>> 1) right after opening a connection, I get this
>>>
>>> test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
>>> numa_node | count
>>> -----------+-------
>>> 0 | 290
>>> -2 | 32478
>>
>> Does that mean that the "touch all pages" logic is missing in some
>> code paths?
>>
>
> I did check and AFAICS we are touching the pages in pg_buffercache_numa.
>
> To make it even more confusing, I can no longer reproduce the behavior I
> reported yesterday. It just consistently reports "0" and I have no idea
> why it changed :-( I did restart since yesterday, so maybe that changed
> something.
>
I kept poking at this, and I managed to reproduce it again. The key
seems to be that the system needs to be under pressure, and then it's
reliably reproducible (at least for me).
What I did is I created two instances - one to keep the system busy, one
for experimentation. The "busy" one is set to use shared_buffers=16GB,
and then running read-only pgbench.
pgbench -i -s 4500 test
pgbench -S -j 16 -c 64 -T 600 -P 1 test
The system has 64GB of RAM and 12 cores, so this is a lot of load.
Then, the other instance is set to use shared_buffers=4GB, is started
and immediately queried for NUMA info for buffers (in a loop):
pg_ctl -D data -l pg.log start;
for r in $(seq 1 10); do
psql -p 5001 test -c 'select numa_node, count(*) from
pg_buffercache_numa group by 1';
done;
pg_ctl -D data -l pg.log stop;
And this often fails like this:
----------------------------------------------------------------------
waiting for server to start.... done
server started
numa_node | count
-----------+---------
0 | 1045302
-2 | 3274
(2 rows)
numa_node | count
-----------+---------
0 | 1048576
(1 row)
numa_node | count
-----------+---------
0 | 1048576
(1 row)
numa_node | count
-----------+---------
0 | 1048576
(1 row)
numa_node | count
-----------+---------
0 | 1048576
(1 row)
numa_node | count
-----------+---------
0 | 1048576
(1 row)
numa_node | count
-----------+---------
0 | 1025321
-2 | 23255
(2 rows)
numa_node | count
-----------+---------
0 | 1038596
-2 | 9980
(2 rows)
numa_node | count
-----------+---------
0 | 1048518
-2 | 58
(2 rows)
numa_node | count
-----------+---------
0 | 1048525
-2 | 51
(2 rows)
waiting for server to shut down.... done
server stopped
----------------------------------------------------------------------
So, it clearly fails quite often. And it can fail even later, after a
run that returned no "-2" buffers.
Clearly, something behaves differently than we thought. I've only seen
this happen on a system with swap - once I removed it, this behavior
disappeared too. So it seems a page can be moved to swap, in which case
we get -2 for a status.
In hindsight, that's not all that surprising. It's interesting it can
happen even with the "touching", but I guess there's a race condition
and the memory can get paged out before we inspect the status. We're
querying batches of pages, which probably makes the window larger.
FWIW I now realized I don't even need two instances. If I try this on
the "busy" instance, I get the -2 values too. Which I find a bit weird.
Because why should those be paged out?
The question is what to do about this. I don't think we can prevent the
-2 values, and error-ing out does not seem great either (most systems
have swap, so -2 may not be all that rare).
In fact, pg_shmem_allocations_numa probably should not error-out either,
because it's now reliably failing (on the busy instance).
I guess the only solution is to accept -2 as a possible value (unknown
node). But that makes regression testing harder, because it means the
output could change a lot ...
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2026-01-05 22:29 ` Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Christoph Berg @ 2026-01-05 22:29 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> I guess the only solution is to accept -2 as a possible value (unknown
> node). But that makes regression testing harder, because it means the
> output could change a lot ...
Or just not test that, or do something like
select numa_node = -2 or numa_node between 0 and 1000 from pg_shmem_allocations_numa;
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2026-01-06 13:23 ` Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Jakub Wartak @ 2026-01-06 13:23 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; [email protected]
On Mon, Jan 5, 2026 at 11:30 PM Christoph Berg <[email protected]> wrote:
>
> Re: Tomas Vondra
> > I guess the only solution is to accept -2 as a possible value (unknown
> > node). But that makes regression testing harder, because it means the
> > output could change a lot ...
Hi Tomas! That's pretty wild, nice find about that swapping s_b thing!
So just to confirm, that was reproduced outside containers/docker,
right?
> Or just not test that, or do something like
>
> select numa_node = -2 or numa_node between 0 and 1000 from pg_shmem_allocations_numa;
Well, with the huge-pages it should be not swappable, so another idea
would be simply alter first line of src/test/regress/sql/numa.sql and
sql/pg_buffercache_numa.sql just like below:
- SELECT NOT(pg_numa_available()) AS skip_test \gset
+ SELECT (pg_numa_available() is false OR
current_setting('huge_pages_status')::bool is false) as skip_test
\gset
(I'm making assumption that there are buildfarm animals that
huge_pages enabled, no idea how to check that)
-J.
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
@ 2026-01-06 15:36 ` Tomas Vondra <[email protected]>
2026-01-07 09:01 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2026-01-06 15:36 UTC (permalink / raw)
To: Jakub Wartak <[email protected]>; Christoph Berg <[email protected]>; +Cc: [email protected]
On 1/6/26 14:23, Jakub Wartak wrote:
> On Mon, Jan 5, 2026 at 11:30 PM Christoph Berg <[email protected]> wrote:
>>
>> Re: Tomas Vondra
>>> I guess the only solution is to accept -2 as a possible value (unknown
>>> node). But that makes regression testing harder, because it means the
>>> output could change a lot ...
>
> Hi Tomas! That's pretty wild, nice find about that swapping s_b thing!
> So just to confirm, that was reproduced outside containers/docker,
> right?
>
Yes, this is a regular bare-metal Debian system.
>> Or just not test that, or do something like
>>
>> select numa_node = -2 or numa_node between 0 and 1000 from pg_shmem_allocations_numa;
>
> Well, with the huge-pages it should be not swappable, so another idea
> would be simply alter first line of src/test/regress/sql/numa.sql and
> sql/pg_buffercache_numa.sql just like below:
> - SELECT NOT(pg_numa_available()) AS skip_test \gset
> + SELECT (pg_numa_available() is false OR
> current_setting('huge_pages_status')::bool is false) as skip_test
> \gset
>
> (I'm making assumption that there are buildfarm animals that
> huge_pages enabled, no idea how to check that)
>
Yes, using huge pages makes this go away.
I'm also even more sure it's about swap, because /proc/PID/smaps for
postmaster tracks how much of the mapping is in swap, and with regular
memory pages I get values like this for the main shmem segment:
Swap: 90508 kB
Swap: 275272 kB
Swap: 135020 kB
Swap: 116460 kB
Swap: 102388 kB
Swap: 93832 kB
Swap: 155616 kB
Swap: 165692 kB
These are just values from "grep" while the pgbench is running. The
instance has 16GB shared buffers, so 200MB is close to 1%. Not a huge
part, but still ...
I've always "known" shared buffers could be swapped out, but I've never
realized it would affect cases like this one.
I'm not a huge fan of fixing just the tests. Sure, the tests will pass,
but what's the point of that if you then can't run this on production
because it also fails (I mean, the pg_shmem_allocations_numa will fail)?
I think it's clear we need to tweak this to handle -2 status. And then
also adjust tests to accept non-deterministic results.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2026-01-07 09:01 ` Jakub Wartak <[email protected]>
2026-01-16 21:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 1 reply; 83+ messages in thread
From: Jakub Wartak @ 2026-01-07 09:01 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; [email protected]
Hi Tomas,
On Tue, Jan 6, 2026 at 4:36 PM Tomas Vondra <[email protected]> wrote:
[..]
> I've always "known" shared buffers could be swapped out, but I've never realized it would affect cases like this one.
Same, I'm a little surprised by it, but it makes sense. In my old and
more recent tests I've always reasoned the following way: NUMA (2+
sockets) --> probably a big production system --> huge_pages literally
always enabled to avoid a variety of surprises (locks the region).
Also this kind of reminds me of our previous past discussion about
dividing shm allocations into smaller requests (potentially 4kB shm
regions that are not huge_pages, so in theory swappable) [1].
> I'm not a huge fan of fixing just the tests. Sure, the tests will pass,
> but what's the point of that if you then can't run this on production
> because it also fails (I mean, the pg_shmem_allocations_numa will fail)?
Well, You are probably right.
> I think it's clear we need to tweak this to handle -2 status. And then
> also adjust tests to accept non-deterministic results.
The only question remains is, if we want to expose it to the user or
not? We could
a) silently ignore ENOENT in the back branches so that "size" won't
contain it (well just change pg_get_shmem_allocations_numa()). It is
not part of any NUMA node anyway. Well, maybe we could emit DEBUG1 or
source code comment about such a fact that we think it may be swapped
out.
b) no sure is it a good idea, but in master we could expose it as a
new column "swapped_out_size" (or change the current datatype of
"numa" column from ::integer to something like ::text to allow
outputting numa_node as integer, but also putting node="swapped-out"
too with proper size). Sounds like a new minor feature that would be
able to tell the user that he has swapped out shm, and needs to really
enable huge pages (?)
-J.
[1] - https://www.postgresql.org/message-id/jqg6jd32sw4s6gjkezauer372xrww7xnupvrcsqkegh2uhv6vg%40ppiwoigzz...
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-07 09:01 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
@ 2026-01-16 21:29 ` Tomas Vondra <[email protected]>
2026-01-17 00:31 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-19 11:47 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-26 23:32 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
0 siblings, 3 replies; 83+ messages in thread
From: Tomas Vondra @ 2026-01-16 21:29 UTC (permalink / raw)
To: Jakub Wartak <[email protected]>; +Cc: Christoph Berg <[email protected]>; [email protected]
Hi,
Here's WIP fix for the root cause, i.e. handling status -2 in the two
views querying NUMA node for memory pages:
* pg_shmem_allocations_numa
* pg_buffercache_numa
We can't prevent -2 from happening - the kernel can move arbitrary pages
to swap, we have no control over it. So I think we need to handle -2 as
"unknown" node, instead of failing. The patch simply returns NULL
instead of the node, but in principle we might return some other value
(but IMHO we should not return the raw status, the -2 makes no sense in
our context, it's some internal kernel errno).
The pg_buffercache_numa was not failing, it just returned the -2 status
verbatim. But I modified it to return NULL, for consistency.
AFAIK this will fix the regression tests too - they only check COUNT(*),
not the actual values.
I'm not sure if we need to mention this in the docs. It probably should
mention the column can be NULL, which means "unknown node".
regards
--
Tomas Vondra
Attachments:
[text/x-patch] 0001-Handle-ENOENT-status-when-querying-NUMA-node.patch (4.8K, 2-0001-Handle-ENOENT-status-when-querying-NUMA-node.patch)
download | inline diff:
From 8ea0d82a1c72f1fcbf834cfa5a7913fce0778ac8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Fri, 16 Jan 2026 21:55:02 +0100
Subject: [PATCH] Handle ENOENT status when querying NUMA node
We've assumed that touching the memory is sufficient for a page to be
located on one of the NUMA nodes. But that's not quite true, because
a page may be moved to swap after we touch it.
It's not hard to make that happen with commands like CREATE INDEX (which
uses only a small circular buffer in shared buffers, while loading large
amounts of data into page cache). This memory pressure may force a
significant fraction of shared buffers to swap.
We touch the memory before querying the status, but there is no
guarangee it won't be moved to swap in between. We do the touching only
during the first call, so later calls are more likely to be affected.
This only happens with regular memory pages (e.g. 4K). Hugepages cannot
be swapped out under memory pressure.
We can't prevent this - it's up to the kernel to move pages to swap.
Therefore, we have to accept ENOENT (-2) status as a valid result, and
handle it without failing. This patch simply treats -2 as unknown node,
and returns NULL in the two affected views (pg_shmem_allocations_numa
and pg_buffercache_numa).
Reported by Christoph Berg, investigation and fix by me. Backpatch to
18, where the two views were introduced.
Reported-by: Christoph Berg <[email protected]>
Discussion: 18
Backpatch-through: https://postgr.es/m/[email protected]
---
contrib/pg_buffercache/pg_buffercache_pages.c | 12 +++++--
src/backend/storage/ipc/shmem.c | 32 +++++++++++++++----
2 files changed, 35 insertions(+), 9 deletions(-)
diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index dcba3fb5473..9ff0eb4b0a0 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -551,8 +551,16 @@ pg_buffercache_os_pages_internal(FunctionCallInfo fcinfo, bool include_numa)
if (fctx->include_numa)
{
- values[2] = Int32GetDatum(fctx->record[i].numa_node);
- nulls[2] = false;
+ /* status is valid node number */
+ if (fctx->record[i].numa_node >= 0)
+ {
+ values[2] = Int32GetDatum(fctx->record[i].numa_node);
+ nulls[2] = false;
+ } else {
+ /* some kind of error (e.g. pages moved to swap) */
+ values[2] = (Datum) 0;
+ nulls[2] = true;
+ }
}
else
{
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index d2f4710f141..1b536363152 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -599,7 +599,7 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
InitMaterializedSRF(fcinfo, 0);
max_nodes = pg_numa_get_max_node();
- nodes = palloc_array(Size, max_nodes + 1);
+ nodes = palloc_array(Size, max_nodes + 2);
/*
* Shared memory allocations can vary in size and may not align with OS
@@ -635,7 +635,6 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
hash_seq_init(&hstat, ShmemIndex);
/* output all allocated entries */
- memset(nulls, 0, sizeof(nulls));
while ((ent = (ShmemIndexEnt *) hash_seq_search(&hstat)) != NULL)
{
int i;
@@ -684,22 +683,33 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
elog(ERROR, "failed NUMA pages inquiry status: %m");
/* Count number of NUMA nodes used for this shared memory entry */
- memset(nodes, 0, sizeof(Size) * (max_nodes + 1));
+ memset(nodes, 0, sizeof(Size) * (max_nodes + 2));
for (i = 0; i < shm_ent_page_count; i++)
{
int s = pages_status[i];
/* Ensure we are adding only valid index to the array */
- if (s < 0 || s > max_nodes)
+ if (s >= 0 && s <= max_nodes)
+ {
+ /* valid NUMA node */
+ nodes[s]++;
+ continue;
+ }
+ else if (s == -2)
{
- elog(ERROR, "invalid NUMA node id outside of allowed range "
- "[0, " UINT64_FORMAT "]: %d", max_nodes, s);
+ /* -2 means ENOENT (e.g. page was moved to swap) */
+ nodes[max_nodes + 1]++;
+ continue;
}
- nodes[s]++;
+ elog(ERROR, "invalid NUMA node id outside of allowed range "
+ "[0, " UINT64_FORMAT "]: %d", max_nodes, s);
}
+ /* no NULLs for regular nodes */
+ memset(nulls, 0, sizeof(nulls));
+
/*
* Add one entry for each NUMA node, including those without allocated
* memory for this segment.
@@ -713,6 +723,14 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
values, nulls);
}
+
+ /* The last entry is used for pages without a NUMA node. */
+ nulls[1] = true;
+ values[0] = CStringGetTextDatum(ent->key);
+ values[2] = Int64GetDatum(nodes[max_nodes + 1] * os_page_size);
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
}
LWLockRelease(ShmemIndexLock);
--
2.52.0
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-07 09:01 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-16 21:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2026-01-17 00:31 ` Christoph Berg <[email protected]>
2 siblings, 0 replies; 83+ messages in thread
From: Christoph Berg @ 2026-01-17 00:31 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: Tomas Vondra
> Here's WIP fix for the root cause, i.e. handling status -2 in the two
> views querying NUMA node for memory pages:
Thanks!
> I'm not sure if we need to mention this in the docs. It probably should
> mention the column can be NULL, which means "unknown node".
We could simply say
The returned value can be NULL if the NUMA node cannot be
determined, e.g. when the page has been swapped out.
> Subject: [PATCH] Handle ENOENT status when querying NUMA node
...
> We touch the memory before querying the status, but there is no
> guarangee it won't be moved to swap in between. We do the touching only
^ guarantee
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-07 09:01 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-16 21:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2026-01-19 11:47 ` Jakub Wartak <[email protected]>
2 siblings, 0 replies; 83+ messages in thread
From: Jakub Wartak @ 2026-01-19 11:47 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Christoph Berg <[email protected]>; [email protected]
On Fri, Jan 16, 2026 at 10:29 PM Tomas Vondra <[email protected]> wrote:
>
> Hi,
>
> Here's WIP fix for the root cause, i.e. handling status -2 in the two
> views querying NUMA node for memory pages:
>
> * pg_shmem_allocations_numa
> * pg_buffercache_numa
>
> We can't prevent -2 from happening - the kernel can move arbitrary pages
> to swap, we have no control over it. So I think we need to handle -2 as
> "unknown" node, instead of failing. The patch simply returns NULL
> instead of the node, but in principle we might return some other value
> (but IMHO we should not return the raw status, the -2 makes no sense in
> our context, it's some internal kernel errno).
>
> The pg_buffercache_numa was not failing, it just returned the -2 status
> verbatim. But I modified it to return NULL, for consistency.
>
> AFAIK this will fix the regression tests too - they only check COUNT(*),
> not the actual values.
>
> I'm not sure if we need to mention this in the docs. It probably should
> mention the column can be NULL, which means "unknown node".
Right, OK, so I've reproduced this without patch (as You have stated, just cause
shared_buffers to swap out, in my case it was simple stress-ng -m 16 --vm-bytes
SOME_HIGH_VALUE).
It gets ERROR pretty fast: select numa_node, sum(size) from
pg_shmem_allocations_numa group by numa_node;
numa_node | sum
-----------+-------------
0 | 24062603264
(1 row)
and then after pretty soon:
ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
but with patch it (which by the way looks good to me), it does not,
instead I get:
numa_node | sum
-----------+-------------
| 10821046272
0 | 13241556992
-J.
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-07 09:01 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-16 21:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2026-01-26 23:32 ` Tomas Vondra <[email protected]>
2026-01-27 06:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Bertrand Drouvot <[email protected]>
2 siblings, 1 reply; 83+ messages in thread
From: Tomas Vondra @ 2026-01-26 23:32 UTC (permalink / raw)
To: Jakub Wartak <[email protected]>; +Cc: Christoph Berg <[email protected]>; [email protected]
On 1/16/26 22:29, Tomas Vondra wrote:
> Hi,
>
> Here's WIP fix for the root cause, i.e. handling status -2 in the two
> views querying NUMA node for memory pages:
>
> * pg_shmem_allocations_numa
> * pg_buffercache_numa
>
> We can't prevent -2 from happening - the kernel can move arbitrary pages
> to swap, we have no control over it. So I think we need to handle -2 as
> "unknown" node, instead of failing. The patch simply returns NULL
> instead of the node, but in principle we might return some other value
> (but IMHO we should not return the raw status, the -2 makes no sense in
> our context, it's some internal kernel errno).
>
> The pg_buffercache_numa was not failing, it just returned the -2 status
> verbatim. But I modified it to return NULL, for consistency.
>
> AFAIK this will fix the regression tests too - they only check COUNT(*),
> not the actual values.
>
> I'm not sure if we need to mention this in the docs. It probably should
> mention the column can be NULL, which means "unknown node".
>
Pushed and backpatched to 18. Hopefully that fixes this.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-07 09:01 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-16 21:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-26 23:32 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
@ 2026-01-27 06:36 ` Bertrand Drouvot <[email protected]>
0 siblings, 0 replies; 83+ messages in thread
From: Bertrand Drouvot @ 2026-01-27 06:36 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; Christoph Berg <[email protected]>; [email protected]
Hi,
On Tue, Jan 27, 2026 at 12:32:28AM +0100, Tomas Vondra wrote:
> On 1/16/26 22:29, Tomas Vondra wrote:
> > Hi,
> >
> > Here's WIP fix for the root cause, i.e. handling status -2 in the two
> > views querying NUMA node for memory pages:
> >
> > * pg_shmem_allocations_numa
> > * pg_buffercache_numa
> >
> > We can't prevent -2 from happening - the kernel can move arbitrary pages
> > to swap, we have no control over it. So I think we need to handle -2 as
> > "unknown" node, instead of failing. The patch simply returns NULL
> > instead of the node, but in principle we might return some other value
> > (but IMHO we should not return the raw status, the -2 makes no sense in
> > our context, it's some internal kernel errno).
> >
> > The pg_buffercache_numa was not failing, it just returned the -2 status
> > verbatim. But I modified it to return NULL, for consistency.
> >
> > AFAIK this will fix the regression tests too - they only check COUNT(*),
> > not the actual values.
> >
> > I'm not sure if we need to mention this in the docs. It probably should
> > mention the column can be NULL, which means "unknown node".
> >
>
> Pushed and backpatched to 18. Hopefully that fixes this.
Should 09c37015d49665c52ae7eabd5852af36851aede4 be added to .git-blame-ignore-revs?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-10-28 15:20 ` Christoph Berg <[email protected]>
1 sibling, 0 replies; 83+ messages in thread
From: Christoph Berg @ 2025-10-28 15:20 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Jakub Wartak <[email protected]>; [email protected]
Re: To Tomas Vondra
> So maybe PG should implement numa_available itself like that.
Following our discussion at pgconf.eu last week, I just implemented
that. The numa and pg_buffercache tests pass in Docker on Debian
bookworm now.
Christoph
^ permalink raw reply [nested|flat] 83+ messages in thread
* Re: failed NUMA pages inquiry status: Operation not permitted
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
@ 2025-10-16 15:09 ` Tomas Vondra <[email protected]>
1 sibling, 0 replies; 83+ messages in thread
From: Tomas Vondra @ 2025-10-16 15:09 UTC (permalink / raw)
To: Christoph Berg <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Jakub Wartak <[email protected]>; [email protected]
On 10/16/25 16:54, Christoph Berg wrote:
> Re: Tomas Vondra
>> It's probably more about the kernel version. What kernels are used by
>> these systems?
>
> It's the very same kernel, just different docker containers on the
> same system. I did not investigate yet where the problem is coming
> from, different libnuma versions seemed like the best bet.
>
> Same (differing) results on both these systems:
> Linux turing 6.16.7+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.16.7-1 (2025-09-11) x86_64 GNU/Linux
> Linux jenkins 6.1.0-39-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.148-1 (2025-08-26) x86_64 GNU/Linux
>
Hmmm. Those seem like relatively recent kernels.
>> Not sure how would that work. It seems this is some sort of permission
>> check in numa_move_pages, that's not what pg_numa_available does. Also,
>> it may depending on the page queried (e.g. whether it's exclusive or
>> shared by multiple processes).
>
> It's probably the lack of some process capability in that environment.
> Maybe there is a way to query that, but I don't know much about that
> yet.
>
move_page() manpage mentions PTRACE_MODE_READ_REALCREDS (man ptrace) so
maybe that's it.
--
Tomas Vondra
^ permalink raw reply [nested|flat] 83+ messages in thread
end of thread, other threads:[~2026-02-12 17:42 UTC | newest]
Thread overview: 83+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-04-07 21:18 pgsql: Introduce pg_shmem_allocations_numa view Tomas Vondra <[email protected]>
2025-06-12 21:16 ` Christoph Berg <[email protected]>
2025-06-23 14:42 ` Christoph Berg <[email protected]>
2025-06-23 14:48 ` Christoph Berg <[email protected]>
2025-06-23 15:14 ` Andres Freund <[email protected]>
2025-06-23 15:20 ` Christoph Berg <[email protected]>
2025-06-23 15:59 ` Christoph Berg <[email protected]>
2025-06-23 16:26 ` Andres Freund <[email protected]>
2025-06-23 19:57 ` Christoph Berg <[email protected]>
2025-06-23 20:10 ` Tomas Vondra <[email protected]>
2025-06-23 20:31 ` Christoph Berg <[email protected]>
2025-06-23 20:37 ` Tomas Vondra <[email protected]>
2025-06-23 20:51 ` Christoph Berg <[email protected]>
2025-06-23 21:14 ` Tomas Vondra <[email protected]>
2025-06-23 21:25 ` Christoph Berg <[email protected]>
2025-06-23 21:47 ` Tomas Vondra <[email protected]>
2025-06-24 01:43 ` Tomas Vondra <[email protected]>
2025-06-24 08:24 ` Bertrand Drouvot <[email protected]>
2025-06-24 09:20 ` Tomas Vondra <[email protected]>
2025-06-24 11:10 ` Bertrand Drouvot <[email protected]>
2025-06-24 12:33 ` Tomas Vondra <[email protected]>
2025-06-24 13:25 ` Bertrand Drouvot <[email protected]>
2025-06-24 14:41 ` Christoph Berg <[email protected]>
2025-06-24 15:04 ` Tomas Vondra <[email protected]>
2025-06-24 15:30 ` Christoph Berg <[email protected]>
2025-06-24 20:32 ` Tomas Vondra <[email protected]>
2025-06-25 06:45 ` Bertrand Drouvot <[email protected]>
2025-06-26 06:00 ` Bertrand Drouvot <[email protected]>
2025-06-26 08:53 ` Tomas Vondra <[email protected]>
2025-07-21 20:52 ` Christoph Berg <[email protected]>
2025-07-22 07:01 ` Bertrand Drouvot <[email protected]>
2025-06-25 06:11 ` Bertrand Drouvot <[email protected]>
2025-06-25 07:15 ` Jakub Wartak <[email protected]>
2025-06-25 09:31 ` Tomas Vondra <[email protected]>
2025-06-25 12:42 ` Álvaro Herrera <[email protected]>
2025-06-25 12:53 ` Tomas Vondra <[email protected]>
2025-06-25 06:05 ` Bertrand Drouvot <[email protected]>
2025-06-25 09:00 ` Christoph Berg <[email protected]>
2025-06-25 09:22 ` Tomas Vondra <[email protected]>
2025-06-26 05:28 ` Bertrand Drouvot <[email protected]>
2025-06-27 14:52 ` Tomas Vondra <[email protected]>
2025-06-27 17:33 ` Bertrand Drouvot <[email protected]>
2025-06-30 18:56 ` Tomas Vondra <[email protected]>
2025-07-01 04:06 ` Bertrand Drouvot <[email protected]>
2025-07-01 11:03 ` Tomas Vondra <[email protected]>
2025-09-11 11:36 ` Christoph Berg <[email protected]>
2025-09-11 11:39 ` Christoph Berg <[email protected]>
2026-02-12 14:21 ` Heikki Linnakangas <[email protected]>
2026-02-12 16:43 ` Álvaro Herrera <[email protected]>
2026-02-12 17:23 ` Bertrand Drouvot <[email protected]>
2026-02-12 17:42 ` Heikki Linnakangas <[email protected]>
2025-06-24 18:24 ` Christoph Berg <[email protected]>
2025-06-24 11:10 ` Andres Freund <[email protected]>
2025-06-24 12:42 ` Tomas Vondra <[email protected]>
2025-10-16 11:38 ` failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 14:27 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-10-16 14:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:06 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:08 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:19 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-28 15:14 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-11-14 12:52 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-11-20 12:53 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-11 12:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-11 12:46 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-13 17:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 13:16 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 14:48 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-16 15:17 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2025-12-16 17:54 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-12-17 11:07 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 21:35 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-05 22:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-06 13:23 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-06 15:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-07 09:01 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-16 21:29 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-17 00:31 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2026-01-19 11:47 ` Re: failed NUMA pages inquiry status: Operation not permitted Jakub Wartak <[email protected]>
2026-01-26 23:32 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
2026-01-27 06:36 ` Re: failed NUMA pages inquiry status: Operation not permitted Bertrand Drouvot <[email protected]>
2025-10-28 15:20 ` Re: failed NUMA pages inquiry status: Operation not permitted Christoph Berg <[email protected]>
2025-10-16 15:09 ` Re: failed NUMA pages inquiry status: Operation not permitted Tomas Vondra <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox