Re: pgsql: Introduce pg_shmem_allocations_numa view

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tomas Vondra <[email protected]>
To: Christoph Berg <[email protected]>
Cc: Bertrand Drouvot <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: [email protected]
Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view
Date: Tue, 24 Jun 2025 22:32:25 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<aFqHoNXQ/[email protected]>
	<[email protected]>
	<aFqnM/[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On 6/24/25 17:30, Christoph Berg wrote:
> Re: Tomas Vondra
>> If it's a reliable fix, then I guess we can do it like this. But won't
>> that be a performance penalty on everyone? Or does the system split the
>> array into 16-element chunks anyway, so this makes no difference?
> 
> There's still the overhead of the syscall itself. But no idea how
> costly it is to have this 16-step loop in user or kernel space.
> 
> We could claim that on 32-bit systems, shared_buffers would be smaller
> anyway, so there the overhead isn't that big. And the step size should
> be larger (if at all) on 64-bit.
> 
>> Anyway, maybe we should start by reporting this to the kernel people. Do
>> you want me to do that, or shall one of you take care of that? I suppose
>> that'd be better, as you already wrote a fix / know the code better.
> 
> Submitted: https://marc.info/?l=linux-mm&m=175077821909222&w=2
> 

Thanks! Now we wait ...

Attached is a minor tweak of the valgrind suppresion rules, to add the
two places touching the memory. I was hoping I could add a single rule
for pg_numa_touch_mem_if_required, but that does not work - it's a
macro, not a function. So I had to add one rule for both functions,
querying the NUMA. That's a bit disappointing, because it means it'll
hide all other failues (of Memcheck:Addr8 type) in those functions.

Perhaps it'd be be better to turn pg_numa_touch_mem_if_required into a
proper (inlined) function, at least with USE_VALGRIND defined. Something
like the v2 patch - needs more testing to ensure the inlined function
doesn't break the touching or something silly like that.

regards

-- 
Tomas Vondra


Attachments:

  [text/x-patch] fix-valgrind-for-numa.patch (831B, 2-fix-valgrind-for-numa.patch)
  download | inline diff:
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..36bf3253f76 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,22 @@
    Memcheck:Cond
    fun:PyObject_Realloc
 }
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for both places querying the NUMA status.
+{
+   pg_buffercache_numa_pages
+   Memcheck:Addr8
+   fun:pg_buffercache_numa_pages
+   fun:ExecMakeTableFunctionResult
+}
+
+{
+   pg_get_shmem_allocations_numa
+   Memcheck:Addr8
+   fun:pg_get_shmem_allocations_numa
+   fun:ExecMakeTableFunctionResult
+}


  [text/x-patch] fix-valgrind-for-numa-v2.patch (1.4K, 3-fix-valgrind-for-numa-v2.patch)
  download | inline diff:
diff --git a/src/include/port/pg_numa.h b/src/include/port/pg_numa.h
index 40f1d324dcf..3b9a5b42898 100644
--- a/src/include/port/pg_numa.h
+++ b/src/include/port/pg_numa.h
@@ -24,9 +24,22 @@ extern PGDLLIMPORT int pg_numa_get_max_node(void);
  * This is required on Linux, before pg_numa_query_pages() as we
  * need to page-fault before move_pages(2) syscall returns valid results.
  */
+#ifdef USE_VALGRIND
+
+static inline void
+pg_numa_touch_mem_if_required(uint64 tmp, char *ptr)
+{
+	volatile uint64 ro_volatile_var pg_attribute_unused();
+	ro_volatile_var = *(volatile uint64 *) ptr;
+}
+
+#else
+
 #define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
 	ro_volatile_var = *(volatile uint64 *) ptr
 
+#endif
+
 #else
 
 #define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index 7ea464c8094..6b9a8998f82 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -180,3 +180,14 @@
    Memcheck:Cond
    fun:PyObject_Realloc
 }
+
+# Querying NUMA node for shared memory requires touching the memory so
+# that it gets allocated in the process. But we'll touch memory backing
+# buffers, but that memory may be marked as noaccess for buffers that
+# are not pinned. So just ignore that, we're not really accessing the
+# buffers, for all places querying the NUMA status.
+{
+   pg_numa_touch_mem_if_required
+   Memcheck:Addr8
+   fun:pg_numa_touch_mem_if_required
+}

view thread (83+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox