public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tomas Vondra <[email protected]>
To: Christoph Berg <[email protected]>
To: Bertrand Drouvot <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: [email protected]
Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view
Date: Tue, 24 Jun 2025 17:04:00 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<aFqHoNXQ/[email protected]>
	<[email protected]>
	<aFqnM/[email protected]>
	<[email protected]>

On 6/24/25 16:41, Christoph Berg wrote:
> Re: Bertrand Drouvot
>> Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used
>> in do_pages_move()).
> 
> I was also reading the kernel source around that place but you spotted
> the problem before me. This patch resolves the issue here:
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8cf0f9c9599..542c81ec3ed 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
>  		if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status)))
>  			break;
> 
> -		pages += chunk_nr;
> +		if (in_compat_syscall()) {
> +			compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages;
> +
> +			pages32 += chunk_nr;
> +			pages = (const void __user * __user *) pages32;
> +		} else
> +			pages += chunk_nr;
>  		status += chunk_nr;
>  		nr_pages -= chunk_nr;
>  	}
> 
> 
>> Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
>> by the wrong pointer arithmetic.
> 
> Good idea. Buggy kernels will be around for some time.
> 

If it's a reliable fix, then I guess we can do it like this. But won't
that be a performance penalty on everyone? Or does the system split the
array into 16-element chunks anyway, so this makes no difference?

Anyway, maybe we should start by reporting this to the kernel people. Do
you want me to do that, or shall one of you take care of that? I suppose
that'd be better, as you already wrote a fix / know the code better.

>> +               #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
>> +
>> +               for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
> 
> Perhaps optimize it to this:
> 
> #if sizeof(void *) == 4
> #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> #else
> #define NUMA_QUERY_CHUNK_SIZE 1024
> #endif
> 
> ... or some other bigger number.
> 

Hmm, maybe. I guess that'd hurt only fully 32-bit systems, but that also
seems like a non-issue.

> The loop could also include CHECK_FOR_INTERRUPTS();
> 
>>> I don't have much
>>> experience with the kernel code, so don't want to rely too much on my
>>> interpretation of it.
>>
>> I don't have that much experience too but I think the issue is in do_pages_stat()
>> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) instead.
> 
> Me neither, but I'll try submit this fix.
> 

+1

Thanks to both of you for the report and the investigation.


regards

-- 
Tomas Vondra






view thread (83+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox