Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uU4qG-00FA3y-Fo for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 14:41:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uU4qC-00Chpe-W9 for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 14:41:41 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uU4qC-00ChpV-MK for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 14:41:41 +0000 Received: from mout-p-102.mailbox.org ([80.241.56.152]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uU4qA-003qAP-2r for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 14:41:40 +0000 Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4bRSM317JQz9sqs; Tue, 24 Jun 2025 16:41:35 +0200 (CEST) Date: Tue, 24 Jun 2025 16:41:33 +0200 From: Christoph Berg To: Bertrand Drouvot Cc: Tomas Vondra , Andres Freund , Tomas Vondra , pgsql-hackers@lists.postgresql.org Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view Message-ID: References: <0643ae61-cf9d-482c-9b2c-fb861b24fd22@vondra.me> <6342f601-77de-4ee0-8c2a-3deb50ceac5b@vondra.me> <8649a4e3-c60d-4f37-aa6f-e7e7c14c581e@vondra.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4bRSM317JQz9sqs List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Re: Bertrand Drouvot > Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used > in do_pages_move()). I was also reading the kernel source around that place but you spotted the problem before me. This patch resolves the issue here: diff --git a/mm/migrate.c b/mm/migrate.c index 8cf0f9c9599..542c81ec3ed 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, if (copy_to_user(status, chunk_status, chunk_nr * sizeof(*status))) break; - pages += chunk_nr; + if (in_compat_syscall()) { + compat_uptr_t __user *pages32 = (compat_uptr_t __user *)pages; + + pages32 += chunk_nr; + pages = (const void __user * __user *) pages32; + } else + pages += chunk_nr; status += chunk_nr; nr_pages -= chunk_nr; } > Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected > by the wrong pointer arithmetic. Good idea. Buggy kernels will be around for some time. > + #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/ > + > + for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) { Perhaps optimize it to this: #if sizeof(void *) == 4 #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/ #else #define NUMA_QUERY_CHUNK_SIZE 1024 #endif ... or some other bigger number. The loop could also include CHECK_FOR_INTERRUPTS(); > > I don't have much > > experience with the kernel code, so don't want to rely too much on my > > interpretation of it. > > I don't have that much experience too but I think the issue is in do_pages_stat() > and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) instead. Me neither, but I'll try submit this fix. Christoph