Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uU3fC-00EoE1-Eq for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 13:26:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uU3eC-00CFZe-6n for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 13:25:12 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uU3eB-00CFZW-TK for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 13:25:12 +0000 Received: from mail-wr1-x431.google.com ([2a00:1450:4864:20::431]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uU3eA-003jWi-2H for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 13:25:11 +0000 Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-3a51481a598so2743197f8f.3 for ; Tue, 24 Jun 2025 06:25:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750771509; x=1751376309; darn=lists.postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=H64IjeJV4++kTUcDh1PmXOU7TvWI08RgENvp5hfyYjg=; b=joDeIYnQM8rfu2FtDqqovqiUxBB5Ft5IoTD7I7P8Mk5wlidELx+L70lF8pc1gxR8mc dts8thkRBzvCbKzsgH/kpFv3rnnuyqosQLkSHN5fBNMVNV7ABp4AD2xdHVv/EL7nmnb/ AZMNvxU4rvQEm+NkM8HWBInBcoc6zMv8do0F0qjz6cREguBYVCE2t8LFdZaGXmTPLEGi KOPGQOwK/Zrj6sXG12XhNciG/ChrElM56tModG0mcBsnT1U9UwZ8miZYm1YdsqwhaysS ihQZrqNKUKE2outqMJKtmVlfFepp8L8LRG89sR/TfeKX5nJbvIablMXmz1eolmRg7EK1 Tqqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750771509; x=1751376309; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=H64IjeJV4++kTUcDh1PmXOU7TvWI08RgENvp5hfyYjg=; b=w099QBpnhqas4YfmhvZ5Mi0Ar2o+KHdCCHlT4FA8aSpUL5J2caCw9v1KhoTN0u0kYz ZNCnHgToOwdlRp/zxeTIuf0L9byNsCTe/9INQtWzaDLYBIU5UFXtMKrCaR8WGjOsr0p9 6sMyBaA5amQuu0sMxlpVNt5XG0jGN5oat64mmvWREEeB5JF/SYO4vobL+o9vlcxcB3Jx qT2RiambK32s3nEkUX+zC1YGba0SHl60wqB8ZxMCnL9Kt8trL+D1Jhlm2ZX6+wk8XUET us7+tRtw4CiJFwz8afe2vpNEH7BVq6v7mRCVcYjEbTUJzPtJjHHjLwIVlPTiOectTcv/ jr+Q== X-Forwarded-Encrypted: i=1; AJvYcCWLZs7PF+B3/HUjTXr4EEXfakcP/mGhxz9Rfs+tfw0lEz/QCdpBWttEUgCwH+9FGP1I+a0hKyRYomxovt2f@lists.postgresql.org X-Gm-Message-State: AOJu0YwQaQgFGTlPi83RxwYauz+yULWhrT4No5cv9ayvV79/hA8v6PkM GT9AqmkXYSWwcLU+MnNlufMkG1Wqjd6SWre8rQNnTPIUjFwa5P87UY/d X-Gm-Gg: ASbGncuEftIqtv9vd/ja3w/mpQ+JVAj2uG/Q+p1y71n/xjSIk2whVDsgPCvQ+m2IDL4 7pHSOFiqpS2/TAYmz1ZhOXSXQ2ejqm3P97BLsM9nF9BTvMGq+LRvfvrBinY/jihcxzPmguIfiF+ ZRPUzunIumg0sV3CIUlb+NPhsUjhNQSf5YlHLf+RCDC+htBjfSxOmZPJE8BLjuctdqfk02TOt/x bLWScscJIQxMY1xNo6YZLe17acIJJPZ14m+Y8txTq0FVsYGZN3BUIWiKmDFqhfSXUIxKPPhos4Z +uwYftwdKXSaUWo/nii/2yrZHRyz45EFBhgUjOHp4bjT8mbj27CaLTE0SijAgBLAQxkj03tJd+P LTa/5aSmL8sGRCKbcePn0DnIIlDKlkguDeMHZ31VCBUGmZ0nUkMbL9KZDUUDWRIjKtkv6hCTAst aDbC73dM9NZG5XUz4pdjloubSUypG6 X-Google-Smtp-Source: AGHT+IHdN4rEW7pOH0XYvDpnZNK4w0L0EuNq+AsCIx+IuyCbT5B9U5N4eZBMomyLIpC7qsrRwVDq8Q== X-Received: by 2002:a05:6000:1a8e:b0:3a4:f6ba:51da with SMTP id ffacd0b85a97d-3a6d12db6d0mr14550241f8f.15.1750771508735; Tue, 24 Jun 2025 06:25:08 -0700 (PDT) Received: from ip-10-97-1-34.eu-west-3.compute.internal (ec2-15-237-181-182.eu-west-3.compute.amazonaws.com. [15.237.181.182]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a6e805cda7sm1968902f8f.26.2025.06.24.06.25.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jun 2025 06:25:08 -0700 (PDT) Date: Tue, 24 Jun 2025 13:25:07 +0000 From: Bertrand Drouvot To: Tomas Vondra Cc: Christoph Berg , Andres Freund , Tomas Vondra , pgsql-hackers@lists.postgresql.org Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view Message-ID: References: <0643ae61-cf9d-482c-9b2c-fb861b24fd22@vondra.me> <6342f601-77de-4ee0-8c2a-3deb50ceac5b@vondra.me> <8649a4e3-c60d-4f37-aa6f-e7e7c14c581e@vondra.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8649a4e3-c60d-4f37-aa6f-e7e7c14c581e@vondra.me> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On Tue, Jun 24, 2025 at 02:33:59PM +0200, Tomas Vondra wrote: > > > On 6/24/25 13:10, Bertrand Drouvot wrote: > > So, if we look at do_pages_stat() ([1]), we can see that it uses an hardcoded > > "#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic: > > > > " > > pages += chunk_nr; > > status += chunk_nr; > > " > > > > is done but has no effect since nr_pages will exit the loop if we use a batch > > size <= 16. > > > > So if this pointer arithmetic is not correct, (it seems that it should advance > > by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the batch > > size is <= 16. > > > > Does test_chunk_size also fails at 17 for you? > > Yes, it fails for me at 17 too. So you're saying the access within each > chunk of 16 elements is OK, but that maybe advancing to the next chunk > is not quite right? Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's used in do_pages_move()). Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected by the wrong pointer arithmetic. > In which case limiting the access to 16 entries > might be a workaround. Yes, something like: diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index c9ae3b45b76..070ad2f13e7 100644 --- a/src/backend/storage/ipc/shmem.c +++ b/src/backend/storage/ipc/shmem.c @@ -689,8 +689,17 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS) CHECK_FOR_INTERRUPTS(); } - if (pg_numa_query_pages(0, shm_ent_page_count, page_ptrs, pages_status) == -1) - elog(ERROR, "failed NUMA pages inquiry status: %m"); + #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/ + + for (uint64 chunk_start = 0; chunk_start < shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) { + uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, shm_ent_page_count - chunk_start); + + if (pg_numa_query_pages(0, chunk_size, &page_ptrs[chunk_start], + &pages_status[chunk_start]) == -1) + elog(ERROR, "failed NUMA pages inquiry status: %m"); + } + + #undef NUMA_QUERY_CHUNK_SIZE > In any case, this sounds like a kernel bug, right? yes it sounds like a kernel bug. > I don't have much > experience with the kernel code, so don't want to rely too much on my > interpretation of it. I don't have that much experience too but I think the issue is in do_pages_stat() and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) instead. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com