Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uToUy-00AJeu-TX for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Jun 2025 21:14:41 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uToUv-006mGt-I2 for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Jun 2025 21:14:38 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uToUv-006mGl-3R for pgsql-hackers@lists.postgresql.org; Mon, 23 Jun 2025 21:14:37 +0000 Received: from relay1-d.mail.gandi.net ([2001:4b98:dc4:8::221]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uToUs-003hpZ-1s for pgsql-hackers@lists.postgresql.org; Mon, 23 Jun 2025 21:14:37 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id DA85244362; Mon, 23 Jun 2025 21:14:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1750713270; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gIpKz12Bvr6TUfn83tzRG1meYNJRjGJjlD4oykOiGts=; b=KK4X5RLftIdaWm+MziykboY7j/B13kNGnZ0WE9rny9gRabUHndL2xoLNixCinPJelFS8U+ +oAOifeQryn/zfa9iqB6pPKlwxJFFifk3tXw/cv/lb7ZgS/2+55Iowr+cqkoFOulmhZujH GifkwjsOpLvaIYJLHAv5h+WzJiUZulXrokdZ9oLUt3nhnJyE6c+qz+Qbw/CPdujfZICZC7 loaQAXMqoi0LBdmXLr8m4Vzld3Q5EkYm7hzVuIBlxWK8X8Ox16rwYToJW0I5deeHRKyZqv zV9pR7NzQbV4q7Z1gV7U6lU0ZFfRESPVyxKNQPaKGOERAQVzaxeuOxVTwPr6Rw== Message-ID: <0643ae61-cf9d-482c-9b2c-fb861b24fd22@vondra.me> Date: Mon, 23 Jun 2025 23:14:28 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view To: Christoph Berg Cc: Andres Freund , Tomas Vondra , pgsql-hackers@lists.postgresql.org References: <6c9f9f7e-947b-4fc3-bdb6-b0696d7492e5@vondra.me> Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-GND-State: clean X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtddvgddukedtjecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpefvohhmrghsucggohhnughrrgcuoehtohhmrghssehvohhnughrrgdrmhgvqeenucggtffrrghtthgvrhhnpeeludegieekgfelhffgffeuvdelteetveeghfdvieekfeduudduvdfhvedufefhveenucfkphepkeeirdegledrvdeftddrvddtieenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeekiedrgeelrddvfedtrddvtdeipdhhvghloheplgdutddrudefjedrtddrvdgnpdhmrghilhhfrhhomhepthhomhgrshesvhhonhgurhgrrdhmvgdpnhgspghrtghpthhtohepgedprhgtphhtthhopehmhihonhesuggvsghirghnrdhorhhgpdhrtghpthhtoheprghnughrvghssegrnhgrrhgriigvlhdruggvpdhrtghpthhtohepthhomhgrshdrvhhonhgurhgrsehpohhsthhgrhgvshhqlhdrohhrghdprhgtphhtthhopehpghhsqhhlqdhhrggtkhgvrhhssehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrgh X-GND-Sasl: tomas@vondra.me List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 6/23/25 22:51, Christoph Berg wrote: > Re: Tomas Vondra >> Didn't you say the first ~35 addresses succeed, right? What about the >> addresses after that? > > That was pg_shmem_allocations_numa. The pg_numa_query_pages() in there > works (does not return -1), but then some of the status[] values are > -14. > > When pg_buffercache_numa fails, pg_numa_query_pages() itself > returns -14. > > The printed os_page_ptrs[] contents are the same for the failing and > non-failing calls, so the problem is probably elsewhere. > > /* Fill pointers for all the memory pages. */ > idx = 0; > for (char *ptr = startptr; ptr < endptr; ptr += os_page_size) > { > + if (idx < 50) > + elog(DEBUG1, "os_page_ptrs idx %d = %p", idx, ptr); > os_page_ptrs[idx++] = ptr; > > > 20:47 myon@postgres =# select * from pg_buffercache_numa; > DEBUG: 00000: os_page_ptrs idx 0 = 0xebc44000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 1 = 0xebc45000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 2 = 0xebc46000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 3 = 0xebc47000 ... > DEBUG: 00000: os_page_ptrs idx 47 = 0xebc73000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 48 = 0xebc74000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 49 = 0xebc75000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:385 > 2025-06-23 20:47:41.827 UTC [1368080] ERROR: failed NUMA pages inquiry: Bad address > 2025-06-23 20:47:41.827 UTC [1368080] STATEMENT: select * from pg_buffercache_numa; > ERROR: XX000: failed NUMA pages inquiry: Bad address > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:396 > Time: 92.757 ms > > 20:47 myon@postgres =# select * from pg_buffercache_numa; > DEBUG: 00000: os_page_ptrs idx 0 = 0xebc44000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 1 = 0xebc45000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 2 = 0xebc46000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 3 = 0xebc47000 ...> DEBUG: 00000: os_page_ptrs idx 46 = 0xebc72000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 47 = 0xebc73000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 48 = 0xebc74000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: os_page_ptrs idx 49 = 0xebc75000 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:375 > DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096 > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:385 > DEBUG: 00000: NUMA: page-faulting the buffercache for proper NUMA readouts > LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:444 > Time: 24.547 ms > 20:47 myon@postgres =# > True. If it fails on first call, but succeeds on the other, then the problem is likely somewhere else. But also on the second call we won't do the memory touching. Can you try setting firstNumaTouch=false, so that we do this on every call? At the beginning you mentioned this is happening on i386, armel and armhf - are all those in qemu? I've tried on my rpi5 (with 32-bit user space), and there everything seems to work fine. But that's aarch64 kernel, just the user space if 32-bit. regards -- Tomas Vondra