Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uTnIv-00A0SK-3i for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Jun 2025 19:58:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uTnIt-006Cwr-2f for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Jun 2025 19:58:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uTnIs-006Cwj-LQ for pgsql-hackers@lists.postgresql.org; Mon, 23 Jun 2025 19:58:07 +0000 Received: from mout-p-201.mailbox.org ([80.241.56.171]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uTnIq-003bUN-0O for pgsql-hackers@lists.postgresql.org; Mon, 23 Jun 2025 19:58:05 +0000 Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4bQzQY4Dn4z9shm; Mon, 23 Jun 2025 21:57:57 +0200 (CEST) Date: Mon, 23 Jun 2025 21:57:56 +0200 From: Christoph Berg To: Andres Freund Cc: Tomas Vondra , pgsql-hackers@lists.postgresql.org Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 4bQzQY4Dn4z9shm List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Re: Andres Freund > How confident are we that this isn't actually because we passed a bogus > address to the kernel or such? With this patch, are *any* pages recognized as > valid on the machines that triggered the error? See upthread - the first 35 pages were ok, then a lot of -14. > I wonder if we ought to report the failures as a separate "numa node" > (e.g. NULL as node id) instead ... Did that now, using N+1 (== 1 here) for errors in this Debian i386 environment (chroot on an amd64 host): select * from pg_shmem_allocations_numa \crosstabview name │ 0 │ 1 ────────────────────────────────────────────────┼──────────┼────────── multixact_offset │ 69632 │ 65536 subtransaction │ 139264 │ 131072 notify │ 139264 │ 0 Shared Memory Stats │ 188416 │ 131072 serializable │ 188416 │ 86016 PROCLOCK hash │ 4096 │ 0 FinishedSerializableTransactions │ 4096 │ 0 XLOG Ctl │ 2117632 │ 2097152 Shared MultiXact State │ 4096 │ 0 Proc Header │ 4096 │ 0 Archiver Data │ 4096 │ 0 .... more 0s in the last column ... AioHandleData │ 1429504 │ 0 Buffer Blocks │ 67117056 │ 67108864 Buffer IO Condition Variables │ 266240 │ 0 Proc Array │ 4096 │ 0 .... more 0s (73 rows) There is something fishy with pg_buffercache. If I restart PG, I'm getting "Bad address" (errno 14), this time as return value of move_pages(). postgres =# select * from pg_buffercache_numa; DEBUG: 00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096 LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:383 2025-06-23 19:41:41.315 UTC [1331894] ERROR: failed NUMA pages inquiry: Bad address 2025-06-23 19:41:41.315 UTC [1331894] STATEMENT: select * from pg_buffercache_numa; ERROR: XX000: failed NUMA pages inquiry: Bad address LOCATION: pg_buffercache_numa_pages, pg_buffercache_pages.c:394 Repeated calls are fine. Maybe NUMA is just not supported on 32-bit archs, but I'd rather be sure about that before play that card. Christoph