Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uTsh9-00BPzn-FL for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 01:43:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uTsh7-0083QC-4S for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 01:43:29 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uTsh6-0083Q3-N1 for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 01:43:29 +0000 Received: from relay8-d.mail.gandi.net ([2001:4b98:dc4:8::228]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1uTsh5-003dw5-1e for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 01:43:28 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id CD5DB43B08; Tue, 24 Jun 2025 01:43:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1750729401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q1tRH6u6a79COWYPBjotqzOzovHbuBTgIC2Ckf8jFNg=; b=nybwHGPS7Z9mPNRL2W5bVxS7IL+1hCMsbdZEGXxTQdkaM2tHzXXF15ypAh8Zvbr7rFAcky cbo6BQwxcb0vaZ308Vhs5QZ/KyVTQrIAD26zKc2RAc9u+eEj2NKqv5xZgERETrXtdjwe6M kILg1XCbIiaXTLudjvlVPlySbZvmqhzoRl8VoABIix+ClEm7Oma2xf13Z3ewcGpO40N2fe NSTJZOk8w+/dhhvhLummpoV31Wj80cRhUU2hDxq3QL+1oaPovTUGRH9XgTRVqtLVm6UqFT GVwOh2jLdew1Ab/WUYH4qCj5iQRmZR0QX7NqqfrFHklammqPPp8qcYBlfHuPFA== Message-ID: <6342f601-77de-4ee0-8c2a-3deb50ceac5b@vondra.me> Date: Tue, 24 Jun 2025 03:43:19 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view From: Tomas Vondra To: Christoph Berg Cc: Andres Freund , Tomas Vondra , pgsql-hackers@lists.postgresql.org References: <6c9f9f7e-947b-4fc3-bdb6-b0696d7492e5@vondra.me> <0643ae61-cf9d-482c-9b2c-fb861b24fd22@vondra.me> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-GND-State: clean X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtddvgddukeeiudcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepkfffgggfuffhvfevfhgjtgfgsehtjeertddtvdejnecuhfhrohhmpefvohhmrghsucggohhnughrrgcuoehtohhmrghssehvohhnughrrgdrmhgvqeenucggtffrrghtthgvrhhnpefhgefgleejvefgjeetuedvhffhudetveelgfeugfduledvffejleegjefhteffkeenucfkphepkeeirdegledrvdeftddrvddtieenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeekiedrgeelrddvfedtrddvtdeipdhhvghloheplgdutddrudefjedrtddrudekngdpmhgrihhlfhhrohhmpehtohhmrghssehvohhnughrrgdrmhgvpdhnsggprhgtphhtthhopeegpdhrtghpthhtohepmhihohhnseguvggsihgrnhdrohhrghdprhgtphhtthhopegrnhgurhgvshesrghnrghrrgiivghlrdguvgdprhgtphhtthhopehtohhmrghsrdhvohhnughrrgesphhoshhtghhrvghsqhhlrdhorhhgpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhg X-GND-Sasl: tomas@vondra.me List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 6/23/25 23:47, Tomas Vondra wrote: > ... > > Or maybe the 32-bit chroot on 64-bit host matters and confuses some > calculation. > I think it's likely something like this. I noticed that if I modify pg_buffercache_numa_pages() to query the addresses one by one, it works. And when I increase the number, it stops working somewhere between 16k and 17k items. It may be a coincidence, but I suspect it's related to the sizeof(void *) being 8 in the kernel, but only 4 in the chroot. So the userspace passes an array of 4-byte items, but kernel interprets that as 8-byte items. That is, we call long move_pages(int pid, unsigned long count, void *pages[.count], const int nodes[.count], int status[.count], int flags); Which (I assume) just passes the parameters to kernel. And it'll interpret them per kernel pointer size. If this is what's happening, I'm not sure what to do about it ... FWIW while looking into this, I tried running this under valgrind (on a regular 64-bit system, not in the chroot), and I get this report: ==65065== Invalid read of size 8 ==65065== at 0x113B0EBE: pg_buffercache_numa_pages (pg_buffercache_pages.c:380) ==65065== by 0x6B539D: ExecMakeTableFunctionResult (execSRF.c:234) ==65065== by 0x6CEB7E: FunctionNext (nodeFunctionscan.c:94) ==65065== by 0x6B6ACA: ExecScanFetch (execScan.h:126) ==65065== by 0x6B6B31: ExecScanExtended (execScan.h:170) ==65065== by 0x6B6C9D: ExecScan (execScan.c:59) ==65065== by 0x6CEF0F: ExecFunctionScan (nodeFunctionscan.c:269) ==65065== by 0x6B29FA: ExecProcNodeFirst (execProcnode.c:469) ==65065== by 0x6A6F56: ExecProcNode (executor.h:313) ==65065== by 0x6A9533: ExecutePlan (execMain.c:1679) ==65065== by 0x6A7422: standard_ExecutorRun (execMain.c:367) ==65065== by 0x6A7330: ExecutorRun (execMain.c:304) ==65065== by 0x934EF0: PortalRunSelect (pquery.c:921) ==65065== by 0x934BD8: PortalRun (pquery.c:765) ==65065== by 0x92E4CD: exec_simple_query (postgres.c:1273) ==65065== by 0x93301E: PostgresMain (postgres.c:4766) ==65065== by 0x92A88B: BackendMain (backend_startup.c:124) ==65065== by 0x85A7C7: postmaster_child_launch (launch_backend.c:290) ==65065== by 0x860111: BackendStartup (postmaster.c:3580) ==65065== by 0x85DE6F: ServerLoop (postmaster.c:1702) ==65065== Address 0x7b6c000 is in a rw- anonymous segment This fails here (on the pg_numa_touch_mem_if_required call): for (char *ptr = startptr; ptr < endptr; ptr += os_page_size) { os_page_ptrs[idx++] = ptr; /* Only need to touch memory once per backend process */ if (firstNumaTouch) pg_numa_touch_mem_if_required(touch, ptr); } The 0x7b6c000 is the very first pointer, and it's the only pointer that triggers this warning. At first I thought there's something wrong with how we align the pointer using TYPEALIGN_DOWN(), but then I noticed it's actually the pointer of BufferGetBlock(1). So I'm a bit puzzled by this, and I'm not sure it's related to the other issue at all (it probably is not). It's a bit too late here, I'll continue investigating this tomorrow. -- Tomas Vondra