Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uTyxi-00DO0W-SY for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 08:25:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uTyxg-00ANTp-PD for pgsql-hackers@arkaria.postgresql.org; Tue, 24 Jun 2025 08:25:01 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uTyxg-00ANTh-By for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 08:25:01 +0000 Received: from mail-wr1-x436.google.com ([2a00:1450:4864:20::436]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uTyxf-003hHo-0c for pgsql-hackers@lists.postgresql.org; Tue, 24 Jun 2025 08:25:00 +0000 Received: by mail-wr1-x436.google.com with SMTP id ffacd0b85a97d-3a510432236so3720628f8f.0 for ; Tue, 24 Jun 2025 01:24:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750753496; x=1751358296; darn=lists.postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=o+YFZEi4JOwdomcH9h2Fk/RV+WNY+LXNRXufos4Lt3M=; b=JY88Xk4VY953Ylbqkl+hxI01nV3to7kBZq7ZTk0sbddaGqC4fyKimm4xUdPdDqtJRs APF7Nwv18HKhjKX+qYVnn0//N1KdCyrEfY9zfrJXp28cKNJqdXgWwI9n38YAAyuGe6fM botds2BoVP+E27+hPEX+fxlxONWRue6IHCuTsW6G+ZtkIyZHrq4XP1pO/BY35ls3Brla 0IJ2C/n2PLgiSafTRHoJiYt9xbnKDZ66SICKtswjuqqcht7IvROL6coLVz1DjVZFFrF6 YtFyp4OHk0xpAarFmHTMSBrJrsSmNZnptK4ArQnfY7Bxkddpg7L9XdItxievSGe/eNwW 7A0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750753496; x=1751358296; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=o+YFZEi4JOwdomcH9h2Fk/RV+WNY+LXNRXufos4Lt3M=; b=gfHBNCK1KBuV+XvSoWqf6RvbuurBFl9pLFxan2VNhti7JFTgWkU9UOdOPMG+KrWwyL I48VP3QnJ93vKYy48Ty550EMWw4BwzqIavaXgpCCLKBLWwZE3yISD6NKj3+gxbAw/dfl 6GSQpshCiN5WhrNtIXVhr5vmHfNSLmFc2UjVcWPANADglYF6iTtcrmwovi1EAoQmnCOw ioCmHZCE+HbGs/gA9q0aS47Tr2ji0P08Dyu44dphilq99tfGGsLhONkA1v6z/HAW8XlF Fkk1XfacRcUoNydDV4DkGff8/ndu/hMZfszRMkEIWP9mvohdZSqcXGV5N/OhSIjyaxue WpGQ== X-Forwarded-Encrypted: i=1; AJvYcCVAZDIGsjrQAzFcwo/z8apLptCLUHVw6Hq3S47cb2tLMkR9cF5TKSoWBp10uYkOItZQa6GDy4HNzTJ6S46Y@lists.postgresql.org X-Gm-Message-State: AOJu0Yy6QVQPktyjEeBdLdoecWElVAri8OY82xe7XZeonKr4atKAIA0E eDSoax1iOik6XpXzV7pcquOWyRlOjye9knkitugi2lL+6WQRvWat7pCh X-Gm-Gg: ASbGncuyPzUo0J78DZXLWMj78f/tG7ZK0NZZ/Avb1wq3egITAvtKK5YPK4/qpgqnLRB rsccTdjjXPD12eWa5X2BHjvjczorf8BSqteikxtL2EAJKuf3Wz7dsTxJ2qE2J/5QcSbSzxkRLj4 rvoYQKEVUwRKwe7UlxVp2fiPuvIxJBcC0/8fGZfmuscm3wwxrDjbkTpxROQdz+B9Zr1tfzHTnmX 0zRLRv4nbAAZIPEceXJn0YiBV8weIO9uS3DZKQlsIY66TvdKWxRfKBENvaNsScGIAqTHlDkUh0l 2yJgDKtimQmurfwf6ECOh291RJlcIa8zamGwsWa/RUSlFno6kI0D9DudPSXBbVDI2o3x1v6k9Vu iBEuxzw/XuC1KJlxnVvBpNybu+bg3I1NJsqPxfNbVyRX54AHixUCvRSN5cTK/KJGhF6D6fa0uOI h/uxhBJ9eOmclWcx77SQ== X-Google-Smtp-Source: AGHT+IFPA6KSs5XGHaxAC96b+R0ARFjZWhiGJKicgYTYDGdj0rU2n7yAg5yjGy2t+VNIyvz5BZ9gNQ== X-Received: by 2002:a05:6000:2a8a:b0:3a6:e6c3:6d95 with SMTP id ffacd0b85a97d-3a6e6c36e16mr1716890f8f.41.1750753495758; Tue, 24 Jun 2025 01:24:55 -0700 (PDT) Received: from ip-10-97-1-34.eu-west-3.compute.internal (ec2-15-237-181-182.eu-west-3.compute.amazonaws.com. [15.237.181.182]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4536470415csm133230025e9.31.2025.06.24.01.24.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jun 2025 01:24:55 -0700 (PDT) Date: Tue, 24 Jun 2025 08:24:53 +0000 From: Bertrand Drouvot To: Tomas Vondra Cc: Christoph Berg , Andres Freund , Tomas Vondra , pgsql-hackers@lists.postgresql.org Subject: Re: pgsql: Introduce pg_shmem_allocations_numa view Message-ID: References: <6c9f9f7e-947b-4fc3-bdb6-b0696d7492e5@vondra.me> <0643ae61-cf9d-482c-9b2c-fb861b24fd22@vondra.me> <6342f601-77de-4ee0-8c2a-3deb50ceac5b@vondra.me> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="YnYdx3wnSYItcXOD" Content-Disposition: inline In-Reply-To: <6342f601-77de-4ee0-8c2a-3deb50ceac5b@vondra.me> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --YnYdx3wnSYItcXOD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, On Tue, Jun 24, 2025 at 03:43:19AM +0200, Tomas Vondra wrote: > On 6/23/25 23:47, Tomas Vondra wrote: > > ... > > > > Or maybe the 32-bit chroot on 64-bit host matters and confuses some > > calculation. > > > > I think it's likely something like this. I think the same. > I noticed that if I modify > pg_buffercache_numa_pages() to query the addresses one by one, it works. > And when I increase the number, it stops working somewhere between 16k > and 17k items. Yeah, same for me with pg_get_shmem_allocations_numa(). It works if pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on more than 16 pages. It's also confirmed by test_chunk_size.c attached: $ gcc-11 -m32 -o test_chunk_size test_chunk_size.c $ ./test_chunk_size 1 pages: SUCCESS (0 errors) 2 pages: SUCCESS (0 errors) 3 pages: SUCCESS (0 errors) 4 pages: SUCCESS (0 errors) 5 pages: SUCCESS (0 errors) 6 pages: SUCCESS (0 errors) 7 pages: SUCCESS (0 errors) 8 pages: SUCCESS (0 errors) 9 pages: SUCCESS (0 errors) 10 pages: SUCCESS (0 errors) 11 pages: SUCCESS (0 errors) 12 pages: SUCCESS (0 errors) 13 pages: SUCCESS (0 errors) 14 pages: SUCCESS (0 errors) 15 pages: SUCCESS (0 errors) 16 pages: SUCCESS (0 errors) 17 pages: 1 errors Threshold: 17 pages No error if -m32 is not used. > It may be a coincidence, but I suspect it's related to the sizeof(void > *) being 8 in the kernel, but only 4 in the chroot. So the userspace > passes an array of 4-byte items, but kernel interprets that as 8-byte > items. That is, we call > > long move_pages(int pid, unsigned long count, void *pages[.count], const > int nodes[.count], int status[.count], int flags); > > Which (I assume) just passes the parameters to kernel. And it'll > interpret them per kernel pointer size. > I also suspect something in this area... > If this is what's happening, I'm not sure what to do about it ... We could work by chunks (16?) on 32 bits but would probably produce performance degradation (we mention it in the doc though). Also would always 16 be a correct chunk size? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com --YnYdx3wnSYItcXOD Content-Type: text/x-csrc; charset=us-ascii Content-Disposition: attachment; filename="test_chunk_size.c" #include #include #include #include #include #include int test_chunk_size(int chunk_size) { size_t page_size = sysconf(_SC_PAGESIZE); void *mem = malloc(page_size * chunk_size); if (!mem) return -1; memset(mem, 0xFF, page_size * chunk_size); void **ptrs = malloc(sizeof(void*) * chunk_size); int *status = malloc(sizeof(int) * chunk_size); for (int j = 0; j < chunk_size; j++) { ptrs[j] = (char*)mem + (j * page_size); status[j] = -999; } long result = syscall(SYS_move_pages, 0, chunk_size, ptrs, NULL, status, 0); int errors = 0; if (result == 0) { for (int j = 0; j < chunk_size; j++) { if (status[j] < 0) errors++; } } free(mem); free(ptrs); free(status); return (result == 0) ? errors : -1; } int main() { int threshold = -1; // Test sizes from 1 to 40 pages for (int size = 1; size <= 40; size++) { int errors = test_chunk_size(size); if (errors == -1) { if (threshold == -1) threshold = size; break; } else if (errors == 0) { printf("%2d pages: SUCCESS (0 errors)\n", size); } else { printf("%2d pages: %d errors\n", size, errors); threshold = size; break; } } if (threshold > 0) printf("Threshold: %d pages\n", threshold); else printf("No threshold found in range 1-40 pages\n"); return 0; } --YnYdx3wnSYItcXOD--