Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uoNW8-0031jh-5q for pgsql-general@arkaria.postgresql.org; Tue, 19 Aug 2025 14:40:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uoNW7-00CePQ-KR for pgsql-general@arkaria.postgresql.org; Tue, 19 Aug 2025 14:40:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uoNW7-00CePH-9m for pgsql-general@lists.postgresql.org; Tue, 19 Aug 2025 14:40:51 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uoNW5-000j9m-2e for pgsql-general@postgresql.org; Tue, 19 Aug 2025 14:40:51 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 57JEel5c576574; Tue, 19 Aug 2025 10:40:47 -0400 From: Tom Lane To: David Mullineux cc: Postgres General Subject: Re: Why analyze reports 30000 pages and rows scanned. Why not just rows? In-reply-to: References: Comments: In-reply-to David Mullineux message dated "Tue, 19 Aug 2025 11:17:58 +0100" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <576572.1755614447.1@sss.pgh.pa.us> Date: Tue, 19 Aug 2025 10:40:47 -0400 Message-ID: <576573.1755614447@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk David Mullineux writes: > But my question is, why does 'analyze verbose' report that it has scanned > '30000 of NNNN pages, containing NNNN live rows and 0 dead rows; 30000 rows > in sample,....' > As most tables would store more than 1 row per page, I expected that 30000 > rows would require a lot fewer than 30000 *pages* to be scanned. Why is it > saying it's scanned 30000 pages instead of only 30000 rows ? If the table is sufficiently large, taking a sample of a single row from each of 30000 different pages is the correct behavior. Taking more than one row from each of a smaller set of pages would give a nonrandom (because clumped) sample. regards, tom lane