Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbrYW-001zIr-IQ for pgsql-general@arkaria.postgresql.org; Thu, 08 Aug 2024 01:03:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sbrYV-00B0h0-5r for pgsql-general@arkaria.postgresql.org; Thu, 08 Aug 2024 01:03:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbrYU-00B0gs-Qq for pgsql-general@lists.postgresql.org; Thu, 08 Aug 2024 01:03:02 +0000 Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sbrYS-003iWg-9f for pgsql-general@lists.postgresql.org; Thu, 08 Aug 2024 01:03:02 +0000 Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-3684c05f7afso50878f8f.2 for ; Wed, 07 Aug 2024 18:02:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723078979; x=1723683779; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=u8006GONge2M6q+kjPzcndDVmngnc82O2rM4JmBth/o=; b=Ow+AoXBYBbzXCOvBHhuZ62051dAG8ZHyebpll0EEtbeJcv8qVM0AveYCF09IWxOXst 3HA28U19SJeot7hGOjel0J+iC4vh6r5AorQKhAoY6iSRZxbdyQC4QABYEx8+nwlEfayd hrGpDTj1y9Wz9ZBDph+GCVQJZp6NoIu62MNqw1uzKcXT//jUPGFmUcUMyB2jIH80k8/R T1B/UdrQU9/MrnYGRAKTCso9vJFPe/uTuQcEVocxEeigwIDjO5TW7RBX/IrR56sEsWYo xpmbcpmgC4ZacdtV8Tio2U1uQMtY5ZqwUkXFCFJCrbGFJRrTuL6xPuWWWDcyRmx17Ql2 /3Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723078979; x=1723683779; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=u8006GONge2M6q+kjPzcndDVmngnc82O2rM4JmBth/o=; b=hhn/lR+7gREc3owkZjYyix2ipeATbgn4EVUW1l3NTCxuN0RqHS+LLsxzgi7WLqek/s hkTPUOIV4d5pnB04Q5clYENURFjuTSKEpqlPb1qPp0uDQ3rQyaShtdDtjPQbkPz5pwmq RBi0LABtqmb3QU4eCbpnbUAcS54FRYK3MdLsHrMRC/8Bf5OXVzbYNz5jwn5nzZS520fB /CMblhyir9jpraUWH3MgRW4PZ70gosbI5rEY+hpeoNpG3tEtkCqDUsZiJzu4UguHxLpw OkZns4Em7z4A4yhJ/W/w/DP2MY54k84fdoQPl+b5ESA4Bfcyr2d4fPclbZ2yYzzxagxN UPAg== X-Gm-Message-State: AOJu0YzVUJrMJhORhFARcnsrkYtq37iPlL/w4eR7F2PRFZbzpOlYSojL dTwCMgxPxtdLrHNpLaHZkc7Z0/cSquNwcvirbV8d3ePuzxJXBFaQm7QDcZHuby1IG0INpqsssB2 J8neXprCJs4fuMMpGx7NV50q5akc= X-Google-Smtp-Source: AGHT+IFo0Qcv6eNIfecxxYRSS2VJiNGt+UtCZ7c0xbGUtM/4uxXe6tx+rgak2alnmVH8gRa8maVynPB7Lr7uhp+f4ts= X-Received: by 2002:a5d:5f46:0:b0:368:4c5:12ec with SMTP id ffacd0b85a97d-36d2758302emr107118f8f.8.1723078978504; Wed, 07 Aug 2024 18:02:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Michael Harris Date: Thu, 8 Aug 2024 11:02:45 +1000 Message-ID: Subject: Re: ANALYZE on partitioned tables vs on individual partitions To: David Rowley Cc: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > You could probably do some sampling of the pg_stat_progress_analyze > view to figure out what's taking the most time. I did another run, sampling the pg_stat_progress_analyze every 30s. For the first 4 minutes it was working on the partitioned table. After that it began analyzing all the partitions, which took approx 29 minutes. I think you are correct - an ONLY option for ANALYZE would be a huge benefit. In my use case, the autovacuum processes are keeping the partitions analyzed so there would seem to be little benefit to including them in the manual table level ANALYZE. Regards Mike On Wed, 7 Aug 2024 at 18:09, David Rowley wrote: > > On Wed, 7 Aug 2024 at 19:20, Michael Harris wrote: > > I found that running an ANALYZE specifying only those 4 columns only took > > 5 minutes, compared to the 30 minutes for the whole table. > > > > That was a bit of a surprise as I imagined actually reading the table would take > > most of the time and would be the same regardless of the number of columns > > being analyzed, but I guess that is wrong. > > ANALYZE does do sampling of the data in the table. It would only read > all of the rows for fairly small tables. The docs in [1] mention > this: > > "For large tables, ANALYZE takes a random sample of the table > contents, rather than examining every row. This allows even very large > tables to be analyzed in a small amount of time." > > I think the reason it's taking so long is not because of it performing > ANALYZE on the partitioned table which results in gathering statistics > for the partitioned table which means proportionately (based on the > size of the partition) sampling rows from each partition, it's more > likely due to the fact that each partition is also analysed and the > statistics for each of those is updated. There is no "ANALYZE ONLY" > command similar to "FROM ONLY" in SELECT queries. > > You could probably do some sampling of the pg_stat_progress_analyze > view to figure out what's taking the most time. If you find that the > majority of the time is spent analysing the partitions and not the > partitioned table then maybe we should expand ANALYZE to add the ONLY > option... > > David > > [1] https://www.postgresql.org/docs/current/sql-analyze.html > [2] https://www.postgresql.org/docs/current/progress-reporting.html