Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbbkA-000A6i-Le for pgsql-general@arkaria.postgresql.org; Wed, 07 Aug 2024 08:10:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sbbk8-005bc3-NS for pgsql-general@arkaria.postgresql.org; Wed, 07 Aug 2024 08:10:00 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbbk8-005bbv-CE for pgsql-general@lists.postgresql.org; Wed, 07 Aug 2024 08:10:00 +0000 Received: from mail-lf1-x12f.google.com ([2a00:1450:4864:20::12f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sbbk1-003b82-V5 for pgsql-general@lists.postgresql.org; Wed, 07 Aug 2024 08:09:59 +0000 Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-52f0277daa5so2095263e87.0 for ; Wed, 07 Aug 2024 01:09:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723018192; x=1723622992; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PB8TwYIauhxn3ceSjlEKtyoLFWfpsgk/imr5XH0UkGM=; b=ZTIAWUOGNF3Omy2gzXQsq5M/ssxdVJ8jQH/0d27/CIKgmoNdaR8a7WFfM+KcAXsCt4 l1AAI2J9l/ZIJTxfVupxfHRXsDrvMdF0lQAZpVK5KXbMuUlB/yxQ8cZa3Uv8z4jhk/aZ k0EsUy5DriHjXGPvYQR+ubMCKmf6aw8KbI15ATdosywI99Ya7/HBh962KRWe6oi+9ZnJ rgOkm2t7RXY1pjS6uAASMPWP1KYlPGvZe505HckGbRCptInoqJZHqbOgrTdlmbMSdLdg l9GW9NH0FR2vHxtauovFHuUxVzhYSUHuMdRwUvayv6v6Q36rFfrUMi4G87oWqoxFJBNQ JTfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723018192; x=1723622992; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PB8TwYIauhxn3ceSjlEKtyoLFWfpsgk/imr5XH0UkGM=; b=XdyKbbOo+Qoj3LJUonXchA2EotRKuu++gZoZJEpjGZVLqAjFcTGy14jWqEATyEynqn HyCDh8CqnEiBGKPfHHp5MYfvlM/5kCopvBzKELrp9zXtP1a5tQnLYqOGD3U4FUUD40LY YX55a4XB15OR+iyctjDFBQcaD8x5Lvl8VzyyayOQqzprrfevDnWteyi7q7EbxBp9qjlJ AKwj6XhjM5TsCYvpMD4b0vEflL+gkmu6u09lHdfPyzS4BLWBSCQmEO2S6vzXx8mu/kRJ KnCquWyxQGTtGd2p4BTsYXmWGLVX1rGTLnD/9TTF+ZUY88yBUpviFZxBB1Tc45mKe9Zp KGbw== X-Gm-Message-State: AOJu0Yypn5BjIO2yG0Iu+n613IgSO4W851pE8YMpMCUD0xveunMCGb2y i+n/uIjqXt4ColVHXyrOVBkUDh5FAxcOklhCAiohfjREHISeusKext2tuYpLVCteLfh9/z38b6H psyaMXz7XYANCj2xkr4msrybGkN0= X-Google-Smtp-Source: AGHT+IFrppw85rS8koDa3S3ds7xfGqHgn7b6/fvP/SUzYxIAPhGQNXMf7/mWnCWG7OMc3QjvrY9RE9Mx7kSgrKnk8d8= X-Received: by 2002:a05:6512:1253:b0:52c:dd3d:85af with SMTP id 2adb3069b0e04-530bb379522mr12224524e87.25.1723018191808; Wed, 07 Aug 2024 01:09:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: David Rowley Date: Wed, 7 Aug 2024 20:09:39 +1200 Message-ID: Subject: Re: ANALYZE on partitioned tables vs on individual partitions To: Michael Harris Cc: pgsql-general@lists.postgresql.org, Christophe Pettus Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, 7 Aug 2024 at 19:20, Michael Harris wrote: > I found that running an ANALYZE specifying only those 4 columns only took > 5 minutes, compared to the 30 minutes for the whole table. > > That was a bit of a surprise as I imagined actually reading the table would take > most of the time and would be the same regardless of the number of columns > being analyzed, but I guess that is wrong. ANALYZE does do sampling of the data in the table. It would only read all of the rows for fairly small tables. The docs in [1] mention this: "For large tables, ANALYZE takes a random sample of the table contents, rather than examining every row. This allows even very large tables to be analyzed in a small amount of time." I think the reason it's taking so long is not because of it performing ANALYZE on the partitioned table which results in gathering statistics for the partitioned table which means proportionately (based on the size of the partition) sampling rows from each partition, it's more likely due to the fact that each partition is also analysed and the statistics for each of those is updated. There is no "ANALYZE ONLY" command similar to "FROM ONLY" in SELECT queries. You could probably do some sampling of the pg_stat_progress_analyze view to figure out what's taking the most time. If you find that the majority of the time is spent analysing the partitions and not the partitioned table then maybe we should expand ANALYZE to add the ONLY option... David [1] https://www.postgresql.org/docs/current/sql-analyze.html [2] https://www.postgresql.org/docs/current/progress-reporting.html