Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbayK-0004cQ-G9 for pgsql-general@arkaria.postgresql.org; Wed, 07 Aug 2024 07:20:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sbayI-005DhB-V7 for pgsql-general@arkaria.postgresql.org; Wed, 07 Aug 2024 07:20:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbayI-005Dgf-J7 for pgsql-general@lists.postgresql.org; Wed, 07 Aug 2024 07:20:34 +0000 Received: from mail-lf1-x131.google.com ([2a00:1450:4864:20::131]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sbayG-003aoA-A0 for pgsql-general@lists.postgresql.org; Wed, 07 Aug 2024 07:20:34 +0000 Received: by mail-lf1-x131.google.com with SMTP id 2adb3069b0e04-52f008b40d7so232025e87.2 for ; Wed, 07 Aug 2024 00:20:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723015231; x=1723620031; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VQlqrA0WSog0XdI7q8PbfU6/wvnsLhWXyxPj4Z90rZI=; b=l2ZisiQypqXxlGoJ2y+CNpNDRi/fgKzLBMad5tom8AOYPszPsV+K/6UoiIbbSZVtXT RaYsEylp68AKCAdtCmNYwDT9pdXwL7LqvqBCdjVS3c7ZhA/YgeXG15Z1riR9wTmOSJTa sClHkA5ViiGyKZGWZ1whamaiVOfx/z786bPp22nJDhf8NpRqSHKFa/e77t2gHloYwApg Q68eUScH/XGz7rjp6rNmIdK7+urZXwhrJmrDRGd2j2M7s02igEY5qhcHZEFQD7ZnAgsx 43U80gm4JfXLVTpoU5Mbtb/LRF9RQW7xMtw6TaFY3UuaC1AdHlmMI3husNNTmNxSFSaq A5eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723015231; x=1723620031; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VQlqrA0WSog0XdI7q8PbfU6/wvnsLhWXyxPj4Z90rZI=; b=bZFlZ8kYcMEc7Tin3FZh1pLUYSCgJ33C90ngbi9yL0+YT1l/LRucyCyrORnXCqd7Cd 2gS9pQPXFwBYmygGsBfDQte43ZCBPdBFFPs7FJE6qqFSYx2tuuurqorDLZDrjYapRKEh YWRjOJUUA2yFTNALOz3nzL+HBVAEly1VMjJIGmLVzc5rtzIQwDCKJ55Y0Y0kRvXl6gZt CHgkiOn+A2WyCqJ3yH+qm5oo7THCaz/XJ+GGjQvWusJ+sm6qIaPjAgeXmtqlNefAA4Zf 2YU1LX1B7U8gg4AKgo9ctulCfCfWcu5JPYOi7SCGNTkPDQTKgDkuOEcuXtoURRGn9A8y RVbg== X-Gm-Message-State: AOJu0YxEc/7ADWx5Khoz1BDq2kMCGT0kQFXGsVe3qYhKa9RjfOlFKKgj Vjj5+a0A7rpMlQEbCJ4zs8p72fKKtJuZAEQ+joVMXDHD22MrtmAeD0k9HI1uO2O6zhJuLeM8Yfi BlxQk5gR2K7/PA7JNCjcmMC0oEsI= X-Google-Smtp-Source: AGHT+IGKfzmrQlfj6BpWIP9BVUc4t/TpHnwxLU6218tAGEOtlsyRJx5rJfXa72ZS8sd/CB6cikrLZ6mtfaif4EuPZN4= X-Received: by 2002:a05:6512:3da4:b0:52f:c438:883c with SMTP id 2adb3069b0e04-530bb362e75mr6697370e87.1.1723015230469; Wed, 07 Aug 2024 00:20:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Michael Harris Date: Wed, 7 Aug 2024 17:20:19 +1000 Message-ID: Subject: Re: ANALYZE on partitioned tables vs on individual partitions To: David Rowley Cc: pgsql-general@lists.postgresql.org, Christophe Pettus Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Many thanks David for the comprehensive response. > I think the complaint was about no autovacuum on the partitioned > table, not the partitions. Yes, exactly. One other piece of information: these tables contain a lot of columns, of w= hich only 4 are normally used for WHERE clauses or joins. The table I was experimenting with has 150 columns, 156026832 rows and occupies 166GB. I found that running an ANALYZE specifying only those 4 columns only took 5 minutes, compared to the 30 minutes for the whole table. That was a bit of a surprise as I imagined actually reading the table would= take most of the time and would be the same regardless of the number of columns being analyzed, but I guess that is wrong. Regards, Mike On Wed, 7 Aug 2024 at 15:23, David Rowley wrote: > > On Wed, 7 Aug 2024 at 16:44, Christophe Pettus wrote: > > Child partitions should be autovacuumed and autoanalyzed just like any = other table; they are not prohibited from autovacuum in any way by default.= It's probably a good idea to investigate why they are not being picked up= by autovacuum. If they are created by a bulk load process, it's not a bad= idea to do a VACUUM ANALYZE on them once the bulk load is complete. > > I think the complaint was about no autovacuum on the partitioned > table, not the partitions. This is expected as we don't track the > counters (in particular n_mod_since_analyze) shown in > pg_stat_all_tables at the partitioned table level, so the trigger > points that normally cause autovacuum to analyze or vacuum a table > just won't be triggered for a partitioned table. For VACUUM, that's > fine as, as you mentioned, no rows are stored. But for analyze, that > does present a problem. > > To name the aspects of planning that rely on statistics of the > partitioned table, basically anything above the Append or MergeAppend > which joins the partitioned results together. So that doesn't include > the scans of each partition and any quals that are pushed down to the > scan level as those are able to use the partition level statistics. > However, it does include things like joins, group by, distinct as > those require n_distinct estimates for the partitioned table. It's not > all bad though as the row estimates for each individual partition will > be totalled up through the Append / MergeAppend simply by adding up > the row estimates for each Append / MergeAppend child plan. So, it's > really only an estimation problem for any node that comes after a join > node or a group by node as the output rows for those nodes will depend > on a good n_distinct estimate for the partitioned table. > > Partition-wise joins and aggregates do change things a bit as those > features do permit moving those operations below the Append / Merge > Append, in which case the statistics for the individual partition can > be used. > > You could consider manually setting the n_distinct_inherited estimates > for the columns that you join on or group by in the partitioned table. > You might find that you're able to choose a suitable value for that if > you review the documentation for that setting. In particular, please > review what is mentioned about using negative numbers for that > setting. You may be able to choose a value that scales correctly with > the row estimate that doesn't get outdated as you add more rows to the > partitions. You'll need to determine that based on the data you're > storing. > > David