Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v4F0S-00B7Vh-Sd for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Oct 2025 08:49:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v4F0P-006kja-Sy for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Oct 2025 08:49:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v4F0P-006kjQ-EI for pgsql-hackers@lists.postgresql.org; Thu, 02 Oct 2025 08:49:42 +0000 Received: from mail-yw1-x112e.google.com ([2607:f8b0:4864:20::112e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1v4F0M-0010gw-1u for pgsql-hackers@lists.postgresql.org; Thu, 02 Oct 2025 08:49:41 +0000 Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-723ad237d1eso8599847b3.1 for ; Thu, 02 Oct 2025 01:49:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759394978; x=1759999778; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=o0UbUKTmgiu2gZkieOYD3V61ZFfnvtQF8Uel/VZ/RWM=; b=gYyMJyfGO8Q74S+GYhPrZsxQn4DejMaYVWNDF/R90qFq82nAiJ5MZ+K2R672Y65Lvl IlduQiDzDHDXDSwnANlnK9yANAhP5sedosJeTeXnjlpzOmaWCf2C6mMctXxt3guDJ2nt Yec4eeErGIvKnFQtz1VTysxQSyQ4vLk8l7eWeUSJweOPIuVzYaTSxFXD5yuSr4Neg/Ni 0ct6w0MOuUs68uNOhnCCIGsWqQhFlDTQ5wn9/q4TiY3tf2EDZVT3AiL+lEyBO3AdeNiF 1zFkbNb660W62OPr9dcmy15sHOn/dVN7d5li8jvNXPZXNO8iiPu4lJ5HZDa/yu/S5FNy Vk5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759394978; x=1759999778; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o0UbUKTmgiu2gZkieOYD3V61ZFfnvtQF8Uel/VZ/RWM=; b=B6Jy50AiAkjT4raHSzSULsWlUT09gAiDtz7T7yZhFoZgV4NvwvA00suZ8AZIdH8GP2 hucmT70XPGEwuPS/Nl9IINbzjqvSzT7HJvWrLPIIKboKV7kxr78cmZF5NjkYXxrwJo3r st88yAHGdy54myJbmjWpKmu9zXB1UQCtFdf46ilXI4IEJIoDiDIAlfbskrqZUPKfV3h3 haWBI9NRUqkfecwe46iBb46ct+1XR+kuvKvxBz1KwxiU5zWd4XHgYBR3ngdfGZP7eenG QFwD9trzX7qE1uI0dsIEOmFVQf7u4D3OcW3VmWPYMrYgQoxnyDNheXMFCBzti9dl68ME XH8A== X-Forwarded-Encrypted: i=1; AJvYcCXbS5xO/GODAZG69JQEG5R2+x8T2QeSgOpXrTq5TvxJhThjzHGrmiC9q825zA8ccdEDJuERyKThYN38mVPF@lists.postgresql.org X-Gm-Message-State: AOJu0Yxwq5w15BaX9GK4lIV9XOIwD2Gzz7rqVV7432EoQedcbeGXvyzA 7ZE6SKXVd01dc1+8YTEm39/tALMFmvSSUg/4blBfqEShY5HXVYZ1rjQCmVBSQI/bvkWCDSIgQvX sFAqQj4Y9j6pw0LsS+c18B5Q9f3DhO5g= X-Gm-Gg: ASbGncuF0eRUMO3cgidw8QzMnM7S6Hzt9KGc7odAtcmsOqjOh8Jd1tbE5ZiLUtPsGlv iAr5IChLAKY7LK58O+Ff3dc3H09qjAnzc4JS6cuJ+hwpWKXmS6zFvMIdhZiLMJqhj3p3RMB08/b yZw7MGDgHDqv5d9BtfA6qncA/NgJZvs3plv9/17GClHBfeZt7YR/XKE4pc7JnKOtViZ5Fpr8URD qWy79g9avEtRVkXM9xtslBxJaYPCwRfYQ== X-Google-Smtp-Source: AGHT+IFJ8rjSuSG27CcWOOIdWivEyQH6J/gmXw9KmCC5zxIOAns+Q+S1px7EiTXIfOKlPAnxMhAychGS0w2YwohVFqE= X-Received: by 2002:a05:690e:28f:b0:633:a145:68f7 with SMTP id 956f58d0204a3-63b6fe8b290mr6375935d50.4.1759394978133; Thu, 02 Oct 2025 01:49:38 -0700 (PDT) MIME-Version: 1.0 References: <87il22cj51.fsf@163.com> In-Reply-To: From: Richard Guo Date: Thu, 2 Oct 2025 17:49:27 +0900 X-Gm-Features: AS18NWB--sy8KqDJlx9tXaLV96MNgO_4NZelxRLfATc8uPYR92wfo4qfM1if2A4 Message-ID: Subject: Re: Eager aggregation, take 3 To: Matheus Alcantara Cc: Robert Haas , Tom Lane , Tender Wang , Paul George , Andy Fan , PostgreSQL-development , pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Oct 2, 2025 at 10:39=E2=80=AFAM Richard Guo wrote: > It seems eager aggregation doesn't cope well with parallel plans for > this query. Looking into it. It turns out that this is not related to parallel plans but rather to poor size estimates. Looking at query 31, it involves joining 6 base relations, all of which are CTE references (i.e., RTE_CTE relations) to two different CTEs. Each CTE involves aggregations and GROUP BY clauses. Unfortunately, our size estimates for CTE relations are quite poor, especially when the CTE uses GROUP BY. In these cases, we don't have any ANALYZE statistics available (cf. examine_simple_variable). As a result, when computing the selectivity of the CTE relation's qual clauses, we have to fall back on default values. For example, for quals like "CTE.var =3D const", which are used a lot in query 31, the selectivity is computed as "1.0 / DEFAULT_NUM_DISTINCT(200)", with the assumption that there are DEFAULT_NUM_DISTINCT distinct values in the relation, and that these values are equally common (cf. var_eq_const). The consequence is that the size estimates are significantly different from the actual values. For example, from the EXPLAIN(ANALYZE) output provided by Matheus: -> CTE Scan on ws ws3 (cost=3D0.00..1797.35 rows=3D2 width=3D110) (actual time=3D0.001..74.725 rows=3D1261.00 loops=3D1) Filter: ((d_year =3D 1999) AND (d_qoy =3D 3)) Interestingly, with eager aggregation applied, the row count estimates for the two CTE plans actually become closer to the actual values. -- without eager aggregation CTE ws -> HashAggregate (cost=3D96009.03..114825.35 rows=3D718952 width=3D54) (actual time=3D977.215..1014.889 rows=3D23320.00 loops=3D1) -- with eager aggregation CTE ws -> Finalize GroupAggregate (cost=3D52144.19..62314.79 rows=3D71894 widt= h=3D54) (actual time=3D275.121..340.107 rows=3D23312.00 l= oops=3D1) However, due to the highly underestimated selectivity for the qual clauses, the row count estimates for CTE Scan nodes become worse. This is because: -- without eager aggregation 718952 * (1.0/200) * (1.0/200) ~=3D 18 -- with eager aggregation 71894 * (1.0/200) * (1.0/200) ~=3D 2 ... while the actual row count is 1261.00 as shown above. That is to say, on master, the CTE plan rows are overestimated while the selectivity estimates are severely underestimated. With eager aggregation, the CTE plan rows become closer to the actual values, but the selectivity estimates remain equally underestimated. As a result, the row count estimates for the CTE Scan nodes worsen with eager aggregation. This causes the join order in the final plan to change when eager aggregation is applied, leading to longer execution times in this case. Another point to note is that, due to severely underestimated selectivity estimates (0.000025, sometimes 0.000000125), the size estimates for the CTE relations are very small, causing the planner to tend to choose nestloops. I tried manually disabling nestloop, and here are what I got for query 31. -- on master, set enable_nestloop to on; Planning Time: 4.613 ms Execution Time: 7142.090 ms -- on master, set enable_nestloop to off; Planning Time: 4.315 ms Execution Time: 2262.330 ms -- on patched, set enable_nestloop to off; Planning Time: 4.321 ms Execution Time: 1214.376 ms That is, on master, simply disabling nestloop makes query 31 run more than 3 times faster. Enabling eager aggregation on top of that improves performance further, making it run 1.86 times faster relative to the nested-loop-disabled baseline. I manually disabled nested loops for other TPC-DS queries on master and discovered some additional interesting findings. For query 4, on master: -- set enable_nestloop to on Planning Time: 3.054 ms Execution Time: 3231356.258 ms -- set enable_nestloop to off Planning Time: 4.291 ms Execution Time: 12751.170 ms That is, on master, simply disabling nestloop makes query 4 run more than 253 times faster. For query 11, on master: -- set enable_nestloop to on Planning Time: 1.435 ms Execution Time: 1824860.937 ms -- set enable_nestloop to off Planning Time: 2.479 ms Execution Time: 7984.360 ms Disabling nestloop makes query 11 run more than 228 times faster. I believe you can find more such queries in TPC-DS if you keep looking. Given this, I don't think it makes much sense to debug a performance regression on TPC-DS with nestloop enabled. Matheus, I wonder if you could help run TPC-DS again with this patch, this time with nested loops disabled for all queries. - Richard