Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sjenS-002fd7-CV for pgsql-hackers@arkaria.postgresql.org; Thu, 29 Aug 2024 13:02:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sjenQ-001z3g-CA for pgsql-hackers@arkaria.postgresql.org; Thu, 29 Aug 2024 13:02:40 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sjenQ-001z3Y-2V for pgsql-hackers@lists.postgresql.org; Thu, 29 Aug 2024 13:02:40 +0000 Received: from mail-ej1-x631.google.com ([2a00:1450:4864:20::631]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sjenM-0025ET-Nw for pgsql-hackers@postgresql.org; Thu, 29 Aug 2024 13:02:40 +0000 Received: by mail-ej1-x631.google.com with SMTP id a640c23a62f3a-a868831216cso77625166b.3 for ; Thu, 29 Aug 2024 06:02:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724936557; x=1725541357; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VR5eS4F5rGUO2l8W+TJlY33dZ13CRbSw/PmtU/DgvPY=; b=K7QQtDEOa500otxSEt5zg5phH1TGWUuIdw6hxjBfYIdjSVex8O6oRgQ94f8DU+KqJw 9sShw8Ir3ygRbCKlkdjzQdAjtyOA/0InuY+xyvS9OlwpOiMIXf2bHb9xIT3qQ+ZW2EV2 mI41tzldabGMpD8gsjdEWIXzLNWaUc3QaC6rejZIyfwSy/VxFTAy5Bu+CRW1ghPoCv7R CZR+TQzc0DrLT2lP9OdGpYJ5BVIQhH1aKn2woHzTIIAgEco5OH94D9KLl5nV8AyUWk2f bQ1pAH6tpQccSrNglj/f9Wdqvqw2fbTSnMtOXaP24xuFDvIyC+4oL/89cCpWDrnUn9iX ru/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724936557; x=1725541357; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VR5eS4F5rGUO2l8W+TJlY33dZ13CRbSw/PmtU/DgvPY=; b=WtIgIMP0GnZIe76CPOnJOMdWc59TwxYYJR7x+1y1QgBwdEpJOU/+6dCJTicZOUhB+p Q5laFKs6mMRxXEUETcBSAbkZtizxVOr4BOKblSMQybCyS/BldkXmpYZkoZcQo+Ul2+Qy SpZugLjQVYpwEZOm4ddJMWbJGyjNTycev37sGiByMw9GLKthtFsN3Tn965ckLDfpYwjY VaCIr9LWBqnkKD1hYWlkj5yeF4dT/tXcg+wLS3ZR8J2RBYfaPK0Ni5KQ4bH9KpJgJpEd XJpfOK+zUnHr/jEZeJH7X1W7fktnrxqZCt0rKkbl95wwFOneUZsuBMpUj2eKGAKp5qJv ArxQ== X-Forwarded-Encrypted: i=1; AJvYcCUK4VGh+P3BT9+vnYI4L4ECoECUU39eydQzDA6BdGlCkujzPJVkVBCLxFXAaDN5SeRye8+7iuhbJrALXws5@postgresql.org X-Gm-Message-State: AOJu0YwaCk9ji9VOJwIwRiHAO/g5SYEc4OtTaEyE7ni3om5M7/yHVDdL Jqb7TlglMHSRrb8RckxhG1K50jhgiFIzyS/Bjfuu8RIP2Qjo0FKW11KXQBzuJWeA4ZaEqTwFbK9 KoBJCt1Gc5fkoXzW2XzhBQGUjIQeNePdM X-Google-Smtp-Source: AGHT+IGkH0y8E/0ns3YloX53Np15mSMYQmrvz5NXUTFj2GRcfDtPjSiTuamnlOc8mtpG5eHfkvecCu/eUxaCW5pCOCM= X-Received: by 2002:a05:6402:518b:b0:5be:fa5f:565 with SMTP id 4fb4d7f45d1cf-5c21ed47916mr2538393a12.13.1724936543132; Thu, 29 Aug 2024 06:02:23 -0700 (PDT) MIME-Version: 1.0 References: <87il22cj51.fsf@163.com> In-Reply-To: From: Robert Haas Date: Thu, 29 Aug 2024 09:02:10 -0400 Message-ID: Subject: Re: Eager aggregation, take 3 To: Tender Wang Cc: Richard Guo , Paul George , Andy Fan , PostgreSQL-development , pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Aug 28, 2024 at 11:38=E2=80=AFPM Tender Wang w= rote: > I upload EXPLAIN(COSTS ON, ANALYZE) test to [1]. > I ran the same query three times, and I chose the third time result. > You can check 19_off_explain.out and 19_on_explain.out. So, in 19_off_explain.out, we got this: -> Finalize GroupAggregate (cost=3D666986.48..667015.35 rows=3D187 width=3D142) (actual time=3D272.649..334.318 rows=3D900 loops=3D= 1) -> Gather Merge (cost=3D666986.48..667010.21 rows=3D187 width=3D142) (actual time=3D272.644..333.847 rows=3D901 loops=3D1) -> Partial GroupAggregate (cost=3D665986.46..665988.60 rows=3D78 width=3D142) (actual time=3D266.379..267.476 rows=3D300 loops=3D3) -> Sort (cost=3D665986.46..665986.65 rows=3D78 width=3D116) (actual time=3D266.367..266.583 rows=3D5081 loops=3D= 3) And in 19_on_explan.out, we got this: -> Finalize GroupAggregate (cost=3D666987.03..666989.77 rows=3D19 width=3D142) (actual time=3D285.018..357.374 rows=3D900 loops=3D1= ) -> Gather Merge (cost=3D666987.03..666989.25 rows=3D19 width=3D142) (actual time=3D285.000..352.793 rows=3D15242 loops=3D1) -> Sort (cost=3D665987.01..665987.03 rows=3D8 width=3D142) (actual time=3D273.391..273.580 rows=3D5081 loops=3D3) -> Nested Loop (cost=3D665918.00..665986.89 rows=3D8 width=3D142) (actual time=3D252.667..269.719 rows=3D5081 loops=3D3= ) -> Nested Loop (cost=3D665917.85..665985.43 rows=3D8 width=3D157) (actual time=3D252.656..264.755 rows=3D5413 loops=3D3) -> Partial GroupAggregate (cost=3D665917.43..665920.10 rows=3D82 width=3D150) (actual time=3D252.643..255.627 rows=3D5413 loops=3D3) -> Sort (cost=3D665917.43..665917.64 rows=3D82 width=3D124) (actual time=3D252.636..252.927 rows=3D5413 loops=3D3) So, the patch was expected to cause the number of rows passing through the Gather Merge to decrease from 197 to 19, but actually caused the number of rows passing through the Gather Merge to increase from 901 to 15242. When the PartialAggregate was positioned at the top of the join tree, it reduced the number of rows from 5081 to 300; but when it was pushed down below two joins, it didn't reduce the row count at all, and the subsequent two joins reduced it by less than 10%. Now, you could complain about the fact that the Parallel Hash Join isn't well-estimated here, but my question is: why does the planner think that the PartialAggregate should go specifically here? In both plans, the PartialAggregate isn't expected to change the row count. And if that is true, then it's going to be cheapest to do it at the point where the joins have reduced the row count to the minimum value. Here, that would be at the top of the plan tree, where we have only 5081 estimated rows, but instead, the patch chooses to do it as soon as we have all of the grouping columns, when we. still have 5413 rows. I don't understand why that path wins on cost, unless it's just that the paths compare fuzzily the same, in which case it kind of goes to my earlier point about not really having the statistics to know which way is actually going to be better. --=20 Robert Haas EDB: http://www.enterprisedb.com