Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sjUra-001FQl-Av for pgsql-hackers@arkaria.postgresql.org; Thu, 29 Aug 2024 02:26:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sjUrX-00DM3p-Pl for pgsql-hackers@arkaria.postgresql.org; Thu, 29 Aug 2024 02:26:16 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sjUrX-00DM3g-CJ for pgsql-hackers@lists.postgresql.org; Thu, 29 Aug 2024 02:26:16 +0000 Received: from mail-yw1-x1129.google.com ([2607:f8b0:4864:20::1129]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sjUrV-001wSn-55 for pgsql-hackers@lists.postgresql.org; Thu, 29 Aug 2024 02:26:14 +0000 Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-6c91f9fb0acso1966927b3.2 for ; Wed, 28 Aug 2024 19:26:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724898372; x=1725503172; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lL2lBo0dHbnzqTwhCaQTQNPH7RbHrRXegHlxYa5BBSc=; b=NaJPLeHFUWi4yO9d/Mg0q3eM7rP5majnqxIWAFYEth9MO8yLQ+xqiMIDaEP7h/aSNF 4H4MNowdJ0xhSYqMZXpKaj8aCc/Xf8QWqHUgCGSLqyu6EmZHekMDu3bZ/e8wShpgO+ai 77Dk9jUFHROojT+kGdHgEtcNBZrNLDKTw4gtvMKFof+6oQjRxm+/ggo//L2DiVWNycOK FAw0dYidjPA7fujD7sj4nqowY+uNl3s5yF1WrOqV39oS1ZAmIDry2RvEo9gS1QLLIVhz 1aebNkK1QjAHXb/qKYtcLcNUeNl7rcowFgxBvSwsD3Xc+T5L+OY84we04NxDJWKqYyjZ U6AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724898372; x=1725503172; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lL2lBo0dHbnzqTwhCaQTQNPH7RbHrRXegHlxYa5BBSc=; b=s8Q9/Oit8FrD1tErROopvlQGWDAN26xIDavgGwlrb+UCX+qfmwn3jFytYrTj0yRumX QFzgfAbBm54Cy86svmir7ZnTG4NNIoep+qwd7zjtB3R3L8liUefvarQ8qt0Rg7kxzvm5 wFMweBW/rIgK01MX6isVJMwq3vUWirxSWjrN3neAav2N+Iw6tmHMAAwkYp9Zd8uDWBD0 wlRiVDQ9Y6hnafGLsIZBS3x36nojplvQKlaqCb1plwTpg3vsRTZ7ZF8QF8H8GQCQ1Z+h m1N2o4dkg1WjFetSRJc2Mpj2Ow9rmAQkuSMmQYOudVmAH/usB7JfuBn81cyNyvSE+I4a g/GQ== X-Forwarded-Encrypted: i=1; AJvYcCVtwRBBnQpH8cpm4BeVG+OpxYPweg/JNW81I5HqVktPDnF4ngCSUq9XDEuFYmSMJia9O/QP4UEi/YkTkrP2@lists.postgresql.org X-Gm-Message-State: AOJu0YzVzrMeUOq4McJhHc2cx1DWCC5LVPGYxcfmXU0N4Jg7b2KrxG4B dNdnhfxiFNigyGB153XW9iUOIG0em2RMi6lDlYLTg3PSV5EO6/IQFsv5YkwDWKW9adGFdensvgO KFjCoqGPq3DbJrG75YDzQo0ikfL8= X-Google-Smtp-Source: AGHT+IEGsTixVW/rCgkVWQaQGQyKSEjiUneo2cNQFsZztJlu+72I9xwkwzptl4mr6EwZVUe531pq53w4zlLbjYaIkcU= X-Received: by 2002:a05:690c:4d06:b0:6ad:219a:b687 with SMTP id 00721157ae682-6d27823f08amr15405067b3.39.1724898372291; Wed, 28 Aug 2024 19:26:12 -0700 (PDT) MIME-Version: 1.0 References: <87il22cj51.fsf@163.com> In-Reply-To: From: Richard Guo Date: Thu, 29 Aug 2024 10:26:01 +0800 Message-ID: Subject: Re: Eager aggregation, take 3 To: Robert Haas Cc: Paul George , Andy Fan , PostgreSQL-development , pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Aug 23, 2024 at 11:59=E2=80=AFPM Robert Haas wrote: > Here are some initial, high-level thoughts about this patch set. Thank you for your review and feedback! It helps a lot in moving this work forward. > 1. As far as I can see, there's no real performance testing on this > thread. I expect that it's possible to show an arbitrarily large gain > for the patch by finding a case where partial aggregation is way > better than anything we currently know, but that's not very > interesting. What I think would be useful to do is find a corpus of > existing queries on an existing data set and try them with and without > the patch and see which query plans change and whether they're > actually better. For example, maybe TPC-H or the subset of TPC-DS that > we can actually run would be a useful starting point. One could then > also measure how much the planning time increases with the patch to > get a sense of what the overhead of enabling this feature would be. > Even if it's disabled by default, people aren't going to want to > enable it if it causes planning times to become much longer on many > queries for which there is no benefit. Right. I haven=E2=80=99t had time to run any benchmarks yet, but that is something I need to do. > 2. I think there might be techniques we could use to limit planning > effort at an earlier stage when the approach doesn't appear promising. > For example, if the proposed grouping column is already unique, the > exercise is pointless (I think). Ideally we'd like to detect that > without even creating the grouped_rel. But the proposed grouping > column might also be *mostly* unique. For example, consider a table > with a million rows and a column 500,000 distinct values. I suspect it > will be difficult for partial aggregation to work out to a win in a > case like this, because I think that the cost of performing the > partial aggregation will not reduce the cost either of the final > aggregation or of the intervening join steps by enough to compensate. > It would be best to find a way to avoid generating a lot of rels and > paths in cases where there's really not much hope of a win. > > One could, perhaps, imagine going further with this by postponing > eager aggregation planning until after regular paths have been built, > so that we have good cardinality estimates. Suppose the query joins a > single fact table to a series of dimension tables. The final plan thus > uses the fact table as the driving table and joins to the dimension > tables one by one. Do we really need to consider partial aggregation > at every level? Perhaps just where there's been a significant row > count reduction since the last time we tried it, but at the next level > the row count will increase again? > > Maybe there are other heuristics we could use in addition or instead. Yeah, one of my concerns with this work is that it can use significantly more CPU time and memory during planning once enabled. It would be great if we have some efficient heuristics to limit the effort. I'll work on that next and see what happens. > 3. In general, we are quite bad at estimating what will happen to the > row count after an aggregation, and we have no real idea what the > distribution of values will be. That might be a problem for this > patch, because it seems like the decisions we will make about where to > perform the partial aggregation might end up being quite random. At > the top of the join tree, I'll need to compare directly aggregating > the best join path with various paths that involve a finalize > aggregation step at the top and a partial aggregation step further > down. But my cost estimates and row counts for the partial aggregate > steps seem like they will often be quite poor, which means that the > plans that use those partial aggregate steps might also be quite poor. > Even if they're not, I fear that comparing the cost of those > PartialAggregate-Join(s)-FinalizeAggregate paths to the direct > Aggregate path will look too much like comparing random numbers. We > need to know whether the combination of the FinalizeAggregate step and > the PartialAggregate step will be more or less expensive than a plain > old Aggregate, but how can we tell that if we don't have accurate > cardinality estimates? Yeah, I'm concerned about this too. In addition to the inaccuracies in aggregation estimates, our estimates for joins are sometimes not very accurate either. All this are likely to result in regressions with eager aggregation in some cases. Currently I don't have a good answer to this problem. Maybe we can run some benchmarks first and investigate the regressions discovered on a case-by-case basis to better understand the specific issues. Thanks Richard