Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t6kcU-00FH7Q-Gb for pgsql-hackers@arkaria.postgresql.org; Fri, 01 Nov 2024 05:54:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t6kcR-009juD-Qh for pgsql-hackers@arkaria.postgresql.org; Fri, 01 Nov 2024 05:54:48 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t6kcR-009jtx-CN for pgsql-hackers@lists.postgresql.org; Fri, 01 Nov 2024 05:54:47 +0000 Received: from mail-yb1-xb2d.google.com ([2607:f8b0:4864:20::b2d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t6kcK-003uFn-JL for pgsql-hackers@lists.postgresql.org; Fri, 01 Nov 2024 05:54:46 +0000 Received: by mail-yb1-xb2d.google.com with SMTP id 3f1490d57ef6-e30d212b6b1so1544411276.0 for ; Thu, 31 Oct 2024 22:54:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730440479; x=1731045279; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CTXA38G3GjMlokfG5F773Re0fzWRBtsU401+4ykIuUk=; b=Bw4dWwRQfZ6zJwtn4pBS7mYlWMzTenAbhrCvCnQhzu2uK9G8QU2mm9fQU6lAhnQ3g6 Uh6EFKvM711IngMNp0HGb/LNylF2eNdrMQoAlBfbUQJ2L0F7lzSdKgQIeOAeV1Ddq489 FP/MOb+FFBFY0w6Ubcnt+K0QW6uLQGStw9vuDj9by9R7pby79sc7wEbO5R+Z+bHgml/n bGHQ/VqzF7U83wZFmnUiBfUB8mY5iFVEa99heVOFQFKVjafshVoTeIngvLUlPSpatwdZ ypsJG2mSwdG/eWMxTIFatkpRB9iL+vgE3TPSnLF/vR4sGnuzdGgk4Jc960orPiyjAsuu r+MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730440479; x=1731045279; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CTXA38G3GjMlokfG5F773Re0fzWRBtsU401+4ykIuUk=; b=VqzMX6NWtLr3Mu1eBoM5dDUMRnW+mBYdDZaD/Aj5CrZvyOrW28sJgzZqGlHwotg9xn JAxbMSiPgykpgaoUXeEObYct18RacAtSDpenS3egLbx2FHf93GaZcjsWRziIcF2UhiSZ 3FLiDvNH9pnsy/QvAVXOsXcVP/iWCwCAPef+l9W9Rj67lisYf6gbkBxgR3tVjpURPlHG oYN7xr7u15q9x45Fpm0hGz333SpzVel3y2Jw1w5Nlf635/BFArBczQIc07ksopqtgmAS 2USWE8iiC5vlJUbUFiTIRKPHso5lpldvUn6RiZJm70QCKOYgpySVcKZS9oFKq89YUBXM 6lZw== X-Forwarded-Encrypted: i=1; AJvYcCXbK5vpzYBGxy9S4z2up6fw78g6ZKKe0gXIiipFPGzyZz8rcwOsGhDvczAo0U/g076rBBWD69YQvC/ZZw+F@lists.postgresql.org X-Gm-Message-State: AOJu0Yy/B60nJpvTHNcmXH573rUP4/oA0x6NYVGhQWm/F+mvbUoReCrP zeWTbLDBxZyzrqg5DvKL/QC5CPN27N5ASRbaKAchSLNvfzYgZk9D74dF/Gu96V4H4cW5Gt8rHUR dX3f8ef37LYBLsrt+MqjPH0UhqZg= X-Google-Smtp-Source: AGHT+IEslkT1zcyVlUT80EGYGFQuYPutsQDbLpsA7FXH61uqvdxyAv/mf9MCAHtfql5VGQouEh9rPeg0L1oEIP9BUDo= X-Received: by 2002:a05:690c:62c6:b0:6e7:e22b:fb80 with SMTP id 00721157ae682-6e9d89948e2mr254064767b3.19.1730440479072; Thu, 31 Oct 2024 22:54:39 -0700 (PDT) MIME-Version: 1.0 References: <87il22cj51.fsf@163.com> In-Reply-To: From: Richard Guo Date: Fri, 1 Nov 2024 14:54:28 +0900 Message-ID: Subject: Re: Eager aggregation, take 3 To: Robert Haas Cc: Tender Wang , Paul George , Andy Fan , PostgreSQL-development , pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Oct 30, 2024 at 5:06=E2=80=AFAM Robert Haas = wrote: > On Tue, Sep 24, 2024 at 11:20=E2=80=AFPM Richard Guo wrote: > > The reason is that it is very tricky to set the size estimates for a > > grouped join relation. For a non-grouped join relation, we know that > > all its paths have the same rowcount estimate (well, in theory). But > > this is not true for a grouped join relation. Suppose we have a > > grouped join relation for t1/t2 join. There might be two paths for > > it: > > What exactly do you mean by "well, in theory" here? My understanding > of how things work today is that every relation is supposed to produce > a specific set of rows and every unparameterized path must produce > that set of rows. The order of the rows may vary but the set of rows > may not. With your proposed design here, that's no longer true. > Instead, what's promised is that the row sets will become equivalent > after a later FinalizeAggregate step. In a sense, this is like > parameterization or partial paths. Yeah, you're correct that currently each relation is expected to produce the same specific set of rows. When I say "well, in theory" I mean that for a join relation, all its unparameterized paths should theoretically have the same row count estimate. However, in practice, because there are more than one way to make a joinrel for more than two base relations, and the selectivity estimation routines don=E2=80=99t handle all cases equally well, we might get different row count estimates depending on the pair provided. And yes, with the grouped relations proposed in this patch, the situation changes. For a grouped join relation, different paths can produce very different row sets, depending on where the partial aggregation is placed in the path tree. This is also why we need to recalculate the row count estimate for a grouped join path using its outer and inner paths provided, rather than using path->parent->rows directly. This is very like the parameterized path case. > I think what you're doing here is roughly equivalent to either of > these two cases. It's more like the parameterized path case. Instead > of having a path for a relation which is parameterized by some input > parameter, you have a path for a relation, say bar, which is partially > aggregated by some grouping column. But there's no guarantee of how > much partial aggregation has been done. In your example, PartialAgg(t1 > JOIN t2) is "more aggregated" than t1 JOIN PartialAgg(t2), so the row > counts are different. This makes me quite nervous. You can't compare a > parameterized path to an unparameterized path, but you can compare it > to another parameterized path if the parameterizations are the same. > You can't compare a partial path to a non-partial path, but you can > compare partial paths to each other. But with this work, > unparameterized, non-partial paths in the same RelOptInfo don't seem > like they are truly comparable. Maybe that's OK, but I'm not sure that > it isn't going to break other things. Perhaps we could introduce a GroupPathInfo into the Path structure to store common information for a grouped path, such as the location of the partial aggregation (which could be the relids of the relation on top of which we place the partial aggregation) and the estimated rowcount for this grouped path, similar to how ParamPathInfo functions for parameterized paths. Then we should be able to compare the grouped paths of the same location apples to apples. I haven=E2=80=99t thought this through in detail yet, though. > It's tempting to wonder if there's some way that we can avoid > generating paths for both PartialAgg(t1 JOIN t2) and t1 JOIN > PartialAgg(t2). Either the former has lower cardinality, or the latter > does. It seems likely that the lower-cardinality set is the winning > strategy. Even if the path has higher cost to generate, we save work > at every subsequent join level and at the final aggregation step. Are > there counterexamples where it's better to use a path from the > higher-cardinality set? This is very appealing if we can retain only the lower-cardinality path, as it would greatly simplify the overall design. I haven't looked for counterexamples yet, but I plan to do so later. > By the way, the work of figuring out what target list should be > produced by partial grouping is done by init_grouping_targets(), but > the comments seem to take it for granted that I know what result we're > trying to produce, and I don't. I think some more high-level > explanation of the goals of this code would be useful. It seems to me > that if I'm looking at a path for an ungrouped relation and it > produces a certain target list, then every column of that target list > is needed somewhere. If those columns are group keys, cool: we pass > those through. If they're inputs to the aggregates, cool: we feed them > to the aggregates. But if they are neither, then what? In the patch, > you either group on those columns or add them to the > possibly_dependent list depending on the result of > is_var_needed_by_join(). I can believe that there are some cases where > we can group on such columns and others where we can't, but find it > difficult to believe that this test reliably distinguishes between > those two cases. If it does, I don't understand why it does. Don't I > need to know something about how those columns are used in the upper > joins? Like, if those columns are connected by a chain of > binary-equality operators back to the user's choice of grouping > columns, that sounds good, but this test doesn't distinguish between > that case and an upper join on the < operator. > create_grouping_expr_infos() does reason based on whether there's an > equal-image operator available, but AIUI that's only reasoning about > the group columns the user mentioned, not the sort of implicit > grouping columns that init_grouping_targets() is creating. Yeah, this patch does not get it correct here. Basically the logic is that for the partial aggregation pushed down to a non-aggregated relation, we need to consider all columns of that relation involved in upper join clauses and include them in the grouping keys. Currently, the patch only checks whether a column is involved in upper join clauses but does not verify how the column is used. We need to ensure that the operator used in the join clause is at least compatible with the grouping operator; otherwise, the grouping operator might interpret the values as the same while the join operator sees them as different. Thanks Richard