Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uxLc4-00DNjI-Tj for pgsql-hackers@arkaria.postgresql.org; Sat, 13 Sep 2025 08:28:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uxLc0-005MMa-Fl for pgsql-hackers@arkaria.postgresql.org; Sat, 13 Sep 2025 08:28:01 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uxLc0-005MM6-0M for pgsql-hackers@lists.postgresql.org; Sat, 13 Sep 2025 08:28:00 +0000 Received: from mail-yb1-xb2b.google.com ([2607:f8b0:4864:20::b2b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uxLbt-000AZL-0e for pgsql-hackers@lists.postgresql.org; Sat, 13 Sep 2025 08:27:54 +0000 Received: by mail-yb1-xb2b.google.com with SMTP id 3f1490d57ef6-ea3e82483d3so1078888276.1 for ; Sat, 13 Sep 2025 01:27:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757752073; x=1758356873; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qkrsRWbBSDYhLUqwgfFsiD5F9YjBfeRHHhqTOPQmiVY=; b=G1iHeqO1sxbcL7dlYtBjEXSFD7q22KyUyEdZoSM7of7Mhp/kXX2VZZx4unlbxwtMwI Y35j7jOWjo0bA8YPPcReiW6a2D7Vvpq6Qy0ul43i35FJh3aBqMDU7ILDSsk61yYYvKF8 7e7IbonnW4ZZvgj0HCDcw143sjBVtkwdltFigxtT66nvUavfQ09NoWhJ+SYxFmQ5EVrt 4tOgQ4oKqaAmADbOR1qunWERQA/DG9iScRB/8Tx8078VR3hBFyfV+yvBL+99quSjlr5h raqKc6ZL7/nX6z7BrTYAxSiKSqSRq1kvPHJIdLPx7hp7IFq9KV9J9eCN/T8G5wOCVLeA JkMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757752073; x=1758356873; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qkrsRWbBSDYhLUqwgfFsiD5F9YjBfeRHHhqTOPQmiVY=; b=dMzwgY7lTZWLOuBQm3Wr7p5xisSuwGnxymBfzOX5DYxNwJbTHqXQRgKLwOU3Amm3uV 5xNfpW67YBFZDO3OVF2yLIL3GRLZ/HN30vgLNvNDQLT9uWN39a3v0gne+N03H+NLEbUy hx0uDermbz0uIoyLL74LBamHIL7YUVfHTtslasHdi9+OjGSkxqdTcBLedSEcUo2Lvp+a I5vWCX7R9E0/fe/MZs5wiehTghjN7tvpdJdaBn0rW4snSa+mTIJZ0dioAVFHD0njHsME A/vg8cx5dFcBnbeIEvrOv+JYQcVHHuYO6hp1i9Cjkz8GMuOZXZqBjWNIgA4CSlgMegkc 6lVw== X-Forwarded-Encrypted: i=1; AJvYcCWGXMMZu5A+sTpU5FaXgvFuxPWf61+4AaR2vZ8GhfwoxaD1tve19oyCI825VO7X2chQihU4GDbYMoCzCtnW@lists.postgresql.org X-Gm-Message-State: AOJu0Yy4Pa0ZBgqJxHWzjZnNoiiqc7gomrZ944BqQHIuEfZ09SxzIYrA i5kXMwUwhZxV5j4TBxB6cTQRGVx8g6v9tbGIjdtHoTmyZxg58TKGrGwcyBe+6tlrKw58JtPZeJY Wo7dr5Ix3NIwXoecHg1F8/NW1e4CUJdM= X-Gm-Gg: ASbGncvHXfbkTh0KrrEWe64lN8xkSVA2NCK2Aum22a6S9H6aGZSEyJv0U8tvspLuy4J dblm5B9uzQ2dPBEHf614Oon1vZnbtfLIEhVJ4Asw6aZzzePkyWlQ3SkI5ksg+eVHM78RTWvURYd O0cl9SVcxnNa93ASApTWD4QREmAEKbOotMx9+n5rQZ1qQAy7BaFyfdelucr/8Z/Tqd00kSGj+kx lKLU1wQyLBXHSWH7a+c5z1JPEP/VUA= X-Google-Smtp-Source: AGHT+IFNBS69E8PVatmbTP0JRPexWfjFLxiYuFnx7qgSt+5uIer9ydipTjlw4JnhMZMmRw2R0KH63XYvNQ5gZbyV1f0= X-Received: by 2002:a05:690c:7604:b0:730:d8be:abae with SMTP id 00721157ae682-730d8bead63mr33999467b3.17.1757752072549; Sat, 13 Sep 2025 01:27:52 -0700 (PDT) MIME-Version: 1.0 References: <87il22cj51.fsf@163.com> In-Reply-To: From: Richard Guo Date: Sat, 13 Sep 2025 17:27:41 +0900 X-Gm-Features: Ac12FXwKxDnc0pOlA1A9HjaGxkw3eewc3fem-8DfLZh77Mx0fvIMomwj0utg-IU Message-ID: Subject: Re: Eager aggregation, take 3 To: Robert Haas Cc: Tom Lane , Tender Wang , Paul George , Andy Fan , PostgreSQL-development , pgsql-hackers@lists.postgresql.org, Matheus Alcantara Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Sep 13, 2025 at 3:48=E2=80=AFAM Robert Haas = wrote: > On Fri, Sep 12, 2025 at 5:34=E2=80=AFAM Richard Guo wrote: > > I really like this idea. Currently, aggtransspace represents an > > estimate of the transition state size provided by the aggregate > > definition. If it's set to zero, a default estimate based on the > > state data type is used. Negative values currently have no defined > > meaning. I think it makes perfect sense to reuse this field so that > > a negative value indicates that the transition state data can grow > > unboundedly in size. > > > > Attached 0002 implements this idea. It requires fewer code changes > > than I expected. This is mainly because that our current code uses > > aggtransspace in such a way that if it's a positive value, that value > > is used as it's provided by the aggregate definition; otherwise, some > > heuristics are applied to estimate the size. For the aggregates that > > accumulate input rows (e.g., array_agg, string_agg), I don't currently > > have a better heuristic for estimating their size, so I've chosen to > > keep the current logic. This won't regress anything in estimating > > transition state data size. > This might be OK, but it's not what I was suggesting: I was suggesting > trying to do a calculation like space_used =3D -aggtransspace * > rowcount, not just using a <0 value as a sentinel. I've considered your suggestion, but I'm not sure I'll adopt it in the end. Here's why: 1) At the point where we check whether any aggregates might pose a risk of excessive memory usage during partial aggregation, row count information is not yet available. You could argue that we could reorganize the logic to perform this check after we've had the row count, but that seems quite tricky. If I understand correctly, the "rowcount" in this context actually means the number of rows within one partial group. That would require us to first decide on the grouping expressions for the partial aggregation, then compute the group row counts, then estimate space usage, and only then decide whether memory usage is excessive and fall back. This would come quite late in planning and adds nontrivial overhead, compared to the current approach which checks at the very beginning. 2) Even if we were able to estimate space usage based on the number of rows per partial group and determined that memory usage seems acceptable, we still couldn't guarantee that the transition state data won't grow excessively after further joins. Joins can multiply partial aggregates, potentially causing a blowup in memory usage even if the initial estimate seemed safe. 3) I don't think "-aggtransspace * rowcount" reflects the true memory footprint for aggregates that accumulate input rows. For example, what if we have an aggregate like string_agg(somecolumn, 'a very long delimiter')? 4) AFAICS, the main downside of the current approach compared to yours is that it avoids pushing down aggregates like string_agg() that accumulate input rows, whereas your suggestion might allow pushing them down in some cases where we *think* it wouldn't blow up memory. You might argue that the current implementation is over-conservative. But I prefer to start safe. That said, I appreciate you proposing the idea of reusing aggtransspace, although I ended up using it in a different way than you suggested. - Richard