Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uuXlo-00CDaD-7U for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Sep 2025 14:50:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uuXlm-008HKB-Lc for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Sep 2025 14:50:31 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uuXlm-008HJw-78 for pgsql-hackers@lists.postgresql.org; Fri, 05 Sep 2025 14:50:30 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uuXlk-000hBA-2x for pgsql-hackers@lists.postgresql.org; Fri, 05 Sep 2025 14:50:29 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-622b4b14a75so401973a12.1 for ; Fri, 05 Sep 2025 07:50:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757083827; x=1757688627; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ONhze3YcnErB7AAYDJrwMhoKBXZtg18ovhbDNJyiPpQ=; b=kdyKSCwpL6pe+ONVHvItiuBMw0sAilAbRNgLgeI8ei0O5rWYQaswW+jeqZ27T2hyPL Ne2r0tnjQoSPkD4aWzI7o8TNNIEcFvvHKqZP4CEXInzhfgoE/mk4zVLEmm4gjt/OCCbX y/0OmKyuDP52Rqp23naVtR2HNomro33lHfzUPWhu8RyslP+cgO0OadUBIsp9ZB64z7jD 1ZE+6MD0qxYad6Eu1ieclWNon3eYKeRpkzEH4WgHOQjwVv2mJIba3h/92RDp+jbfhQWz f5AMGQEuN3/5vuwN4KwYBQIqm3n9ik99m626LXJ8MjOpOAkR7LrqmAeXIxbryemDuX+v Oeyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757083827; x=1757688627; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ONhze3YcnErB7AAYDJrwMhoKBXZtg18ovhbDNJyiPpQ=; b=R7l5RnGwqlDVY/z45EgYQjaqYWe34ITkicTO4u48k0GogIJZj/sgpxbhwpd4ypDJm1 pvRMMKNuxa0ZUM4X2FtUewNN5+qiZfTsChtA+FXm1MzOtOevm+5hzWm6MlMwtM8N7+oP GdbD06DEjIqqZWsQNY53mYp04SheJ/CumswivJcKtIm3+Q3vJHlxRxnNL893gfNxvT/q pEoynNI5Z4aMHQUgpld+Rd/MTs1wyuMZ7nxsscBkVtNRovDKbXL6q4wFgj3wTke3xWDK Tu8DsmSWxHlra/TtGNYIpCH2NDTwHn+O1MYh/pLX+VCcjhfaMrdvg9BFk8o95nV43TYQ fjeA== X-Forwarded-Encrypted: i=1; AJvYcCUbnZm4QCGAZmXiEY//fz+iX41F/lLziRyZrkysTEeYJCkCp0Q0DXbH0yVdFWRyydEVHYN/OlJuEX7rHFS7@lists.postgresql.org X-Gm-Message-State: AOJu0YxG/htJkRc38PGoqbS3JhhgiJWKNk1pWVTaEF3PhlMj4PI3rcX/ wIit46dVHHGd69kfRBBI6mEcd9brH9+AOs4SHLh8iVhCSaYUlQ+YCa12jZ2mNOnSeYEcM576PnH 9BxRsK2zRqAg3w5L5NU0RXlONpLwWyJk= X-Gm-Gg: ASbGnctgdTtHjCR3bRMJnyONaS3isVqU3BloyBft5BWvh+MrINKlBtzc1Qr5R/fpLa2 Zi37U4DEYZp/jwDW8pVDEgHBU6PRvVVoeKfOyqDRz40DaCzHYDeDWhuO5uLd9+lb1WqBf/CImt0 RGThcapt5u0I+PGn++fEGo6+fEDhDGRsf3qT3C2JRmb7S31Y2m8jb3PJhZ+dvPd5XTFNEFtbY8Z XqdQWaoOPrUeN+CBRs= X-Google-Smtp-Source: AGHT+IFRUIXs/cjSFveVCn5n02C9nZheVPWmfrCmQKUsc0hoCn0PlhyBTJsZsQPMJOMXryxYgxw6DNOAi8Ko9pPAcs4= X-Received: by 2002:a17:907:1c1e:b0:b04:3302:d7a8 with SMTP id a640c23a62f3a-b043302ef3dmr1734761866b.58.1757083827304; Fri, 05 Sep 2025 07:50:27 -0700 (PDT) MIME-Version: 1.0 References: <87il22cj51.fsf@163.com> In-Reply-To: From: Robert Haas Date: Fri, 5 Sep 2025 10:50:15 -0400 X-Gm-Features: Ac12FXwTKMrsW_lZlMFrT80EPMdD2BL8L1JLKvrfS1H17A3rfSNDmOSIUIoxYik Message-ID: Subject: Re: Eager aggregation, take 3 To: Richard Guo Cc: Tom Lane , Tender Wang , Paul George , Andy Fan , PostgreSQL-development , pgsql-hackers@lists.postgresql.org, Matheus Alcantara Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Aug 6, 2025 at 3:52=E2=80=AFAM Richard Guo = wrote: > Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum > needed to consider eager aggregation for them. The resulting plans > show nice performance improvements without any measurable increase in > planning time. So, I'm inclined to lower the threshold to 10 for now. > (Wondering whether we should make this threshold a GUC, so users can > adjust it based on their needs.) Like Matheus, I think a GUC is reasonable. A significant danger here appears to be the possibility of a performance cliff, where queries are optimized very different when the ratio is 9.99 vs. 10.01, say. It would be nice if there were some way to mitigate that danger, but at least a GUC avoids chaining the performance of the whole system to a hard-coded value. It might be worth considering whether there are heuristics other than the group size that could help here. Possibly that's just making things more complicated to no benefit. It seems to me, for example, that reducing 100 rows to 10 is quite different from reducing a million rows to 100,000. On the whole, the latter seems more likely to work out well, but it's tricky, because the effort expended per group can be arbitrarily high. I think we do want to let the cost model make most of the decisions, and just use this threshold to prune ideas that are obviously bad at an early stage. That said, it's worth thinking about how this interacts with the just-considered-one-eager-agg strategy. Does this threshold apply before or after that rule? For instance, consider AGG(FACT_TABLE JOIN DIMENSION_TABLE), like a count of orders grouped by customer name. Aggregating on the dimension table (in this case, the list of customers) is probably useless, but aggregating on the join column of the fact table has a good chance of being useful. If we consider only one of those strategies, we want it to be the right one. This threshold could be the thing that helps us to get it right. --=20 Robert Haas EDB: http://www.enterprisedb.com