Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sUD8E-001JMN-Od for pgsql-general@arkaria.postgresql.org; Wed, 17 Jul 2024 22:28:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sUD8B-00609C-TX for pgsql-general@arkaria.postgresql.org; Wed, 17 Jul 2024 22:28:16 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sUD8B-006094-J8 for pgsql-general@lists.postgresql.org; Wed, 17 Jul 2024 22:28:16 +0000 Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sUD89-0005hz-CD for pgsql-general@lists.postgresql.org; Wed, 17 Jul 2024 22:28:14 +0000 Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-52ea2ce7abaso256442e87.0 for ; Wed, 17 Jul 2024 15:28:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721255292; x=1721860092; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sh/Lht0SKZB+ay+ZQbQHh+5g831XB7JI7B+G8h/vaIc=; b=k74Dip7kHv7EEOukk9gEt8pokDjNdUUpAJ4zYoM6bW7vPRRlFqouEfI9lkLylKFdbY WdVTjzOtbz1OcCkTbNPAeJ/C15lXLnxv/tShkXctDdOekScYqNF3uy2XvQnvl8sTXShn A/+WaVhzl6mR08VY1szxHZmO/z4bnDW6uGjoFWfixPvLsKJM/8QDypEuRgiItSy75PXs W9ugj+QAtASQLyPl5LAfv/HynwkNzR47+nMw+0YmO8mPjE8JaLHyUsSeUqTwFILW+MLs gZmETHM57EcHXlvHjpYdcgt6EX3ubUBCd8UGRZiQLNjW+j2AvPIs2V4gWJqJ1+qPWLfv Bd4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721255292; x=1721860092; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sh/Lht0SKZB+ay+ZQbQHh+5g831XB7JI7B+G8h/vaIc=; b=TVx705/neEpKnxI35cS0mEB/oaQ880Bc7kwnPNBmaICwHAATy77Sm1SNIuy3B4Lsjd bpIoBepTUPnAMVRH+Wo4u0OiYDa1eaavd89HtwT7wZT501Iec6G//clWmnIYFrItQcO5 18CHFtOKBgecMVa5nlhtMAaBcZuTgz8UBKKl3RVoQlogu8KPeb5EB5wJ9U1qB1Jzo4DR L2Z3Mkg/sF+/1BtHnw0DqGo7O83A1SkUYkGQjqJjCIZ7I+c+rYImfyjG1nR5mNHg3xF7 nrlTMGpvaalOYVJsBWgca/PuOifX8/8Pu/oASxIbSLRh0GCfH6WWmYfoziO/bllEmnOI o1YQ== X-Gm-Message-State: AOJu0YyCTjuGZKguC3iDYWNBxe7/bUfCeLxxsNlrDUuz/StCZ0cWzDH0 gbnA60saOVvX9ycRffmeM5BJqV3TqnjDJAWFA3lJSdDQpSbqEIxlUOGi0culSaXbnpI0UGAJUz/ 8JZdEJ6/us+pg1S08a8O5UW932cU= X-Google-Smtp-Source: AGHT+IHjiS+OaqEULlOVkpuB5rQxiZBeKpHg7UzGkHtH3bt3Eiwyghsuq0hGwgala65oa/XpSDre5+pU2y5AV+qOJc0= X-Received: by 2002:a05:6512:12c5:b0:52c:b09e:136d with SMTP id 2adb3069b0e04-52eeaeb09femr489704e87.32.1721255291403; Wed, 17 Jul 2024 15:28:11 -0700 (PDT) MIME-Version: 1.0 References: <3603c380-d094-136e-e333-610914fb3e80@gmx.net> In-Reply-To: From: David Rowley Date: Thu, 18 Jul 2024 10:27:59 +1200 Message-ID: Subject: Re: Query on partitioned table needs memory n_partitions * work_mem To: Dimitrios Apostolou Cc: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 12 Jul 2024 at 02:08, Dimitrios Apostolou wrote: > I can't help but see this as a bug. I see many issues: > > * postgres is not reading from partitions in parallel, but one after the > other. It shouldn't need all this memory simultaneously. I don't know for Hash Aggregate, but for nodes like Sort, we still hold onto the tuplestore after returning the last tuple as a rescan might want to read those tuples again. There's also a mark/restore that might want to rewind a little to match up to the next outer tuple of a Merge Join. It might be possible to let go of the memory sooner in plans when returning the final tuple means we'll never need the memory again, but that would require figuring out all the cases where that could happen and ensuring we don't ever release memory when it's required again. > * The memory is unnecessarily allocated early on, before any partitions > are actually aggregated. I know this because I/O is slow on this device > and the table sizes are huge, it's simply not possible that postgres > went through all partitions and blew up the memory. That would take > hours, but the OOM happens seconds after I start the query. That's interesting. Certainly, there is some memory allocated during executor startup, but that amount should be fairly small. Are you able to provide a self-contained test case that shows the memory blowing up before execution begins? > Having wasted long time in that, the minimum I can do is submit a > documentation patch. At enable_partitionwise_aggregate someting like > "WARNING it can increase the memory usage by at least > n_partitions * work_mem". How do I move on for such a patch? Pointers > would be appreciated. :-) I think mentioning something about this in enable_partitionwise_join and enable_partitionwise_aggregate is probably wise. I'll propose a patch on pgsql-hackers. David