MIME-Version: 1.0
References: <3603c380-d094-136e-e333-610914fb3e80@gmx.net>
In-Reply-To: <3603c380-d094-136e-e333-610914fb3e80@gmx.net>
From: David Rowley <dgrowleyml@gmail.com>
Date: Thu, 11 Jul 2024 14:17:26 +1200
Message-ID: <CAApHDvp_u6SbyTKtk9m7ssh8H4dXMoLMZrAq8a1sJk1cj6YxzQ@mail.gmail.com>
Subject: Re: Query on partitioned table needs memory n_partitions * work_mem
To: Dimitrios Apostolou <jimis@gmx.net>
Cc: pgsql-general@lists.postgresql.org
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://www.postgresql.org/message-id/CAApHDvp_u6SbyTKtk9m7ssh8H4dXMoLMZrAq8a1sJk1cj6YxzQ%40mail.gmail.com>
Precedence: bulk

On Thu, 11 Jul 2024 at 13:19, Dimitrios Apostolou <jimis@gmx.net> wrote:
> I have a table with 1000 partitions on PostgreSQL 16.
> I notice that a fairly complicated query of the form:
>
> SELECT ... GROUP BY ... LIMIT ...
>
> causes the postgres backend process to grow insanely very fast, and the
> kernel OOM killer to kill it rather soon.
> It seems it tries to allocate at least 1000 * work_mem.

> ->  Append
>    ->  HashAggregate
>      ->  Seq Scan
>    -> ... 1000 more hashagg+seqscans
>
>
> Is this allocation pattern (workmem * n_partitions) expected under any
> scenario? I can't find it documented.  AFAIU the backend should allocate
> up to (depth_of_execution_plan * work_mem) (putting aside the
> hash_mem_multiplier and the parallel workers).

Not depth of execution plan. It relates to the number of nodes in the
plan which allocate work_mem or work_mem * hash_mem_multiplier.

There is some documentation in [1]:

"Note that a complex query might perform several sort and hash
operations at the same time"

Also, see the warning about execution time memory in [2].

> NOTE: after having written the above message, it occured to me that I have
> enable_partitionwise_aggregate=on. And Turning it off fixes the issue and
> makes the query faster too! Expected behaviour or bug?

enable_partitionwise_aggregate=on causes this query to perform an
aggregate per partition.  If your GROUP BY clause values are
distributed evenly throughout all partitions then you might find it's
not much slower to execute the query with
enable_partitionwise_aggregate=off.

It's understandable that how PostgreSQL uses work_mem isn't ideal
here, but unfortunately, that's the state of affairs, currently. You
might want to reconsider your enable_partitionwise_aggregate setting
and/or how many partitions you have.

David

[1] https://www.postgresql.org/docs/current/runtime-config-resource.html
[2] https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES