public inbox for [email protected]  
help / color / mirror / Atom feed
From: Dimitrios Apostolou <[email protected]>
To: David Rowley <[email protected]>
Cc: Ashutosh Bapat <[email protected]>
Cc: [email protected]
Subject: Re: Query on partitioned table needs memory n_partitions * work_mem
Date: Fri, 19 Jul 2024 21:33:35 +0200 (CEST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAApHDvp_u6SbyTKtk9m7ssh8H4dXMoLMZrAq8a1sJk1cj6YxzQ@mail.gmail.com>
	<[email protected]>
	<CAApHDvpG7Hxb81exMZGuwpTQLbeApMNZvffOmfJ3yJnKAYxqYA@mail.gmail.com>
	<[email protected]>

Ok I reproduced the OOM, with only 200 partitions as opposed to 2K that I
mentioned before. The keys to reproduce it I believe are:

1. Write millions of rows to *all* partitions
2. Run ANALYSE so that the planner knows about the sizes


Here are the correct steps now. Let me know if you manage to reproduce.
Kind reminder to set vm.overcommit_memory to 2, before you lose your
system like it happened to me. :-)


CREATE TABLE   partitioned_table1(
     run_n       bigint   GENERATED ALWAYS AS IDENTITY,
     workitem_n  integer  NOT NULL,
     label_n     smallint,
     result      smallint NOT NULL,
     PRIMARY KEY(workitem_n, run_n)
) PARTITION BY RANGE(workitem_n);

-- Create 200 partitions

DO $$
for i in range(0, 200):
     start = i     * 10 * 1000 * 1000
     end   = (i+1) * 10 * 1000 * 1000
     stmt = f'''
         CREATE TABLE part{i}
             PARTITION OF partitioned_table1
             FOR VALUES FROM ({start}) TO ({end})
     '''
     plpy.execute(stmt)
$$ LANGUAGE plpython3u;

-- Insert 20M rows per partition. This will take a while but I don't know
-- of a way to speed it up.

DO $$
for i in range(0, 2000):
     stmt = f'''
         INSERT INTO partitioned_table1(workitem_n, label_n, result)
             SELECT
                     j,
                     CAST(random()*1000 AS INTEGER),
                     CAST(random()*4    AS INTEGER)
                 FROM generate_series({i}*1000*1000, ({i}+1)*1000*1000 - 1, 1) as j
     '''
     plpy.info(stmt)
     plpy.execute(stmt)
     plpy.commit()
$$ LANGUAGE plpython3u;

-- Important to analyse! and set the right settings...

ANALYSE partitioned_table1;
SET SESSION max_parallel_workers_per_gather TO 0;
SET SESSION enable_incremental_sort         TO off;
SET SESSION enable_partitionwise_aggregate  TO on;
SET SESSION enable_partitionwise_join       TO on;

-- And then reproduce the issue:

\timing on
SET SESSION work_mem TO '4MB';

SELECT
       workitem_n, label_n, bool_or(result IN (2,3))
FROM
       partitioned_table1
GROUP BY
       workitem_n, label_n
LIMIT 10;


Repeat that last query while doubling the work_mem every time. You'll see
the process growing to immense dimensions. In my case, my 16GB of RAM were
exhausted when work mem was set to '128MB'.

Remarkably, the fastest execution is the one with the smallest work_mem,
where HashAggregate splits the work into many "partitions". As we grow the
work_mem, it becomes slower and slower (of course while not reaching the
limits of RAM yet; if it goes to swap then all measurements are off).

Let me know if you manage to reproduce the issue!

Dimitris







view thread (3+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Query on partitioned table needs memory n_partitions * work_mem
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox