public inbox for [email protected]
help / color / mirror / Atom feedFrom: Michail Nikolaev <[email protected]>
To: Matthias van de Meent <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: Melanie Plageman <[email protected]>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: Mon, 6 Jan 2025 14:36:00 +0100
Message-ID: <CANtu0ojHAputNCH73TEYN_RUtjLGYsEyW1aSXmsXyvwf=3U4qQ@mail.gmail.com> (raw)
In-Reply-To: <CAEze2Wj1sY+A6zKJtWhdXSmA9wjmP8jiazHgyH1O37NnSjH5SQ@mail.gmail.com>
References: <CAEze2Wj9SgwOpe_1CWnS_D-txQaQyXArR=dm4DTnha93=yua4g@mail.gmail.com>
<CANtu0ohFr7OzNSbxqBhUpR0mXDYyt0Xt6+=Tbq0EC7as7kr+Lg@mail.gmail.com>
<CANtu0oh4PwBn_h+4p_MxFigRAyJvF-0nA9Tm5NFRwfsWWjZQiA@mail.gmail.com>
<CANtu0ojHEVU9U_bxgViRmtqNTJ92LnF+76-yzn4axYjGsK2kqQ@mail.gmail.com>
<CANtu0ogS871NkdUnZW9P_LVpLzhSJ1+cETK0b55cYjs=v2qbPA@mail.gmail.com>
<CANtu0ohRVBDf4x7Ge3oVzgf4NzMb_DhmTM1ae0u1WUA+CD0UqA@mail.gmail.com>
<CANtu0ogTfyng-H4yWr3Pm_+PXX+XvDx1AM1sXTy1V7DM6jJ+Bw@mail.gmail.com>
<CANtu0oi+nbipJUsMZcoUfodCyuTN_DAXD22UstjMTYWG=tJ4jw@mail.gmail.com>
<CANtu0oiuUF7L0wTGxOHfumyoVge3n7C4rAjdmFo=efeEwobXbg@mail.gmail.com>
<CANtu0oiD-AvXdygYqYP-WkFq=7vSL78Wj8UU-PUX+3huPNqroQ@mail.gmail.com>
<[email protected]>
<CANtu0og-4pvn4+TCWH6U9ghyd7x7NBAZSgi4ZWyBZdBWH6OpWA@mail.gmail.com>
<CANtu0oj6EXRq2US8gkz7ehELeECDsVAXq4R0iRfOaRW9GK=5Dw@mail.gmail.com>
<CAEze2Wj1sY+A6zKJtWhdXSmA9wjmP8jiazHgyH1O37NnSjH5SQ@mail.gmail.com>
Hello, everyone!
Some benchmark results are ready. You can access them via [0] or check the
attachments. The benchmark code is available at [1].
A few words about the environments and tests:
There are two environments:
* local: AMD Ryzen 7 7700X (8-Core), 32GB RAM, local high-performance NVMe
SSD [2].
* io2: AWS t2.2xlarge, 8 vCPUs, 32GB RAM, 300GB io2 with 64,000 IOPS (the
fastest available).
There are few tests:
* btree_abalance - A basic new index on a frequently modified field
query: CREATE INDEX CONCURRENTLY idx ON pgbench_accounts (abalance)
* btree_unique - A simple unique index
query: CREATE UNIQUE INDEX CONCURRENTLY idx ON pgbench_accounts (aid)
* btree_unique_hot - A unique index with multiple tuples sharing the same
value, caused by another index
schema: CREATE INDEX idx2 ON pgbench_accounts (abalance)
query: CREATE UNIQUE INDEX CONCURRENTLY idx ON pgbench_accounts (aid)
* brin - A basic BRIN index
query: CREATE INDEX CONCURRENTLY idx ON pgbench_accounts USING
brin(abalance)
* hash - A basic hash index
query: CREATE INDEX CONCURRENTLY idx ON pgbench_accounts USING hash(bid)
* gist - A simple GiST index
schema: CREATE EXTENSION btree_gist
query: CREATE INDEX CONCURRENTLY idx ON pgbench_accounts using
gist(abalance)
* gin - A simple GIN index
schema: CREATE EXTENSION btree_gin
query: CREATE INDEX CONCURRENTLY idx ON pgbench_accounts using
gin(abalance)
The tests were executed on the pgbench schema with a scale factor of 2000
(approximately 30GB) and a fill factor of 95.
Two types of concurrent loads were tested:
* IO-bound scenario: pgbench with 8 clients.
* CPU-bound scenario: pgbench with 50 clients.
As you can see, the index build time results are quite impressive—up to 4x
faster in some cases!
However, there’s something unusual with the GiST index. Occasionally,
sometimes it takes more time to build. I'll investigate that.
The auxiliary index size is relatively small, typically less than 1MB.
You can also observe the typical comparison results of TPS and oldest xmin
during index builds in the provided images (except for GiST, which shows
some anomalies).
>> (mostly because of a single heap scan).
> Isn't there a second heap scan, or do you consider that an index scan?
It is something between.
First phase: a regular heap scan is performed (with snapshot resetting).
Second phase: we collect all TIDs from target and auxiliary indexes, sort
them, and fetch from heap only records which are not present in the target
index (new tuples created during the first phase).
> I think a good benchmark could show how bloat is actually prevented,
> i.e. through result table size comparisons on an update-heavy
> workload, both with and without the patch.
> I think it shouldn't be too difficult to show how such workloads
> quickly regress to always extending the table as no cleanup can
> happen, while patched they'd have much more leeway due to page
> pruning. Presumably a table with a fillfactor <100 would show the best
> results.
I can’t see any significant differences from these tests so far. However, I
think this might be due to the random selection of tuples—there’s almost
always space available to place a new version on the same page.
I’ll try running the tests with a different distribution. Additionally, to
produce bloat comparable to a ~30GB table, updates will need to run for a
longer period.
Best regards,
Mikhail.
[0]:
https://docs.google.com/spreadsheets/d/1UYaqpsWSfYdZdQxaqY4gVo0RW6KrT0d-U1VDNJB8lVk/edit?usp=sharing
[1]:
https://gist.github.com/michail-nikolaev/b33fb0ac1f35729388c89f72db234b0f
[2]:
https://www.harddrivebenchmark.net/hdd.php?hdd=WD%20PC%20SN810%20SDCPNRZ%202TB&id=29324
Attachments:
[application/pdf] PG benchmark 2 - summary.pdf (417.4K, 3-PG%20benchmark%202%20-%20summary.pdf)
download
[image/png] tps.png (68.8K, 4-tps.png)
download | view image
[image/png] graphs.png (108.8K, 5-graphs.png)
download | view image
[image/png] oldest_xmin_age.png (26.0K, 6-oldest_xmin_age.png)
download | view image
view thread (33+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
In-Reply-To: <CANtu0ojHAputNCH73TEYN_RUtjLGYsEyW1aSXmsXyvwf=3U4qQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox