Re: Lowering the default wal_blocksize to 4K

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Matthias van de Meent <[email protected]>
To: Andres Freund <[email protected]>
Cc: [email protected]
Cc: Heikki Linnakangas <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Thomas Munro <[email protected]>
Subject: Re: Lowering the default wal_blocksize to 4K
Date: Tue, 10 Oct 2023 12:57:04 +0200
Message-ID: <CAEze2WjpzNkpbNg_CMFSgndbDRMGgHmU6-49wET8uGELae_nBw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

On Tue, 10 Oct 2023 at 01:08, Andres Freund <[email protected]> wrote:
>
> Hi,
>
> I've mentioned this to a few people before, but forgot to start an actual
> thread. So here we go:
>
> I think we should lower the default wal_blocksize / XLOG_BLCKSZ to 4096, from
> the current 8192.

Seems like a good idea.

> It's IMO quite interesting that even at the higher client counts, the number
> of bytes written don't reach parity.
>
> It's fun to see how the total number of writes *decreases* at higher
> concurrency, because it becomes more likely that pages are filled completely.

With higher client counts and short transactions I think it is not too
unexpected to see commit_delay+commit_siblings configured. Did you
measure the impact of this change on such configurations?

> One thing I noticed is that our auto-configuration of wal_buffers leads to
> different wal_buffers settings for different XLOG_BLCKSZ, which doesn't seem
> great.

Hmm.

> Performing the same COPY workload (1024 files, split across N clients) for
> both settings shows no performance difference, but a very slight increase in
> total bytes written (about 0.25%, which is roughly what I'd expect).
>
> Personally I'd say the slight increase in WAL volume is more than outweighed
> by the increase in throughput and decrease in bytes written.

Agreed.

> There's an alternative approach we could take, which is to write in 4KB
> increments, while keeping 8KB pages. With the current format that's not
> obviously a bad idea. But given there aren't really advantages in 8KB WAL
> pages, it seems we should just go for 4KB?

It is not just the disk overhead of blocks, but we also maintain some
other data (currently in the form of XLogRecPtrs) in memory for each
WAL buffer, the overhead of which will also increase when we increase
the number of XLog pages per MB of WAL that we cache.
Additionally, highly concurrent workloads with transactions that write
a high multiple of XLOG_BLCKSZ bytes to WAL may start to see increased
overhead due to the .25% additional WAL getting written and a doubling
of the number of XLog pages being touched (both initialization and the
smaller memcpy for records that would now cross an extra page
boundary).

However, for all of these issues I doubt that they actually matter
much in the grand scheme of things, so I definitely wouldn't mind
moving to 4KiB XLog pages.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Lowering the default wal_blocksize to 4K
  In-Reply-To: <CAEze2WjpzNkpbNg_CMFSgndbDRMGgHmU6-49wET8uGELae_nBw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox