Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qqAQT-008OZj-9k for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Oct 2023 10:57:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1qqAQR-00725d-1H for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Oct 2023 10:57:19 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qqAQQ-007254-Mv for pgsql-hackers@lists.postgresql.org; Tue, 10 Oct 2023 10:57:19 +0000 Received: from mail-lj1-x231.google.com ([2a00:1450:4864:20::231]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qqAQP-001H7a-02 for pgsql-hackers@postgresql.org; Tue, 10 Oct 2023 10:57:19 +0000 Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2b95d5ee18dso67019081fa.1 for ; Tue, 10 Oct 2023 03:57:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696935436; x=1697540236; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bhp0OPp5OF1E050p4hSymOh/VMh06MGQUQ3xHIeWrx8=; b=NzF/nR4k46x/X6umgONdxHD7CzskSS9T8ASYXJNo2fNm4yQeeKPRPIvMEb8ZItyGtG sw/oCwYMNWJKCtWncywOWVWkjFmEJlmyM5ufO+Nvkc339FKVkjWlNMaQOMTH55wtXh5D 8pocydhoWXSayAGlLGQGnGx/24/ZrryeDi0+Okdy/XhWm08RltOWE+ZX4YZ2foBAFjWE SRsumfGjpKo4g5y9tsMF1byj0FrAWHFj97wQTHlUz9ZIFuk7XC4xLWWcAX/3puktsPeb LNYWBa74XKeqXsJ86zNw7iZ7WJMdwn4mrrsZOQ9wN/Lx7MjNHtZ5dndH+H3505u3Bq09 Ocbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696935436; x=1697540236; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bhp0OPp5OF1E050p4hSymOh/VMh06MGQUQ3xHIeWrx8=; b=ikt7MLala39y83JrDHBW4RGCvmagniWKaXSrBCHDK2b9X7MNd0MRI5Ub8GGA3BqHC9 vZmmMfsmrDZd8P8uCbQ6OL/CQJK2lFZ4UDqGoGLXQGKNkQn9gB611DWG3oL3NRj+YqVf gM+zGbrRrTjShxEkhdJR9yUBtjtBjycON38e1+3CNBzHxb7wA9GRIQjyFUHEIVeJPN0Y mTvTo3sA9D40HFV+v/rDVj74K0uzRI5mzmi7lDXoJpS3zhR8S694no/MSL9cM3aXljkf pZeevhzFLSVQzvmPQ4g1b8GxyBcd1+5R11OZIRikz9yEJtFx03YeEMlUlB2fOjvygPUF mfvw== X-Gm-Message-State: AOJu0YxwFYKZF8AZC1vsYJ2Rl+tKEdJQ0cGGfBYDIZMgKQfMJUjGovcJ sMMurTlCKk5C15+xQdl8rs5F1RrTLewLH+SNqbw= X-Google-Smtp-Source: AGHT+IH4yhZBTI/yzckJ7BkByHYJrH3WuLYmVSYjYawIRSUUzj39gzP6WOknQZ2fBoKYrp8YFaExEXVzE9TeWeb0pvw= X-Received: by 2002:a2e:8683:0:b0:2c1:522a:8e25 with SMTP id l3-20020a2e8683000000b002c1522a8e25mr15148937lji.32.1696935436052; Tue, 10 Oct 2023 03:57:16 -0700 (PDT) MIME-Version: 1.0 References: <20231009230805.funj5ipoggjyzjz6@awork3.anarazel.de> In-Reply-To: <20231009230805.funj5ipoggjyzjz6@awork3.anarazel.de> From: Matthias van de Meent Date: Tue, 10 Oct 2023 12:57:04 +0200 Message-ID: Subject: Re: Lowering the default wal_blocksize to 4K To: Andres Freund Cc: pgsql-hackers@postgresql.org, Heikki Linnakangas , Robert Haas , Thomas Munro Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 10 Oct 2023 at 01:08, Andres Freund wrote: > > Hi, > > I've mentioned this to a few people before, but forgot to start an actual > thread. So here we go: > > I think we should lower the default wal_blocksize / XLOG_BLCKSZ to 4096, from > the current 8192. Seems like a good idea. > It's IMO quite interesting that even at the higher client counts, the number > of bytes written don't reach parity. > > It's fun to see how the total number of writes *decreases* at higher > concurrency, because it becomes more likely that pages are filled completely. With higher client counts and short transactions I think it is not too unexpected to see commit_delay+commit_siblings configured. Did you measure the impact of this change on such configurations? > One thing I noticed is that our auto-configuration of wal_buffers leads to > different wal_buffers settings for different XLOG_BLCKSZ, which doesn't seem > great. Hmm. > Performing the same COPY workload (1024 files, split across N clients) for > both settings shows no performance difference, but a very slight increase in > total bytes written (about 0.25%, which is roughly what I'd expect). > > Personally I'd say the slight increase in WAL volume is more than outweighed > by the increase in throughput and decrease in bytes written. Agreed. > There's an alternative approach we could take, which is to write in 4KB > increments, while keeping 8KB pages. With the current format that's not > obviously a bad idea. But given there aren't really advantages in 8KB WAL > pages, it seems we should just go for 4KB? It is not just the disk overhead of blocks, but we also maintain some other data (currently in the form of XLogRecPtrs) in memory for each WAL buffer, the overhead of which will also increase when we increase the number of XLog pages per MB of WAL that we cache. Additionally, highly concurrent workloads with transactions that write a high multiple of XLOG_BLCKSZ bytes to WAL may start to see increased overhead due to the .25% additional WAL getting written and a doubling of the number of XLog pages being touched (both initialization and the smaller memcpy for records that would now cross an extra page boundary). However, for all of these issues I doubt that they actually matter much in the grand scheme of things, so I definitely wouldn't mind moving to 4KiB XLog pages. Kind regards, Matthias van de Meent Neon (https://neon.tech)