Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufTWv-007mgK-H0 for pgsql-general@arkaria.postgresql.org; Sat, 26 Jul 2025 01:16:54 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ufTWt-005xIO-E3 for pgsql-general@arkaria.postgresql.org; Sat, 26 Jul 2025 01:16:51 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufTWs-005xIA-QG for pgsql-general@lists.postgresql.org; Sat, 26 Jul 2025 01:16:51 +0000 Received: from fhigh-a8-smtp.messagingengine.com ([103.168.172.159]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1ufTWp-000riW-0Q for pgsql-general@lists.postgresql.org; Sat, 26 Jul 2025 01:16:50 +0000 Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfhigh.phl.internal (Postfix) with ESMTP id 0F6D714000A2; Fri, 25 Jul 2025 21:16:46 -0400 (EDT) Received: from phl-imap-04 ([10.202.2.82]) by phl-compute-01.internal (MEProxy); Fri, 25 Jul 2025 21:16:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=barre.sh; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1753492606; x=1753579006; bh=jlpfsR2xM2QDnM2Fu+4n4RpgWaiqKJrO93USrid3rSs=; b= TuvG9mpH5h2wjgu3VgNSpqYXzvR5ciOHJTlizdSyREPqRwvVE8LNN/IowKjGM442 BkxlzVSP+v+oY5+C9RN6Dsk9XD2e6YTYnFcHwYaR7XDtz639z/gTZdOvxIPw5xyY GpKh9TkNlEEbGIK9xHh5/IlbphT4wr6B28mBa2mwwpHYYO0e2cIPm2F4OIss/EvV Qoj5sLmbGu/Zul/Zkol/GM5BXFDwjCPbGs9vsElpfXXFehCqNXYA7uKCwJzmXLoi gzLn+uQPKprES2cJx1JLO5UHBhN0Tg0n9kFg+QPrLuRySk4EiPkTmpd+JxSCz52K 8DG/1BwuFSHKPNqNaZ99mg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; t=1753492606; x=1753579006; bh=j lpfsR2xM2QDnM2Fu+4n4RpgWaiqKJrO93USrid3rSs=; b=DR8Kfqxbuhs2vxs2w G2FT+qTQQvneU+ssL3P+WWZvREwQsa6F3amdax5ulvV49VuzBT977GGnu+nXFu8z OsMOsCRU7+e2gW/6SM1C3LzTI/Z+LinfZO3iqxVuRi4OOSxI1+R9yLlAm3dMdRK2 W9gHv8nTLh0hm9/hyfqjhcA8pR5OgXA2UwrDoDr9sWI8lOt7VzpRQkw2h/dFN+qu JE79/rJEgHsGv932knX9PhxXYwT6mFkjiX2WcLnMb0pNBOFImet23IbI9rTltCTU AcNpYI49qKEkEpGYkhmti7k3XWEQNpkfEjsAtVjelYftyv0WZ3KuxqJDPYCqbZc6 Ghlsw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdekhedtjecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjug hrpefoggffhffvkfgjfhfutgfgsehtqhertdertdejnecuhfhrohhmpedfrfhivghrrhgv uceurghrrhgvfdcuoehpihgvrhhrvgessggrrhhrvgdrshhhqeenucggtffrrghtthgvrh hnpedugeefieejveefgeekteeuhfeuveevtdejieejgfffhffgfeeukeekudekkeefkeen ucffohhmrghinhepghhithhhuhgsrdgtohhmnecuvehluhhsthgvrhfuihiivgeptdenuc frrghrrghmpehmrghilhhfrhhomhepphhivghrrhgvsegsrghrrhgvrdhshhdpnhgspghr tghpthhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepphhgshhqlhdqgh gvnhgvrhgrlheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhgpdhrtghpthhtohep jhhrohhsshesohhpvghnvhhishhtrghsrdhnvght X-ME-Proxy: Feedback-ID: i97614980:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id 4D562B6006B; Fri, 25 Jul 2025 21:16:45 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface MIME-Version: 1.0 X-ThreadId: T89c86ea8eb4c36ce Date: Sat, 26 Jul 2025 03:16:24 +0200 From: "Pierre Barre" To: "Jeff Ross" , pgsql-general@lists.postgresql.org Message-Id: In-Reply-To: <77eb549f-ef2d-46c1-932d-c54247e1400a@app.fastmail.com> References: <8188513c-e089-4273-b2be-16dd0a5a0a80@app.fastmail.com> <5c512367-0f67-4bcc-9897-1acf9c0f8bd3@app.fastmail.com> <60027457-1b85-4a69-a67e-ee87f7cabd61@openvistas.net> <77eb549f-ef2d-46c1-932d-c54247e1400a@app.fastmail.com> Subject: Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk I built postgres (same version, 16.9) but --with-block-size=3D32 (I'd re= ally love if this would be a initdb time flag!) and did some more testin= g: synchronous_commit =3D off postgres@zerofs:~$ pgbench -vvv -c 100 -j 40 -t 10000 bench pgbench (16.9 (Ubuntu 16.10-1)) starting vacuum...end. starting vacuum pgbench_accounts...end. transaction type: scaling factor: 50 query mode: simple number of clients: 100 number of threads: 40 maximum number of tries: 1 number of transactions per client: 10000 number of transactions actually processed: 1000000/1000000 number of failed transactions: 0 (0.000%) latency average =3D 5.727 ms initial connection time =3D 59.223 ms tps =3D 17460.128835 (without initial connection time) synchronous_commit =3D on=20 postgres@zerofs:/root$ pgbench -vvv -c 100 -j 40 -t 1000 bench pgbench (16.9 (Ubuntu 16.10-1)) starting vacuum...end. starting vacuum pgbench_accounts...end. transaction type: scaling factor: 50 query mode: simple number of clients: 100 number of threads: 40 maximum number of tries: 1 number of transactions per client: 1000 number of transactions actually processed: 100000/100000 number of failed transactions: 0 (0.000%) latency average =3D 301.800 ms initial connection time =3D 62.237 ms tps =3D 331.345391 (without initial connection time) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Then, using the same setup (same server, same postgres build), I create = a ZeroFS NBD device with ext4 on top /dev/nbd0 on /mnt_9p type ext4 (rw,relatime,stripe=3D32) synchronous_commit =3D off postgres@zerofs:/mnt_9p$ pgbench -vvv -c 100 -j 40 -t 10000 bench pgbench (16.9 (Ubuntu 16.10-1)) starting vacuum...end. starting vacuum pgbench_accounts...end. transaction type: scaling factor: 50 query mode: simple number of clients: 100 number of threads: 40 maximum number of tries: 1 number of transactions per client: 10000 number of transactions actually processed: 1000000/1000000 number of failed transactions: 0 (0.000%) latency average =3D 3.615 ms initial connection time =3D 45.653 ms tps =3D 27665.373366 (without initial connection time) synchronous_commit =3D on postgres@zerofs:/root$ pgbench -vvv -c 100 -j 40 -t 1000 bench pgbench (16.9 (Ubuntu 16.10-1)) starting vacuum...end. starting vacuum pgbench_accounts...end. transaction type: scaling factor: 50 query mode: simple number of clients: 100 number of threads: 40 maximum number of tries: 1 number of transactions per client: 1000 number of transactions actually processed: 100000/100000 number of failed transactions: 0 (0.000%) latency average =3D 337.762 ms initial connection time =3D 43.969 ms tps =3D 296.066616 (without initial connection time) Best, Pierre On Fri, Jul 25, 2025, at 11:25, Pierre Barre wrote: > Hi, > > I went ahead and did that test. > > Here is the postgresql config I used for reference (note the wal=20 > options (recycle, init_zero) as well as full_page_writes =3D off, beca= use=20 > ZeroFS cannot have torn writes by design). > > https://gist.github.com/Barre/8d68f0d00446389998a31f4e60f3276d > > Test was running on Azure with Standard D16ads v5 (16 vcpus, 64 GiB me= mory) > > This time, I didn't run ZFS with L2ARC, I just mounted ZeroFS with 9p. > > synchronous_commit =3D off=20 > > postgres@zerofs:~$ pgbench -vvv -c 100 -j 40 -t 1000 bench > pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) > starting vacuum...end. > starting vacuum pgbench_accounts...end. > transaction type: > scaling factor: 50 > query mode: simple > number of clients: 100 > number of threads: 40 > maximum number of tries: 1 > number of transactions per client: 1000 > number of transactions actually processed: 100000/100000 > number of failed transactions: 0 (0.000%) > latency average =3D 6.239 ms > initial connection time =3D 68.922 ms > tps =3D 16026.940646 (without initial connection time) > > > synchronous_commit =3D on > > postgres@zerofs:~$ pgbench -vvv -c 50 -j 15 -t 1000 bench > pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) > starting vacuum...end. > starting vacuum pgbench_accounts...end. > transaction type: > scaling factor: 50 > query mode: simple > number of clients: 50 > number of threads: 15 > maximum number of tries: 1 > number of transactions per client: 1000 > number of transactions actually processed: 50000/50000 > number of failed transactions: 0 (0.000%) > latency average =3D 197.723 ms > initial connection time =3D 46.089 ms > tps =3D 252.878721 (without initial connection time) > > > Not great barebones with with synchronous_commit, but still usable! > > Best, > Pierre > > On Fri, Jul 25, 2025, at 00:44, Pierre Barre wrote: >>> This then begs the obvious question of how fast is this with=20 >>> synchronous_commit =3D on? >> >> Probably not awful, especially with commit_delay. >> >> I'll try that and report back. >> >> Best, >> Pierre >> >> On Fri, Jul 25, 2025, at 00:03, Jeff Ross wrote: >>> On 7/24/25 13:50, Pierre Barre wrote: >>> >>>> It=E2=80=99s not =E2=80=9Csafe=E2=80=9D or =E2=80=9Cunsafe=E2=80=9D= , there=E2=80=99s mountains of valid workloads which don=E2=80=99t requi= re synchronous_commit. Synchronous_commit don=E2=80=99t make your system= automatically safe either, and if that=E2=80=99s a requirement, there=E2= =80=99s many workarounds, as you suggested, it certainly doesn=E2=80=99t= make the setup useless. >>>> >>>> Best, >>>> Pierre >>>> >>>> On Thu, Jul 24, 2025, at 21:44, Nico Williams wrote: >>>>> On Fri, Jul 18, 2025 at 12:57:39PM +0200, Pierre Barre wrote: >>>>>> - Postgres configured accordingly memory-wise as well as with >>>>>> synchronous_commit =3D off, wal_init_zero =3D off and wal_recy= cle =3D off. >>>>> Bingo. That's why it's fast (synchronous_commit =3D off). It's a= lso why >>>>> it's not safe _unless_ you have a local, fast, persistent ZIL devi= ce >>>>> (which I assume you don't). >>>>> >>>>> Nico >>>>> -- >>> This then begs the obvious question of how fast is this with=20 >>> synchronous_commit =3D on?