public inbox for [email protected]help / color / mirror / Atom feed
regarding PG on ZFS performance 3+ messages / 1 participants [nested] [flat]
* regarding PG on ZFS performance @ 2022-04-12 16:18 Scott Ribe <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Scott Ribe @ 2022-04-12 16:18 UTC (permalink / raw) To: pgsql-admin <[email protected]> Just re-ran some older tests with the current version of Ubunutu & ZFS (but older kernel thanks to a multi-way incompatibility with other things). Results are that with proper tuning, ZFS RAIDZ1 on 4 NVMe drives gives higher TPS on pgbench at scale 10,000 than XFS on one of the same NVMe--but the initial population of the db takes 25% longer. Proper tuning: PG full_page_writes off (for ZFS, on for NVMe); ZFS lz4 compression, 64K recordsize, relatime db created by: pgbench -i -s 10000 --foreign-keys test benchmarked as: pgbench -c 100 -j 4 -t 1000 test NVMe: 31,804 TPS RAIDZ1: 50,228 TPS Some other notes: - the situation is reversed, single NVMe is faster when using 10 connections instead of 100 - these tests are all from within containers running on Kubernetes--pg server and client in same container, connected over domain sockets - 256GB and 48 CPU pod limits--running where there's still the cgroup double-counting bug, so CPU is theoretically throttled to ~24, leaving ~20 to PG server - the container is actually getting very slightly throttled at barely over 20 CPU--so not sure if it's CPU-bound or IO-bound - PG settings are set up for a larger database, shared_buffers, work_mem, parallel workers, autovacuum, etc - I'd read that because of the way ZFS handles RAIDZ1 compared to RAID5, that performance probably didn't suffer relative to RAID10, and this is the case--tests with ZFS RAID10 on the same drives were a tiny bit slower (2-3%) than RAIDZ1 for TPS, but a bit faster on initial population (6-8%) - as an aside, WekaFS (https://www.aspsys.com/solutions/storage-solutions/weka-io/) is about 10% faster than RAIDZ1 (both TPS and initial fill) I hope that experience from someone who actually bothered to read up on how to configure ZFS for PG can put to rest some "ZFS is too slow" misinformation. I am certain that ZFS is not nearly the fastest for all configurations (for instance, I am unable to configure the 4 NVMe drives into a hardware RAID10 to test, and it seems that ZFS may not scale well to larger numbers of disks) but "too slow to ever be consider for serious work" is flat-out wrong. -- Scott Ribe [email protected] https://www.linkedin.com/in/scottribe/ ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: regarding PG on ZFS performance @ 2022-04-12 19:48 Scott Ribe <[email protected]> parent: Scott Ribe <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Scott Ribe @ 2022-04-12 19:48 UTC (permalink / raw) To: Gogala, Mladen (Short Hills) <[email protected]>; +Cc: pgsql-admin <[email protected]> > On Apr 12, 2022, at 10:54 AM, Gogala, Mladen (Short Hills) <[email protected]> wrote: > > I wouldn’t call turning off data safety “a proper tuning”. It may be faster but can never be deployed to production. As I explained, that which it protects against cannot happen on a ZFS volume. Therefore turning it off is just fine. (ZFS fsync operations are themselves atomic, using its own write-ahead log. So full_page_writes does a double write to avoid corruption, but so does ZFS, thus 4X writes, and a major source of slowness. There is NO danger in letting only one of the two do a double-write in order to avoid a partial write ever being read back.) ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: regarding PG on ZFS performance @ 2022-04-12 19:50 Scott Ribe <[email protected]> parent: Scott Ribe <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: Scott Ribe @ 2022-04-12 19:50 UTC (permalink / raw) To: Gogala, Mladen (Short Hills) <[email protected]>; +Cc: pgsql-admin <[email protected]> > On Apr 12, 2022, at 1:48 PM, Scott Ribe <[email protected]> wrote: > > As I explained... Well, dang. I previously explained that in a different venue. ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2022-04-12 19:50 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2022-04-12 16:18 regarding PG on ZFS performance Scott Ribe <[email protected]> 2022-04-12 19:48 ` Scott Ribe <[email protected]> 2022-04-12 19:50 ` Scott Ribe <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox