Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vK7jY-00GkzZ-02 for pgsql-general@arkaria.postgresql.org; Sat, 15 Nov 2025 04:17:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vK7jU-008pYb-1K for pgsql-general@arkaria.postgresql.org; Sat, 15 Nov 2025 04:17:52 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vK7jT-008pYS-2q for pgsql-general@lists.postgresql.org; Sat, 15 Nov 2025 04:17:52 +0000 Received: from mail-pl1-x62c.google.com ([2607:f8b0:4864:20::62c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vK7jS-007Cx1-0N for pgsql-general@lists.postgresql.org; Sat, 15 Nov 2025 04:17:51 +0000 Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-2960771ec71so3753285ad.0 for ; Fri, 14 Nov 2025 20:17:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763180269; x=1763785069; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ha2wU1ZC73LzoyuTnstjfJiu4Q1G41HNcGq9/kMqcEA=; b=lq4ndIhQj+9UN3xyo3JoeYPcSAvlE0BkVzQ262C9Y9L7SMR19gjWXgzJBf5KmvWQtj i0vi/Qq620yRAy3rdXQuL79HKJxRXRGwU1eiQQFuVri+uxkX9WEop3UbksOMb9tCeHAK 4Ahbs7RCldw6+yTIE8dkPzqZCkXcSgo8GU+yTOT0zk9b5MSttbTBNkiY6IL48Oyzkql3 GbNT+oqimVDeaP+g17eVkdwQVEWnGjPvWOFCGqhS9ugdYGUYqDyZqhmqg7TjJ++ScldZ IRmSCg4/+tzsjv9A7/GEBfs4/xueou3ZnHFluMhu2yHbLVssqEn9OSgCZTubyqpeQVXu cqJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763180269; x=1763785069; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Ha2wU1ZC73LzoyuTnstjfJiu4Q1G41HNcGq9/kMqcEA=; b=DzzbsGoVKawVrjqks6et9mL0mM51lPqkVGPlLlb+cbIJmb+9ITiRQFFJ5WB0QgYUsM vyk0d1xjbbcekzjhYv4+gGiAxggAbJ8oj/445KEQdgCdui8uvxsmrXND4wIKS/D2859+ e8QMSncD/7BrmhyohGmnGFZI98wO/AVamW+gpNcou/hnloZKt/Dr/BmUksqkbupqQU9t zdtoZmLjWNL8QberPafYX1d8CMGlZ7UnbXVclyDJ5JO5kV1dkyVl77iF4F0AnEXdeHx+ OExlE8VkmRZpOmectJVPspdJ9M1HC0OqoCxECmqIJjx4uQgv9l96dHrXa9H+q/o9z3EL yyKg== X-Gm-Message-State: AOJu0Yy0PhdWxqxXPTKSOUdTs399XTJQluSeBeqetYMm0lEFn77qYFFr iuN7VibuHNSnZrbihpIHIYdm6yYOuGsQbEC4nDGD0ER3a4Dg/LPYht2/NRi6smnYg6CTmMvx/fB rNacA8kQzKis/3cq2EswfHXhotBJPZmuK1EfDHeU= X-Gm-Gg: ASbGncvFwK/ttsaFRHDzesC/9wHHdxhFZ4bin9cuxfJIInW1DrPSNsTMmzVpdbiuf/O F4JgqzBN2x9isZbLxXAbAZIeEtJcz0vG8WlfoGuH90YMf/JugWP2bQn+YjKnC5ZM/o8hQveQZnu 2Uh0hy0s5XDvzwHqMj7/OXkzDRBLEyLtWd/ocpVZPJ4tYCZ2TSuSGyGKNAhKTLh1krGftJBJUcB 0ZHPb7CY+v2mY1LSrmAyOAeuVYqSfLAf68AY+ZTjlnwTKO7mszvPe99GEo0N17D3d3xNE44vJCY nM0TM00bQbAHQyEA/jGVSJhBCOYbahLirsRUjwE4xw== X-Google-Smtp-Source: AGHT+IHPqzYzVWPf1F6S6hf4wRrv9rVUSxNn3+1pTI08hqCko1tTbPV2SUbo+T/bdoSjRYkhMHG9P8Yintvs/n61Hxo= X-Received: by 2002:a05:7022:218:b0:11b:1c6d:98ed with SMTP id a92af1059eb24-11b49180db3mr1557128c88.2.1763180268474; Fri, 14 Nov 2025 20:17:48 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Thomas Munro Date: Sat, 15 Nov 2025 17:17:11 +1300 X-Gm-Features: AWmQ_bme_IhS7ggIt772venUzI0H9-K4op8rFtAPqzYZmhaR-EvBwKbrL1vqPPE Message-ID: Subject: Re: pg_upgrade reflink support on OpenZFS To: Marcel Menzel Cc: pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Nov 15, 2025 at 7:16=E2=80=AFAM Marcel Menzel wr= ote: > For the PostgreSQL upgrade to version 18, I took the opportunity to test > the reflink support in pg_upgrade (with --clone) on OpenZFS 2.3.4 / > Linux 6.15.11 and it worked flawlessly, being a huge time saver here. Nice! > I've looked into the documentation for pg_upgrade and it's only > mentioning btrfs and XFS on Linux and not FreeBSD at all, so I thought > It'd be an interesting heads-up to report that Linux gained a 3rd FS and > also I think FreeBSD in general the ability for doing reflink copies. It does mention both Linux and FreeBSD under --copy-file-range. I didn't try to list all the relevant file systems there though, partly because I didn't feel like documenting all the quirks (only works if you created your XFS file system with the feature enabled, might need to frobnicate ZFS sysctl, which NFS clients and servers can push it down, likewise for non-COW file systems and device drivers, etc etc). It might be nice to find a decent reference for all that stuff somewhere else and point to it, but I don't think we can maintain that accurately ourselves. I was actually surprised to hear that ioctl(dest_fd, FICLONE, src_fd) worked for you. I knew that it was really BTRFS's ioctl and XFS accepted it too, but I didn't know that ZFS also understood it[1] in 2.3. They apparently didn't really expect anyone to call it, and since ZFS 2.4 is apparently about to ship without it[2], it seems like a bad time to add it to the documentation for --clone. > OpenZFS has been supporting this since 2.2 but has had it disabled due > to data corruption bugs, now since 2.3 the sysctl (zfs_bclone_enabled on > Linux, vfs.zfs.bclone_enabled on FreeBSD) has been enabled by default so > only the zpool feature "block_cloning" has to be enabled, which might be > the case when running "zpool upgrade". Yeah, those data corruption reports (which turned out to be misattributed IIRC?) provided one reason to keep the old BTRFS ioctl() under --clone but add the new behaviour under --copy-file-range. --copy-file-range should work for all COW filesystems on Linux via proper VFS entrypoints, and is the official way to do this from user space. Perhaps we should eventually harmonise this under a single option and drop the ioctl() stuff. One semantic change would be that copy_file_range() means "copy with your best trick" (could be cloning, network/driver pushdown or user space buffer copy, silently selecting the behaviour), while the BTRFS ioctl() means "clone or fail" IIRC, so that was another reason to want a separate option for now. For reference, the macOS copyfile() call used for --clone has flags that should cause it to fail if it can't clone IIUC, while the Windows CopyFile() call used for --copy might even clone blocks on ReFS even if you don't specify --clone... huh. > I haven't had the possibility to check this on FreeBSD yet, but I don't > see why this should not work as I also can't spot anything in the > OpenZFS docs regarding reflink / block cloning limitations on FreeBSD. > Also I saw one of the OpenZFS devs writing on Reddit about block cloning > being supported on FreeBSD v14. It always succeeds on FreeBSD, but it only actually clones if you set vfs.zfs.bclone_enabled=3D1. I've tested all our "clone" features with that and they work nicely. The sysctl wasn't on by default in FreeBSD 14.x, but 15 is about to ship and the "experimental" label was removed in man 4 zfs. If you haven't seen them yet, you might also like these COW tricks: Shared storage of basic catalog tables when you have a lot of databases: SET file_copy_method =3D CLONE; CREATE DATABASE ... STRATEGY=3DFILE_COPY; Fast database clone/snapshot of very large databases (caveats: users can't be connected to source, checkpoint forced): SET file_copy_method =3D CLONE; CREATE DATABASE ... STRATEGY=3DFILE_COPY TEMPLATE=3Dsource_db; Combine a chain of incremental backups and a full backup to produce a new full backup, sharing disk blocks with the ancestor backups: pg_combinebackup --copy-file-range That last one is a really powerful use of copy_file_range()'s subfile cloning powers. Another subfile cloning trick I've proposed before is making relation segment size user-controllable, and then allowing pg_upgrade to migrate between segment sizes by splicing them together. [1] https://github.com/openzfs/zfs/commit/9927f219f1e9f4ee886d426190500abf5= b1d602e [2] https://github.com/openzfs/zfs/commit/4800181b3b950d67a62aca7c9e28d34c8= b303242