public inbox for [email protected]  
help / color / mirror / Atom feed
From: Andrew Kim <[email protected]>
To: John Naylor <[email protected]>
Cc: [email protected]
Cc: Oleg Tselebrovskiy <[email protected]>
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations
Date: Fri, 24 Oct 2025 00:48:45 -0700
Message-ID: <CAK64mnejn9AZMYz03e7HX8Uui35PihUuOy=b+iBG=YtRKx0Log@mail.gmail.com> (raw)
In-Reply-To: <CANWCAZZuS3sNgLRo8Z4AM=uY4zTmz=dH5D4Z9xV6K0CEuJ8Hdw@mail.gmail.com>
References: <[email protected]>
	<CANWCAZYZQw-nzTXbx3Bk332VtY9_D7ksDsuMZ0A-iDZ53yG7Ng@mail.gmail.com>
	<CAK64mnfeWLBRbMfnOsag0vGTDnT84KJzpuei40nG0OHyw4SESw@mail.gmail.com>
	<CANWCAZa1b2rcvoK657SmcKwh2P2cgASQ1D-0JPj5d3LbfaAVgA@mail.gmail.com>
	<CAK64mneN20+sW5WhV+r7hMVo4Rd0z11B6=3L039rWMt1wK3nPg@mail.gmail.com>
	<CANWCAZZuS3sNgLRo8Z4AM=uY4zTmz=dH5D4Z9xV6K0CEuJ8Hdw@mail.gmail.com>

Hi John,

Thank you for your review on the previous patch versions.

I've carefully addressed your concerns and those raised by Oleg,
specifically focusing on patch separation and simplification of the
configure tests. I am submitting the new version (V8) as two distinct
patches:

V8-0001: Pure refactoring (moving files, updating includes).

V8-0002: Adding the AVX2 feature (detection, dispatch, and optimization).

As requested, I've used in-line responses below to clarify how each
point was handled.

On Mon, Oct 20, 2025 at 8:30 PM John Naylor <[email protected]> wrote:
>
> On Fri, Oct 17, 2025 at 2:15 PM Andrew Kim <[email protected]> wrote:
> >
> > Hi John,
> >
> > Thank you for your detailed and constructive feedback on the checksum
> > AVX2 optimization patch.
> > I've carefully addressed all of your concerns and am pleased to share
> > the updated V6 implementation.
>
> Great! I know we're on v7 now, but I'm going to make a request for
> next time you respond to a review: Respond in-line to each point. As I
> mentioned before,
>
> > On Wed, Oct 1, 2025 at 10:26 PM John Naylor <[email protected]> wrote:
> > > (BTW, we discourage top-posting and prefer to cut to size and
> > > use inline responses)
>
> Please don't top-post again, as it clutters our archives in addition
> to making it easy to forget things. I'm now going to copy the things
> that were either not addressed or misunderstood:
>

I apologize for the top-posting in the previous response. I've
switched to the preferred in-line response format for this and all
future correspondence.

> > > I think a good first refactoring patch would be to move
> > > src/backend/storage/checksum.c (which your patch doesn't even touch)
> > > to src/port (and src/include/storage/checksum.h to src/include/port)
> > > and have all callers use that. With that, I imagine only that
> > > checksum.c file would include checksum_impl.h.
> > >
> > > If that poses a problem, let us know -- we may have to further juggle
> > > things. If that works without issue, we can proceed with the
> > > specialization.
>
> That means the first patch moves things around without adding any
> platform-specific code, and the next patch adds the specialization. I
> think that would be a lot easier to review and test, especially to
> avoid breaking external programs (see below for more on this). A
> committer can always squash things together if it make sense to do so.
>

Patch V8-0001 (Move-checksum-functions...): This is now a pure
refactoring patch. It simply moves checksum.c and its headers from
storage/ to port/ and updates the #include paths in all callers
(rawpage.c, pg_checksums.c, etc.). It contains no AVX2 or ISA-specific
code.

Patch V8-0002 (Add-AVX2-optimization...): This patch builds upon the
first, adding all the new AVX2 functionality, detection, and dispatch
logic.

> > > +    #if defined(__has_attribute) && __has_attribute (target)
> > > +    __attribute__((target("avx2")))
> > > +    #endif
> > > +    static int avx2_test(void)
> > > +    {
> > > +      const char buf@<:@sizeof(__m256i)@:>@;
> > > +      __m256i accum = _mm256_loadu_si256((const __m256i *) buf);
> > > +   accum = _mm256_add_epi32(accum, accum);
> > > +      int result = _mm256_extract_epi32(accum, 0);
> > > +      return (int) result;
> > > +    }],
> > >
> > > If we're just testing if the target works, we can just use an empty
> > > function, right?
>
> Oleg mentioned the same thing later. It's a waste of time for us to
> repeat ourselves. I said you didn't have to worry about it yet,
> because I was hoping to see the refactoring first.
>

I have implemented this simplification in Patch V8-0002. The test in
config/c-compiler.m4 is now a simple, empty function with only the
__attribute__((target("avx2"))) to verify compiler support for the
attribute, as suggested.

> Now, aside from that I looked further into this:
>
> > > The top of the checksum_impl.h has this:
> > >
> > >  * This file exists for the benefit of external programs that may wish to
> > >  * check Postgres page checksums.  They can #include this to get the code
> > >  * referenced by storage/checksum.h.  (Note: you may need to redefine
> > >  * Assert() as empty to compile this successfully externally.)
> > >
> > > It's going to be a bit tricky to preserve this ability while allowing
> > > the core server and client programs to dispatch to a specialized
> > > implementation, but we should at least try. That means keeping
> > > pg_checksum_block() and pg_checksum_page() where they live now.
>
> Looking at commit f04216341dd1, we have at least one example of an
> external program, pg_filedump. If we can keep this working with
> minimal fuss, it should be fine everywhere.

The v8 patch series preserves external compatibility. External
programs like pg_filedump will only need to update their include
paths:
/* OLD */
#include "storage/checksum.h"
#include "storage/checksum_impl.h"

/* NEW */
#include "port/checksum.h"
#include "port/checksum_impl.h"

The function signatures (pg_checksum_block, pg_checksum_page) remain
identical, and checksum_impl.h still contains the complete
implementation that external programs can include. The runtime
dispatch only affects internal PostgreSQL usage.

/* OLD */#include "storage/checksum.h"#include
"storage/checksum_impl.h"/* NEW */  #include "port/checksum.h"#include
"port/checksum_impl.h"


> https://github.com/df7cb/pg_filedump/blob/master/pg_filedump.c#L29
>
> ```
> /* checksum_impl.h uses Assert, which doesn't work outside the server */
> #undef Assert
> #define Assert(X)
>
> #include "storage/checksum.h"
> #include "storage/checksum_impl.h"
> ```
>
> Elsewhere they already have to do things like
>
> ```
> #if PG_VERSION_NUM < 110000
>     "   Previous Checkpoint Record: Log File (%u) Offset (0x%08x)\n"
> #endif
> ```
>
> ...so it's probably okay if they have to adjust for a new #include
> path, but I want to verify that actually works, and I don't want to
> make it any more invasive than that. As we proceed, I can volunteer to
> do the work to test that pg_filedump still builds fine with small
> changes. Feel free to try building it yourself, but I'm happy to do
> it.

I appreciate your offer to test pg_filedump compatibility. The changes
in v8 should be minimal for external programs - just the include path
updates. If you're willing to test this, it would be very valuable
validation.

>
> Oleg posted another review recently, so I won't complicate things
> further, but from a brief glance I will suggest for next time not to
> change any comments that haven't been invalidated by the patch.
>

 In v8, I've been much more conservative about comment changes. I only
updated comments that were directly invalidated by the code changes
(like file path references that changed from storage/ to port/). Other
comments remain untouched unless they were factually incorrect due to
the refactoring.

> --
> John Naylor
> Amazon Web Services


Attachments:

  [application/octet-stream] v8-0001-Move-checksum-functions-from-backend-storage-to-port.patch (8.8K, 2-v8-0001-Move-checksum-functions-from-backend-storage-to-port.patch)
  download | inline diff:
From adb9981eae786ed7be0e4f78d51d527da2527402 Mon Sep 17 00:00:00 2001
From: Andrew Kim <[email protected]>
Date: Thu, 23 Oct 2025 10:42:24 -0700
Subject: [PATCH] Move checksum functions from backend storage to port

This refactoring moves checksum implementation from src/backend/storage/page/
to src/port/
---
 contrib/pageinspect/rawpage.c                 | 2 +-
 src/backend/backup/basebackup.c               | 2 +-
 src/backend/storage/page/Makefile             | 6 +-----
 src/backend/storage/page/bufpage.c            | 2 +-
 src/backend/storage/page/meson.build          | 9 ---------
 src/bin/pg_checksums/pg_checksums.c           | 3 +--
 src/bin/pg_upgrade/file.c                     | 3 +--
 src/include/{storage => port}/checksum.h      | 2 +-
 src/include/{storage => port}/checksum_impl.h | 4 ++--
 src/port/Makefile                             | 6 ++++++
 src/{backend/storage/page => port}/checksum.c | 8 ++++----
 src/port/meson.build                          | 4 ++--
 src/test/modules/test_aio/test_aio.c          | 2 +-
 13 files changed, 22 insertions(+), 31 deletions(-)
 rename src/include/{storage => port}/checksum.h (94%)
 rename src/include/{storage => port}/checksum_impl.h (98%)
 rename src/{backend/storage/page => port}/checksum.c (73%)

diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index aef442b5db3..7beb7765da9 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -23,7 +23,7 @@
 #include "miscadmin.h"
 #include "pageinspect.h"
 #include "storage/bufmgr.h"
-#include "storage/checksum.h"
+#include "port/checksum.h"
 #include "utils/builtins.h"
 #include "utils/pg_lsn.h"
 #include "utils/rel.h"
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index bb7d90aa5d9..d84ced4b47c 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -39,7 +39,7 @@
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
 #include "storage/bufpage.h"
-#include "storage/checksum.h"
+#include "port/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/ipc.h"
 #include "storage/reinit.h"
diff --git a/src/backend/storage/page/Makefile b/src/backend/storage/page/Makefile
index da539b113a6..788fee403f6 100644
--- a/src/backend/storage/page/Makefile
+++ b/src/backend/storage/page/Makefile
@@ -12,12 +12,8 @@ subdir = src/backend/storage/page
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS =  \
+OBJS = \
 	bufpage.o \
-	checksum.o \
 	itemptr.o
 
 include $(top_srcdir)/src/backend/common.mk
-
-# Provide special optimization flags for checksum.c
-checksum.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE}
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dbb49ed9197..b8f889efb88 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -18,7 +18,7 @@
 #include "access/itup.h"
 #include "access/xlog.h"
 #include "pgstat.h"
-#include "storage/checksum.h"
+#include "port/checksum.h"
 #include "utils/memdebug.h"
 #include "utils/memutils.h"
 
diff --git a/src/backend/storage/page/meson.build b/src/backend/storage/page/meson.build
index 112f00ff365..cf92a8f55f0 100644
--- a/src/backend/storage/page/meson.build
+++ b/src/backend/storage/page/meson.build
@@ -1,14 +1,5 @@
 # Copyright (c) 2022-2025, PostgreSQL Global Development Group
 
-checksum_backend_lib = static_library('checksum_backend_lib',
-  'checksum.c',
-  dependencies: backend_build_deps,
-  kwargs: internal_lib_args,
-  c_args: vectorize_cflags + unroll_loops_cflags,
-)
-
-backend_link_with += checksum_backend_lib
-
 backend_sources += files(
   'bufpage.c',
   'itemptr.c',
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 46cb2f36efa..2e0212c029c 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -29,8 +29,7 @@
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
-#include "storage/checksum.h"
-#include "storage/checksum_impl.h"
+#include "port/checksum.h"
 
 
 static int64 files_scanned = 0;
diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index 91ed16acb08..f9a5ed02ee4 100644
--- a/src/bin/pg_upgrade/file.c
+++ b/src/bin/pg_upgrade/file.c
@@ -24,8 +24,7 @@
 #include "common/file_perm.h"
 #include "pg_upgrade.h"
 #include "storage/bufpage.h"
-#include "storage/checksum.h"
-#include "storage/checksum_impl.h"
+#include "port/checksum.h"
 
 
 /*
diff --git a/src/include/storage/checksum.h b/src/include/port/checksum.h
similarity index 94%
rename from src/include/storage/checksum.h
rename to src/include/port/checksum.h
index 25d13a798d1..c2faed83ede 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/port/checksum.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/storage/checksum.h
+ * src/include/port/checksum.h
  *
  *-------------------------------------------------------------------------
  */
diff --git a/src/include/storage/checksum_impl.h b/src/include/port/checksum_impl.h
similarity index 98%
rename from src/include/storage/checksum_impl.h
rename to src/include/port/checksum_impl.h
index da87d61ba52..00cb0549f24 100644
--- a/src/include/storage/checksum_impl.h
+++ b/src/include/port/checksum_impl.h
@@ -5,13 +5,13 @@
  *
  * This file exists for the benefit of external programs that may wish to
  * check Postgres page checksums.  They can #include this to get the code
- * referenced by storage/checksum.h.  (Note: you may need to redefine
+ * referenced by port/checksum.h.  (Note: you may need to redefine
  * Assert() as empty to compile this successfully externally.)
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/storage/checksum_impl.h
+ * src/include/port/checksum_impl.h
  *
  *-------------------------------------------------------------------------
  */
diff --git a/src/port/Makefile b/src/port/Makefile
index 4274949dfa4..4f1f460bff2 100644
--- a/src/port/Makefile
+++ b/src/port/Makefile
@@ -39,6 +39,7 @@ OBJS = \
 	$(LIBOBJS) \
 	$(PG_CRC32C_OBJS) \
 	bsearch_arg.o \
+	checksum.o \
 	chklocale.o \
 	inet_net_ntop.o \
 	noblock.o \
@@ -90,6 +91,11 @@ pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC)
 pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC)
 pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC)
 
+# Provide special optimization flags for checksum.c
+checksum.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE}
+checksum_shlib.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE}
+checksum_srv.o: CFLAGS += ${CFLAGS_UNROLL_LOOPS} ${CFLAGS_VECTORIZE}
+
 #
 # Shared library versions of object files
 #
diff --git a/src/backend/storage/page/checksum.c b/src/port/checksum.c
similarity index 73%
rename from src/backend/storage/page/checksum.c
rename to src/port/checksum.c
index c913459b5a3..de61a46231d 100644
--- a/src/backend/storage/page/checksum.c
+++ b/src/port/checksum.c
@@ -7,16 +7,16 @@
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  * IDENTIFICATION
- *	  src/backend/storage/page/checksum.c
+ *	  src/port/checksum.c
  *
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
 
-#include "storage/checksum.h"
+#include "port/checksum.h"
 /*
- * The actual code is in storage/checksum_impl.h.  This is done so that
+ * The actual code is in port/checksum_impl.h.  This is done so that
  * external programs can incorporate the checksum code by #include'ing
  * that file from the exported Postgres headers.  (Compare our CRC code.)
  */
-#include "storage/checksum_impl.h"	/* IWYU pragma: keep */
+#include "port/checksum_impl.h"		/* IWYU pragma: keep */
diff --git a/src/port/meson.build b/src/port/meson.build
index fc7b059fee5..d3e63bce9e7 100644
--- a/src/port/meson.build
+++ b/src/port/meson.build
@@ -104,8 +104,8 @@ replace_funcs_pos = [
   ['pg_crc32c_sb8', 'USE_SLICING_BY_8_CRC32C'],
 ]
 
-pgport_cflags = {'crc': cflags_crc}
-pgport_sources_cflags = {'crc': []}
+pgport_cflags = {'crc': cflags_crc, 'checksum': vectorize_cflags + unroll_loops_cflags}
+pgport_sources_cflags = {'crc': [], 'checksum': [files('checksum.c')]}
 
 foreach f : replace_funcs_neg
   func = f.get(0)
diff --git a/src/test/modules/test_aio/test_aio.c b/src/test/modules/test_aio/test_aio.c
index c55cf6c0aac..175e491c0bc 100644
--- a/src/test/modules/test_aio/test_aio.c
+++ b/src/test/modules/test_aio/test_aio.c
@@ -24,7 +24,7 @@
 #include "storage/aio_internal.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
-#include "storage/checksum.h"
+#include "port/checksum.h"
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "utils/builtins.h"
-- 
2.43.0



  [application/octet-stream] v8-0002-Add-AVX2-optimization-for-page-checksum-calculation.patch (10.7K, 3-v8-0002-Add-AVX2-optimization-for-page-checksum-calculation.patch)
  download | inline diff:
From 8a33e3737360156b862fc75666630f8043f7d285 Mon Sep 17 00:00:00 2001
From: Andrew Kim <[email protected]>
Date: Thu, 23 Oct 2025 16:35:22 -0700
Subject: [PATCH 1/1] Add AVX2 optimization for page checksum calculation

This patch adds runtime AVX2 detection and optimization for PostgreSQL's
page checksum algorithm while maintaining full backward compatibility.

Key changes:
- Add cross-platform AVX2 CPU detection with XSAVE/YMM register checks
- Implement function pointer dispatch pattern following PostgreSQL conventions
- Use compiler auto-vectorization with pg_attribute_target("avx2")
- Add build system support in both autotools and meson
- Maintain external program compatibility (pg_filedump, etc.)

The implementation uses the same algorithm for both default and AVX2 paths,
allowing the compiler to automatically vectorize the AVX2 version while
preserving identical results. Runtime detection ensures optimal performance
on supported hardware with graceful fallback on older systems.

Addresses reviewer feedback on configure test simplification, Windows
compatibility, and PostgreSQL coding conventions.
---
 config/c-compiler.m4             |  26 ++++++
 configure                        |  52 +++++++++++
 configure.ac                     |   9 ++
 meson.build                      |  30 +++++++
 src/include/pg_config.h.in       |   3 +
 src/include/port/checksum_impl.h | 142 ++++++++++++++++++++++++++++++-
 6 files changed, 261 insertions(+), 1 deletion(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 236a59e8536..40927d56e6a 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -581,6 +581,32 @@ fi
 undefine([Ac_cachevar])dnl
 ])# PGAC_SSE42_CRC32_INTRINSICS
 
+# PGAC_AVX2_SUPPORT
+# ---------------------------
+# Check if the compiler supports AVX2 target attribute.
+# This is used for optimized checksum calculations with runtime detection.
+#
+# If AVX2 target attribute is supported, sets pgac_avx2_support.
+AC_DEFUN([PGAC_AVX2_SUPPORT],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx2_support])])dnl
+AC_CACHE_CHECK([for AVX2 target attribute support], [Ac_cachevar],
+[AC_COMPILE_IFELSE([AC_LANG_PROGRAM([#include <stdint.h>
+    #if defined(__has_attribute) && __has_attribute (target)
+    __attribute__((target("avx2")))
+    static int avx2_test(void)
+    {
+      return 0;
+    }
+    #endif],
+  [return avx2_test();])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])])
+if test x"$Ac_cachevar" = x"yes"; then
+  pgac_avx2_support=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_AVX2_SUPPORT
+
 # PGAC_AVX512_PCLMUL_INTRINSICS
 # ---------------------------
 # Check if the compiler supports AVX-512 carryless multiplication
diff --git a/configure b/configure
index 22cd866147b..209849c773c 100755
--- a/configure
+++ b/configure
@@ -17562,6 +17562,58 @@ $as_echo "#define HAVE_XSAVE_INTRINSICS 1" >>confdefs.h
 
 fi
 
+# Check for AVX2 target and intrinsic support
+#
+if test x"$host_cpu" = x"x86_64"; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for AVX2 support" >&5
+$as_echo_n "checking for AVX2 support... " >&6; }
+if ${pgac_cv_avx2_support+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <immintrin.h>
+    #include <stdint.h>
+    #if defined(__has_attribute) && __has_attribute (target)
+    __attribute__((target("avx2")))
+    #endif
+    static int avx2_test(void)
+    {
+      const char buf[sizeof(__m256i)];
+      __m256i accum = _mm256_loadu_si256((const __m256i *) buf);
+	  accum = _mm256_add_epi32(accum, accum);
+      int result = _mm256_extract_epi32(accum, 0);
+      return (int) result;
+    }
+int
+main ()
+{
+return avx2_test();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv_avx2_support=yes
+else
+  pgac_cv_avx2_support=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_avx2_support" >&5
+$as_echo "$pgac_cv_avx2_support" >&6; }
+if test x"$pgac_cv_avx2_support" = x"yes"; then
+  pgac_avx2_support=yes
+fi
+
+  if test x"$pgac_avx2_support" = x"yes"; then
+
+$as_echo "#define USE_AVX2_WITH_RUNTIME_CHECK 1" >>confdefs.h
+
+  fi
+fi
+
 # Check for AVX-512 popcount intrinsics
 #
 if test x"$host_cpu" = x"x86_64"; then
diff --git a/configure.ac b/configure.ac
index e44943aa6fe..ca7205d90ac 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2084,6 +2084,15 @@ else
   fi
 fi
 
+# Check for AVX2 target and intrinsic support
+#
+if test x"$host_cpu" = x"x86_64"; then
+  PGAC_AVX2_SUPPORT()
+  if test x"$pgac_avx2_support" = x"yes"; then
+    AC_DEFINE(USE_AVX2_WITH_RUNTIME_CHECK, 1, [Define to 1 to use AVX2 instructions with a runtime check.])
+  fi
+fi
+
 # Check for XSAVE intrinsics
 #
 PGAC_XSAVE_INTRINSICS()
diff --git a/meson.build b/meson.build
index 395416a6060..5670722944e 100644
--- a/meson.build
+++ b/meson.build
@@ -2293,6 +2293,36 @@ int main(void)
 endif
 
 
+###############################################################
+# Check for the availability of AVX2 support
+###############################################################
+
+if host_cpu == 'x86_64'
+
+  prog = '''
+#include <immintrin.h>
+#include <stdint.h>
+#if defined(__has_attribute) && __has_attribute (target)
+__attribute__((target("avx2")))
+#endif
+static int avx2_test(void)
+{
+    return 0;
+}
+
+int main(void)
+{
+    return avx2_test();
+}
+'''
+
+  if cc.links(prog, name: 'AVX2 support', args: test_c_args)
+    cdata.set('USE_AVX2_WITH_RUNTIME_CHECK', 1)
+  endif
+
+endif
+
+
 ###############################################################
 # Check for the availability of AVX-512 popcount intrinsics.
 ###############################################################
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index c4dc5d72bdb..987f9b5c77c 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -675,6 +675,9 @@
 /* Define to 1 to use AVX-512 CRC algorithms with a runtime check. */
 #undef USE_AVX512_CRC32C_WITH_RUNTIME_CHECK
 
+/* Define to 1 to use AVX2 instructions with a runtime check. */
+#undef USE_AVX2_WITH_RUNTIME_CHECK
+
 /* Define to 1 to use AVX-512 popcount instructions with a runtime check. */
 #undef USE_AVX512_POPCNT_WITH_RUNTIME_CHECK
 
diff --git a/src/include/port/checksum_impl.h b/src/include/port/checksum_impl.h
index 00cb0549f24..0e1eef45249 100644
--- a/src/include/port/checksum_impl.h
+++ b/src/include/port/checksum_impl.h
@@ -100,8 +100,23 @@
  * manually unroll the inner loop.
  */
 
+#include "pg_config.h"
 #include "storage/bufpage.h"
 
+#ifdef USE_AVX2_WITH_RUNTIME_CHECK
+
+#if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT)
+#include <cpuid.h>
+#endif
+
+#include <immintrin.h>
+
+#if defined(HAVE__CPUID) || defined(HAVE__CPUIDEX)
+#include <intrin.h>
+#endif
+
+#endif
+
 /* number of checksums to calculate in parallel */
 #define N_SUMS 32
 /* prime multiplier of FNV-1a hash */
@@ -114,6 +129,9 @@ typedef union
 	uint32		data[BLCKSZ / (sizeof(uint32) * N_SUMS)][N_SUMS];
 } PGChecksummablePage;
 
+/* Forward declaration */
+static uint32 pg_checksum_block_choose(const PGChecksummablePage *page);
+
 /*
  * Base offsets to initialize each of the parallel FNV hashes into a
  * different initial state.
@@ -129,6 +147,71 @@ static const uint32 checksumBaseOffsets[N_SUMS] = {
 	0x9FBF8C76, 0x15CA20BE, 0xF2CA9FD3, 0x959BD756
 };
 
+#ifdef USE_AVX2_WITH_RUNTIME_CHECK
+
+/*
+ * Does CPUID say there's support for XSAVE instructions?
+ */
+static inline bool
+xsave_available(void)
+{
+	unsigned int exx[4] = {0, 0, 0, 0};
+
+#if defined(HAVE__GET_CPUID)
+	__get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]);
+#elif defined(HAVE__CPUID)
+	__cpuid(exx, 1);
+#else
+#error cpuid instruction not available
+#endif
+	return (exx[2] & (1 << 27)) != 0;	/* osxsave */
+}
+
+/*
+ * Does XGETBV say the YMM registers are enabled?
+ *
+ * NB: Caller is responsible for verifying that xsave_available() returns true
+ * before calling this.
+ */
+#ifdef HAVE_XSAVE_INTRINSICS
+pg_attribute_target("xsave")
+#endif
+static inline bool
+ymm_regs_available(void)
+{
+#ifdef HAVE_XSAVE_INTRINSICS
+	return (_xgetbv(0) & 0x06) == 0x06;
+#else
+	return false;
+#endif
+}
+
+/*
+ * Check for AVX2 support using manual CPUID detection
+ */
+static inline bool
+avx2_available(void)
+{
+#ifdef USE_AVX2_WITH_RUNTIME_CHECK
+	unsigned int exx[4] = {0, 0, 0, 0};
+
+	if (!xsave_available() || !ymm_regs_available())
+		return false;
+
+#if defined(HAVE__GET_CPUID_COUNT)
+	__get_cpuid_count(7, 0, &exx[0], &exx[1], &exx[2], &exx[3]);
+#elif defined(HAVE__CPUIDEX)
+	__cpuidex(exx, 7, 0);
+#else
+#error cpuid instruction not available
+#endif
+	return (exx[1] & (1 << 5)) != 0; /* avx2 */
+#else
+	return false;
+#endif
+}
+#endif /* USE_AVX2_WITH_RUNTIME_CHECK */
+
 /*
  * Calculate one round of the checksum.
  */
@@ -143,7 +226,7 @@ do { \
  * (at least on 4-byte boundary).
  */
 static uint32
-pg_checksum_block(const PGChecksummablePage *page)
+pg_checksum_block_default(const PGChecksummablePage *page)
 {
 	uint32		sums[N_SUMS];
 	uint32		result = 0;
@@ -173,6 +256,63 @@ pg_checksum_block(const PGChecksummablePage *page)
 	return result;
 }
 
+#ifdef USE_AVX2_WITH_RUNTIME_CHECK
+/*
+ * AVX2-optimized block checksum algorithm.
+ * Same algorithm as default, but compiled with AVX2 target for auto-vectorization.
+ */
+pg_attribute_target("avx2")
+static uint32
+pg_checksum_block_avx2(const PGChecksummablePage *page)
+{
+	uint32		sums[N_SUMS];
+	uint32		result = 0;
+	uint32		i,
+				j;
+
+	/* ensure that the size is compatible with the algorithm */
+	Assert(sizeof(PGChecksummablePage) == BLCKSZ);
+
+	/* initialize partial checksums to their corresponding offsets */
+	memcpy(sums, checksumBaseOffsets, sizeof(checksumBaseOffsets));
+
+	/* main checksum calculation */
+	for (i = 0; i < (uint32) (BLCKSZ / (sizeof(uint32) * N_SUMS)); i++)
+		for (j = 0; j < N_SUMS; j++)
+			CHECKSUM_COMP(sums[j], page->data[i][j]);
+
+	/* finally add in two rounds of zeroes for additional mixing */
+	for (i = 0; i < 2; i++)
+		for (j = 0; j < N_SUMS; j++)
+			CHECKSUM_COMP(sums[j], 0);
+
+	/* xor fold partial checksums together */
+	for (i = 0; i < N_SUMS; i++)
+		result ^= sums[i];
+
+	return result;
+}
+#endif
+
+/* Function pointer - external linkage */
+static uint32 (*pg_checksum_block)(const PGChecksummablePage *page) = pg_checksum_block_choose;
+
+/* Choose the best available checksum implementation */
+static uint32
+pg_checksum_block_choose(const PGChecksummablePage *page)
+{
+#ifdef USE_AVX2_WITH_RUNTIME_CHECK
+	if (avx2_available())
+	{
+		pg_checksum_block = pg_checksum_block_avx2;
+		return pg_checksum_block(page);
+	}
+#endif
+	/* fallback to default implementation */
+	pg_checksum_block = pg_checksum_block_default;
+	return pg_checksum_block(page);
+}
+
 /*
  * Compute the checksum for a Postgres page.
  *
-- 
2.43.0



  [application/octet-stream] v8-0003-Benchmark-code-for-pg_checksum_bench-performance-tes.patch (4.8K, 4-v8-0003-Benchmark-code-for-pg_checksum_bench-performance-tes.patch)
  download | inline diff:
From d8c98e0111eadd1cd698465f63b39e71bcc49985 Mon Sep 17 00:00:00 2001
From: Andrew Kim <[email protected]>
Date: Thu, 23 Oct 2025 16:53:00 -0700
Subject: [PATCH 1/1] Benchmark code for pg_checksum_bench performance testing

This extension provides benchmarking functions to test and compare
the performance of different checksum implementations
---
 contrib/meson.build                           |  1 +
 contrib/pg_checksum_bench/meson.build         | 23 +++++++++++++
 .../pg_checksum_bench--1.0.sql                |  8 +++++
 contrib/pg_checksum_bench/pg_checksum_bench.c | 34 +++++++++++++++++++
 .../pg_checksum_bench.control                 |  4 +++
 .../sql/pg_checksum_bench.sql                 | 17 ++++++++++
 6 files changed, 87 insertions(+)
 create mode 100644 contrib/pg_checksum_bench/meson.build
 create mode 100644 contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql
 create mode 100644 contrib/pg_checksum_bench/pg_checksum_bench.c
 create mode 100644 contrib/pg_checksum_bench/pg_checksum_bench.control
 create mode 100644 contrib/pg_checksum_bench/sql/pg_checksum_bench.sql

diff --git a/contrib/meson.build b/contrib/meson.build
index ed30ee7d639..fe5149aadff 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -12,6 +12,7 @@ contrib_doc_args = {
   'install_dir': contrib_doc_dir,
 }
 
+subdir('pg_checksum_bench')
 subdir('amcheck')
 subdir('auth_delay')
 subdir('auto_explain')
diff --git a/contrib/pg_checksum_bench/meson.build b/contrib/pg_checksum_bench/meson.build
new file mode 100644
index 00000000000..32ccd9efa0f
--- /dev/null
+++ b/contrib/pg_checksum_bench/meson.build
@@ -0,0 +1,23 @@
+# Copyright (c) 2022-2025, PostgreSQL Global Development Group
+
+pg_checksum_bench_sources = files(
+  'pg_checksum_bench.c',
+)
+
+if host_system == 'windows'
+  pg_checksum_bench_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'pg_checksum_bench',
+    '--FILEDESC', 'pg_checksum_bench',])
+endif
+
+pg_checksum_bench = shared_module('pg_checksum_bench',
+  pg_checksum_bench_sources,
+  kwargs: contrib_mod_args,
+)
+contrib_targets += pg_checksum_bench
+
+install_data(
+  'pg_checksum_bench--1.0.sql',
+  'pg_checksum_bench.control',
+  kwargs: contrib_data_args,
+)
diff --git a/contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql b/contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql
new file mode 100644
index 00000000000..5f13cbe3c5e
--- /dev/null
+++ b/contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql
@@ -0,0 +1,8 @@
+/* contrib/pg_checksum_bench/pg_checksum_bench--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+-- \echo Use "CREATE EXTENSION pg_checksum_bench" to load this file. \quit
+
+CREATE FUNCTION drive_pg_checksum(page_count int)
+	RETURNS pg_catalog.void
+	AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/contrib/pg_checksum_bench/pg_checksum_bench.c b/contrib/pg_checksum_bench/pg_checksum_bench.c
new file mode 100644
index 00000000000..e5b150e6b13
--- /dev/null
+++ b/contrib/pg_checksum_bench/pg_checksum_bench.c
@@ -0,0 +1,34 @@
+#include "postgres.h"
+#include "fmgr.h"
+#include "port/checksum_impl.h"
+
+#include <stdio.h>
+#include <assert.h>
+
+PG_MODULE_MAGIC;
+
+#define REPEATS 1000000
+
+PG_FUNCTION_INFO_V1(drive_pg_checksum);
+Datum
+drive_pg_checksum(PG_FUNCTION_ARGS)
+{
+	int page_count = PG_GETARG_INT32(0);
+
+	PGChecksummablePage * pages = palloc(page_count * sizeof(PGChecksummablePage));
+	srand(0);
+	for (size_t i = 0; i < page_count * sizeof(PGChecksummablePage); i++){
+		char * byte_ptr = (char *) pages;
+		byte_ptr[i] = rand() % 256;
+	}
+
+	for (int i = 0; i < REPEATS; i++){
+		const PGChecksummablePage * test_page = pages + (i % page_count);
+		volatile uint32 result = pg_checksum_block(test_page);
+		(void) result;
+	}
+
+	pfree((void *) pages);
+
+	PG_RETURN_VOID();
+}
diff --git a/contrib/pg_checksum_bench/pg_checksum_bench.control b/contrib/pg_checksum_bench/pg_checksum_bench.control
new file mode 100644
index 00000000000..4a4e2c9363c
--- /dev/null
+++ b/contrib/pg_checksum_bench/pg_checksum_bench.control
@@ -0,0 +1,4 @@
+comment = 'pg_checksum benchmark'
+default_version = '1.0'
+module_pathname = '$libdir/pg_checksum_bench'
+relocatable = true
diff --git a/contrib/pg_checksum_bench/sql/pg_checksum_bench.sql b/contrib/pg_checksum_bench/sql/pg_checksum_bench.sql
new file mode 100644
index 00000000000..4b347699953
--- /dev/null
+++ b/contrib/pg_checksum_bench/sql/pg_checksum_bench.sql
@@ -0,0 +1,17 @@
+CREATE EXTENSION pg_checksum_bench;
+
+SELECT drive_pg_checksum(-1);
+
+\timing on
+
+SELECT drive_pg_checksum(1);
+SELECT drive_pg_checksum(2);
+SELECT drive_pg_checksum(4);
+SELECT drive_pg_checksum(8);
+SELECT drive_pg_checksum(16);
+SELECT drive_pg_checksum(32);
+SELECT drive_pg_checksum(64);
+SELECT drive_pg_checksum(128);
+SELECT drive_pg_checksum(256);
+SELECT drive_pg_checksum(512);
+SELECT drive_pg_checksum(1024);
-- 
2.43.0



view thread (35+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Proposal for enabling auto-vectorization for checksum calculations
  In-Reply-To: <CAK64mnejn9AZMYz03e7HX8Uui35PihUuOy=b+iBG=YtRKx0Log@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox