public inbox for [email protected]  
help / color / mirror / Atom feed
pgsql: Optimize popcount functions with ARM Neon intrinsics.
2+ messages / 1 participants
[nested] [flat]

* pgsql: Optimize popcount functions with ARM Neon intrinsics.
@ 2025-03-28 19:50  Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Nathan Bossart @ 2025-03-28 19:50 UTC (permalink / raw)
  To: [email protected]

Optimize popcount functions with ARM Neon intrinsics.

This commit introduces Neon implementations of pg_popcount{32,64},
pg_popcount(), and pg_popcount_masked().  As in simd.h, we assume
that all available AArch64 hardware supports Neon, so we don't need
any new configure-time or runtime checks.  Some compilers already
emit Neon instructions for these functions, but our hand-rolled
implementations for pg_popcount() and pg_popcount_masked()
performed better in testing, likely due to better instruction-level
parallelism.

Author: "[email protected]" <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazons...

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/6be53c27673a5fca64a00a684c36c29db6ca33a5

Modified Files
--------------
src/include/port/pg_bitutils.h |   9 ++
src/port/Makefile              |   1 +
src/port/meson.build           |   1 +
src/port/pg_bitutils.c         |  22 +++--
src/port/pg_popcount_aarch64.c | 208 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 235 insertions(+), 6 deletions(-)



^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* pgsql: Optimize popcount functions with ARM SVE intrinsics.
@ 2025-03-28 21:20  Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Nathan Bossart @ 2025-03-28 21:20 UTC (permalink / raw)
  To: [email protected]

Optimize popcount functions with ARM SVE intrinsics.

This commit introduces SVE implementations of pg_popcount{32,64}.
Unlike the Neon versions, we need an additional configure-time
check to determine if the compiler supports SVE intrinsics, and we
need a runtime check to determine if the current CPU supports SVE
instructions.  Our testing showed that the SVE implementations are
much faster for larger inputs and are comparable to the status
quo for smaller inputs.

Author: "[email protected]" <[email protected]>
Co-authored-by: "[email protected]" <[email protected]>
Co-authored-by: "Malladi, Rama" <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazons...
Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/519338ace410d9b1ffb13176b8802b0307ff0531

Modified Files
--------------
config/c-compiler.m4           |  52 ++++++++
configure                      |  71 +++++++++++
configure.ac                   |   9 ++
meson.build                    |  48 +++++++
src/include/pg_config.h.in     |   3 +
src/include/port/pg_bitutils.h |  17 +++
src/port/pg_popcount_aarch64.c | 281 ++++++++++++++++++++++++++++++++++++++++-
7 files changed, 475 insertions(+), 6 deletions(-)



^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2025-03-28 21:20 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-28 19:50 pgsql: Optimize popcount functions with ARM Neon intrinsics. Nathan Bossart <[email protected]>
2025-03-28 21:20 pgsql: Optimize popcount functions with ARM SVE intrinsics. Nathan Bossart <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox