public inbox for [email protected]
help / color / mirror / Atom feedpgsql: Optimize popcount functions with ARM Neon intrinsics.
2+ messages / 1 participants
[nested] [flat]
* pgsql: Optimize popcount functions with ARM Neon intrinsics.
@ 2025-03-28 19:50 Nathan Bossart <[email protected]>
0 siblings, 0 replies; 2+ messages in thread
From: Nathan Bossart @ 2025-03-28 19:50 UTC (permalink / raw)
To: [email protected]
Optimize popcount functions with ARM Neon intrinsics.
This commit introduces Neon implementations of pg_popcount{32,64},
pg_popcount(), and pg_popcount_masked(). As in simd.h, we assume
that all available AArch64 hardware supports Neon, so we don't need
any new configure-time or runtime checks. Some compilers already
emit Neon instructions for these functions, but our hand-rolled
implementations for pg_popcount() and pg_popcount_masked()
performed better in testing, likely due to better instruction-level
parallelism.
Author: "[email protected]" <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazons...
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/6be53c27673a5fca64a00a684c36c29db6ca33a5
Modified Files
--------------
src/include/port/pg_bitutils.h | 9 ++
src/port/Makefile | 1 +
src/port/meson.build | 1 +
src/port/pg_bitutils.c | 22 +++--
src/port/pg_popcount_aarch64.c | 208 +++++++++++++++++++++++++++++++++++++++++
5 files changed, 235 insertions(+), 6 deletions(-)
^ permalink raw reply [nested|flat] 2+ messages in thread
* pgsql: Optimize popcount functions with ARM SVE intrinsics.
@ 2025-03-28 21:20 Nathan Bossart <[email protected]>
0 siblings, 0 replies; 2+ messages in thread
From: Nathan Bossart @ 2025-03-28 21:20 UTC (permalink / raw)
To: [email protected]
Optimize popcount functions with ARM SVE intrinsics.
This commit introduces SVE implementations of pg_popcount{32,64}.
Unlike the Neon versions, we need an additional configure-time
check to determine if the compiler supports SVE intrinsics, and we
need a runtime check to determine if the current CPU supports SVE
instructions. Our testing showed that the SVE implementations are
much faster for larger inputs and are comparable to the status
quo for smaller inputs.
Author: "[email protected]" <[email protected]>
Co-authored-by: "[email protected]" <[email protected]>
Co-authored-by: "Malladi, Rama" <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazons...
Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/519338ace410d9b1ffb13176b8802b0307ff0531
Modified Files
--------------
config/c-compiler.m4 | 52 ++++++++
configure | 71 +++++++++++
configure.ac | 9 ++
meson.build | 48 +++++++
src/include/pg_config.h.in | 3 +
src/include/port/pg_bitutils.h | 17 +++
src/port/pg_popcount_aarch64.c | 281 ++++++++++++++++++++++++++++++++++++++++-
7 files changed, 475 insertions(+), 6 deletions(-)
^ permalink raw reply [nested|flat] 2+ messages in thread
end of thread, other threads:[~2025-03-28 21:20 UTC | newest]
Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-28 19:50 pgsql: Optimize popcount functions with ARM Neon intrinsics. Nathan Bossart <[email protected]>
2025-03-28 21:20 pgsql: Optimize popcount functions with ARM SVE intrinsics. Nathan Bossart <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox