public inbox for [email protected]  
help / color / mirror / Atom feed
From: KAZAR Ayoub <[email protected]>
To: Nathan Bossart <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Tue, 18 Nov 2025 21:42:39 +0100
Message-ID: <CA+K2RunaLe7Wi1jSMXnNLgL5Bn17==PgrGEAG53HwXDuWpXdXg@mail.gmail.com> (raw)
In-Reply-To: <aRue0D4QQkUf2B_N@nathan>
References: <aPZrg6lxb5bgy_px@nathan>
	<[email protected]>
	<CAN55FZ2GonAeSJHn-c2nJgUO-v6sDMOQzn97evVdZbcHeu3ihw@mail.gmail.com>
	<aPfTiX0HwV42R6Od@nathan>
	<CAN55FZ0AYP4ZEczBJ5ur-=9QuEhMysH9Yfrq5srr0ZakK1M0FA@mail.gmail.com>
	<aPkvi5P7kpA8oQKc@nathan>
	<[email protected]>
	<CAKWEB6qdyhN3EoUNAK23etXX-kXH-_79NNbTsKqtF1g1WkuaBQ@mail.gmail.com>
	<CA+K2RumMC+avYGSX-AWNeod3w+XOGHrVPz8HiqkvJj7AZ5tZXA@mail.gmail.com>
	<CAKWEB6pev=pNVi4qDYWS50N=YFrKRbjH1h=5F1bXpnK7WR5CYg@mail.gmail.com>
	<aRue0D4QQkUf2B_N@nathan>

On Mon, Nov 17, 2025, 11:16 PM Nathan Bossart <[email protected]>
wrote:

> (assuming there is a desire to
> continue with it)?

I'm hoping to start spending more time on it soon.
>
Somethings worth noting for future reference (so someone else wouldn't
waste time thinking about it), previously I tried extra several micro
optimizations inside and around CopyReadLineText:

SIMD alignment*:* Forcing 16-byte aligned buffers so we could use aligned
memory instructions (_mm_load_si128 vs _mm_loadu_si128) provided no
measurable benefit on modern CPUs (there's definitely a thread somewhere
talking about it that i didn't encounter yet). This likely explains why
simd.h exclusively uses unaligned load intrinsics the performance
difference has become negligible since Nehalem processors.

Memory prefetching: Explicit prefetch instructions for the COPY buffer
pipeline (copy_raw_buf, input buffers, etc.) either showed no improvement
or slight regression. Multiple chunks are already within a cache line,
other buffers are too far to prefetch and the next part of the buffer is
easily prefetched, nothing special, so it turns out to be not worth having
more uops.

Instruction-level parallelism: Spreading too many independent vector
operations to increase ILP eventually degrades performance, likely due to
backend saturation observed through perf (execution port and execution
units contention most likely ?)
.....

This simply suggests that further optimization work should focus on the
pipeline as a whole for large benefits (parallel copy[0], maybe ?).

[0]
https://www.postgresql.org/message-id/[email protected]...

--
Regards,
Ayoub Kazar


view thread (99+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CA+K2RunaLe7Wi1jSMXnNLgL5Bn17==PgrGEAG53HwXDuWpXdXg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox