public inbox for [email protected]  
help / color / mirror / Atom feed
From: Nazir Bilal Yavuz <[email protected]>
To: Nathan Bossart <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Thu, 12 Mar 2026 13:59:53 +0300
Message-ID: <CAN55FZ1sn-2tVX_n9C5UNBCfDPjSDOCT4zkMeDsB7MaZ9SUBTw@mail.gmail.com> (raw)
In-Reply-To: <abHTvkeIK37hj9oS@nathan>
References: <aaXrGSyq4u2d9qEC@nathan>
	<CAN55FZ2DNaKCK3Kf_kHizb2pAbQvULeDYtzaiz97B_xz7YbrkQ@mail.gmail.com>
	<aa8QlTVEDhG1JU0Z@nathan>
	<CAN55FZ08kqmA+B9pzPDy-QstxAd=cK-RqjbR3cWBjPF_8-FXAw@mail.gmail.com>
	<abBQbdFa6OsG8TGu@nathan>
	<CAN55FZ3jXs7XDsP_-v_jUBquRu4uAdheN3xcmW=WhAyKwFLSjg@mail.gmail.com>
	<abGv0ScUWVa6eogw@nathan>
	<CAN55FZ3gdK8dGrEo0M6KFW97OaF8TUbjO_dFoxQKi63davE-jA@mail.gmail.com>
	<abG8R6HkOHyUuyWb@nathan>
	<CAN55FZ0yfETy4UEA5rOJ9S06JSOtiWF8TW_+yi3yjVAcrLqKLA@mail.gmail.com>
	<abHTvkeIK37hj9oS@nathan>

Hi,

On Wed, 11 Mar 2026 at 23:42, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote:
> > Here is v14 which is v13-0001 + v13-0002.
>
> Thanks!  It's getting close.
>
> > +             /*
> > +              * Temporary variables are used here instead of passing the actual
> > +              * variables (especially input_buf_ptr) directly to the helper. Taking
> > +              * the address of a local variable might force the compiler to
> > +              * allocate it on the stack rather than in a register.  Because
> > +              * input_buf_ptr is used heavily in the hot scalar path below, keeping
> > +              * it in a register is important for performance.
> > +              */
> > +             int                     temp_input_buf_ptr;
> > +             bool            temp_hit_eof = hit_eof;
>
> A few notes:
>
> * Does using a temporary variable for hit_eof actually make a difference?
> AFAICT that's only updated when loading more data.
>
> * Does inlining the function produce the same results?
>
> * Also, I'm curious what the usual benchmarks look like with and without
> this hack for the latest patch.

I tried to benchmark all of these questions, here are the results:

Old master means d841ca2d14 - inlining CopyReadLineText commit (dc592a4155).

v14 means d841ca2d14 + v14.

v14 + #1 means removing temporary variables.

v14 + #2 means removing temp_hit_eof variable only.

v14 + #3 means inlining CopyReadLineTextSIMDHelper().

v14 + #4 means inlining CopyReadLineTextSIMDHelper() + removing
temporary variables (#1).

------------------------------------------------------------

Results for default_toast_compression = 'lz4':

+-------------------------------------------+
|             Optimization: -O2             |
+------------+--------------+---------------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|    WIDE    | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 4260 |  4789 |  5930 |  8276 |
+------------+------+-------+-------+-------+
|     v14    | 2489 |  4439 |  2529 |  8098 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 2472 |  5177 |  2479 |  9285 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 2521 |  4252 |  2481 |  8050 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 2632 |  4569 |  2458 |  8657 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 2476 |  4239 |  2475 | 10544 |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|   NARROW   | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
|     v14    | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+

------------------------------------------------------------

Results for default_toast_compression = 'pglz':

+-------------------------------------------+
|             Optimization: -O2             |
+------------+--------------+---------------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|    WIDE    | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 4260 |  4789 |  5930 |  8276 |
+------------+------+-------+-------+-------+
|     v14    | 2489 |  4439 |  2529 |  8098 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 2472 |  5177 |  2479 |  9285 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 2521 |  4252 |  2481 |  8050 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 2632 |  4569 |  2458 |  8657 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 2476 |  4239 |  2475 | 10544 |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |      |       |       |       |
+------------+------+-------+-------+-------+
|            |     Text     |      CSV      |
+------------+------+-------+-------+-------+
|   NARROW   | None |  1/3  |  None |  1/3  |
+------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+------------+------+-------+-------+-------+
|     v14    | 9917 | 10080 | 10104 | 10510 |
+------------+------+-------+-------+-------+
|  v14 + #1  | 9913 | 10090 | 10120 | 10532 |
+------------+------+-------+-------+-------+
|  v14 + #2  | 9937 | 10130 | 10072 | 10520 |
+------------+------+-------+-------+-------+
|  v14 + #3  | 9880 | 10258 | 10220 | 10604 |
+------------+------+-------+-------+-------+
|  v14 + #4  | 9827 | 10306 | 10308 | 10734 |
+------------+------+-------+-------+-------+


------------------------------------------------------------

By looking these results:

v14 + #1 and v14 + #3 performs worse on wide & 1/3 cases.

v14 + #4 performs worse on CSV & wide & 1/3 cases.

v14 and v14 + #2 perform very similarly. They don't have regression. I
think we can move forward with one of these.

--
Regards,
Nazir Bilal Yavuz
Microsoft





view thread (114+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <CAN55FZ1sn-2tVX_n9C5UNBCfDPjSDOCT4zkMeDsB7MaZ9SUBTw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox