public inbox for [email protected]  
help / color / mirror / Atom feed
From: Nathan Bossart <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Mon, 9 Mar 2026 13:25:25 -0500
Message-ID: <aa8QlTVEDhG1JU0Z@nathan> (raw)
In-Reply-To: <CAN55FZ2DNaKCK3Kf_kHizb2pAbQvULeDYtzaiz97B_xz7YbrkQ@mail.gmail.com>
References: <CAKWEB6pq7C0Wv1wT9Y1_c_1fn-+cR8pb210Pj3w2FcEOmNGxbQ@mail.gmail.com>
	<CAN55FZ2DT4-k06umn=7NYG+NoM6gnVJVQCCwRrr2qOraO+Jadw@mail.gmail.com>
	<aZikzQP6WPJ5Rq2S@nathan>
	<CAN55FZ3cBN_TncLVWyXAKm-KfewguN1AUjyRhoR6zL_QCxHh7A@mail.gmail.com>
	<CAKWEB6qzsZEQ4Czo9QBFiMXqdXVJknHUJwg6wjRwNzLn4+Jw0g@mail.gmail.com>
	<CAN55FZ2O2Ls==sdpROHqxWRx-PMBZ0riJ6eVKoHj8=vssTavxw@mail.gmail.com>
	<aZ3kYQnF9_u6sUQp@nathan>
	<CAN55FZ3+NYF1TkKyNtpRQuLiaauSYk9G5tA+fpruOA4-14Y_ZA@mail.gmail.com>
	<aaXrGSyq4u2d9qEC@nathan>
	<CAN55FZ2DNaKCK3Kf_kHizb2pAbQvULeDYtzaiz97B_xz7YbrkQ@mail.gmail.com>

On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> +#ifndef USE_NO_SIMD
> +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
> +									   bool *temp_hit_eof, int *temp_input_buf_ptr);
> +#endif

Should we inline this, too?

> +				/*
> +				 * Do not disable SIMD when we hit EOL or EOF characters. In
> +				 * practice, it does not matter for EOF because parsing ends
> +				 * there, but we keep the behavior consistent.
> +				 */
> +				if (!(simd_hit_eof || simd_hit_eol))
> +					cstate->simd_enabled = false;

nitpick: I would personally avoid disabling it for EOF.  It probably
doesn't amount to much, but I don't see any point in the extra
complexity/work solely for consistency.

> +				/*
> +				 * We encountered a EOL or EOF on the first vector. This means
> +				 * lines are not long enough to skip fully sized vector. If
> +				 * this happens two times consecutively, then disable the
> +				 * SIMD.
> +				 */
> +				if (first_vector)
> +				{
> +					if (cstate->simd_failed_first_vector)
> +						cstate->simd_enabled = false;
> +
> +					cstate->simd_failed_first_vector = true;
> +				}

The first time I saw this, my mind immediately went to the extreme case
where this likely regresses: alternating long and short lines.  We might
just want to disable it the first time we see a short line, like we do for
special characters.  This is another thing that we can improve
independently later on.

> +	/* First try to run SIMD, then continue with the scalar path */
> +	if (cstate->simd_enabled)
> +	{
> +		int			temp_input_buf_ptr = input_buf_ptr;
> +		bool		temp_hit_eof = false;
> +
> +		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
> +											&temp_input_buf_ptr);
> +		input_buf_ptr = temp_input_buf_ptr;
> +		hit_eof = temp_hit_eof;

Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
moving the SIMD stuff to its own function is nice.  The temp variables seem
a bit too magical to me, though.  If those really make a difference, IMHO
there ought to be a big comment explaining why.

-- 
nathan





view thread (114+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <aa8QlTVEDhG1JU0Z@nathan>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox