public inbox for [email protected]  
help / color / mirror / Atom feed
From: Nathan Bossart <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Wed, 11 Feb 2026 16:39:43 -0600
Message-ID: <aY0FL4rXUl6ykn-a@nathan> (raw)
In-Reply-To: <CAN55FZ1=O6TjeZM2CUT7T2tu66uJT+w3G9FiRXVs+gt_ousFxQ@mail.gmail.com>
References: <CAKWEB6oZdQhhBV3ojHLBwjQgKzfDw0fkqncurt9oi7vNsq41ww@mail.gmail.com>
	<CAN55FZ1p5UyUdTRO7iWR_ukjhJDOnpOR2rYNOq=+hcC45OuahQ@mail.gmail.com>
	<CAOW5sYZEx=fPw2wp7y2nK_-ifXFeYW4CTmFx_OQeoHFjG7rbHw@mail.gmail.com>
	<CA+K2Ru=C_woAnd-3-pGHoNSTR8FOf=7eeSWE1xaLt9ojVWndVg@mail.gmail.com>
	<CAN55FZ0FRB2OD6-oEESLvgUT4bLZQVD72pAqUqzdw7Rx5cN0ig@mail.gmail.com>
	<CA+K2Run1VdLnmp-5_Qv2Fax0KgT7LLJMH-uzjaaf-NZD1oU-=w@mail.gmail.com>
	<aYZdKSTw6N3khsVE@nathan>
	<CAN55FZ2DOeLjSXE2Jos99bgHG-Zeo3KjStrSgoA8Rf=2Mu+hFA@mail.gmail.com>
	<aYZvdsXPElQvwWOA@nathan>
	<CAN55FZ1=O6TjeZM2CUT7T2tu66uJT+w3G9FiRXVs+gt_ousFxQ@mail.gmail.com>

On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote:
> I am sharing a v6 which implements (1). My benchmark results show
> almost no difference for the special-character cases and a nice
> improvement for the no-special-character cases.

Thanks!

> +	/* Initialize SIMD variables */
> +	cstate->simd_enabled = false;
> +	cstate->simd_initialized = false;

> +	/* Initialize SIMD on the first read */
> +	if (unlikely(!cstate->simd_initialized))
> +	{
> +		cstate->simd_initialized = true;
> +		cstate->simd_enabled = true;
> +	}

Why do we do this initialization in CopyReadLine() as opposed to setting
simd_enabled to true when initializing cstate in BeginCopyFrom()?  If we
can initialize it in BeginCopyFrom, we could probably remove
simd_initialized.

> +	if (cstate->simd_enabled)
> +		result = CopyReadLineText(cstate, is_csv, true);
> +	else
> +		result = CopyReadLineText(cstate, is_csv, false);

I know we discussed this upthread, but I'd like to take a closer look at
this to see whether/why it makes such a big difference.  It's a bit awkward
that CopyReadLineText() needs to manage both its local simd_enabled and
cstate->simd_enabled.

+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);

As mentioned upthread [0], I think it's worth testing whether processing
multiple vectors worth of data in each loop iteration is worthwhile.

[0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan

-- 
nathan






view thread (21+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
  In-Reply-To: <aY0FL4rXUl6ykn-a@nathan>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox