public inbox for [email protected]  
help / color / mirror / Atom feed
From: Nathan Bossart <[email protected]>
To: KAZAR Ayoub <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Pg Hackers <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: Manni Wood <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: Mark Wong <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Subject: Re: Speed up COPY TO text/CSV parsing using SIMD
Date: Tue, 17 Mar 2026 13:49:24 -0500
Message-ID: <abmiNPQOqBrRlf_m@nathan> (raw)
In-Reply-To: <CA+K2Rum7+Jm2rm65K5msxaiAM8QTkhSNAYarPBP9O7nBXYo12Q@mail.gmail.com>
References: <CA+K2Runi_H2CBL0yMm3De2KqcR9RMA0HK5cLJjEhoNszC7myeg@mail.gmail.com>
	<[email protected]>
	<CA+K2Rum_QTZqTUrdMOL5hr-OOpCwGR_9Nj1z15BFObjktMOY6A@mail.gmail.com>
	<abBuKalOno33MQFw@nathan>
	<CA+K2Rum7+Jm2rm65K5msxaiAM8QTkhSNAYarPBP9O7nBXYo12Q@mail.gmail.com>

On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote:
> Just a small concern about where some varlenas have a larger binary size
> than its text representation ex:
> SELECT pg_column_size(to_tsvector('SIMD is GOOD'));
>  pg_column_size
> ----------------
>              32
> 
> its text representation is less than sizeof(Vector8) so currently v3 would
> enter SIMD path and exit out just from the beginning (two extra branches)
> because it does this:
> + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 &&
> + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8))
> 
> I thought maybe we could do * 2 or * 4 its binary size, depends on the type
> really but this is just a proposition if this case is something concerning.

Can we measure the impact of this?  How likely is this case?

> +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string,
> +															bool use_simd, size_t len);
> +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string,
> +														   bool use_quote, bool use_simd, size_t len);

Can you test this on its own, too?  We might be able to separate this and
the change below into a prerequisite patch, assuming they show benefits.

>  			if (is_csv)
> -				CopyAttributeOutCSV(cstate, string,
> -									cstate->opts.force_quote_flags[attnum - 1]);
> +			{
> +				if (use_simd)
> +					CopyAttributeOutCSV(cstate, string,
> +										cstate->opts.force_quote_flags[attnum - 1],
> +										true, len);
> +				else
> +					CopyAttributeOutCSV(cstate, string,
> +										cstate->opts.force_quote_flags[attnum - 1],
> +										false, len);
> +			}
>  			else
> -				CopyAttributeOutText(cstate, string);
> +			{
> +				if (use_simd)
> +					CopyAttributeOutText(cstate, string, true, len);
> +				else
> +					CopyAttributeOutText(cstate, string, false, len);
> +			}

There isn't a terrible amount of branching on use_simd in these functions,
so I'm a little skeptical this makes much difference.  As above, it would
be good to measure it.

-- 
nathan





view thread (13+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Speed up COPY TO text/CSV parsing using SIMD
  In-Reply-To: <abmiNPQOqBrRlf_m@nathan>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox