public inbox for [email protected]
help / color / mirror / Atom feedFrom: Nathan Bossart <[email protected]>
To: Andrew Dunstan <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: KAZAR Ayoub <[email protected]>
Cc: Shinya Kato <[email protected]>
Cc: [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Date: Mon, 20 Oct 2025 12:04:03 -0500
Message-ID: <aPZrg6lxb5bgy_px@nathan> (raw)
In-Reply-To: <[email protected]>
References: <CAOzEurR5nFt=-SijfU7y0BHVcrT6RG9ovvdVfKt_uBZfEQew9w@mail.gmail.com>
<CAOzEurSqgA69er9SzhPnXwmsVpO7-piUOuOy3dXcHOi__nSQcg@mail.gmail.com>
<CA+K2RumC79NwWxBdofHOYo8SCSs0YCJic05Du=xOszRmoPf9FA@mail.gmail.com>
<CAN55FZ0houfWHn8_MEEefhprZvc33jr07GrBYo+Bp2yw=TVnKA@mail.gmail.com>
<CA+K2Ru=jHuz_Wpgar4Sobtxeb33qxx=o59ToOhZ=vpmkMqErnA@mail.gmail.com>
<CAN55FZ1J+6eM=F5GreWEBMJcNV_gifYyYY1b6xpYzun=nWPhMQ@mail.gmail.com>
<CAN55FZ109W90Ux_EBEqkkU2TyNqBNhdhN_1XPRGo3iiZ2L9b=A@mail.gmail.com>
<[email protected]>
<CAN55FZ1KF7XNpm2XyG=M-sFUODai=6Z8a11xE3s4YRBeBKY3tA@mail.gmail.com>
<[email protected]>
On Mon, Oct 20, 2025 at 10:02:23AM -0400, Andrew Dunstan wrote:
> On 2025-10-16 Th 10:29 AM, Nazir Bilal Yavuz wrote:
>> With this heuristic the regression is limited by %2 in the worst case.
>
> My worry is that the worst case is actually quite common. Sparse data sets
> dominated by a lot of null values (and hence lots of special characters) are
> very common. Are people prepared to accept a 2% regression on load times for
> such data sets?
Without knowing how common it is, I think it's difficult to judge whether
2% is a reasonable trade-off. If <5% of workloads might see a small
regression while the other >95% see double-digit percentage improvements,
then I might argue that it's fine. But I'm not sure we have any way to
know those sorts of details at the moment.
I'm also at least a little skeptical about the 2% number. IME that's
generally within the noise range and can vary greatly between machines and
test runs.
--
nathan
view thread (99+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
In-Reply-To: <aPZrg6lxb5bgy_px@nathan>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox