Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vr6QU-006SfC-0a for pgsql-hackers@arkaria.postgresql.org; Sat, 14 Feb 2026 03:34:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vr6QR-00H186-1O for pgsql-hackers@arkaria.postgresql.org; Sat, 14 Feb 2026 03:34:31 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vr6QR-00H17y-0B for pgsql-hackers@lists.postgresql.org; Sat, 14 Feb 2026 03:34:31 +0000 Received: from mail-dl1-x1230.google.com ([2607:f8b0:4864:20::1230]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vr6QN-00000000adX-2gOJ for pgsql-hackers@postgresql.org; Sat, 14 Feb 2026 03:34:29 +0000 Received: by mail-dl1-x1230.google.com with SMTP id a92af1059eb24-12732165d1eso1943630c88.1 for ; Fri, 13 Feb 2026 19:34:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771040065; cv=none; d=google.com; s=arc-20240605; b=D7Ut/lbG5ICgjVvunVvxBNyz0r687o2Q54tBRbjliWo6mDZzQ0X/CjPoHskVdxX6Rx nRvGpSH1/+A9wvoud9Z+lxyuC9WHzAscOo80I8sNaALr4FkxVGkeh538wiKiDG56SIP/ sDTuuavKLK7pzH9vGDBcTTZW1KiYXuGSzcF79X+c6wYJjt5l276WlgtM1dnXeWRWNO0R bHMIroBPH2ZR7x6xZqA2bl6Dne0hFYurNFDea3JxyBCa/Wf5lMOE75wFbW4Gnh+KVBV4 t4sAOt56GaEci6K5ZXqZaS/jT0nvNR3EpdcDHBJepCZDepZ/j/FVNqOAh0wKhwNpC7LP 5RXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=qXgt/Xq6Q+CmGhhhXmn3Gy3NZ0u/+aUlcVINSZhEArs=; fh=m5pklNzrRy0NslFIHZcouYiUg5EtCZedFbty6VFrz+k=; b=PL7700AnNV6YoK1hwbFm/yBJYHt6KwmIX56N3PJrjqJZ6jyo4kszadHk8luJoySWHl prnAaEnJeSaI5Yykalr/OzNysalVWRpq0R4XESf+/2FqOsAlghSN3LoXsw2LRnpnCG7a R/p2Gh60SxnPOISmGgryiP2a/Bm+OuwdaLBJLitvrM99RfXXG8Bat24w8VAB33/0JGQq rCL+uJTgd54RlAMd9y06ts1R8nTg/QcDHiTtOqkfxRrnP6773Slo/DM4eZOcLhhl/mog tYQrsIril6muBxf3Aau9QGN8MP0c2NV9LQQ4Zs3W6ZSCD8S+kN1tD43J2SEC01/kPnwD IqIQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1771040065; x=1771644865; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qXgt/Xq6Q+CmGhhhXmn3Gy3NZ0u/+aUlcVINSZhEArs=; b=RC5ZIHS54y3yNCfaM+dO0bNYVpvMnFNEzFmwTNoOTlc3Gkx5zBzSMkuDwBJBaqFsLL CT7N5nEiZJgDuJgQiNksFx3nZ5gAjeveSUMR0rymdjJVPG7MnEaL1SVzhmypu/TJNkei ZH4CPyoUBtreudPxv7XTvs0WXu5QyzaFeT0s5VJIjRCKkmE6JSNgfRDhvEYn+svmfEx2 8UkDISzMgQRQu9r5eN6KHh8kp9R7LfsBFZezHheTheQ/lvzJbH2eIfnHJKHWxNQD8h3R 6pgEbhPGYsguqLkey8A6nYPVsahSFzDIggt4kJ2WrqHHWIlhvTHM9qkeyOLq3NDoVU8q osPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771040065; x=1771644865; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qXgt/Xq6Q+CmGhhhXmn3Gy3NZ0u/+aUlcVINSZhEArs=; b=bvuKFVLXhPMzO0njsQHzr70lgKt1IWeQXW4jhSYg5pzzp7mJsxwKtHaT/i+V/2hA/c +9HajbyUYgZGxRhNPk1H3g4DsPktY5v5sO07fufH6TwUK7BTgIyLXtfx6Gi3km2W+FIX miBSvCu0b59b/V9v55eIpUQXCA7qRtHG29L5P4qAg/EYxYEkIQDQ7hHgwESJvsbfF9C0 PhwioauEgZ92Z7ykJ5Yw70vDAYpU1vECMunhy/Ovh/7hvxMl/TMWWyanUQOHYTIUwWoi 05EkWcY14J6C5rL6Oc96keKy52J6iXhFuZY2fIi65EUimzqVKrMDqemev/TAXMOtK2ka uwKg== X-Forwarded-Encrypted: i=1; AJvYcCWO9Klr4dmSJ9bDtKeGjwYlH7pxNqXXop3VPPoVPPQU4bOtoP6+UldZTWg9sIGrUiLrmYM6skVb6vCTIVv2@postgresql.org X-Gm-Message-State: AOJu0YzDqRF9U7a5UORH7+KBiVwr0KEYlcDdYgV71MTCmWWw/Jo4pu/d JP6/7Pq8ljK53Pt7VGFaQDJ3Aap84AtUO9uehqi5o4e7sVKDVnugRGb9jLh4F3SlujMeam0QFZX P3MCXJqBTrUOxErWwYVh9xtA2Fsj4a5Euc5fOY5K+ X-Gm-Gg: AZuq6aKx3YK7E1IuJD38wSbCdSdZOTjkVwRlIo7gE09hHhNd0WiSWu6IzO3QXpOeTIt aPvNBF6KXyz7L3QIkQYZqoSGHQNabihKZS3Esc/767uko0SCPZIrC3y+xY7jDnQ2po3vN1Z81QT s8CmZs0qxNGiKS7pq67SPfbGhskl36lA7XJ/6fAUwfV9SafrGR8TgC2S4kygJExc3sPqNLJbNte ggT60nPo/4cFP75UyslIHAbhNPLvHOlii7J1wrDgq6M8Xl3ylu1kSHgo3qcWdd9LiWYCcGkh0NX yDDFjGsosw== X-Received: by 2002:a05:7300:e828:b0:2b7:32a6:82bc with SMTP id 5a478bee46e88-2babc53a491mr1316404eec.26.1771040064777; Fri, 13 Feb 2026 19:34:24 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Manni Wood Date: Fri, 13 Feb 2026 21:34:13 -0600 X-Gm-Features: AaiRm52s6sqXiaRjy195v_kAoZmJb8hukRidFWpOTkpJ-0eqCSI6xxga9iSPCMA Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nathan Bossart Cc: Nazir Bilal Yavuz , KAZAR Ayoub , Neil Conway , Andrew Dunstan , Shinya Kato , PostgreSQL-development Content-Type: multipart/alternative; boundary="000000000000f1c5c1064ac0650a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f1c5c1064ac0650a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello! I ran some COPY FROM tests using master and then Nazir's v7-0001 and v7-0002 patches applied to master. x86 master TXT : 29222.524250 ms CSV : 36162.588500 ms TXT with 1/3 escapes: 32922.649750 ms CSV with 1/3 quotes: 47631.423750 ms x86 v7-0001 TXT : 23247.834250 ms 20.445496% improvement CSV : 23162.711750 ms 35.948413% improvement TXT with 1/3 escapes: 31786.386000 ms 3.451313% improvement CSV with 1/3 quotes: 43330.475500 ms 9.029645% improvement x86 v7-0002 TXT : 22394.812500 ms 23.364552% improvement CSV : 22374.645750 ms 38.127643% improvement TXT with 1/3 escapes: 32378.929750 ms 1.651507% improvement CSV with 1/3 quotes: 47139.171750 ms 1.033461% improvement arm master TXT : 9448.900500 ms CSV : 11135.871500 ms TXT with 1/3 escapes: 10786.418750 ms CSV with 1/3 quotes: 14115.335500 ms arm v7-0001 TXT : 7271.170500 ms 23.047443% improvement CSV : 7259.866750 ms 34.806479% improvement TXT with 1/3 escapes: 10894.445500 ms -1.001507% regression CSV with 1/3 quotes: 13398.444000 ms 5.078813% improvement arm v7-0002 TXT : 7165.707250 ms 24.163587% improvement CSV : 7140.497250 ms 35.878416% improvement TXT with 1/3 escapes: 10308.782250 ms 4.428129% improvement CSV with 1/3 quotes: 12576.179500 ms 10.904140% improvement v7-0001 + v7-0002 applied to master certainly seems promising: nice to see speed improvements across the board on both x86 and arm! On Fri, Feb 13, 2026 at 5:09=E2=80=AFPM Nathan Bossart wrote: > On Fri, Feb 13, 2026 at 02:45:30PM +0300, Nazir Bilal Yavuz wrote: > > Also, if I change this code to: > > > > if (cstate->simd_enabled) > > { > > if (is_csv) > > result =3D CopyReadLineText(cstate, true, true); > > else > > result =3D CopyReadLineText(cstate, false, true); > > } > > else > > { > > if (is_csv) > > result =3D CopyReadLineText(cstate, true, false); > > else > > result =3D CopyReadLineText(cstate, false, false); > > } > > > > then I see ~%5 performance improvement in scalar path compared to maste= r. > > Hm. What difference do you see if you just do > > if (is_csv) > result =3D CopyReadLineText(cstate, true); > else > result =3D CopyReadLineText(cstate, false); > > both with and without the SIMD stuff? IIUC this is allowing the compiler > to remove several branches in CopyReadLineText(), which might be a nice > improvement on its own. That being said, I'm less convinced that adding = a > simd_enabled parameter to CopyReadLineText() helps, because 1) it's > involved in fewer branches and 2) we change it within the function, so th= e > compiler can't remove the branches, anyway. But perhaps I'm missing > something. > > Some other random thoughts: > > + match =3D vector8_or(vector8_eq(chunk, nl), > vector8_eq(chunk, cr)); > > + match =3D vector8_or(vector8_eq(chunk, nl), > vector8_eq(chunk, cr)); > > Since \n and \r are well below "normal" ASCII values, I wonder if we coul= d > simplify these to something like > > match =3D vector8_gt(... vector with all lanes set to \r + 1 ..., > chunk); > > + /* Check if we found any special characters */ > + mask =3D vector8_highbit_mask(match); > + if (mask !=3D 0) > > vector8_highbit_mask() is somewhat expensive on AArch64, so I wonder if > waiting until we enter the "if" block to calculate it has any benefit. > > + simd_hit_eol =3D (c1 =3D=3D '\r' || c1 =3D=3D '\n') && (= !is_csv || > !in_quote); > > If (is_csv && in_quote), we shouldn't have picked up \r or \n in the firs= t > place, right? > > + simd_hit_eof =3D c1 =3D=3D '\\' && c2 =3D=3D '.' && !is_= csv; > + > + /* > + * Do not disable SIMD when we hit EOL or EOF characters= . > In > + * practice, it does not matter for EOF because parsing > ends > + * there, but we keep the behavior consistent. > + */ > + if (!(simd_hit_eof || simd_hit_eol)) > > I'd think that doing less unnecessary work would outweigh the benefits of > consistency for the EOF case. > > -- > nathan > --=20 -- Manni Wood EDB: https://www.enterprisedb.com --000000000000f1c5c1064ac0650a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello!

I ran some COPY FROM tests using master and = then Nazir's v7-0001 and v7-0002 patches applied to master.

x86 = master
TXT : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 292= 22.524250 ms
CSV : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 36162.588500 ms
TXT with 1/3 escapes: 32922.649750 ms
CSV with 1/= 3 quotes: =C2=A047631.423750 ms

x86 v7-0001
TXT : =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 23247.834250 ms =C2=A020.445496% = improvement
CSV : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 23162.711750 ms =C2=A035.948413% improvement
TXT with 1/3 escapes: 3= 1786.386000 ms =C2=A03.451313% improvement
CSV with 1/3 quotes: =C2=A043= 330.475500 ms =C2=A09.029645% improvement

x86 v7-0002
TXT : =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 22394.812500 ms =C2=A0= 23.364552% improvement
CSV : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 22374.645750 ms =C2=A038.127643% improvement
TXT with 1/3 = escapes: 32378.929750 ms =C2=A01.651507% improvement
CSV with 1/3 quotes= : =C2=A047139.171750 ms =C2=A01.033461% improvement

arm master
TX= T : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 9448.900500 ms<= br>CSV : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 11135.8715= 00 ms
TXT with 1/3 escapes: 10786.418750 ms
CSV with 1/3 quotes: =C2= =A014115.335500 ms

arm v7-0001
TXT : =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 7271.170500 ms =C2=A023.047443% improvement
= CSV : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 7259.866750 m= s =C2=A034.806479% improvement
TXT with 1/3 escapes: 10894.445500 ms =C2= =A0-1.001507% regression
CSV with 1/3 quotes: =C2=A013398.444000 ms =C2= =A05.078813% improvement

arm v7-0002
TXT : =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 7165.707250 ms =C2=A024.163587% improvem= ent
CSV : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 7140.4= 97250 ms =C2=A035.878416% improvement
TXT with 1/3 escapes: 10308.782250= ms =C2=A04.428129% improvement
CSV with 1/3 quotes: =C2=A012576.179500 = ms =C2=A010.904140% improvement

v7-0001 + v7-0002 applied to master = certainly seems promising: nice to see speed improvements across the board = on both x86 and arm!

On Fri, Feb 13, 2026 at 5:09=E2= =80=AFPM Nathan Bossart <nat= handbossart@gmail.com> wrote:
On Fri, Feb 13, 2026 at 02:45:30PM +0300, Nazir Bilal = Yavuz wrote:
> Also, if I change this code to:
>
>=C2=A0 =C2=A0 =C2=A0if (cstate->simd_enabled)
>=C2=A0 =C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (is_csv)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0result =3D CopyReadLine= Text(cstate, true, true);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0else
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0result =3D CopyReadLine= Text(cstate, false, true);
>=C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0 =C2=A0else
>=C2=A0 =C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (is_csv)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0result =3D CopyReadLine= Text(cstate, true, false);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0else
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0result =3D CopyReadLine= Text(cstate, false, false);
>=C2=A0 =C2=A0 =C2=A0}
>
> then I see ~%5 performance improvement in scalar path compared to mast= er.

Hm.=C2=A0 What difference do you see if you just do

=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (is_csv)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 result =3D CopyRead= LineText(cstate, true);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 else
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 result =3D CopyRead= LineText(cstate, false);

both with and without the SIMD stuff?=C2=A0 IIUC this is allowing the compi= ler
to remove several branches in CopyReadLineText(), which might be a nice
improvement on its own.=C2=A0 That being said, I'm less convinced that = adding a
simd_enabled parameter to CopyReadLineText() helps, because 1) it's
involved in fewer branches and 2) we change it within the function, so the<= br> compiler can't remove the branches, anyway.=C2=A0 But perhaps I'm m= issing
something.

Some other random thoughts:

+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 matc= h =3D vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));

+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match =3D vector8_= or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));

Since \n and \r are well below "normal" ASCII values, I wonder if= we could
simplify these to something like

=C2=A0 =C2=A0 =C2=A0 =C2=A0 match =3D vector8_gt(... vector with all lanes = set to \r + 1 ..., chunk);

+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Check if we found any special= characters */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mask =3D vector8_highbit_mask(ma= tch);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (mask !=3D 0)

vector8_highbit_mask() is somewhat expensive on AArch64, so I wonder if
waiting until we enter the "if" block to calculate it has any ben= efit.

+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 simd_hit_eol =3D (= c1 =3D=3D '\r' || c1 =3D=3D '\n') && (!is_csv || !i= n_quote);

If (is_csv && in_quote), we shouldn't have picked up \r or \n i= n the first
place, right?

+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 simd_hit_eof =3D c= 1 =3D=3D '\\' && c2 =3D=3D '.' && !is_csv;<= br> +
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* Do not dis= able SIMD when we hit EOL or EOF characters. In
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* practice, = it does not matter for EOF because parsing ends
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* there, but= we keep the behavior consistent.
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!(simd_hit_eof= || simd_hit_eol))

I'd think that doing less unnecessary work would outweigh the benefits = of
consistency for the EOF case.

--
nathan


--
-- Manni Wood EDB: https://www.enterprisedb.com
--000000000000f1c5c1064ac0650a--