Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umZRA-000Dp2-JW for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 15:00:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umZR8-0076eT-E8 for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 15:00:14 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umZR7-0076eL-VO for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 15:00:14 +0000 Received: from mail-ed1-x532.google.com ([2a00:1450:4864:20::532]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1umZR5-000Yt3-16 for pgsql-hackers@postgresql.org; Thu, 14 Aug 2025 15:00:12 +0000 Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-6188b657bddso1944186a12.1 for ; Thu, 14 Aug 2025 08:00:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=esi.dz; s=google; t=1755183608; x=1755788408; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=I+qiemPBgT4RXS+Usd6RdY92f+y4rtyxriuVyd+Yc8c=; b=HZM7UtwbNY3z43S+sUyGqVF5C7qzxvv8CyJUyZEcWr6BFAaDLLMjv4Wrg6Ytfn+8bN NbbWiTzfL/ks+mP07P1x6Mqh6KYoAW5w/4z3WDF9cvQvURR+AHBaVYncJ863pRq/6qZK m96KgP10Sx6a0X/M7Q0ogzEzcpM9K75/+PDI3sOcPEZkDrPUAuZ2Hu2gLYDZJDjew9IR urKBGEAGDq1RehyxhdsS6AVj3h61+lWhyxT1WhGpZpX/YgrnHKWcnMmipYLV/t05UUoC eqfMvXGyqTYjxU8ERHZUmevDoILACC1m1M0GvQ0IfPtZrSxQqM/YD8fjM1ISDY8q1aRc M8WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755183608; x=1755788408; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=I+qiemPBgT4RXS+Usd6RdY92f+y4rtyxriuVyd+Yc8c=; b=Xzu4kMJTZs3lNjOss5oiq+uQFYOy+mF2ILbp8pg7bpwdhFsDiPNPIBQNa1UWiY1vdV n+T78f/BKfWgzY7wRATkhKPLDvWnjmDZbJi1PlMYEenAmROTQhjJmYcgObyxgX6uR0nl OmQ4WXCX9OGbBqZoeB32g2DWHISTkjDwo74Lt7G+U8MrzzXLhH3zb8/DbQu/QPyHG/ai rvwk8VBX87qBcofWjf5qKbXCGBUa0sOge41eH3v9qRztWbSwrRiOXqBsgqJdxDNdjxU7 sXNa62d2uIciiTJT6eGnTXhYKTkyLevpj/jfY9fad6jSPfm/BeBqyjtr1ELekG4ILPMm fm1g== X-Forwarded-Encrypted: i=1; AJvYcCUDWmdtS6DFZWdfuPI72bV2eiFwfRUOEACDKjx1k5oxq1I1X/Y7sHtH+WYzWq2GQ5tKIlc5AHnm9zGCY4FG@postgresql.org X-Gm-Message-State: AOJu0Yx67fXOb8rJEK4wa6JNKDRESB7scft55JF3YvGMEpjaP9UwQHrz 4O/a7ttqMarNJ+vNXH4XPYagV0X6juGQ3gnHxuHTitkQGS47mf2ntmVxABLWVUtSp/3mxVrddIM DzgYlYFMClXNMVBs/G1dgzxosm2HczaQfiG3gOScG X-Gm-Gg: ASbGncuhdiYh0e2CuXLik5xgvFlFuuA3QRYNIXr5FGAG0mrbrgFoMJL0jb97greZ3dK fOPshJPQCTIuBHprccVdUYdpU5XUabJz3JdBTvXczfEqtXOO+BclUdOO/Me4fBy9ywl60p0uc69 9AWuI/UgkfIql5UwKrCpca1BF6KaxHcmduuEfNwStprHQ4OKM/4l3ePywL+bRVKYLECIRfkb1Ct 9T2uLkKrQ== X-Google-Smtp-Source: AGHT+IFDiFCJKjHFtucDO+RRXFJCCX7NaYjQGkJljZ24avBdp2Ol79Z3I8Fmfss620s5SVWh8CLq/cttjh9KRqVw9jQ= X-Received: by 2002:a05:6402:26c1:b0:617:eb72:b207 with SMTP id 4fb4d7f45d1cf-6188b9a43femr3003541a12.11.1755183608471; Thu, 14 Aug 2025 08:00:08 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: KAZAR Ayoub Date: Thu, 14 Aug 2025 15:59:55 +0100 X-Gm-Features: Ac12FXxYBWj52vdAjH1zn0o4XDwfsVgVwzAamzd5U2GEA1fhIVIw_4u83pkk0rA Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nazir Bilal Yavuz Cc: Shinya Kato , pgsql-hackers@postgresql.org Content-Type: multipart/mixed; boundary="0000000000007ff47f063c548798" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007ff47f063c548798 Content-Type: multipart/alternative; boundary="0000000000007ff47e063c548796" --0000000000007ff47e063c548796 Content-Type: text/plain; charset="UTF-8" > Hi, > > On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub wrote: > > > > Following Nazir's findings about 4096 bytes being the performant line > length, I did more benchmarks from my side on both TEXT and CSV formats > with two different cases of normal data (no special characters) and data > with many special characters. > > > > Results are con good as expected and similar to previous benchmarks > > ~30.9% faster copy in TEXT format > > ~32.4% faster copy in CSV format > > 20%-30% reduces cycles per instructions > > > > In the case of doing a lot of special characters in the lines (e.g., > tables with large numbers of columns maybe), we obviously expect > regressions here because of the overhead of many fallbacks to scalar > processing. > > Results for a 1/3 of line length of special characters: > > ~43.9% slower copy in TEXT format > > ~16.7% slower copy in CSV format > > So for even less occurrences of special characters or wider distance > between there might still be some regressions in this case, a > non-significant case maybe, but can be treated in other patches if we > consider to not use SIMD path sometimes. > > > > I hope this helps more and confirms the patch. > > Thanks for running that benchmark! Would you mind sharing a reproducer > for the regression you observed? > > -- > Regards, > Nazir Bilal Yavuz > Microsoft Of course, I attached the sql to generate the text and csv test files. If having a 1/3 of line length of special characters can be an exaggeration, something lower might still reproduce some regressions of course for the same idea. Best regards, Ayoub Kazar --0000000000007ff47e063c548796 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi,

On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>
> Following Nazir's findings about 4096 bytes being the performant= =20 line length, I did more benchmarks from my side on both TEXT and CSV=20 formats with two different cases of normal data (no special characters)=20 and data with many special characters.
>
> Results are con good as expected and similar to previous benchmarks >=C2=A0 ~30.9% faster copy in TEXT format
>=C2=A0 ~32.4% faster copy in CSV format
> 20%-30% reduces cycles per instructions
>
> In the case of doing a lot of special characters in the lines=20 (e.g., tables with large numbers of columns maybe), we obviously expect=20 regressions here because of the overhead of many fallbacks to scalar=20 processing.
> Results for a 1/3 of line length of special characters:
> ~43.9% slower copy in TEXT format
> ~16.7% slower copy in CSV format
> So for even less occurrences of special characters or wider=20 distance between there might still be some regressions in this case, a=20 non-significant case maybe, but can be treated in other patches if we=20 consider to not use SIMD path sometimes.
>
> I hope this helps more and confirms the patch.

Thanks for running that benchmark! Would you mind sharing a reproducer
for the regression you observed?

--
Regards,
Nazir Bilal Yavuz
Microsoft

Of course, I attached the sql to gener= ate the text and csv test files.
If having a 1/3 of line length of special characters can be an=20 exaggeration, something lower might still reproduce some regressions of cou= rse for the same idea.

Best regards,
Ayoub = Kazar
--0000000000007ff47e063c548796-- --0000000000007ff47f063c548798 Content-Type: application/sql; name="simd-copy-from-bench.sql" Content-Disposition: attachment; filename="simd-copy-from-bench.sql" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_mebiyt0i0 RFJPUCBUQUJMRSBJRiBFWElTVFMgdDsKQ1JFQVRFIFVOTE9HR0VEIFRBQkxFIHQgKGlkIElOVCBQ UklNQVJZIEtFWSwgZmlsbGVyIFRFWFQpOwoKLS0gVGV4dCBhbmQgQ1NWLCBubyBzcGVjaWFsIGNo YXJhY3RlcnMKSU5TRVJUIElOVE8gdApTRUxFQ1QgcywgcmVwZWF0KCdBJywgNDA5NikKRlJPTSBn ZW5lcmF0ZV9zZXJpZXMoMSwgMTAwMDAwKSBBUyBzOwpDT1BZIHQgVE8gJ34vY29kaW5nL3Bvc3Rn cmVzL3RfNDA5Nl9ub25lLnR4dCcgKEZPUk1BVCB0ZXh0KTsKQ09QWSB0IFRPICd+L2NvZGluZy9w b3N0Z3Jlcy90XzQwOTZfbm9uZS5jc3YnIChGT1JNQVQgY3N2LCBRVU9URSAnIicpOwoKLS0gVGV4 dCwgd2l0aCB+MS8zIGVzY2FwZXMgKFwpClRSVU5DQVRFIHQ7CklOU0VSVCBJTlRPIHQKU0VMRUNU IHMsIHJlcGVhdCgnQVxBJywgMTM2NSkgLS0gMTM2NSAqIDMgPSA0MDk1IGJ5dGVzLCAxMzY1IFwg Y2hhcnMKRlJPTSBnZW5lcmF0ZV9zZXJpZXMoMSwgMTAwMDAwKSBBUyBzOwpDT1BZIHQgVE8gJ34v Y29kaW5nL3Bvc3RncmVzL3RfNDA5Nl9lc2NhcGUudHh0JyAoRk9STUFUIHRleHQpOwoKLS0gQ1NW LCB3aXRoIH4xLzMgcXVvdGVzICgiKQpUUlVOQ0FURSB0OwpJTlNFUlQgSU5UTyB0ClNFTEVDVCBz LCByZXBlYXQoJ0EiQScsIDEzNjUpIC0tIDEzNjUgKiAzID0gNDA5NSBieXRlcywgMTM2NSAiIGNo YXJzCkZST00gZ2VuZXJhdGVfc2VyaWVzKDEsIDEwMDAwMCkgQVMgczsKQ09QWSB0IFRPICd+L2Nv ZGluZy9wb3N0Z3Jlcy90XzQwOTZfcXVvdGUuY3N2JyAoRk9STUFUIGNzdiwgUVVPVEUgJyInKTsK CkRST1AgVEFCTEUgdDs= --0000000000007ff47f063c548798--