Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8MSR-000SV7-1j for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 18:07:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8MSQ-007LYs-0l for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 18:07:54 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8MSP-007LYk-2s for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 18:07:54 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w8MSN-00000000EyU-2hNa for pgsql-hackers@postgresql.org; Thu, 02 Apr 2026 18:07:53 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-66d24c6963bso2296559a12.1 for ; Thu, 02 Apr 2026 11:07:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775153270; cv=none; d=google.com; s=arc-20240605; b=B2cO7WwyztZkOYiiVaLuEfO8RR9otSpr/CfngNz8hSt3F8L1crr5vX6zdMXrSksK18 QheW6ZEeyrUTzxE2DCMzc2DpANq9LovIIs8g6DNbpk3EcJP66n/6hk/twespQ+SzvAm1 Zv+XZxUY24QoITJ8CeOxHtmlxVB5BZPeUVYTwxVed+5PjriwEPOti0/SCYnhzP+vVpDQ LJE1KkI9x55H5cF5fyUmv2rySxcfod/LVAlZx1s8rHX8iXps5e4WtL+HehPLksT3P9tz qz31w0Ap2raomQLxqTDwmaaRNzt+lUsVcPgeFijtVT5p02GoI8WSY4wLbCG2YE9P4g7H /wfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=NrqHJ7uMwEg7vA6b398hwMxRN5z10mEHjv3NE148FF0=; fh=SJePYpPBoJCmh5QDWxTTq7D1D8AE5Jt8kiSPj2F6K/I=; b=WfMKXvZWt7HGL+Zjc8qoeAZ9EBR61Dz5lyTlo7v6NlGY+vFaqaBFU08CE8/7Nwin4E ilgzvmWjoG51hFQKgGqazeGR7ejEMIFSgC2OqJHub1Agw/vxpE5bNiY//JBzmNZwE+/f t0M6zNaLXVnTUyuVfF05Df1C284Lf4nGTOQw8nnNuhpD20PkRASQP3wK+rbFMt1HsRBT lb3NG0MI9c/EvcJENxO3bBPf6HSdup5KAhvfxgvW4E0FW5yov8AjNgVPV3qmUqNqJ8zh rOqjZfCkdmnnUN75YE7uTwWqrufpj2mciY7DJYUoxEiii5fhOEdVS+aFkbipdEzsfteD 3qmw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=esi.dz; s=google; t=1775153270; x=1775758070; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=NrqHJ7uMwEg7vA6b398hwMxRN5z10mEHjv3NE148FF0=; b=XYyQlhcP60FcDPR1y+P/VoHyaYgQgyLEPqiQNtmALVxzBcKsasb6YMoqEBsX/zO6UL j4HK2Ws/1Vol9camob+tXW1s5lt2uPFTFtqD8Zz1ita+/JmFhQsjts97p7I9IWcnVL0/ E4+sBv2uzJyLrYhKzkUxcX0tzLTvhT1Bdcv8DRbv4vBULkMm+UnlJJodG/bxqyM8jnQk IP5NJqk5iaLAYOzHjz4LsD6PjTU82/kSr8zZV60uxPjHvUifKtiZmKrhBwhzcWnIhOLe Fy3Z+Hhgxp0GRpbOIzkWD0piXkH13soCufQd3e63OW6ceU3Hokfu6qNopog/FpvOCkXx FCVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775153270; x=1775758070; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NrqHJ7uMwEg7vA6b398hwMxRN5z10mEHjv3NE148FF0=; b=lfHumsMLKcR16CUXLc0KRsJ/PxEhHW8jKKJpbvUgel/7JmIHfz4OerRJ/rfeETQdyj w93SuMbQRkqlI2PjtYj4fo4yLKFH/SDHDqJ4eyOAgrb2idW06QVHElmAhBj5fLSuHXqY 6IzWUVAtuxr8WlKJ7T9Lc3O/YPq4hM2O+nNb/uTnGtPuD9kHBotqFbKz9Px7nhKqNfok EWvTnJZtrvglXiaI8q1csuUmFdgWMXc6BH3dQjXbKGoz88BnSjcnWKJ+NQzIGnlbXKVN MSqXwMkA3aBrxjdgcnxQPoyUW2knftqg9ba/qtzCd5Lxyky0v5biOMBxY5Z+AExpbHKt 8SGw== X-Forwarded-Encrypted: i=1; AJvYcCXhqhlsBeervN4thk91PYI2asP+bQjdhoyNYY7WgwwQIOhAg0hGkXFB2hcpgEnFfYkblrdf4Jily7EXYDZF@postgresql.org X-Gm-Message-State: AOJu0YyQYpazHVpGP02nHkhpW/k/eqMGLJJpxgF3hFkwQxPOGcSs9Uy5 dl7zNfZ1inNav1R+mdHA9mdKV/ww48c4aWIEqsdVPjRr8Mv3mAdeAcoSUXjux2rBxxOYPvLJXcx hnLsvxuCE9XARYY95y1xvVXninP9i39ljBPw+DA8046tt6zuyTm5/tw== X-Gm-Gg: AeBDiesfgFPdwhVnwTsgz+9Z6yCHeKhZj9rQrBzvTyEMSeTHv7dEo/G9txnuCCFmywe cvO+FeTu3AHUEvqEEtbd6McmJOfbCd+Mv01qx8E0v6X6vw7Zp3JdPiyYFVa6FoRqRpeYxHnxd3x OeQTEVXSdM2NSWXLGX5ASMpReWGRa4K1bZapnPzlnC+rkaMXprb4VQkJXsFa1eEkWgd9TKHJAsO yZRoZ8BbbCaoA9YzIuwGYkawzMidl9EmK4NvOG9iX2U6uWQJMVu0fCaekMHYzySxYDTPlEcr7dZ X11Xx5aV+S+baYN+d6jI3dh6Sti0evrHmwJY2SV1/K/wS21FcepMPX6hMuVd+eQhgvtZ8GOINSa F+6Cph2UFZC0/S8vNwtWwM+ju3BQ= X-Received: by 2002:a05:6402:3717:b0:66e:2fa6:50d0 with SMTP id 4fb4d7f45d1cf-66e3f3c57a3mr26594a12.5.1775153270270; Thu, 02 Apr 2026 11:07:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: KAZAR Ayoub Date: Thu, 2 Apr 2026 20:07:38 +0200 X-Gm-Features: AQROBzAdTtexMkX5I8WsGUc8UQ53TH5HhV73pOmLuvyPLXY_ZiVRGcIAecRPEG0 Message-ID: Subject: Re: Speed up COPY TO text/CSV parsing using SIMD To: Nathan Bossart Cc: Andres Freund , Pg Hackers , Neil Conway , Manni Wood , Andrew Dunstan , Shinya Kato , Mark Wong , Nazir Bilal Yavuz Content-Type: multipart/alternative; boundary="00000000000018bee7064e7e1482" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000018bee7064e7e1482 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Mar 31, 2026 at 6:30=E2=80=AFPM Nathan Bossart wrote: > On Fri, Mar 27, 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote: > > I added a prescan loop inside the simd helpers trying to catch special > > chars in sizeof(Vector8) characters, i measured how good is this at > > reducing the overhead of starting simd and exiting at first vector: > > the scalar loop is better than SIMD for one vector if it finds a specia= l > > character before 6th character, worst case is not a clean vector, where > the > > scalar loop needs 20 more cycles compared to SIMD. > > This helps mitigate the case of JSON(B) in CSV format, this is why I on= ly > > added this for CSV case only. > > Interesting. > > > In a benchmark with 10M early SIMD exit like the JSONB case, the previo= us > > 3% regression is gone. > > While these are nice results, I think it's best that we target v20 for th= is > patch so that we have more time to benchmark and explore edge cases. > Thanks for the review. Fair enough, I'll try many more cases in the upcoming weeks to make sure we're not missing anything. > > -- > nathan Regards, Ayoub --00000000000018bee7064e7e1482 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, Mar 31, 2026 at 6:30=E2=80=AFPM N= athan Bossart <nathandbossar= t@gmail.com> wrote:
On Fri, Mar 27,= 2026 at 07:48:38PM +0100, KAZAR Ayoub wrote:
> I added a prescan loop inside the simd helpers trying to catch special=
> chars in sizeof(Vector8) characters, i measured how good is this at > reducing the overhead of starting simd and exiting at first vector: > the scalar loop is better than SIMD for one vector if it finds a speci= al
> character before 6th character, worst case is not a clean vector, wher= e the
> scalar loop needs 20 more cycles compared to SIMD.
> This helps mitigate the case of JSON(B) in CSV format, this is why I o= nly
> added this for CSV case only.

Interesting.

> In a benchmark with 10M early SIMD exit like the JSONB case, the previ= ous
> 3% regression is gone.

While these are nice results, I think it's best that we target v20 for = this
patch so that we have more time to benchmark and explore edge cases.
Thanks for the review.
Fair enough, I'll try m= any more cases in the upcoming weeks=C2=A0to make sure we're not missin= g anything.

--
nathan
Regards,
Ayoub
--00000000000018bee7064e7e1482--