Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJC57-00E6O2-0S for pgsql-hackers@arkaria.postgresql.org; Wed, 12 Nov 2025 14:44:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vJC54-00C7hR-1g for pgsql-hackers@arkaria.postgresql.org; Wed, 12 Nov 2025 14:44:18 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJC54-00C7hJ-0V for pgsql-hackers@lists.postgresql.org; Wed, 12 Nov 2025 14:44:18 +0000 Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vJC51-007MNW-0v for pgsql-hackers@postgresql.org; Wed, 12 Nov 2025 14:44:17 +0000 Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-64166a57f3bso1393202a12.1 for ; Wed, 12 Nov 2025 06:44:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=esi.dz; s=google; t=1762958654; x=1763563454; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=9QYAMZ1k2GS45+WYcNjF6JTEbkz4XsNVa9nHw47MLB0=; b=JlzpB8l9O8Qy1EDsA+M8FoMDtNv8co5sdRosi3080Ui9FvrYqVLlqwR9KY0kzDNQi9 5DxJo2y0xouH1h4bMXhDU0mDoQEn2OAmVse8OfZpzkOWKas0uQIFylB2B5LNha0/wTmT xwD67chT/y2fNQxrlH7j1MftNA2F4AGN2u0Mj8EsYxRD+F+Tv/okxqe/oS/fxNvmkffA SVNtUiHu7p29Af+Gyvc0cQgmpE4IDrp+YBPE9z6raRTc6/lsFsIq6aUWA4HcM+WL8vbi LCDvPFD62xcfI3SlvqIsu49GhbfA92niOHwRAXbJ8yFxxxgJHd22E5+8KF55j+gx7SL7 Gz4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762958654; x=1763563454; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9QYAMZ1k2GS45+WYcNjF6JTEbkz4XsNVa9nHw47MLB0=; b=hEunnKx1gjG5gZGpD8PSQvbHnUVi2w61xk7H0Zbjonv1JgndQfGgWl5p9ittdDwG+h S1AjRewzQuYqt2LgGcmgTdiqwU/0tQjemTskyjeecpqr/tgqGrCP/FJf+OvoveUvb45w FHGpmEceC4rQslI/hxIbZ8Va5wMEMrXPWut2vFqM4oxev6h5DN4NnO3smO3C01Z5RdBK EvGp/2nKYNnpNeSks/Z2YKWu8Q2SNCW+xO+MCboBQbuxVRMDr3EbJ0r4yjRyyPDEnyAf j0FljwOCaXK0vQ6NCe2R6At9ruBADz8PjCltQt4VKQNR8jVYzmFBFCKzKlz/uVn9KXW1 f8fQ== X-Forwarded-Encrypted: i=1; AJvYcCWTaagBToYa7Fe4MxdHzLsTfg+kI+arSvEGVWFIGkC3vTFhyP7uWyq+i7z3xAh6Jja4D5c0e58FnBw8/5fP@postgresql.org X-Gm-Message-State: AOJu0YxXpRIFS7i0aGv2f2Rpo6ki5qgQw6xFkRUDQvcM4JVhPF3V41yv JSVyRbKNDCIwlrfc65Vhm7xC939s7nXDvimgxyvEd+ngxQstoA8/actl4vIbgB9cM5AgiJPMcjh 1jDuthVvK5xV10m3AzTnThrZuul6f5/PPXfHpmiyF X-Gm-Gg: ASbGncvi2n4LIaxhOWAcjSzSUUJttP7ICSePTQKsitj7aGcmZvUcAVqQAKAip87UMZu kYY7AHiurlQxB1y3WYh1OjX34cR+8hZhQlXSkWABBGlmyqMusjp3MXRCv6Pd5CRBaxMiQdLtKnH b00fUJvE2WjsDkHh3uoHWdOGLp43tcYFedLYsq4T/Kxy2xK9TiiHogVVtFO1PGM98A1g/4FgKiG RK4gB3wcME3vRBbG6W047Xr61918N4LJr+E5h5Kzcj0NzpDACfWLfggi6QG X-Google-Smtp-Source: AGHT+IGBTUsF+W9HYji7H7pfp1hns/jWRG7LALdn/UEqH5FJzTTqgIwOKHQ+OSLgvOMzDRFpM29sXwq1/NiWKFR9j7M= X-Received: by 2002:a05:6402:40cd:b0:640:abd5:8642 with SMTP id 4fb4d7f45d1cf-6431a52a420mr2754983a12.21.1762958653956; Wed, 12 Nov 2025 06:44:13 -0800 (PST) MIME-Version: 1.0 References: <8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net> <673d92f7-2489-475f-a208-9414ea35d4d8@dunslane.net> <8e045899-2023-48b1-bd91-f8cdffeb511d@dunslane.net> <5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net> In-Reply-To: From: KAZAR Ayoub Date: Wed, 12 Nov 2025 15:44:02 +0100 X-Gm-Features: AWmQ_bnMvHHUt0eDTLBgvLQR0Qm0nyUeKx7sM6_wZtOYIeTcOijXcj7t_RKFVtE Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Manni Wood Cc: Andrew Dunstan , Nathan Bossart , Nazir Bilal Yavuz , Shinya Kato , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="00000000000052b976064366ccf2" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000052b976064366ccf2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Nov 11, 2025 at 11:23=E2=80=AFPM Manni Wood wrote: > Hello! > > I wanted reproduce the results using files attached by Shinya Kato and > Ayoub Kazar. I installed a postgres compiled from master, and then I > installed a postgres built from master plus Nazir Bilal Yavuz's v3 patche= s > applied. > > The master+v3patches postgres naturally performed better on copying into > the database: anywhere from 11% better for the t.csv file produced by > Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file > created by Ayoub Kazar's simd-copy-from-bench.sql. > > But here's where it gets weird. The two files created by Ayoub Kazar's > simd-copy-from-bench.sql that are supposed to be slower, t_4096_escape.tx= t, > and t_4096_quote.csv, actually ran faster on my machine, by 11% and 5% > respectively. > > This seems impossible. > > A few things I should note: > > I timed the commands using the Unix time command, like so: > > time psql -X -U mwood -h localhost -d postgres -c '\copy t from > /tmp/t_4096_escape.txt' > > For each file, I timed the copy 6 times and took the average. > > This was done on my work Linux machine while also running Chrome and an > Open Office spreadsheet; not a dedicated machine only running postgres. > Hello, I think if you do a perf benchmark (if it still reproduces) it would probably be possible to explain why it's performing like that looking at the CPI and other metrics and compare it to my findings. What i also suggest is to make the data close even closer to the worst case i.e: more special characters where it hurts the switching between SIMD and scalar processing (in simd-copy-from-bench.sql file), if still does a good job then there's something to look at. > > > All of the copy results took between 4.5 seconds (Shinyo's t.csv copied > into postgres compiled from master) to 2 seconds (Ayoub > Kazar's t_4096_none.csv copied into postgres compiled from master plus > Nazir's v3 patches). > > Perhaps I need to fiddle with the provided SQL to produce larger files to > get longer run times? Maybe sub-second differences won't tell as > interesting a story as minutes-long copy commands? > I did try it on some GBs (around 2-5GB only), the differences were not that much, but if you can run this on more GBs (at least 10GB) it would be good to look at, although i don't suspect anything interesting since the shape of data is the same for the totality of the COPY. > > Thanks for reading this. > -- > -- Manni Wood EDB: https://www.enterprisedb.com > Thanks for the info. Regards, Ayoub Kazar. --00000000000052b976064366ccf2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, Nov 11, 2025 at 11:23=E2=80=AFPM = Manni Wood <manni.wood@en= terprisedb.com> wrote:
Hello!

I wanted reproduce th= e results using files attached by Shinya Kato and Ayoub Kazar. I installed = a postgres compiled from master, and then I installed a postgres built from= master plus Nazir Bilal Yavuz's v3 patches applied.

The master+v3patches postgres naturally performed better on copying = into the database: anywhere from 11% better for the t.csv file produced by = Shinyo's test.sql, to 35% better copying in the t_4096_none.csv file cr= eated by=C2=A0Ayoub Kazar's simd-copy-from-bench.sql.

But here's where it gets weird. The two files created by Ayoub = Kazar's simd-copy-from-bench.sql that are supposed to be slower,=C2=A0t= _4096_escape.txt, and=C2=A0t_4096_quote.csv, actually ran faster on my mach= ine, by 11% and 5% respectively.

This seems imposs= ible.

A few things I should note:

I timed the commands using the Unix time command, like so:

time psql -X -U mwood -h localhost -d postgres -c '\c= opy t from /tmp/t_4096_escape.txt'

For each fi= le, I timed the copy 6 times and took the average.

This was done on my work Linux machine while also running Chrome and an Op= en Office spreadsheet; not a dedicated machine only running postgres.
=
Hello,
I think if you do a perf bench= mark (if it still reproduces) it would probably be possible to explain why = it's performing like that looking at the CPI and other metrics and comp= are it to my findings.
What i also suggest is to make the data cl= ose even closer to the worst case i.e: more special characters where it hur= ts the switching between SIMD and scalar processing (in simd-copy-from-benc= h.sql file), if still does a good job then there's something to look at= .
<= div>=C2=A0

All of the copy results too= k between 4.5 seconds (Shinyo's t.csv copied into postgres compiled fro= m master) to 2 seconds (Ayoub Kazar's=C2=A0t_4096_none.csv copied into = postgres compiled from master plus Nazir's v3 patches).

<= /div>
Perhaps I need to fiddle with the provided SQL to produce larger = files to get longer run times? Maybe sub-second differences won't tell = as interesting a story as minutes-long copy commands?
I did try it on some GBs (around 2-5GB only), the differences were = not that much, but if you can run this on more GBs (at least 10GB) it would= be good to look at, although i don't suspect anything interesting sinc= e the shape of data is the same for the totality of the COPY.

<= div>Thanks for reading this.
--=
-- Manni Wood EDB: https://www.enterprisedb.com
Th= anks for the info.


Regards,
Ayoub Kazar.=C2=A0
--00000000000052b976064366ccf2--