Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vaGec-0069DR-2n for pgsql-hackers@arkaria.postgresql.org; Mon, 29 Dec 2025 17:03:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vaGeb-00HXRa-1s for pgsql-hackers@arkaria.postgresql.org; Mon, 29 Dec 2025 17:03:34 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vaGeb-00HXRQ-0Q for pgsql-hackers@lists.postgresql.org; Mon, 29 Dec 2025 17:03:34 +0000 Received: from mail-pg1-x532.google.com ([2607:f8b0:4864:20::532]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vaGeZ-003Cqs-19 for pgsql-hackers@postgresql.org; Mon, 29 Dec 2025 17:03:32 +0000 Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-c1e7cdf0905so6720765a12.0 for ; Mon, 29 Dec 2025 09:03:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1767027809; x=1767632609; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FCfqsrkR5OSQBPFTux8QXY5XXRc+//yEPHQhJzw/DZo=; b=WT654xbJOeDyT0aSVKM/dCLd3vtDH2ApEVnm1cJXSF5NO8ZjeBXGq8r+g7ihDqZQmW eoP/dzET8XvlMJaHjGDsNHjG54GzqmRqefZl5rGd5AdbUJDS+pPVUINFL+md8+vCBRJv kKLnCWA5rZ5zzfPZGIFook424PACrqk5+iwQFb/VEnlfvS7dZ0AfB6HiW18SKDV6M01j aLbZRJuJn1UyTHbcoawiZkzuEFzkqNMK3l+gWDi+VgwHYyOFzRsVJsvSAjrOvbl+lKP+ 2bpxaxTpAFDceY95Jf4ZM0Rfs/lDnPkTRZdg7oq3v0DJQnnMsDYf1yeaDWoLavKcZA/4 ZjRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767027809; x=1767632609; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=FCfqsrkR5OSQBPFTux8QXY5XXRc+//yEPHQhJzw/DZo=; b=M4JPQmEdvPR9aGz2ChGp1etJn3ea0oXhId234cfu9Nb2Z7c9jW9Jr0d/FVQ+u0WuD7 flWal7fhgQESvCv4BXbnRpRw67gUL/qtVQ045nNe/KwdP8pwt7J5RL/gE3kDOHtLG55x /SWYmktO59w1JLpu4CI/nIJBzb1oSgkVlRAZcCNbf5STSs57/C8ibrl/PGLYNDqsRGRv e38rDkwABWl6zI9A2jmPI8gwIUWhC2yAnrg7yDqhHqTtDn21l+Ko1Ic3oIMDZSVBx/7f VToKEvdSLAZdGNWCmFm7iudORsqDFgILc8dIiTfO1v1a8CH+7/07w4+0AFL1oYpo0BJz a/Rw== X-Forwarded-Encrypted: i=1; AJvYcCWW3UTUFlI5Ho3w4jqxW3XBQTbGdu2wTOHTo7RkU1bG6HBypHu8apdv19KiVWyqo0CRkAKknBZERlfYIJz+@postgresql.org X-Gm-Message-State: AOJu0YwybKVXSDQjbNfplczAJZ9FVjNYWHGFGlJyqyXpRhC82xBkxOog VXbbiCYccNOHs5za39I6XDw2JBvorkMJLLL6VhiEDnnWotpefpcUQTu6otFDqoXceACvy8l7UXF jGrIAUhf+JJcXxC+dTt77j0RD2SrpxzRi35Kb1ggV X-Gm-Gg: AY/fxX5ohmxcad9CYm1MRe0BGpjhufs9SeKLsA+6ZAWROo6xOrIfnysLVi2plrbs4P/ rKpGwJlQ10nHIsMP2r8kACmLaPR9euHxHKbAp35gVXiPAj54sxu2u2v1bcPBK8yU4W8bZkF52uE HjA9DBiD6JKpJ72XA74mhMQ2jB61JVQWrg71Gs8uMe2h/8pt0WJEfyRzU67BH6D7xWgh86VedE2 4MeLw12w7VUun4tKlrsH4wQh6KkT8DFMomKE5MChuGoUHeeBc2QVp8Rzeg4uEeOmDIDRBxKvb3U cwfCYEM= X-Google-Smtp-Source: AGHT+IHnCU3hW/TfaGH62oEIsQE8dddku+yZjQGsC3hZl60sYOhqRWn9aSfSQ6iTtNXwyglOr0TeFeynzvQnuP3xR2Y= X-Received: by 2002:a05:7300:5b88:b0:2b0:6a03:e68b with SMTP id 5a478bee46e88-2b06a046979mr27847067eec.13.1767027809359; Mon, 29 Dec 2025 09:03:29 -0800 (PST) MIME-Version: 1.0 References: <8e226753-57af-489a-bfbe-caa23dd71286@dunslane.net> In-Reply-To: From: Manni Wood Date: Mon, 29 Dec 2025 11:03:17 -0600 X-Gm-Features: AQt7F2rOjN1CscOOlSYUSzMoIcRFxEBY7MEwbWBIKXPuE5dbGIETaOGLgXpARCs Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: KAZAR Ayoub Cc: Nazir Bilal Yavuz , Mark Wong , Nathan Bossart , Andrew Dunstan , Shinya Kato , PostgreSQL-development Content-Type: multipart/alternative; boundary="000000000000e2a83806471a3893" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e2a83806471a3893 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Dec 24, 2025 at 9:08=E2=80=AFAM KAZAR Ayoub wrote= : > Hello, > Following the same path of optimizing COPY FROM using SIMD, i found that > COPY TO can also benefit from this. > > I attached a small patch that uses SIMD to skip data and advance as far a= s > the first special character is found, then fallback to scalar processing > for that character and re-enter the SIMD path again... > There's two ways to do this: > 1) Essentially we do SIMD until we find a special character, then continu= e > scalar path without re-entering SIMD again. > - This gives from 10% to 30% speedups depending on the weight of special > characters in the attribute, we don't lose anything here since it advance= s > with SIMD until it can't (using the previous scripts: 1/3, 2/3 specials > chars). > > 2) Do SIMD path, then use scalar path when we hit a special character, > keep re-entering the SIMD path each time. > - This is equivalent to the COPY FROM story, we'll need to find the same > heuristic to use for both COPY FROM/TO to reduce the regressions (same > regressions: around from 20% to 30% with 1/3, 2/3 specials chars). > > Something else to note is that the scalar path for COPY TO isn't as heavy > as the state machine in COPY FROM. > > So if we find the sweet spot for the heuristic, doing the same for COPY T= O > will be trivial and always beneficial. > Attached is 0004 which is option 1 (SIMD without re-entering), 0005 is th= e > second one. > > > Regards, > Ayoub > Hello, Nazir and Ayoub! Nazir, sorry for the late reply, I am on holiday. :-) I wanted to thank you for the tips on using cpupower to get less variance in my test results. Ayoub, I suppose it was inevitable the SIMD patch would work for copying out as well as copying in! I am back at work on 5 Jan 2026, so I till try to carve out time to test this then, using Nazir's tips. Happy Holidays! -Manni --=20 -- Manni Wood EDB: https://www.enterprisedb.com --000000000000e2a83806471a3893 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Wed, Dec 24,= 2025 at 9:08=E2=80=AFAM KAZAR Ayoub <ma_kazar@esi.dz> wrote:
Hello,
Following the sa= me path of optimizing COPY FROM using SIMD, i found that COPY TO can also b= enefit from this.

I attached a small patch that us= es SIMD to skip data and advance as far as the first special character is f= ound, then fallback to scalar processing for that character and re-enter th= e SIMD path again...
There's two ways to do this:
1) Essentially we do SIMD until we find a special character, then co= ntinue scalar path without re-entering SIMD again.
- This giv= es from 10% to 30% speedups depending on the weight of special characters i= n the attribute, we don't lose anything here since it advances with SIM= D until it can't (using the previous scripts: 1/3, 2/3 specials chars).=

2) Do SIMD path, then use scalar path when we hit= a special character, keep re-entering the SIMD path each time.
- This is equivalent to the COPY FROM story, we'll need to find the = same heuristic to use for both COPY FROM/TO to reduce the regressions (same= regressions: around from 20% to 30% with 1/3, 2/3 specials chars).

Something else to note is that the scalar path for COPY T= O isn't as heavy as the state machine in COPY FROM.

So if we find the sweet spot for the heuristic, doing the same for CO= PY TO will be trivial and always beneficial.
Attached is 0004 whi= ch is option 1 (SIMD without re-entering),=C2=A00005 is the second one.
=

Regards,
Ayoub

Hello, Nazir and Ayoub= !

Nazir, sorry for the late reply, I am on holiday. :-) = I wanted to thank you for the tips on using cpupower to get less variance i= n my test results.

Ayoub, I suppose it was inevita= ble the SIMD patch would work for copying out as well as copying in!
<= div>
I am back at work on 5 Jan 2026, so I till try to carve = out time to test this then, using Nazir's tips.

Happy Holidays!

-Manni
--
-- Manni Wood EDB: https://www.enterprisedb.com
--000000000000e2a83806471a3893--