Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1upB6m-000QT2-RH for pgsql-hackers@arkaria.postgresql.org; Thu, 21 Aug 2025 19:38:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1upB5n-0018ra-5S for pgsql-hackers@arkaria.postgresql.org; Thu, 21 Aug 2025 19:36:59 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1upB5m-0018rS-Q0 for pgsql-hackers@lists.postgresql.org; Thu, 21 Aug 2025 19:36:59 +0000 Received: from mail-ed1-x531.google.com ([2a00:1450:4864:20::531]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1upB5k-0014Ng-1O for pgsql-hackers@postgresql.org; Thu, 21 Aug 2025 19:36:58 +0000 Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-6188b5be5deso2098226a12.0 for ; Thu, 21 Aug 2025 12:36:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=esi.dz; s=google; t=1755805015; x=1756409815; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=40fu1lZYFfQxANz4eIwuUZAJAg4xnXabPty6/IGymkY=; b=Lba3LFcQJ8ErIK2AOPKifURAtiXlPQm+J3oNjG/v0JV+NYa6B9frDvcUKWnEEqBvXa adCZq3ey023ArCc4z0YkVyvTdMmMbyzUhDwP8Y+Q1YYnxlwvgnCmgbsutOaPWyj10Xxo gRkJGY38f5ErutQZEcW4Q0Xj6m5PVXcvOF7iOC9zg5t5RE6djb6kDWWME4a9BQBfPpmq H3AFERxtqR+GhzgbtS5yv1vqFuacUCmOGs/l1ZYNWphEA9x+Zds5cZyh6dnbUKcJB/ha BS5AFz2jf6Uk1kc2BlRkKJKMmNTmCJl0Ff+sw6CDnM4qksFbfrIbnZFYCPaDtDFssdOf bukA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755805015; x=1756409815; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=40fu1lZYFfQxANz4eIwuUZAJAg4xnXabPty6/IGymkY=; b=F5ZR7QG8gYeoaFS2wOZzGOEUnwu96cA/5jQx2VaINkhS7Q0KuW9+g+OqiGnb2++J9l 4Dx5btqtkCh6mE/6tWcB+pnvpLIO4uqxn+8S/21C2F5q+PsjNjRq3fiGQnJYzRSjBgPX xYbTheHxROq4xi2MWNvfquXMyW31V8KExUkYcMVRbhpR37zqAyZmZjFEPXZE6iFNwzB+ kT370mxEkyqL2mqpT1nqs7/Ohxh8nUiJgNRzDnkpwKScTyo46DDKUZptBXJ3RBg1xIiK cOglQWeuk8X6aIR3XzpXAYuiX/hs2bA8RHzXCTLq7h711nRAdkAVfv22x3vxXPUa23vn x4ZA== X-Forwarded-Encrypted: i=1; AJvYcCUcBcH9f4hq0E4eMODJbrwUBpw3UzcXNd955om03NO+c5Jv/daO/TOqRzWiXXNJvaMzx/9Y63k1JfKpdlGz@postgresql.org X-Gm-Message-State: AOJu0YxV9ZKydgnZdVlS7E/GePvPQznw8pAjC3UfeErGJ15eRbndNqP8 3NLkNq5GBXofbNDEq7H8YddfZQoFMyjbaFJ9ndE1WPKozQG4Lhc0ETmtJLxDX4fGefXZ1VSZ3s/ OYI3HIEf1CJIUUeX/eZIRsgetS6w3QnovkKwaEdy/ X-Gm-Gg: ASbGncvP0sAYtQeF01TcN+FSDF53KrvxAK70by9GqCADaBs0nCE+/3Y+ChFnE1xvma0 2oZpP/cvz9jtCY2KOSBmxPxqEWUwmQVbwoGRvpLRgc9bYRuq7C4z94q5T93GVzKkanPoZtnvb0O RIyusIUaJ7FCG15m2aMEoRqvQEYAEDJ++yUPUxzvLP/4rwzrJez4+h7GB0GODOXUsNDzoUAGHF8 VC8cWRF X-Google-Smtp-Source: AGHT+IFkKpFFjqmoSwThyaNFPDD7EMbz3zv8PO7DdQqGZRUXFiQrf8KECHx71HIbglBXjjDws2e+w9D70gbjmBbmFRo= X-Received: by 2002:a05:6402:270c:b0:61b:fabb:6d0c with SMTP id 4fb4d7f45d1cf-61c1b3b66f8mr228079a12.1.1755805015303; Thu, 21 Aug 2025 12:36:55 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: KAZAR Ayoub Date: Thu, 21 Aug 2025 20:36:42 +0100 X-Gm-Features: Ac12FXxycHnr2_-CTPx8LpWJTCH_KgmBvB2JNzkMdDFwTg_9d8App21RGIwXXPQ Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nazir Bilal Yavuz Cc: Shinya Kato , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="0000000000003b79de063ce536ec" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000003b79de063ce536ec Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > On Thu, 14 Aug 2025 at 18:00, KAZAR Ayoub wrote: > >> Thanks for running that benchmark! Would you mind sharing a reproducer > >> for the regression you observed? > > > > Of course, I attached the sql to generate the text and csv test files. > > If having a 1/3 of line length of special characters can be an > exaggeration, something lower might still reproduce some regressions of > course for the same idea. > > Thank you so much! > > I am able to reproduce the regression you mentioned but both > regressions are %20 on my end. I found that (by experimenting) SIMD > causes a regression if it advances less than 5 characters. > > So, I implemented a small heuristic. It works like that: > > - If advance < 5 -> insert a sleep penalty (n cycles). > - Each time advance < 5, n is doubled. > - Each time advance =E2=89=A5 5, n is halved. > > I am sharing a POC patch to show heuristic, it can be applied on top > of v1-0001. Heuristic version has the same performance improvements > with the v1-0001 but the regression is %5 instead of %20 compared to > the master. > > -- > Regards, > Nazir Bilal Yavuz > Microsoft Yes this is good, i'm also getting about 5% regression only now. Regards, Ayoub Kazar --0000000000003b79de063ce536ec Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Thu, 14 A= ug 2025 at 18:00, KAZAR Ayoub <ma_kazar@esi.dz> wrote:
>> Thanks for running that benchmark! Would you mind sharing a reprod= ucer
>> for the regression you observed?
>
> Of course, I attached the sql to generate the text and csv test files.=
> If having a 1/3 of line length of special characters can be an exagger= ation, something lower might still reproduce some regressions of course for= the same idea.

Thank you so much!

I am able to reproduce the regression you mentioned but both
regressions are %20 on my end. I found that (by experimenting) SIMD
causes a regression if it advances less than 5 characters.

So, I implemented a small heuristic. It works like that:

- If advance < 5 -> insert a sleep penalty (n cycles).
- Each time advance < 5, n is doubled.
- Each time advance =E2=89=A5 5, n is halved.

I am sharing a POC patch to show heuristic, it can be applied on top
of v1-0001. Heuristic version has the same performance improvements
with the v1-0001 but the regression is %5 instead of %20 compared to
the master.

--
Regards,
Nazir Bilal Yavuz
Microsoft
Yes this is good, i'm also getting about 5% = regression only now.



Regards,
Ayoub Kazar
--0000000000003b79de063ce536ec--