Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vB5gO-00AO2k-SU for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Oct 2025 06:17:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vB5gN-006yQO-Q7 for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Oct 2025 06:17:18 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vB5gN-006yQG-EN for pgsql-hackers@lists.postgresql.org; Tue, 21 Oct 2025 06:17:18 +0000 Received: from mail-ed1-x52b.google.com ([2a00:1450:4864:20::52b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vB5gJ-003NPS-2p for pgsql-hackers@postgresql.org; Tue, 21 Oct 2025 06:17:18 +0000 Received: by mail-ed1-x52b.google.com with SMTP id 4fb4d7f45d1cf-63bdfd73e6eso10949900a12.0 for ; Mon, 20 Oct 2025 23:17:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=esi.dz; s=google; t=1761027433; x=1761632233; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZdkfSz3NM5nMz8DUpOVI/izQzj3xJNQvj8mxz8vqoLA=; b=fXYFK5eYZ/ZDIFpG2sXFnY3Q4xDNB8YrGZ0H4pnAJ5rZJeG9yR0maAKmP+5bzlGgKq Pt7bWPLN5epEruGhVCVq6RmqLiG9X3Q8q9EvrQ7NGGmdHNRZan5ovOf9tJ2swDPUHhbg 7pbCpS2CUWoacQDSRQbLHjV6pt4CK/4ZdtTFZ1VC3Gg9XtUpYKaEEblqRjdV2M+SFunw m7ML/B2VPonjtQJbBwFpZkpwJOjkWfJthtafoYig58Co0eC1u5FZUi9YHiKVvG+NKtyG 23FRuCko8GDRIhXRuZ/rm/LtYGTFz/kzR4aMerxL4A3DX6SBt7VJkbtTYEqkrBxRpfF7 25Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761027433; x=1761632233; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZdkfSz3NM5nMz8DUpOVI/izQzj3xJNQvj8mxz8vqoLA=; b=jSrzNwju1k6FiBsL1mXdMOHWFky/lNp9nBm/xjSz2ucfyXbeOlrzmy29Y8C5mQnm4o MLpD7lJkSD9GwFtkoTAmePGGFd380TinHakJfMgC/rnaWvjoYiKRxYW2LyFtR5FvJujZ nK/TcK5Lj7YMA0/OuBXl6TT20sL1wR9xA99XcdLmTf2IlISd5nZ0JmZvM1OO/C0rB6MW y2GjlELJ0UXCFlGLrgDeyBOzWW0IV9RKsNba2vsIdljdN0izp0htkNu7iRwYyhhRwnxm FqDeh5X0zP2z0dARO0Hes6LX6u3W9G4Z2bmHtQtbL2aG6Ykqmi2ktLgvuR/Rb58jxmf0 PxWw== X-Forwarded-Encrypted: i=1; AJvYcCW7o1xdyACxI6FuyiDjSqTP9PvTnrlAW+CX5wnL9TONTJbSb6Lu6dDOadqliyxZPvARkoIcXso1HTqgOlGa@postgresql.org X-Gm-Message-State: AOJu0Yxig05SNkM03bCIyJ233QYfUD8NRkBQwIefcRbRhOXvmuRYDCl0 zTQ0ZxjZs3K88Eb5X0fAilLEYUPqoEiBbguVAZ4JeR60iIk3+YdJqDY1Mx6cfxzKrX4cEM5ExVZ H5uIN7C14nn9FaP96Re2TOnkmLbzCvq2G7cTezZ/9 X-Gm-Gg: ASbGncvge22TYrzETILOzFSgyQMqx3Wm3w+vrtyDF/gUKz1JGRiygKv5oW1s80inosQ pW3c7X3sP5Gxsb/jx2tiutp51RuhoQ9nlV2DuzTXFFYRU9TJ2b89182ppz/e1lb1gBwMDxKGnDq fuTkmJDn/9hXtyjLh0TZn6Qz4y253RyrPOCPc1ZmXjWGsVeSIgmuc/WYP7XGbFgWeMDjjCchF9H tRVLpkrIURz3+Ntg77NbZbDKENNkBarbpZBJh2gKnbJOas/oHptDWMRLtzZkmARbpdKVVD1tC1z 1GMMc8lzuBMvNpCaqUJddLFI4TsoY9Fkbzcuy1hD X-Google-Smtp-Source: AGHT+IFIPshpuQ3MzxGCd9RZZX6BJYT/2DhTou6ZyDAss/XzSbqbXPKEvsYg53aDDY1zaOyK7dX4lGo3nNDhKFh+9IA= X-Received: by 2002:a05:6402:4316:b0:636:2699:3812 with SMTP id 4fb4d7f45d1cf-63c1e0c3341mr16002553a12.0.1761027432825; Mon, 20 Oct 2025 23:17:12 -0700 (PDT) MIME-Version: 1.0 References: <8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net> In-Reply-To: From: KAZAR Ayoub Date: Tue, 21 Oct 2025 08:17:01 +0200 X-Gm-Features: AS18NWCtWRfHRX2VJCdCqvpcwsufEniCaBUqROPAEcmxKDx19gXQ9uDD4sKwHy8 Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nazir Bilal Yavuz , "nathandbossart@gmail.com" , "ants.aasma@cybertec.at" Cc: Andrew Dunstan , Shinya Kato , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="00000000000092db450641a52696" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000092db450641a52696 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Oct 18, 2025 at 10:01=E2=80=AFPM Nazir Bilal Yavuz wrote: > Thank you so much for doing this! The results look nice, do you think > there are any other benchmarks that might be interesting to try? > > > I'm also trying the idea of doing SIMD inside quotes with prefix XOR > using carry less multiplication avoiding the slow path in all cases even > with weird looking input, but it needs to take into consideration the > availability of PCLMULQDQ instruction set with and here we > go, it quickly starts to become dirty OR we can wait for the decision to > start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2. > > I can not quite picture this, would you mind sharing a few examples or > patches? > The idea aims to avoid stopping at characters that are not actually special in their position (inside quote, escaped ..etc) This is done by creating a lot of masks from the original chunk, masks like: quote_mask, escape_mask, odd escape sequences mask ; from these we can deduce which quotes are not special to stop at Then for inside quotes, we aim to know which characters in our chunk are inside quotes (also keeping in track the previous chunk's quote state) and there's a clever/fast way to do it [1]. After this you start to match with LF and CR ..etc, all this while maintaining the state of what you've seen (the annoying part). At the end you only reach the scalar path advancing by the position of first real special character that requires special treatment. However, after trying to implement this on the existing pipeline way of COPY command [2] (broken hopeless try, but has the idea), It becomes very unreasonable for a lot of reasons: - It is very challenging to correctly handle commas inside quoted fields, and tracking quoted vs. unquoted state (especially across chunk boundaries, or with escaped quotes) .... - Using carry less multiplication (CLMUL) for prefix xor on a 16 bytes chunk is overkill for some architectures where PCLMULQDQ latency is high [3][4] to a point where it performs worse than an unrolled shifts + xor (5 cycles). - It starts to feel that handling these cases is inherently scalar, doing all that work for a 16 bytes chunk would be unreasonable since it's not free, compared to a simple help using SIMD and heuristic of Nazir which is way nicer in general. Currently we are at 200-400Mbps which isn't that terrible compared to production and non production grade parsers (of course we don't only parse in our case), also we are using SSE2 only so theoretically if we add support for avx later on we'll have even better numbers. Maybe more micro optimizations to the current heuristic can squeeze it more= . [1] https://branchfree.org/2019/03/06/code-fragment-finding-quote-pairs-with-ca= rry-less-multiply-pclmulqdq/ [2] https://github.com/AyoubKaz07/postgres/commit/73c6ecfedae4cce5c3f375fd6074b= 1ca9dfe1daf [3] https://agner.org/optimize/instruction_tables.pdf [4] https://www.uops.info/table.html Regards, Ayoub Kazar. --00000000000092db450641a52696 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sat, Oct 18, 2025= at 10:01=E2=80=AFPM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
Thank you so much for doing this! The results look nice, do you think
there are any other benchmarks that might be interesting to try?

> I'm also trying the idea of doing SIMD inside quotes with prefix X= OR using carry less multiplication avoiding the slow path in all cases even= with weird looking input, but it needs to take into consideration the avai= lability of PCLMULQDQ instruction set with <wmmintrin.h> and here we = go, it quickly starts to become dirty OR we can wait for the decision to st= art requiring x86-64-v2 or v3 which has SSE4.2 and AVX2.

I can not quite picture this, would you mind sharing a few examples or patc= hes?
The idea aims to avoid stopping at characters tha= t are not actually special in their position=C2=A0(inside quote, escaped ..= etc)
This is done by creating a lot of masks from the origina= l chunk, masks like: quote_mask, escape_mask, odd escape sequences mask ; f= rom these we can deduce which quotes are not special to stop at
Then for inside quotes, we aim to know which characters in our chunk are= inside quotes (also keeping in track the previous chunk's quote state)= and there's a clever/fast way to do it [1].
After this y= ou start to match with LF and CR ..etc, all this while maintaining the stat= e of what you've seen (the annoying part).
At the end you onl= y reach the scalar path advancing by the position of first real special cha= racter that requires special treatment.

However, after tr= ying to implement this on the existing pipeline way of COPY command [2] (br= oken hopeless=C2=A0try, but has the idea), It becomes very unreasonable for= a lot of reasons:
- It is very challenging to correctly hand= le=C2=A0commas inside quoted fields, and tracking quoted vs. unquoted state= =20 (especially across chunk boundaries, or with escaped quotes) ....
=
- Using carry less multiplication (CLMUL) for prefix xor on a 16 bytes= chunk is overkill for some architectures where=C2=A0PCLMULQDQ latency is h= igh [3][4] to a point where it performs worse than an unrolled shifts=C2=A0= + xor (5 cycles).
- It starts to feel that handling these cases i= s inherently scalar, doing all that work for a 16 bytes chunk would be unre= asonable since it's=C2=A0not free, compared to a simple help using SIMD= and heuristic of Nazir which is way nicer in general.

Currently we are at 200-400Mbps which isn't that terrible comp= ared to production and non production grade parsers (of course we don't= only parse in our case), also we are using SSE2 only so theoretically if w= e add support for avx later on we'll have even better numbers.
Maybe more micro optimizations to the current heuristic can squeeze i= t more.

--00000000000092db450641a52696--