Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0Ock-001s7y-0u for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 18:49:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0Oci-00BG67-2R for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 18:49:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0Oci-00BG5z-1P for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 18:49:36 +0000 Received: from mail-dy1-x1333.google.com ([2607:f8b0:4864:20::1333]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0Och-00000001eYa-0ltj for pgsql-hackers@postgresql.org; Wed, 11 Mar 2026 18:49:36 +0000 Received: by mail-dy1-x1333.google.com with SMTP id 5a478bee46e88-2b4520f6b32so391286eec.0 for ; Wed, 11 Mar 2026 11:49:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773254974; cv=none; d=google.com; s=arc-20240605; b=AbQA5jqXBEx7OCy4UJxkZIH5iXD5xDo/nhEdZzqtgHxqigCeunSMSmlqYHydJJZStt 2N24qXLd9li5zM8zYO8bsQZ/SaHGnw5yjxyCKeTYAx4F6jI+k3mFMZ/5v3vQrIqdK8Bp T+EEG7QRX7+m/WKKps2mKPug/H5MBagz/i10O+6xuNhoMH6gpjrdBZytgh6/BsHhM0oX 2zgq+7lUppSRMGjnfg5I1y6O8Cb1e9TcCh6EhgMboO/Im7/+Cm0JtKenmeIM8f8TTkPG NvJ5VKVLj2pAJHodSRrlL5kNrN4mtQV6BCcHYb7AeoOFyq2vapt1lFP1WVv1EXdG2y5S ppvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=4okqSeurn+3HBohjmQyCAtmOhDWb2MLHoOcBmyWuZDM=; fh=pz6y0z1UQLszy3eoVUWP0mTLQAubMogYbXrimpXGYi8=; b=Lp0QYRzR08bj396QcdvGUldOFTtq45a0F3hiOwABHOeB33nUi7DT1Cjd9YNSlorsA3 cXPLv65p4oJ0NUEK0Dziaxv7k33QymTAVzt2CegVFMMiqMU1DW0hOQ1CUlwNIV2acS2M /NZLM0RjMk9Qcq7Hf/4JPw2LZ/1oSFVyHu3PqwFjYAE73e6jzF1OtxMHz5/161/nT352 Rq2VCMpkt7uRrScXzUf1h4vZeAPOXGTyZ5ivgzs9aQlfKBF6u1iNzVjgnOe7NSZXRfKK DcLMZXJmBPTLsFAFq/GUXSFY95j7+WjF+yWhhIgle2cx5VWEe78F2bAdMIirPINAdD6T LZ0w==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773254974; x=1773859774; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4okqSeurn+3HBohjmQyCAtmOhDWb2MLHoOcBmyWuZDM=; b=JijL8rAZX+fKoILEikNyyrTIBMC62bwFZVrzudfvUaOF8c92DMXDrX1Tm55zKgH1SV PHE6GqX4F8FEhVdmbVQTKF05TrbJcYxkf2CwCOj4Wpmy5qhVFDJCo4WeZNupIse+VRME OBG7f08FuBBhl5bL+semHuNS68SvndbnPH+3ndgPGTvFC1JfkI3ZCfW8Ja+yfS+Wc1B2 vj9pxA1xw0+7WuhnP6M7ga+5Jdw4vVjkqj6uT6b0dK6DvuqBZupM2GbZSz7O6Jigc6h+ F9Meejs8DtCfyBfY6IFAE0e02Ely+dqBqdrTEI5Cf8SE9jBZ3iOpCgcTJzGv0u61JO7D R/nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773254974; x=1773859774; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4okqSeurn+3HBohjmQyCAtmOhDWb2MLHoOcBmyWuZDM=; b=WG2b2ywYqAvoKHFSqX1kUoFDF5IizWAS1eFvSKbb5R/t98mJtmsRnh3sVREjFKh6ep FcLvS7C6uT7qSg8iwwCXLr+5iOJH06PWrHUpN1TFekVyO+4rI/Jw8pr9nIbVLgmwDLjw iOBb/rL/wJIk+jYWTc5xuZcRv+LIoxAAIJBNb55xkonlTk0XUX3tRoRtUHQfn50VB8kO b32SjtRT1+XtqNXcopq6WBYy/svcwukLzU5VZw1YW1wIhEP8UgnkFaBfampNoVmO5jkL T1ka+8gdQoIX00NjxrtTvkH+bPOwPVtt37oDKgN5oEZQvNMfsF1gKjHGm/CyXedYWD2d VB9g== X-Forwarded-Encrypted: i=1; AJvYcCWOROP4Y7dkm208i737lpi7GioE1IVBD6nI8sUzvpduugLU4Xfqz6eQJ3asD+zsH3MdqpTPABvJUv4YGsFM@postgresql.org X-Gm-Message-State: AOJu0YzxqOp1WKGWYaNMmVlQFA2xpzcnT7oEL6clcWXVzSiFYBJGphcI tb9d0pPlC35jQ0gZtx0pU0q6U2EOO7h0BHpYDDFc3CE1NAVC89UTOfyfFSkzzfF/s7vGDQmodtq b2yDdRx0os/DJXl2dunHOQfoEq7kdF70= X-Gm-Gg: ATEYQzxTRiDlQZd3IbH3xSg9m3MI0hIm7Ut1fFbeCz4sBOWrPcXljHQhDIWCgNUFrc+ R0rZzjUpxmZTyQ9b8mfR4f/fovvtnjX+H6j2KTUSYKzGyNWATEkYpMThBpXSViJ92s/Qz6dX/lp 13VJmlRsJKEaWdCLI9qAJifzhbeFsdOiAljDwirRqskhVbep0+llS51Qx0SEGehLTKkJ1tqiYW/ 8z8/piBhX9tTTau+9a+z5HwDZuXvrhDsno2UZoRnKmd1RnjKBmYcEwjBP7Yf5BImTdyDfPvSWWd 1wvvUZ4= X-Received: by 2002:a05:7300:3213:b0:2be:2964:44c3 with SMTP id 5a478bee46e88-2be8a227034mr1521131eec.10.1773254974369; Wed, 11 Mar 2026 11:49:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Nazir Bilal Yavuz Date: Wed, 11 Mar 2026 21:49:22 +0300 X-Gm-Features: AaiRm53F84wSssFp2c1Jnror8o0nq6kbW6SKtACHjtRX9JDdqVPdsMM2yeyhgWA Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nathan Bossart Cc: Manni Wood , KAZAR Ayoub , Neil Conway , Andrew Dunstan , Shinya Kato , PostgreSQL-development Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On Wed, 11 Mar 2026 at 21:09, Nathan Bossart wrote: > > On Wed, Mar 11, 2026 at 02:36:46PM +0300, Nazir Bilal Yavuz wrote: > > 0002 has an attempt to remove some branches from SIMD code but since > > it is kind of functional change, I wanted to attach that as another > > patch. I think we can apply some parts of this, if not all. > > Could you describe what this is doing and what the performance impact is? SIMD code check these characters: csv mode: nl, cr, quote and possibly escape. text mode: nl, cr and bs. v12 checks them like that: if (is_csv) { match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); match = vector8_or(match, vector8_eq(chunk, quote)); if (unique_escapec) match = vector8_or(match, vector8_eq(chunk, escape)); } else { match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); match = vector8_or(match, vector8_eq(chunk, bs)); } But actually we know that we will definitely check nl, cr and one of the quote or bs characters in the code. So, we can introduce a new variable named bs_or_quote, it will be equal to bs if the mode is text and it will be equal to quote if the mode is csv. Then, we can remove the 'if (is_csv)' check and only check for escape ('if (unique_escapec)'). Now code will look like that: match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); match = vector8_or(match, vector8_eq(chunk, bs_or_quote)); if (unique_escapec) match = vector8_or(match, vector8_eq(chunk, escape)); That is what v13-0002 does. I saw 1%-2% speedups with this change and there was no regression. Regardless of introducing the bs_or_quote variable, we can move 'match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));' outside of the if checks, though. -- Regards, Nazir Bilal Yavuz Microsoft