Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vr2I3-005Jon-0B for pgsql-hackers@arkaria.postgresql.org; Fri, 13 Feb 2026 23:09:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vr2I1-00Ganv-22 for pgsql-hackers@arkaria.postgresql.org; Fri, 13 Feb 2026 23:09:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vr2I1-00Ganm-18 for pgsql-hackers@lists.postgresql.org; Fri, 13 Feb 2026 23:09:33 +0000 Received: from mail-oo1-xc33.google.com ([2607:f8b0:4864:20::c33]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vr2Hz-00000000Ywv-0QUa for pgsql-hackers@postgresql.org; Fri, 13 Feb 2026 23:09:32 +0000 Received: by mail-oo1-xc33.google.com with SMTP id 006d021491bc7-67749cd2adeso637993eaf.2 for ; Fri, 13 Feb 2026 15:09:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771024169; x=1771628969; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=kuHBZ4PZswjP6JPYA+JxB2GNFrzi14Fbj6kHAyMGdmY=; b=IpkmjNfnG3qygHoc/rkaxGlAdtszLGoSDg+GGyNiqwxGq+fO7S/echjiR49RRuiu+b NpNQX4ambrNjnWyHzGlpw3MISdrWAVzVCQba55i+LwDA/+J4mWTFqCAMz79NAkdRKCUL 0jVgdWLrbuG/Zsi/3DSLH+06JFEZv4PMWfBLKq1ZOriPX1C2z89C6lC/RA80C0cXEFey e4/z32MKASubQUrxvzCTAmllDuKU6Rd/JRQeBx7Hatwt8PVwmPdVjy6cUOwwLd4n48rd Gt7qOWL19os8nCI1j1o5VepFK/5PFC4HuUgFv4RoUl4Srd+ftNCd9fvvEZxOUfOf3/O8 dvyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771024169; x=1771628969; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kuHBZ4PZswjP6JPYA+JxB2GNFrzi14Fbj6kHAyMGdmY=; b=ilFPN3OYoy9kVCV8SDAZmJCZ6XzXDLplctwIHCQLv9z6h6ObMsCMDYCyVUioG2LCCW VI/q/3TUY8yq0rLBvzIQkeAE6FCNWhrCjuW85pTJ6XQBbEJXMtrggNgIK+Qz0+shcZnc 6KLZDgPBqMeLI7aBwURTnya1Of+6fsQn2DoYp8/5qsrPKz3DjvKXeud5EKU0su7QZsnz MawRNl/13F+7984P7ZE+M4nx6A08iPFa3A8LMC4ysZOmNmMHpw24eYi0wkiI11JzrTWb 0sM26qBGy5OzDcpWAby/emgKQIPqJDZ7hysVuhxcgaQ2itdpYoH6q5Qv21OSu3ssMWVA Z2ew== X-Forwarded-Encrypted: i=1; AJvYcCV0TTNeQyDBslijuw7KgC1+gekH88qisP9e549Ot7BnQbuF00MSp0YjnMxyx1s7/zgPHxVGSW+8TjKeKmMB@postgresql.org X-Gm-Message-State: AOJu0YzJryHjbUpor75IdJotE3DJqj1TF5mRbQ110zSrzE6eQCmVzkBV KYvDg881MdEztd4mcDkeavxSgyODmsTvlz8z/yGPMleqvifoFYHhe7Hv X-Gm-Gg: AZuq6aIhxo4M0f3MhNJ+s8wA08fhyOiPq21Kp6N28VHlHLejLZ/SnGjvrxNY53GyC0p gjM3E3s7Q4qysu+TmoqDmL+3Tk0wakSlYrdOlX2eYc1gplU5Nw70gnNFa+PeCj332KUjECBU5Lm ut98zRqcsmnr6dNUcF0B4yWVm+76ldqhH8ad9hYd271UjATMF2QNG6dIm2wKBt0UFpuQ26P9C3X axZfF6UYqO5QllkIec1VtS1fJDF+Ot9Ia3ItKMOBxeuURfa3basaWFBseAq7ZLByPngGHJS1DEx eI9D7dvidYA8aARI4mlB/5WUVZtThGO/1EaO1oK5Bk326MWFGPv7/Dgi7kja0U8DJPcQJrwWEPq 1VMbYYilj/0znBjShe0ccSCRDw31e0ZOIe5Q1Wv6i8+kDqiC5dzAefDawfNa9VzhSXrFE5z89D6 A5fIgZSpWwVuSp+aBYksIHyrvAdxgQTVWXhqXBGpw/Fk26cxuQO809Xm7fu8BlzDw/R1vUgJD/p JauOTTVnF5jew9j X-Received: by 2002:a05:6820:2188:b0:663:c86:e9e2 with SMTP id 006d021491bc7-6785a058e16mr568825eaf.35.1771024168859; Fri, 13 Feb 2026 15:09:28 -0800 (PST) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-676f444f315sm2262295eaf.1.2026.02.13.15.09.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Feb 2026 15:09:27 -0800 (PST) Date: Fri, 13 Feb 2026 17:09:25 -0600 From: Nathan Bossart To: Nazir Bilal Yavuz Cc: KAZAR Ayoub , Neil Conway , Manni Wood , Andrew Dunstan , Shinya Kato , PostgreSQL-development Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Feb 13, 2026 at 02:45:30PM +0300, Nazir Bilal Yavuz wrote: > Also, if I change this code to: > > if (cstate->simd_enabled) > { > if (is_csv) > result = CopyReadLineText(cstate, true, true); > else > result = CopyReadLineText(cstate, false, true); > } > else > { > if (is_csv) > result = CopyReadLineText(cstate, true, false); > else > result = CopyReadLineText(cstate, false, false); > } > > then I see ~%5 performance improvement in scalar path compared to master. Hm. What difference do you see if you just do if (is_csv) result = CopyReadLineText(cstate, true); else result = CopyReadLineText(cstate, false); both with and without the SIMD stuff? IIUC this is allowing the compiler to remove several branches in CopyReadLineText(), which might be a nice improvement on its own. That being said, I'm less convinced that adding a simd_enabled parameter to CopyReadLineText() helps, because 1) it's involved in fewer branches and 2) we change it within the function, so the compiler can't remove the branches, anyway. But perhaps I'm missing something. Some other random thoughts: + match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); + match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); Since \n and \r are well below "normal" ASCII values, I wonder if we could simplify these to something like match = vector8_gt(... vector with all lanes set to \r + 1 ..., chunk); + /* Check if we found any special characters */ + mask = vector8_highbit_mask(match); + if (mask != 0) vector8_highbit_mask() is somewhat expensive on AArch64, so I wonder if waiting until we enter the "if" block to calculate it has any benefit. + simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote); If (is_csv && in_quote), we shouldn't have picked up \r or \n in the first place, right? + simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv; + + /* + * Do not disable SIMD when we hit EOL or EOF characters. In + * practice, it does not matter for EOF because parsing ends + * there, but we keep the behavior consistent. + */ + if (!(simd_hit_eof || simd_hit_eol)) I'd think that doing less unnecessary work would outweigh the benefits of consistency for the EOF case. -- nathan