Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vqIs9-008PUO-0Q for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Feb 2026 22:39:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vqIs8-006GxM-1H for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Feb 2026 22:39:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vqIs8-006GxE-04 for pgsql-hackers@lists.postgresql.org; Wed, 11 Feb 2026 22:39:48 +0000 Received: from mail-oa1-x2a.google.com ([2001:4860:4864:20::2a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vqIs5-00000000Byk-3fIe for pgsql-hackers@postgresql.org; Wed, 11 Feb 2026 22:39:47 +0000 Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-40ea36b56b7so1394791fac.3 for ; Wed, 11 Feb 2026 14:39:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770849586; x=1771454386; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Cf8vwL1xtdcB4dzMZyQy4YDSepBuTv41a9DKVy0VwvA=; b=dj/KWyCJdr2Q6KcKFvocuXhlso6Odj0aQuq4VqeCIrEoVovSUKesfC9J435i+fg2vP ZfvuhDPu+lGDG+B/DZfewo3aubxmA+Yt7oyPCWdD2pPXALEU0NyCyTSJuWrxyXwiIn0q zlUO9mGJqL4H9ahIFbeolytkJeHYwJ6+ZiktuQwS+1KRyL7+0kvz7OCqm3/xlwdLX63/ 1QmWbGXNncrT/QP80aRnADXlc8ktfatrP5QC2FXNIHC0Fs1AOiG6Gyfm3kjPGW4cVWyK Qb4qcijHnzG71jVQaSL5X+bKKm5CHphRQYj9WNK2ek1qANqXwM5g5q8S4x1tNfeplxCH xMyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770849586; x=1771454386; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Cf8vwL1xtdcB4dzMZyQy4YDSepBuTv41a9DKVy0VwvA=; b=c3c8V+pf34Ny0ystxhh8Mm1IhhEkD+W4wqfJae8WKJVYEWKGgtxbTDaJMPchYUtyUs ladERAqPLGbhjNm1H0vvC/rhBCLXpjbwvyTMcL6UBXFFDntJIdAGK6OsGOj/U+qUWRfk fGc9eOxhRuO8TCtF9+VlgJkUmwUvr/P1ik8WO4a417K0Ztflkx7+LvLh1av35lzKYOfE LulJu44ioufPptMit97NKQpYHCNUGjcsOm7il8GqdlQJuD5/q3wbFPwd5I4AxAjn6iHr AHVEaNhd+cM+1Z3Q84uAdLZVmTjYNn4+HFwGDT8zKnCTfvNUwwho0AEwq8y3ZntXauzL 8X8A== X-Forwarded-Encrypted: i=1; AJvYcCUnRYp/+SOBps888HvgsJ1EtIGUnbDsge4DqrtB+cm16sBIw2yCH7kq0csSXQvk+ud9EUbDqwi44zyAgzrc@postgresql.org X-Gm-Message-State: AOJu0YyQA9YFMSwz9BUS6tTSL0lPSoe3ikW7HYia5uMZwXZgrXCuMAuW P5yMBqbBZ+w7IYx8u1nUDI80m5Sq0Rd1mU77jg3DPig/9X5FKGFtq0DF X-Gm-Gg: AZuq6aLNBlvLWhZEbjVBDUCfh6lUuw4Sdg14tiUH3sw0HVthvrmyr89farfi+wk0nMi ReVcW7WW6FeqeiUKXkpTJ/eXCpIlHJFI1ePK3Ozjh8zDegXlb8UzVk1F8Hypi/HJiid/Auvk9Je NXewNqnzXuqHhvHEZkgUQFYhrKOEGgDe3VDkkx5B0WSDPey9TkhNVYF4qOS0MS2VZulind32/0o cbGAMXTER7JvJOkt1GUz9hsb8gVjY+gCQJh8AXYwsJfAsSJ3ySqTQtDdPqzSMuhAlpctdspiOp1 +MldnITz9sUE8W72wGKHlwYjgrywKt/ZHapHKQeNQwpYpiAmUMbiVlxI9UoRiz+qQaY9yip9DcK nkL8bLefUwvHkcxHuKMaX9J9pZt/M4hmOlw1/ikKXvT9HcHLdtfezLnihjY8wa+r691ClqoUx6/ 0E/ANfl/fXF4lzpnyN1VTOh0GqKeDUhu+QwH2DU9vkxgmPz4Cetal0FVJl5x95Q1fHx1vFu1Nfh mjZcAqvVMUh7TSauUOOkoXsFsw= X-Received: by 2002:a05:6870:2804:b0:40a:5795:1533 with SMTP id 586e51a60fabf-40eca27e477mr200804fac.34.1770849585732; Wed, 11 Feb 2026 14:39:45 -0800 (PST) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-40eaee485b8sm2422025fac.4.2026.02.11.14.39.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Feb 2026 14:39:45 -0800 (PST) Date: Wed, 11 Feb 2026 16:39:43 -0600 From: Nathan Bossart To: Nazir Bilal Yavuz Cc: KAZAR Ayoub , Neil Conway , Manni Wood , Andrew Dunstan , Shinya Kato , PostgreSQL-development Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote: > I am sharing a v6 which implements (1). My benchmark results show > almost no difference for the special-character cases and a nice > improvement for the no-special-character cases. Thanks! > + /* Initialize SIMD variables */ > + cstate->simd_enabled = false; > + cstate->simd_initialized = false; > + /* Initialize SIMD on the first read */ > + if (unlikely(!cstate->simd_initialized)) > + { > + cstate->simd_initialized = true; > + cstate->simd_enabled = true; > + } Why do we do this initialization in CopyReadLine() as opposed to setting simd_enabled to true when initializing cstate in BeginCopyFrom()? If we can initialize it in BeginCopyFrom, we could probably remove simd_initialized. > + if (cstate->simd_enabled) > + result = CopyReadLineText(cstate, is_csv, true); > + else > + result = CopyReadLineText(cstate, is_csv, false); I know we discussed this upthread, but I'd like to take a closer look at this to see whether/why it makes such a big difference. It's a bit awkward that CopyReadLineText() needs to manage both its local simd_enabled and cstate->simd_enabled. + /* Load a chunk of data into a vector register */ + vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]); As mentioned upthread [0], I think it's worth testing whether processing multiple vectors worth of data in each loop iteration is worthwhile. [0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan -- nathan