Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1voTOD-005nil-1n for pgsql-hackers@arkaria.postgresql.org; Fri, 06 Feb 2026 21:29:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1voTOC-005bAs-11 for pgsql-hackers@arkaria.postgresql.org; Fri, 06 Feb 2026 21:29:20 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1voTOC-005bAj-04 for pgsql-hackers@lists.postgresql.org; Fri, 06 Feb 2026 21:29:19 +0000 Received: from mail-oi1-x22e.google.com ([2607:f8b0:4864:20::22e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1voTO9-00000001Qbt-1i6Z for pgsql-hackers@postgresql.org; Fri, 06 Feb 2026 21:29:19 +0000 Received: by mail-oi1-x22e.google.com with SMTP id 5614622812f47-45e934bb51dso485529b6e.1 for ; Fri, 06 Feb 2026 13:29:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770413356; x=1771018156; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8G0+KnXnaaTEwIZ5gjbmF6QfNc/YLgMyXj35nyFAQFw=; b=MKEG6Tgw/Swn2VfFtgMSz42Y99uNS2P6eEdEstBMyV7/Pu+ZZ/3C2kqXgjJ+fs7UmJ xFOGkmGXaMn3rQXPBEIRazxquOUFTdmHErw+PZtgKKwwqfDL3eiakG7B5ULxTYe1vkrS xHiTOcfqy2364+WPxx6mBHArZhHHqq1xEmwdCYIiLxikwj8TdlKDXUsTGoIfceNok5NK BXGeFWKzfBNWzKEE664ck7QpGfw6YK14NCd2LWMdTJTlcwS8wXIilX4aDirUBUUT61wt wqYxskrmFnVX2946S/mCgl6IGN4kScf2ePGax5WD4BZrb9GAlHJlvXNJAzpYTtIejyO6 SKhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770413356; x=1771018156; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8G0+KnXnaaTEwIZ5gjbmF6QfNc/YLgMyXj35nyFAQFw=; b=HUrc//LO2x4brTZlOLiU1qqIjiLzRU5iaCfAIWRDytu0fM+zo7LSKr0ajOy8gO3xNt jv25xFg71TuW7r0/T9wN84YLVDyi1eKYZ2dPNyOZSismewKrsptkqdw4EBKMKAx6tfrQ ksIWWbgHh8VF9PQooXFYJ2sP8ZVir6INTzyvnfCnc71/XHgfExJ09/hC8o7zIAscUxt/ RppJYotFnHVXWuzCtCFtx5LyD3utaNVyDfdn1UjJDuKx6AS2Y0RaWAMENZ8rMdlbeZ9W TBTt8BAg72Nk/1BMYowPNafzoDCvnfP0+zSZKaH40WCO7QakU/yotk+kyGuysgNLjk2v hw3A== X-Forwarded-Encrypted: i=1; AJvYcCWkjLD85cmrHKoIRf5wtBRmg63j0N4qMVrNwfyJ7dCJQ6YxsEOtkvcOsCaZeYiV1fRKlT5gYrvX/IhPn6Ej@postgresql.org X-Gm-Message-State: AOJu0YwL+fiO3c0I/FggcFKtbLj18x9LiQn8oCK8YkPDpNMKUcQh47sO x01zRH/d3+JkSScAertWfcCrWofYZyc57hwpiFP2o2giqKqZEAk/+Ov9 X-Gm-Gg: AZuq6aL/UDTF7c533Rl5M72kEIljJSfU8+eUE40OfkqOL2YQTQ8wpvJGCcfh/MVsnmy s7TMWzjD4e44FcjTVLt96ukzCHrQVgSUM/PetNBfT8oLNd1YPg+B5puTBJH3giQam8R0AnD0nF6 bPOBIe0lamTkBc909kMsfaEwRXER2FjXKOwIs7ZQqSrN7zFWs3nEntgJ/fEfWdVzsk9PhiDp50F NTb4w+M3XzeSLuGquz3ZyhWL9C5ttn2ghch6xTuE4A4zmMgHa7PgwJ4VRXgAG20zGBOgvH0B8Em uPk6Yz/LhCYoBqI56ctL222vzbFnpei5UM+kJL6uEKXN0Fp0TccQnTSC+CI9W5OpucqtaTVtzFo A0BMH/Nk0m5Roe81dQBo3xgWunrzB1Dyw5mEyXcyWcep2zj5hx+yeoKMOsG7LZsr4VtCKyhhqhN aXsqujQf0ID4veHgShyYlvJ68jqNxydnhQ3+gkPImTODsLsoELbz/vdJwRO87GSK/N+t6V2PI4O Fps X-Received: by 2002:a05:6808:c18d:b0:460:fce5:2fc2 with SMTP id 5614622812f47-462fcb45c6emr2089407b6e.44.1770413355835; Fri, 06 Feb 2026 13:29:15 -0800 (PST) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 5614622812f47-462feb0d786sm2051240b6e.11.2026.02.06.13.29.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Feb 2026 13:29:15 -0800 (PST) Date: Fri, 6 Feb 2026 15:29:13 -0600 From: Nathan Bossart To: KAZAR Ayoub Cc: Nazir Bilal Yavuz , Neil Conway , Manni Wood , Andrew Dunstan , Shinya Kato , PostgreSQL-development Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Sorry for disappearing from this thread for a while. It looks like a lot of energy has been put into benchmarking and refining the heuristic for deciding when to use the SIMD path so that we avoid large regressions when there are special characters. I think this is all valuable work, but I'm a bit concerned that we are putting the cart before the horse. IMHO it would be better to first get the SIMD code committed with the absolute simplest heuristic we can think of (e.g., as soon as we see a special character, switch to the scalar path for the remainder of COPY FROM). My hope is that would be far easier to reason about from a performance angle. If we immediately fall back to the existing code path, we don't need to worry about how many special characters there are and whether they are sparse or clustered or whatever. We just need to measure the overhead of the new branches and ensure they don't produce meaningful regressions. Assuming that all looks good, we can then focus on the SIMD code itself and make sure that is correct and optimal. And once we get that portion committed, we could then consider more sophisticated heuristics. FWIW I'm hoping to get something in this area committed for v19, and IMO now is a good time to start thinking about how to get things over the finish line. Thanks for working on it. -- nathan