Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vAtIz-006zSo-5K for pgsql-hackers@arkaria.postgresql.org; Mon, 20 Oct 2025 17:04:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vAtIx-002Qiq-V8 for pgsql-hackers@arkaria.postgresql.org; Mon, 20 Oct 2025 17:04:18 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vAtIx-002Qii-LC for pgsql-hackers@lists.postgresql.org; Mon, 20 Oct 2025 17:04:18 +0000 Received: from mail-ua1-x92f.google.com ([2607:f8b0:4864:20::92f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vAtIv-002pAg-0I for pgsql-hackers@postgresql.org; Mon, 20 Oct 2025 17:04:17 +0000 Received: by mail-ua1-x92f.google.com with SMTP id a1e0cc1a2514c-932e6d498b2so287603241.1 for ; Mon, 20 Oct 2025 10:04:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760979856; x=1761584656; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Bx/1sAAK7cT72jOk+Cf29awG3F0gfAa4BsJMpbQlDhY=; b=EDFoL57mVFTmzVzkwH22TMCYtaC162+ZyK5WhkvYPJsPtARmMfrwxexsrLLU3jZRnL KpG27wmlUrgXBzJzNxcLqn6CrZCeQfBn34UU0Wm40mvrtkpT1uZpa8xLLiWDN6aGl8Su yGSUvzjHe3ENwuh2rP0AtQGmKV0TrKZe28PYwQH7IpByrk5DFWVrXSp+q4Eatddu8KpJ 8GVEfRFS/an8yIhZVIaFuoQWcoNBgfkyKb56bSOaX+exgTpope58ua5MG7SwoqDywuUy t2ca4/a2h4beb6HOGpMpnaaXoCcgHJariQbAzTwYKzPRPe/dGM9GMkUApNFMB3Gpz/q4 q90A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760979856; x=1761584656; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Bx/1sAAK7cT72jOk+Cf29awG3F0gfAa4BsJMpbQlDhY=; b=YnGai8EDhchILRdfjHnvVfsVVRS8xeTytkiqkjEtBs7TWdFkOmaBUhoERlN2GnjjSX 1dLnapyqHxBuIbfzJgYe7DQV9z2qSiTd1PCvW/XYgkvWZelnXS5MV25QKwPwMaGbZXuV 4Va7upr765SEm+9f8tpxSl/YSH44bJazjzaSRkVtSjDb5UP8MDHKzA5PfY1vnLjfkIUN ab17nZvzNAjR8kRRA7LDQmYU5z9shH7EAz8Rqj6qnMbUD/jSqWkMWXTK9FijiafWBMQC r56ATmNg9MDJdAadutQbAnucrHq8ZUth/iA0ihZUr8UNml2AzYPGK3xtR/mjfNON3GeU 4x4g== X-Forwarded-Encrypted: i=1; AJvYcCUHsNRrN1vA7PZ7+4pW6r+E22B+6YN2r4COnIE+gb5G9JyGQcFiU2r0wX9zR3/wTJJMud+UhO4DRkNVqaBE@postgresql.org X-Gm-Message-State: AOJu0Yx6lr6798dUU1ZoxY4ykfr3fV0vFuNNxFUsOIDCIEvPE7XptVvX nBOZNNPBXqO/Y33jhkYon8oSora/3CloS/HElerZTkh6kQbWx8OW8ztM2gfvmw== X-Gm-Gg: ASbGncsX2HiBP4HoGgIWX7RSdD95jQpDGTPf7g783XGJbhJVD17EZPBWRVc1MNEtn+x eN9256ffTqqMxoAkYj7fdyMCeBqAn63vg1AwaYqXNRr/rMXVO+7QB1okbT8L9rpKR4TWqKOOAvk yEzAXUUh5bhV24wJQyzeS5zOfInH6n69oQFdKU7uzV4SOG5j/q+ccDC37chIH1xV+kLvel8q13T vHpvJcQMcHMsdrF2L/ivlf+EdGvdSwVzMqaXVGyGTW1L/7nILaKcEQ11mpPAGhh6jkcz509tVD7 C/zrKUaK2J5fsoc8ZC6FLqK6CGiLITP5cu9A5ZO+M6SXdMLEWpNmwSsFekV5tJDOV9NIPtAIZJF lqISguSeB65qGprynTd0ulzgQwajfw0KDyzShO+B6Bp0T/iEf8Zh1alzppzooyUQmQ7vLNlE9QK R2BDUG6IHs04zX2gAIxHmcmAHKicbR6p1Xl9tqpfGDp09QwKHsglI9itwCQaKwf6jsxX8dZu/Eo a5V X-Google-Smtp-Source: AGHT+IHJ0TQSOu+F0Mb7LWg8uoVQND9kjFIOun01GvlzUwI8xo+uG/vBL2cFHXDXBBFHQCGjPsCe4w== X-Received: by 2002:a05:6e02:16ce:b0:42f:9e92:a434 with SMTP id e9e14a558f8ab-430c52b5a2cmr162408885ab.21.1760979845339; Mon, 20 Oct 2025 10:04:05 -0700 (PDT) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-5a8a97689basm3074739173.49.2025.10.20.10.04.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Oct 2025 10:04:04 -0700 (PDT) Date: Mon, 20 Oct 2025 12:04:03 -0500 From: Nathan Bossart To: Andrew Dunstan Cc: Nazir Bilal Yavuz , KAZAR Ayoub , Shinya Kato , pgsql-hackers@postgresql.org Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: <8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net> <673d92f7-2489-475f-a208-9414ea35d4d8@dunslane.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <673d92f7-2489-475f-a208-9414ea35d4d8@dunslane.net> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Oct 20, 2025 at 10:02:23AM -0400, Andrew Dunstan wrote: > On 2025-10-16 Th 10:29 AM, Nazir Bilal Yavuz wrote: >> With this heuristic the regression is limited by %2 in the worst case. > > My worry is that the worst case is actually quite common. Sparse data sets > dominated by a lot of null values (and hence lots of special characters) are > very common. Are people prepared to accept a 2% regression on load times for > such data sets? Without knowing how common it is, I think it's difficult to judge whether 2% is a reasonable trade-off. If <5% of workloads might see a small regression while the other >95% see double-digit percentage improvements, then I might argue that it's fine. But I'm not sure we have any way to know those sorts of details at the moment. I'm also at least a little skeptical about the 2% number. IME that's generally within the noise range and can vary greatly between machines and test runs. -- nathan