Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1voUbk-0074cC-2t for pgsql-hackers@arkaria.postgresql.org; Fri, 06 Feb 2026 22:47:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1voUbj-005usO-2Z for pgsql-hackers@arkaria.postgresql.org; Fri, 06 Feb 2026 22:47:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1voUbj-005usF-1e for pgsql-hackers@lists.postgresql.org; Fri, 06 Feb 2026 22:47:23 +0000 Received: from mail-oi1-x232.google.com ([2607:f8b0:4864:20::232]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1voUbh-00000000uSu-334X for pgsql-hackers@postgresql.org; Fri, 06 Feb 2026 22:47:22 +0000 Received: by mail-oi1-x232.google.com with SMTP id 5614622812f47-45f10d7eb81so436276b6e.3 for ; Fri, 06 Feb 2026 14:47:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770418041; x=1771022841; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=OZ/G6OEiAWZMyAKImzs3AvflHPInPtvSmrah5z684+8=; b=COYDu+9IWOivomOBj/NNVJzx6uuRs158sYrqnrBMalguaT+qhLLPa79o30zZBLnRlB 2E56OOZodvRKpMiw3v1sDQ6Wko9TDgD357O3hfsih7PIpjN6iDl/+Tk3GTbB/NmDHo2K qfh5YWKmDl3rOl6Ptkso88YIwklvGVfgPfPWk9CqdHvv4llZ5N4pWmcP/2VQez71Ffcw O2HwUSn1l0TG05TW2deOIGzJSYEzNGevlTGr+eoFl2+kqZxp+po9jNy4hZGSVFgh12f0 lW75odv7ghx02y9n9kFl8sXPoyhvUPDYN+kD+/aWdU6deY0Y6+7PjlME60VD4TbF27Ce 1UTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770418041; x=1771022841; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OZ/G6OEiAWZMyAKImzs3AvflHPInPtvSmrah5z684+8=; b=HjPmFxwR1xJ7jfUa/896VdNJT2khxgw3/ZKB7eVKWKpklsstxXFUv1wy63H9nc2tIe UYsq2JZupsyx5sZCnfmlo4YOaeiLNrDu2KcNT4Xe0IcrKqOdkhIfWh8O9eKRBezz3BYa TfxTDNtDPDovGCDwps3AQ4ew31/XlXVWKz5BeZ3dHsvNrebj18Es2WTQ2fTZH8ult/cK 8aF6+l6Cqu9cEksHJz+zppv9xxOqAdqkbHXcvAAHbfntL2JYl5mxBQ0jaSWbp2XzM10S PUY5+ScYuL9y5pgOH0T+q5LVOETrnEF1QRWwsjJdFrzIhw8cAC9ltR3/3ZdxjIgj6q1u DZOg== X-Forwarded-Encrypted: i=1; AJvYcCW9PrfGuSUdCZzcFJJVwBMpz9ARP880CJEEEECiZh+oPtLnCTGzMSzGt1J20MkIcbXtfZx55WKSLyYLnyxd@postgresql.org X-Gm-Message-State: AOJu0YwplLsl7QW2J/XM9NcJwrdxe0e3ZTgCMCm5YJDSA4U1u7z7sJFf /zVaIkSz7FERurSvjgE8vJFLQodX1Gi2ooGpN1+QL/lh0/xBnzWtj5c+7+Jntw== X-Gm-Gg: AZuq6aIdCAoPr5sXArXAOz74243I1xUiwqdAVga8CPia/HNtkjVru0/X8r/HlWbQ8IT q/ebzZ+7HCtpvrGOru8Kcsj5MYDaOtEQXzi+ry3RnLUhU3KGaiddNTNogjLPmQSzyza2NKta9vP AQGdAMveseepk3I7TPl501xDweGylYdRu9R0GM+RCQgzGmEJbeMVK4fo6QpmBQ26obMylE20i6D BplngCinLZTJY0DfE6SPTa5n+Emj/FMjAbd0b8nUCLTHEoUPx0xjmAlvde0flJ1xGxwcfts+lU4 AJNZe19EKsXDwWWEVaoWTCW5wUvxN1z+vD9IjW76+Wl+aMHfpCYHner8TIhq41YwyCP4/kHmTDY TuMoMpI78IcT3oZyWV8Qx5IQpWTEmHkGPenrh5GL6h4i7UxerXIdEWdXRJMLstTMamNAcbRnZeM /mG+I8JHQ/ucrO+WIbDnU8Nps8KIz5EZux0zCGWVtqm7BRw/xAwNmYlou9V7sjIaG0OcTsprFGZ VPQ X-Received: by 2002:a9d:73ca:0:b0:7cf:e3f3:437 with SMTP id 46e09a7af769-7d46448201amr1734082a34.22.1770418040890; Fri, 06 Feb 2026 14:47:20 -0800 (PST) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d46470dab9sm2697229a34.10.2026.02.06.14.47.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Feb 2026 14:47:20 -0800 (PST) Date: Fri, 6 Feb 2026 16:47:18 -0600 From: Nathan Bossart To: Nazir Bilal Yavuz Cc: KAZAR Ayoub , Neil Conway , Manni Wood , Andrew Dunstan , Shinya Kato , PostgreSQL-development Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Feb 07, 2026 at 01:19:16AM +0300, Nazir Bilal Yavuz wrote: > I have three possible approaches in my mind, they are actually similar > to each other. > > 1- After encountering a special character, disable SIMD for the rest > of the current line and also for the rest of the data. > > 2- It is a mixed version of the current heuristic and #1. After > encountering a special character, skip SIMD for the current line (let' > say line 1) and for the next line (line 2). Then try running SIMD for > the next line (line 3), if there is no special character continue to > run SIMD but if there is a special character then skip running SIMD > for two lines this time. And it goes like that, everytime special > character is encountered in the SIMD run, skipped SIMD lines are > doubled. > > 3- This version is a bit different from #2. Instead of calculating the > number of lines to skip dynamically, skip the constant N number of > lines and then try to run SIMD again after these lines. N could be > something like 100, 1000, or 10000 etc.. Actually, you and Andrew > suggested this approach before [1]. > > I think what you suggested is closer to #1 or #3. I just wanted to > hear your opinions, and whether you think any of these approaches are > good to implement / work on. Yeah, I think either (1) or (3) would be a good starting point. (1) is basically just (3) with N set to infinity, anyway. I imagine there's some value less than infinity that is acceptable, but if I had to pick an approach right now, I'd probably go with (1) to essentially remove the heuristic from the discussion until we're ready to focus on it. -- nathan