Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vBHVt-00ER4l-HB for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Oct 2025 18:55:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vBHVr-00BHqE-E7 for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Oct 2025 18:55:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vBHVr-00BHq5-4S for pgsql-hackers@lists.postgresql.org; Tue, 21 Oct 2025 18:55:14 +0000 Received: from mail-il1-x12f.google.com ([2607:f8b0:4864:20::12f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vBHVn-003TgC-34 for pgsql-hackers@postgresql.org; Tue, 21 Oct 2025 18:55:13 +0000 Received: by mail-il1-x12f.google.com with SMTP id e9e14a558f8ab-430d4cf258fso13676165ab.0 for ; Tue, 21 Oct 2025 11:55:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761072910; x=1761677710; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=onJ+1CtNgcr2TxBGo5EK2D5e2VNsNfS+wWPwBMb7gPo=; b=SMNb1it7rMXFNwuGgrSg4upFEWOAUnZuX90+WoD9bcgtG9ocbRc43HPz7Ixv79z6aD lMl11CSMi7n4JetGqyPhWouydDv8lRNJUftoQQ2w1XZwTGuox/GQ8iBYH71jp5hi+xob 6Wedl99f3r4OY5OglfCVKKGy5Z4Tdw+QGgaw+MwG9P/8q/pMFbBtuYCMzsH+Ll7aPANc 7qCMCvP/tENB90t9NZtKTkNX9n1bb7h9NitvrXuJm3ZxoM7M76e67Xhdi5rWqrmwycOx XLIloGGiEFE/nxFlHoe2bu1yE8v4xLftX9reZuD9NoJZc2yim6aeROQ5co+xv2zAQxBX 9X0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761072910; x=1761677710; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=onJ+1CtNgcr2TxBGo5EK2D5e2VNsNfS+wWPwBMb7gPo=; b=uK7dS4Ry3NQH9d9UgQ++ZSQOv3ZiRuyMe7mIn3kLDkQGYjKqKpDjUrvgHGs3Gbb1qr 85V1NLAxaOtTgvguKJeoW7Mlt4YNjZx4qFE9vOPSh2gxvdfDUGMN7E4knoa/RAXpMAzF XDwrpPLc0T+gLk3QpoP2z6a9YKd41GJQk/QERwgA9JI+PEXfr4M9SrgcpoHYfsZ2BLtE IhT5jobfbMuNu30oNr/XJIRrZX2I5e/3gRjXBNAnVUFHgucpcHPzLSEKJHNnhz4BIjwP iCI9rO0EBnpc4j193vQ0Y9agxPbB3hWf3Z1biXXwJWbjJLV34lhAbjCIssbZxdMjuKkL 5wdA== X-Forwarded-Encrypted: i=1; AJvYcCXlF/UeP1yvKLVVH8ctEPVtrJDVVxGlfX38P5e6tmEr3aAK+wZt9Ld/EbbB03jKv3HbDkSPDzyUmAolnU1r@postgresql.org X-Gm-Message-State: AOJu0YwVzCzTkdamelBBUJUwpTYRSKAlMqNGr219E69jChfhz4WMmlTG eGi7RpwXReHwjpEjBTnFqAqC3n6NzOe+nLmk41IjV5i+IY9m7EWhhUa9 X-Gm-Gg: ASbGncv6ImvFpct8RLiK8VBcouuSbNIy8koIO44YvjF5SEaSmjYPDNHv95GR4oKXt9N 7Als47pmv50tIX/PWyioUAq3V/bch7/qUXuO76fcbU86dpNL25gV1THiI4ix1mudBPaPwD7FAdf gjMoLep4MADdva+XCQCv2rkon6hlcjKaPx042ZPfdHOB2CrUAin4gYgCUVmwbC1wQ0f/BrB0qzA 5IaPikO+GDR1ljjhmWIKkNVaPJH7bxkiopCStRbzYmvIRKSH8t7mD0IBQGuWzIOZJYkxNW51kSc CxZOoP1UQt9XOsUIDmAmVMKRfyHaJszpjsOqwwiTuolL1JKvZtbVyHKT6jGerjsxe8+/OmwELa8 sNYx0rmwIO9fA0bIkfYgfWqXo9kSTICs+B3umb/akBNJcGtlbBy5cdCigXdIZLGVRqCTLSLgB9J N5jUYNpiMneqHHBPx+EZKKosCyNz/1UdEK1BcVavlsmohtXSD/TN2yxZNVMkDzIequdA== X-Google-Smtp-Source: AGHT+IHpaoYe0KszKh02mKpaGHra3KLfOMP0HVnOeWiZ/03p7fMKxx1r5Mkor8ssNJ1tVeOgOx1lvw== X-Received: by 2002:a05:6e02:1a66:b0:42d:8525:ac81 with SMTP id e9e14a558f8ab-430c527d363mr275819795ab.17.1761072910103; Tue, 21 Oct 2025 11:55:10 -0700 (PDT) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-430d07ce09fsm45507925ab.39.2025.10.21.11.55.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Oct 2025 11:55:09 -0700 (PDT) Date: Tue, 21 Oct 2025 13:55:07 -0500 From: Nathan Bossart To: KAZAR Ayoub Cc: Nazir Bilal Yavuz , "ants.aasma@cybertec.at" , Andrew Dunstan , Shinya Kato , pgsql-hackers@postgresql.org Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: <8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Oct 21, 2025 at 08:17:01AM +0200, KAZAR Ayoub wrote: >>> I'm also trying the idea of doing SIMD inside quotes with prefix XOR >>> using carry less multiplication avoiding the slow path in all cases even >>> with weird looking input, but it needs to take into consideration the >>> availability of PCLMULQDQ instruction set with and here we >>> go, it quickly starts to become dirty OR we can wait for the decision to >>> start requiring x86-64-v2 or v3 which has SSE4.2 and AVX2. > > [...] > > Currently we are at 200-400Mbps which isn't that terrible compared to > production and non production grade parsers (of course we don't only parse > in our case), also we are using SSE2 only so theoretically if we add > support for avx later on we'll have even better numbers. > Maybe more micro optimizations to the current heuristic can squeeze it more. I'd greatly prefer that we stick with SSE2/Neon (i.e., simd.h) unless the gains are extraordinary. Beyond the inherent complexity of using architecture-specific intrinsics, you also have to deal with configure-time checks, runtime checks, and function pointer overhead juggling. That tends to be a lot of work for the amount of gain. -- nathan