Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2ZTx-000PIb-00 for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 18:49:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2ZTv-004BlC-0g for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 18:49:31 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2ZTu-004Bl3-2l for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 18:49:30 +0000 Received: from mail-ot1-x32e.google.com ([2607:f8b0:4864:20::32e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w2ZTs-00000000eXB-1Od3 for pgsql-hackers@postgresql.org; Tue, 17 Mar 2026 18:49:30 +0000 Received: by mail-ot1-x32e.google.com with SMTP id 46e09a7af769-7d7b685faeeso1064638a34.3 for ; Tue, 17 Mar 2026 11:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773773367; x=1774378167; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=NvAsvhcjaSSAQseYRrZCXwNnrRirOe5ivpksmNYtgZU=; b=Wc8WrNoXWap0oiTn+2XJJyKFKfPqr8zrUI+++R1lXnXLN4xEM4SMAvzTj7KFdJGmnV XsK4TU+jx1CVGfg/7siHyOyMqR3b5O607jUTKVzlNSC48T2jOkGh31wX5kBHNQ8Xlzt2 QW0Ot3mJhulgiuB+9SSsuIexOl25srtalytaD9gH+JnxsolRukiqI+O+ZVQZXCLGnE3V xzT5YY/NVbBLVNWRcMGlHi/TLhZNe8Hsv6TRFkmKPDg50bYLTR/sfDnqsXJgpFEz3E+L dADU5YRpkpnM41d2NjkytrwHosT2NfoYhDeY/eyaRsI9BHf5wOMFsK+i9+vS5/Aitoko jxEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773773367; x=1774378167; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NvAsvhcjaSSAQseYRrZCXwNnrRirOe5ivpksmNYtgZU=; b=dBc58Cagkw/gd2UgiMiFFJESNnmFg7rhSEaxUqTL/UwvIB9KnmFpeuUSAf6ujUMMzS 2tzIhtqZU+UO7wDZi2RsT+PUWvHyAJRBUJyvEoP2AZFiy1oAnpbspCcAi8KQ+dvJg7rb ACrrpTKf2EfOHFoVrzxcvcnp/6kutC7pRTVBQQPhTirtZ3ylTkpd3/TdmfmBoNs0hLrO LrSOy3uw9C8i66pDnJt5pm5AJz8yQ/e4LB3NZXpnu2Z+OuQl1jSNRVSzbaS/R6FrEMMu 6nrrkcpk89vU7H8hKgJQVSjL4Nn9ssOpvsgrBJBDF3bjXHrlw6uFU4eX0neP0MD+VN7y BdIg== X-Forwarded-Encrypted: i=1; AJvYcCU9CJby/bmAkL5XD9SxeYFu3GNZGBst9N0w7oeTX6516txrpgnd9vviQcS4A618eVOcqGuWYP6WuB8xdxNF@postgresql.org X-Gm-Message-State: AOJu0YxZRZ5rM5850KkuA/pJ1LZmz5JSXFfFwmmM4Ca3SELWE1F8TVaB J9yRRomDRiQNRpe12FpkWEi8D1DYC1By7z4g8PP3uUhXANAlgeyNVsAz X-Gm-Gg: ATEYQzy+XJfZhStwMFQolhZ0p+3euOK1ZCf0P8B+ow/pmQlq0CQ7FIX66IGj8cihO+Q yCJ1Txw1fHwWBGujC2btdjhBDm/SS573bKdHvFwM/sk/oN4BAid0Ok3qI/5VguiDlAyeskVnYYO AKePjmMvEJjlUu2SqICaw9W1B3OKiYLJ1KYhBF7+heAbbLMHX1ZgeEgq9FAUJOljKSgmv/IcvwK 1p4x+tPf3h+q3u2eiVa0w12NqDT6XJHWCnzCA6juE2XgL278mUJ9VgZAMaLurM8J7CPBXJFFo4W uugvZu/WC6NkOuw8bScSFTTuG2Gcen9FtBOdaQuwck37Oq826juUzJt5+AKI+A8DdFcMFB/SKIX T5AiMQ4PQoky6UZyOCAfErIGvF5EbeYI+jbp099gw7UTbG9iEMEPasQ8MtITPJeEmOyMBar5gBn fnFx9oqERJls/x70CxN2OPT2uHmFO1sCkG2+7zeiQm8IFEEkTv8sJBPnVT65YcsEK+l3MxvGDI1 4fyKOz+L04kfN2cJKCD7A== X-Received: by 2002:a9d:5e99:0:b0:7d7:cb90:c7ec with SMTP id 46e09a7af769-7d7cb90ca0cmr87297a34.32.1773773366987; Tue, 17 Mar 2026 11:49:26 -0700 (PDT) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d7c9b39de8sm339601a34.15.2026.03.17.11.49.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Mar 2026 11:49:26 -0700 (PDT) Date: Tue, 17 Mar 2026 13:49:24 -0500 From: Nathan Bossart To: KAZAR Ayoub Cc: Andres Freund , Pg Hackers , Neil Conway , Manni Wood , Andrew Dunstan , Shinya Kato , Mark Wong , Nazir Bilal Yavuz Subject: Re: Speed up COPY TO text/CSV parsing using SIMD Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Mar 14, 2026 at 11:43:38PM +0100, KAZAR Ayoub wrote: > Just a small concern about where some varlenas have a larger binary size > than its text representation ex: > SELECT pg_column_size(to_tsvector('SIMD is GOOD')); > pg_column_size > ---------------- > 32 > > its text representation is less than sizeof(Vector8) so currently v3 would > enter SIMD path and exit out just from the beginning (two extra branches) > because it does this: > + if (TupleDescAttr(tup_desc, attnum - 1)->attlen == -1 && > + VARSIZE_ANY_EXHDR(DatumGetPointer(value)) > sizeof(Vector8)) > > I thought maybe we could do * 2 or * 4 its binary size, depends on the type > really but this is just a proposition if this case is something concerning. Can we measure the impact of this? How likely is this case? > +static pg_attribute_always_inline void CopyAttributeOutText(CopyToState cstate, const char *string, > + bool use_simd, size_t len); > +static pg_attribute_always_inline void CopyAttributeOutCSV(CopyToState cstate, const char *string, > + bool use_quote, bool use_simd, size_t len); Can you test this on its own, too? We might be able to separate this and the change below into a prerequisite patch, assuming they show benefits. > if (is_csv) > - CopyAttributeOutCSV(cstate, string, > - cstate->opts.force_quote_flags[attnum - 1]); > + { > + if (use_simd) > + CopyAttributeOutCSV(cstate, string, > + cstate->opts.force_quote_flags[attnum - 1], > + true, len); > + else > + CopyAttributeOutCSV(cstate, string, > + cstate->opts.force_quote_flags[attnum - 1], > + false, len); > + } > else > - CopyAttributeOutText(cstate, string); > + { > + if (use_simd) > + CopyAttributeOutText(cstate, string, true, len); > + else > + CopyAttributeOutText(cstate, string, false, len); > + } There isn't a terrible amount of branching on use_simd in these functions, so I'm a little skeptical this makes much difference. As above, it would be good to measure it. -- nathan