Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0dm4-00259l-0B for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 11:00:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0dm1-00EMm9-10 for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 11:00:13 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0dm0-00EMm1-2g for pgsql-hackers@lists.postgresql.org; Thu, 12 Mar 2026 11:00:13 +0000 Received: from mail-dy1-x1333.google.com ([2607:f8b0:4864:20::1333]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0dly-00000001l3U-480b for pgsql-hackers@postgresql.org; Thu, 12 Mar 2026 11:00:12 +0000 Received: by mail-dy1-x1333.google.com with SMTP id 5a478bee46e88-2be1ab1fa7dso1601489eec.0 for ; Thu, 12 Mar 2026 04:00:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773313210; cv=none; d=google.com; s=arc-20240605; b=Eg+7hW6mUvoxugpDwt2LldKob3QY+W15spgkN3vqfKuilt27TgdJGb4bsN2XuUwoME aJY92GcdTnxx1vZWtrA6NTGRp16TrJUMNZndAGrWAdLGLqvMAncxfbYWqHsxWzXuAoyq 00mUg0/X3OKOH2U0r4H1ebcZRg4MqAKIoD8DdAGNZVE92UYRIeRG6RaMyWmrdG1uwrEh F4i1Pz1abdJiDJLG4S9mJZzYwd32ccqHkXeR/jJ46MvR2J7yAOhR5NJZfaqCYchkPNtc HCVKXQOAQpF9LB4M7jJ4jqHHmBHKlNx5uAfHSCu9DCtUOSWzVES5JNleDrYnCzVbM1nH o44w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=ot62aG3X8kXJmUIZQCnj8nCzpL0tzdlTUE7PcWiDduM=; fh=o/Kk2aTLcLmrCHaabjf5d1p5hZNqvjMJpGG4+6Ho+ok=; b=Bqm4983OnT6i0tST6gF0pcIRVjrEsIlM7vsewbzHBS7wrN026NZ5XPoOUMS6HxvgqT OTQJYvp6/zWijfLYhFuJiJIJasKa1SFb+8B2T4piYHhGeBATYjU3uaNy7o1/+T8jprvV CxCc6MkBVRL+0O/k/b1PosiVepTPalqAzfmflOsJCNcQ6KL1KBogsy0sWBZB1D2oIrMv HNdRpR/x9lb4GRTBo/TDyGAeC8IA3qWyua1+1vX4rBTL3orroiXml8RcPLnRKImj85Za 7pyd8uzY5PQFMVhY761H37e0TgtwPhpNytEerBS2iQRQmaq1Y8/KHgW7If0cNOF3N8OD PcwQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773313210; x=1773918010; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ot62aG3X8kXJmUIZQCnj8nCzpL0tzdlTUE7PcWiDduM=; b=OEBskfGPuyZyy0/w6trT8a5V2qNxOCn0dQJFsQbuMhyixqvMWfReCXTsUIgJ+yUh67 9nNjlQesG3qb4w3ek0wZF0TtD2wOLBwxLyWK2w71z7m1IbxPlPNAFqRd/lHuWCs2HPtM IQL3sqv9uRJ62+AFToQ23PQ9jDXPEfwcN5M820G8s1raBizCXpuUj2TZ6IjbtdIUg9aQ V3+Hqu3MXwLnKZfxgjfVtyjCKdJ9ZDZdMh3Ieib5wZkNUFNkivGkFI+oUTxp1DB0p3kW jtjtT8DwiSRKSyeTskY3ACYFIBoP+PjF/5lir+q3ze8PL+Axz7TRhnbhBskwnjaEr4K0 AUSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773313210; x=1773918010; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ot62aG3X8kXJmUIZQCnj8nCzpL0tzdlTUE7PcWiDduM=; b=i61IMQ1CCJPFbP5Kx9uJfn1bly3uzuAyQtKiXb9wQb7t33qK/I9GNkDPMcmI+2sC5B +jjBXGFYA1eCBhcDYkDhP8tqSPCvztbB+3UxZut/ui9jgh7DIN9xgiud3lr8QmXjYUcM jMKBWhzHf75BiWlHoOZJ+DX12/AXdC3H6dFLJF9mPmfHt7tn+lx5/RWaAu7oDTmFoN3X 59cqNEu2TsbDOwEhHMVJ2wbRuRy7+Al2lCDSKN9135LGyuznYUdcKmMr7fIoTpa8MCO8 2+MxZqdNEUjI/0YbDmhBY7ZPb0K2AC6xLhQd6NZeW58gG1RtzNQ1WshZSY1YsX8xE5xg I21A== X-Forwarded-Encrypted: i=1; AJvYcCWjbZlG+jypHIA7wVTlViVm/M9/pfPtfMsvhrOLhHmrh6HZMUnpRgicwF2lwzLbOfUGq6IZL+BUYrJO1KOG@postgresql.org X-Gm-Message-State: AOJu0YxVvfbh0Xo60+LjI1fnXrO0JcLMTPBdlqcH6bndzpiaBWiT5V/i B6rF7hYXacm0ku+0S/RyNJj8p2kVLq0X/lW8Q54VOk5x7NPzOoqDmbCDkJcowRJnIQ9AfJSalcN WAHP14OnM1dprvEW5AwoSVjnvHiXnkoE= X-Gm-Gg: ATEYQzz1TFbwhbAnT0dNyy2PcBU9UxJ7axS6DhotWJbQXvVPxn4ftzd5CAQMDDvJFh5 FHgNIs98HCkySXYdyziWOhCrxKIFjvY5xqwX62kJ1ylrDTOaUrzPWQYQ6RQoZ4buKaFYX6f7qTA t0AuLr7rb6DB+ZB9/+g4OaAQpWvfNkWqqwvUbawwK6yNmXVOZ5z+CZaUJk3aZoyuq87XNj987i2 cBhZAcKNkoW153jN5KVrZ+tBJZdjvpSHTnVVXClhiYZCDosu2zwKArzf9jViGBW934c8m/lsfy8 I0N37v8= X-Received: by 2002:a05:693c:2b17:b0:2b8:6abf:5ebf with SMTP id 5a478bee46e88-2be8a23ee07mr2808953eec.12.1773313209941; Thu, 12 Mar 2026 04:00:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Nazir Bilal Yavuz Date: Thu, 12 Mar 2026 13:59:53 +0300 X-Gm-Features: AaiRm53ALIA24otqGwahD6VKC-cT5fNm8Tefpa3btSbOUk1LJxZGaqdKAkeB1JM Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nathan Bossart Cc: Manni Wood , KAZAR Ayoub , Neil Conway , Andrew Dunstan , Shinya Kato , PostgreSQL-development Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On Wed, 11 Mar 2026 at 23:42, Nathan Bossart wrote: > > On Wed, Mar 11, 2026 at 10:22:18PM +0300, Nazir Bilal Yavuz wrote: > > Here is v14 which is v13-0001 + v13-0002. > > Thanks! It's getting close. > > > + /* > > + * Temporary variables are used here instead of passing the actual > > + * variables (especially input_buf_ptr) directly to the helper. Taking > > + * the address of a local variable might force the compiler to > > + * allocate it on the stack rather than in a register. Because > > + * input_buf_ptr is used heavily in the hot scalar path below, keeping > > + * it in a register is important for performance. > > + */ > > + int temp_input_buf_ptr; > > + bool temp_hit_eof = hit_eof; > > A few notes: > > * Does using a temporary variable for hit_eof actually make a difference? > AFAICT that's only updated when loading more data. > > * Does inlining the function produce the same results? > > * Also, I'm curious what the usual benchmarks look like with and without > this hack for the latest patch. I tried to benchmark all of these questions, here are the results: Old master means d841ca2d14 - inlining CopyReadLineText commit (dc592a4155). v14 means d841ca2d14 + v14. v14 + #1 means removing temporary variables. v14 + #2 means removing temp_hit_eof variable only. v14 + #3 means inlining CopyReadLineTextSIMDHelper(). v14 + #4 means inlining CopyReadLineTextSIMDHelper() + removing temporary variables (#1). ------------------------------------------------------------ Results for default_toast_compression = 'lz4': +-------------------------------------------+ | Optimization: -O2 | +------------+--------------+---------------+ | | Text | CSV | +------------+------+-------+-------+-------+ | WIDE | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 4260 | 4789 | 5930 | 8276 | +------------+------+-------+-------+-------+ | v14 | 2489 | 4439 | 2529 | 8098 | +------------+------+-------+-------+-------+ | v14 + #1 | 2472 | 5177 | 2479 | 9285 | +------------+------+-------+-------+-------+ | v14 + #2 | 2521 | 4252 | 2481 | 8050 | +------------+------+-------+-------+-------+ | v14 + #3 | 2632 | 4569 | 2458 | 8657 | +------------+------+-------+-------+-------+ | v14 + #4 | 2476 | 4239 | 2475 | 10544 | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | Text | CSV | +------------+------+-------+-------+-------+ | NARROW | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 9955 | 10056 | 10329 | 10872 | +------------+------+-------+-------+-------+ | v14 | 9917 | 10080 | 10104 | 10510 | +------------+------+-------+-------+-------+ | v14 + #1 | 9913 | 10090 | 10120 | 10532 | +------------+------+-------+-------+-------+ | v14 + #2 | 9937 | 10130 | 10072 | 10520 | +------------+------+-------+-------+-------+ | v14 + #3 | 9880 | 10258 | 10220 | 10604 | +------------+------+-------+-------+-------+ | v14 + #4 | 9827 | 10306 | 10308 | 10734 | +------------+------+-------+-------+-------+ ------------------------------------------------------------ Results for default_toast_compression = 'pglz': +-------------------------------------------+ | Optimization: -O2 | +------------+--------------+---------------+ | | Text | CSV | +------------+------+-------+-------+-------+ | WIDE | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 4260 | 4789 | 5930 | 8276 | +------------+------+-------+-------+-------+ | v14 | 2489 | 4439 | 2529 | 8098 | +------------+------+-------+-------+-------+ | v14 + #1 | 2472 | 5177 | 2479 | 9285 | +------------+------+-------+-------+-------+ | v14 + #2 | 2521 | 4252 | 2481 | 8050 | +------------+------+-------+-------+-------+ | v14 + #3 | 2632 | 4569 | 2458 | 8657 | +------------+------+-------+-------+-------+ | v14 + #4 | 2476 | 4239 | 2475 | 10544 | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | | | | | +------------+------+-------+-------+-------+ | | Text | CSV | +------------+------+-------+-------+-------+ | NARROW | None | 1/3 | None | 1/3 | +------------+------+-------+-------+-------+ | Old master | 9955 | 10056 | 10329 | 10872 | +------------+------+-------+-------+-------+ | v14 | 9917 | 10080 | 10104 | 10510 | +------------+------+-------+-------+-------+ | v14 + #1 | 9913 | 10090 | 10120 | 10532 | +------------+------+-------+-------+-------+ | v14 + #2 | 9937 | 10130 | 10072 | 10520 | +------------+------+-------+-------+-------+ | v14 + #3 | 9880 | 10258 | 10220 | 10604 | +------------+------+-------+-------+-------+ | v14 + #4 | 9827 | 10306 | 10308 | 10734 | +------------+------+-------+-------+-------+ ------------------------------------------------------------ By looking these results: v14 + #1 and v14 + #3 performs worse on wide & 1/3 cases. v14 + #4 performs worse on CSV & wide & 1/3 cases. v14 and v14 + #2 perform very similarly. They don't have regression. I think we can move forward with one of these. -- Regards, Nazir Bilal Yavuz Microsoft