Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vyGFB-00HZXv-3C for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 21:28:30 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vyGF9-001WCN-2a for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 21:28:28 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vyGC7-001Tls-0T for pgsql-hackers@lists.postgresql.org; Thu, 05 Mar 2026 21:25:19 +0000 Received: from mail-qv1-xf43.google.com ([2607:f8b0:4864:20::f43]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vyGC5-00000000fXQ-2SN2 for pgsql-hackers@postgresql.org; Thu, 05 Mar 2026 21:25:18 +0000 Received: by mail-qv1-xf43.google.com with SMTP id 6a1803df08f44-89a15b9a556so37972386d6.3 for ; Thu, 05 Mar 2026 13:25:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dunslane-net.20230601.gappssmtp.com; s=20230601; t=1772745916; x=1773350716; darn=postgresql.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=rP7N5R18Y44y7/UHqwvJeLtQSkKK0fW09QzCRQrt1Xc=; b=dg9rWNLa9ggllSHpCXEEd5OHdayhxVKLj0C5Lam0CJ0oNgu62GOxbLJIzRqRnaW2NP vXIZoKhn3dBn9egyiv68bgsRJSsRRgeeUykolIXeyYyfE+WRnlDq5eZOKUmkEynEHSTz 0hJ8TjcexyKkqBcohMO8H+N4Cr6eCZhT36+T3tZ3KEoCvFrfwGn5kJKNsFLVrKo2nRmD gayHKLszQZIzdp3rjjp3ifoTo5RZv8Up5+nxlDiyD55MqjBqQqg8UA0DCryr5CRSMgTZ pqTql/8PqK+P+/ZgE8GPlj0YKJv+L3BfPNfEWiV5mTjueweHIuXVsaLlJAq0J8TBqVRN hmjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772745916; x=1773350716; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rP7N5R18Y44y7/UHqwvJeLtQSkKK0fW09QzCRQrt1Xc=; b=b9dQ8t+h4lr3dV3Vy5I3Fc+37BwmsvGxdunXvlrwS6qjtJKKnQVObwHRj2kebu0L9y v3DpobMXiN4HA0n16mt0joljQ+0+g3eHwWEXoIJwLKid72/UaeAXS210B5Z2iyZXhdkI 1iexSgyZbrpTt9rm09k+soc4Y44SUvKIzCjXpZWesehItp3ihmgpbic/Nl8dKJygLzRu sZ74l5zt+nGnpZscJIwQIePt8kJGf1gjI7DGbO3B2jVEDcSAP17GLJpHDZ7muG6pFktH mJbwkpJ8VilIPmbtTvPVdA2ylT8cOFuhgJUCXyo+Fq7I2AWJ8LFbUpL5IhyhPFAvncw8 +bMQ== X-Forwarded-Encrypted: i=1; AJvYcCXzjDXOM4rp1eQUH3yzhYOpgOtn1GyxIPEZn/XnSvraNsS1xidvtXa/WPY4jAvtUwrKrYAO/nm6Cb5+3ICT@postgresql.org X-Gm-Message-State: AOJu0YwvPKtfAf5xBW+/oUcfBYF/GCgVeRwivscnNu2de86anBcxNsMc nODQFdqS9sVm7RieefEkiG02/A9fPWxdfABAzCV/EE533hatzfEEvMclvCM1TRRVgtU= X-Gm-Gg: ATEYQzzGhnK2ftwnWQBEzIDKL13imUShVbaL+4JLe79VJ47v073i44//oBCEU+daWbR G9phTiYRinwpjS2/8t1JQQOXCChJYH8oeEI6fU38F/JnjhR7jovAp8cKp4NAgqgl4ytBd3/J7Og ho++/VOTlDtKblDjnUX8+1D+ZRazb4KDL0MkHZk27u4Zu9X6ZzSKS99Gw+WejWD6fwzSoymXvbo MQrLLlHHZrKhrZOEP6dJyvYItQJh65+Jh+7SJKRi5cd3Zm9nLb6SyWZz/4blZo4vBH9afuW6Rud VLEzseiRdd2WUzwtuOZiiqavTTDtX0QslEbd2GUeNyS4hAatMf77w6aYasWdb086XBBYRyxzutG FsI+ejfMwkcKLTSg+erXODUQ6o9mDdvEWlpR1z4730aIVoeGF728iGqc7L8inflzhcCaF+hmed8 bXNhbNid9K7JuVoI1d3EkgQf05WYVTQQ== X-Received: by 2002:a05:6214:b61:b0:89a:e15:46fb with SMTP id 6a1803df08f44-89a19d3ac6emr100574596d6.66.1772745915822; Thu, 05 Mar 2026 13:25:15 -0800 (PST) Received: from ?IPV6:2605:a601:a6b0:500::1cb? ([2605:a601:a6b0:500::1cb]) by smtp.googlemail.com with ESMTPSA id 6a1803df08f44-899e7627b69sm132803616d6.41.2026.03.05.13.25.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Mar 2026 13:25:15 -0800 (PST) Message-ID: <91acb778-42c4-44ef-8888-f18ad9b12a5b@dunslane.net> Date: Thu, 5 Mar 2026 16:25:14 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nazir Bilal Yavuz , Nathan Bossart Cc: Manni Wood , KAZAR Ayoub , Neil Conway , Shinya Kato , PostgreSQL-development References: From: Andrew Dunstan Content-Language: en-US Autocrypt: addr=andrew@dunslane.net; keydata= xsBNBE7KWFkBCAClridxur2AIc7eW2AR7izbfp3EnNefie2HbLF0izW5Ik5UjX2HBXBx4syI gY6b0ugohXrr274+baoAlvSbq6cAoQuEVrk5IZFzt20b1Xkx65FwGSEj526yiKLocqkJceSq Xr9xcA5SGY+FZv441chh5SU92v4q6z+6LPpoHOh97ptAVXZYNTtU0LevyvD5lja0TzbvJm6C eFXitJfnm1pLEr0DGJCR/iUOl/N62Kh4855zZC7NHIjQHPOvV5Stz/l5ilDhvGVk+xkXFPys SjZoUr1rXhYLpiyi5sR0X9FHXT0KnGuz1F5ERO7ZTLSSQ6fJwPj6gOk9K+vvoKvoeql5ABEB AAHNJEFuZHJldyBEdW5zdGFuIDxhbmRyZXdAZHVuc2xhbmUubmV0PsLAlwQTAQgAQQIbAwIX gAIZAQULCQgHAwUVCgkICwUWAgMBAAIeBRYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNGd BQkdEO8nAAoJEJn6f8tZ/DuBq74H/jkTR4Zi3stbw+xC7v2u3QozssK7MYPL2AsVfh7OealS h182fiWXpfvmmAB7WUHbhk9GC2RAOnHI/2d2jgKaMLAHsGYOT0YopTVIwRY43fCw/mK67yxc wmDcX+zyKfLaivNbf5A7QPLNwda98bEAMSJ8Sn652Uc6cA8t3uKGsVzbRBQOoYzjgvBCfSrE 9ql3PDNg0l4BfAqabd2f70ZUm9VAMEPrgv/v2xI7M2XiL4g5BVmqLCOwxLM8RMCotCuoweUr VO43DeBCIDwLxotMJKvGWDjBzQYlU1NPUAtNcz/gN9ITUe1VUGjyvGj4u1lxBOcQQUw7l1+T 5moZ4iZxXzvOwE0ETspYWQEIANGc4zQULOxhbqO2dyD51YhqCNRmm9oKWaqf+wmW4tpDe/VV cxAnNizd4LWCHfzpb5cHAtGkOPePMfzWVf6nvdF7d3eglbtf59+zG7O7llV0xSSoFiieQBsr GvqDInXYX/4mRRXMtyhM353/tixC9RWLs1oofyYmCPPXXY7h9R7en3B8BoVrRFcdzlIY/NFN hFGW/9dkEiGjgna2Rk6e15kln4ZvFBWUg23p93w/pqXcxY6+k/8TEk+C4R+M6w7o2PLGOjdZ +kPiUcw5H85zf/yZJwQXzisXaNduwWB6Vads9YC9dj6kPR1c4VGRqAaYL++LAEOqrlvm2Tvq QqZRtnEAEQEAAcLAfAQYAQgAJgIbDBYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNI2BQkd EODdAAoJEJn6f8tZ/DuBfw0IAKTsfD40teP/pp+bsLLMSxPXUYrrprTj7WFB5v61p6dkpSr/ qXmMlyahdxQFaPmfVgVirB1Vk/kHiWNnnGjfUV9nB2Zg9LI0Xb9/ts3LsUiRWXzG3tkMY6XL vsVOxW4XFRND9l2q+WW93aZ1DZl+fqWfYgMvsusFRhmGFOKTRfKPta2Pkv+AhA24N4+PrR5p bU4k2MO8PAGiK8eaYKGFG1bHKuAvoDoF7WXJ3FHxuWqLnKEt4dfOLm5pAe3zq1Lt6q8azT9i QWGpSAK5vQUWQHBHpiDjdPeqKZ6HiAXIIKfSmb+jrvXBqoP+D6/K7rUjG2aXiRtTIAXms9sm VRu7cmw= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2026-03-04 We 10:15 AM, Nazir Bilal Yavuz wrote: > Hi, > > On Mon, 2 Mar 2026 at 22:55, Nathan Bossart wrote: >> On Wed, Feb 25, 2026 at 05:24:27PM +0300, Nazir Bilal Yavuz wrote: >>> If anyone has any suggestions/ideas, please let me know! > I am able to fix the problem. My first assumption was that the > branching of SIMD code caused that problem, so I moved SIMD code to > the CopyReadLineTextSIMDHelper() function. Then I moved this > CopyReadLineTextSIMDHelper() to top of CopyReadLineText(), by doing > that we won't have any branching in the non-SIMD (scalar) code path. > This didn't solve the problem and then I realized that even though I > disable SIMD code path with 'if (false)', there is still regression > but if I comment all of the 'if (cstate->simd_enabled)' branch, then > there is no regression at all. > > To find out more, I compared assembly outputs of both and found out > the possible reason. What I understood is that the compiler can't > promote a variable to register, instead these variables live in the > stack; which is slower. Please see the two different assembly outputs: > > Slow code: > > c = copy_input_buf[input_buf_ptr++]; > db0: 48 8b 55 b8 mov -0x48(%rbp),%rdx > db4: 48 63 c6 movslq %esi,%rax > db7: 44 8d 66 01 lea 0x1(%rsi),%r12d > dbb: 44 89 65 cc mov %r12d,-0x34(%rbp) > dbf: 0f be 14 02 movsbl (%rdx,%rax,1),%edx > > Fast code: > > c = copy_input_buf[input_buf_ptr++]; > d80: 49 63 c4 movslq %r12d,%rax > d83: 45 8d 5c 24 01 lea 0x1(%r12),%r11d > d88: 41 0f be 04 06 movsbl (%r14,%rax,1),%eax > > And the reason for that is sending the address of input_buf_ptr to a > CopyReadLineTextSIMDHelper(..., &input_buf_ptr). If I change it to > this: > > int temp_input_buf_ptr = input_buf_ptr; > CopyReadLineTextSIMDHelper(..., &temp_input_buf_ptr); > > Then there is no regression. However, I am still not completely sure > if that is the same problem in the v10, I am planning to spend more > time debugging this. > >> A couple of random ideas: >> >> * Additional inlining for callers. I looked around a little bit and didn't >> see any great candidates, so I don't have much faith in this, but maybe >> you'll see something I don't. > I agree with you. CopyReadLineText() is already quite a big function. > >> * Disable SIMD if we are consistently getting small rows. That won't help >> your "wide & CSV 1/3" case in all likelihood, but perhaps it'll help with >> the regression for narrow rows described elsewhere. > I implemented this, two consecutive small rows disables SIMD. > >> * Surround the variable initializations with "if (simd_enabled)". >> Presumably compilers are smart enough to remove those in the non-SIMD paths >> already, but it could be worth a try. > Done. > >> * Add simd_enabled function parameter to CopyReadLine(), >> NextCopyFromRawFieldsInternal(), and CopyFromTextLikeOneRow(), and do the >> bool literal trick in CopyFrom{Text,CSV}OneRow(). That could encourage the >> compiler to do some additional optimizations to reduce branching. > I think we don't need this. At least the implementation with > CopyReadLineTextSIMDHelper() doesn't need this since branching will be > at the top and it will be once per line. > > I think v11 looks better compared to v10. I liked the > CopyReadLineTextSIMDHelper() helper function. I also liked it being at > the top of CopyReadLineText(), not being in the scalar path. This > gives us more optimization options without affecting the scalar path. > > Here are the new benchmark results, I benchmarked the changes with > both -O2 and -O3 and also both with and without 'changing > default_toast_compression to lz4' commit (65def42b1d5). Benchmark > results show that there is no regression and the performance > improvement is much bigger with 65def42b1d5, it is close to 2x for > text format and more than 2x for the csv format. I spent some time exploring different ideas for improving this, but found none that didn't cause regression in some cases, so good to go from my POV. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com