Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEEZG-0085PO-MJ for pgsql-hackers@arkaria.postgresql.org; Wed, 29 Oct 2025 22:22:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEEZF-0048Kn-Bv for pgsql-hackers@arkaria.postgresql.org; Wed, 29 Oct 2025 22:22:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEEZE-0048Ke-Vf for pgsql-hackers@lists.postgresql.org; Wed, 29 Oct 2025 22:22:56 +0000 Received: from mail-qv1-xf35.google.com ([2607:f8b0:4864:20::f35]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vEEZB-004STb-1m for pgsql-hackers@postgresql.org; Wed, 29 Oct 2025 22:22:54 +0000 Received: by mail-qv1-xf35.google.com with SMTP id 6a1803df08f44-87eed34f767so3642606d6.2 for ; Wed, 29 Oct 2025 15:22:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dunslane-net.20230601.gappssmtp.com; s=20230601; t=1761776572; x=1762381372; darn=postgresql.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=5lG6IaQh94jdl4rRhSRsADWoV4aihSvask/BD43kfWo=; b=OHiPJ5OYskvP4jD0v4J0POLRb0KH7Rlp8zFQi8HCql/el2PxJj9oZDLzkZfSMP0kLO 4EZK6C9KnQAAD8zZBDzjaCrclIeYHza4T+HjmUKKICeBF94VmkaqOxfHBJAzzSfoiWqD LlZ0q021Mj4n87q0U+Cz4/pi7MvX0ls4hkqX8Vhs/LEe0iAK7P/W73ANgrFhLDrWv0s+ AEMwl2lTSKupKt9uy7tnOS9O3DgQJLnE+vj5nnBtvOgUULzHrwfE89CKae6uez7mwPb/ /tL/weP9AR+wfOAGqRic2P7APH0Y4c6vU0GvW6vFvn2+TscmmHcXa3F5n7Xqp47PzhlB 2JaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761776572; x=1762381372; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5lG6IaQh94jdl4rRhSRsADWoV4aihSvask/BD43kfWo=; b=JB9AO9kUWtvmpg8GnldoTP54/++f/z1Q/nRBwT2UTm7x2HtTtXHpSlGa+063S0gWmE 8Cgg9G2lBKc+fz1EBHHhlfjkMdVP3vnpq2seajINeWhIEBQOpcrEF/xbtEI3Fz+Gi0Zq WONvKlafk8kIQNHK3Y9aREKtjHhHVXzgzVOW72Xlk+KJv931VdpBtvJ3LiKgf0liqdDP ME0uKj4J9+nud/ZpFkMqU6jvArNrbBtvesrCBD2b8JmuVHa44n2wFXIkIY05av55pAy/ Yt+BszOTi67BOMlTqL5vFij+BHt7Ww8KsvV21mWRRU2cy4LobuU1kOt5IccM+Vg8iK8e u6wQ== X-Forwarded-Encrypted: i=1; AJvYcCXUtt+NT55vixThx5AT9eJQ0o+NaqSglFP+x7tA6vIe46ioMs4HRCTp2YyTd8j7HgCqBHdwk6B/bCEo7WRc@postgresql.org X-Gm-Message-State: AOJu0YxX9Z5XC38+cU4zo6DVQtd/AyjKfwiHhprNBtIGeYRknPOM453l ke75ghVMREy2e77hxnWiPMxu9BMmIj+hS+QlgUbyX0l6S0+6Lf3b+MicKzQ+yYvHg+M= X-Gm-Gg: ASbGnctDE+zcGHAVlTYKJBGXBa8WBjtf3zYvUQnZKaXOUeFjp8+fPcU52vDkwTKJ3as LxFY0nnotorrdeyxBJ7+3zzG68FbgVn5GGa6GB72RFGYLPp8Lvcaf32dR6mvleer0thQ0ex2uop rt8O4xb1xN4MsfN2hddknGfPqWCYA2HYv8jDa09m/lkp7xUlq3jIxTzcWtunMj6YCR0mIq+18Zp oQ4fgjjodbgsy23NbXo6RHtPLrrrclfvkoEmP0OBWhSKebYPDR0qADy92vaAf4h3mTDZ1mGhlkf twcVdyg1tPZy7prKvgoZBEiVW7z7VxuXWCqLT/DQAy80jCNWVrK9xDNkt3Rkrvz+5gMLGCauGHj 83QD9COaAx41MAUQwGg8eaB/HDfRoQHKKgg+DSh+vbvfevgK0UwF26WtJpqvTJMtY7md28lH724 d/pX67vGc2Ke4HOvasCk7LPgqWE3nG X-Google-Smtp-Source: AGHT+IHEHs49vHP4kOb/2J8R5tAXmciJ/A78Q6nI2VMcWYLFt/5DUkm9DUeQ42YF9TRD5COwsB4n/g== X-Received: by 2002:a05:6214:1cc9:b0:7cd:91ff:6215 with SMTP id 6a1803df08f44-88009c573efmr60898926d6.61.1761776571557; Wed, 29 Oct 2025 15:22:51 -0700 (PDT) Received: from ?IPV6:2605:a601:a6b0:500::1cb? ([2605:a601:a6b0:500::1cb]) by smtp.googlemail.com with ESMTPSA id 6a1803df08f44-87fd3c48f20sm93112226d6.32.2025.10.29.15.22.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Oct 2025 15:22:50 -0700 (PDT) Message-ID: <5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net> Date: Wed, 29 Oct 2025 18:22:46 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nathan Bossart , Nazir Bilal Yavuz Cc: KAZAR Ayoub , Shinya Kato , pgsql-hackers@postgresql.org References: <8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net> <673d92f7-2489-475f-a208-9414ea35d4d8@dunslane.net> <8e045899-2023-48b1-bd91-f8cdffeb511d@dunslane.net> From: Andrew Dunstan Content-Language: en-US Autocrypt: addr=andrew@dunslane.net; keydata= xsBNBE7KWFkBCAClridxur2AIc7eW2AR7izbfp3EnNefie2HbLF0izW5Ik5UjX2HBXBx4syI gY6b0ugohXrr274+baoAlvSbq6cAoQuEVrk5IZFzt20b1Xkx65FwGSEj526yiKLocqkJceSq Xr9xcA5SGY+FZv441chh5SU92v4q6z+6LPpoHOh97ptAVXZYNTtU0LevyvD5lja0TzbvJm6C eFXitJfnm1pLEr0DGJCR/iUOl/N62Kh4855zZC7NHIjQHPOvV5Stz/l5ilDhvGVk+xkXFPys SjZoUr1rXhYLpiyi5sR0X9FHXT0KnGuz1F5ERO7ZTLSSQ6fJwPj6gOk9K+vvoKvoeql5ABEB AAHNJEFuZHJldyBEdW5zdGFuIDxhbmRyZXdAZHVuc2xhbmUubmV0PsLAlwQTAQgAQQIbAwIX gAIZAQULCQgHAwUVCgkICwUWAgMBAAIeBRYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNGd BQkdEO8nAAoJEJn6f8tZ/DuBq74H/jkTR4Zi3stbw+xC7v2u3QozssK7MYPL2AsVfh7OealS h182fiWXpfvmmAB7WUHbhk9GC2RAOnHI/2d2jgKaMLAHsGYOT0YopTVIwRY43fCw/mK67yxc wmDcX+zyKfLaivNbf5A7QPLNwda98bEAMSJ8Sn652Uc6cA8t3uKGsVzbRBQOoYzjgvBCfSrE 9ql3PDNg0l4BfAqabd2f70ZUm9VAMEPrgv/v2xI7M2XiL4g5BVmqLCOwxLM8RMCotCuoweUr VO43DeBCIDwLxotMJKvGWDjBzQYlU1NPUAtNcz/gN9ITUe1VUGjyvGj4u1lxBOcQQUw7l1+T 5moZ4iZxXzvOwE0ETspYWQEIANGc4zQULOxhbqO2dyD51YhqCNRmm9oKWaqf+wmW4tpDe/VV cxAnNizd4LWCHfzpb5cHAtGkOPePMfzWVf6nvdF7d3eglbtf59+zG7O7llV0xSSoFiieQBsr GvqDInXYX/4mRRXMtyhM353/tixC9RWLs1oofyYmCPPXXY7h9R7en3B8BoVrRFcdzlIY/NFN hFGW/9dkEiGjgna2Rk6e15kln4ZvFBWUg23p93w/pqXcxY6+k/8TEk+C4R+M6w7o2PLGOjdZ +kPiUcw5H85zf/yZJwQXzisXaNduwWB6Vads9YC9dj6kPR1c4VGRqAaYL++LAEOqrlvm2Tvq QqZRtnEAEQEAAcLAfAQYAQgAJgIbDBYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNI2BQkd EODdAAoJEJn6f8tZ/DuBfw0IAKTsfD40teP/pp+bsLLMSxPXUYrrprTj7WFB5v61p6dkpSr/ qXmMlyahdxQFaPmfVgVirB1Vk/kHiWNnnGjfUV9nB2Zg9LI0Xb9/ts3LsUiRWXzG3tkMY6XL vsVOxW4XFRND9l2q+WW93aZ1DZl+fqWfYgMvsusFRhmGFOKTRfKPta2Pkv+AhA24N4+PrR5p bU4k2MO8PAGiK8eaYKGFG1bHKuAvoDoF7WXJ3FHxuWqLnKEt4dfOLm5pAe3zq1Lt6q8azT9i QWGpSAK5vQUWQHBHpiDjdPeqKZ6HiAXIIKfSmb+jrvXBqoP+D6/K7rUjG2aXiRtTIAXms9sm VRu7cmw= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2025-10-22 We 3:24 PM, Nathan Bossart wrote: > On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote: >> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart wrote: >>> I wonder if we could mitigate the regression further by spacing out the >>> checks a bit more. It could be worth comparing a variety of values to >>> identify what works best with the test data. >> Do you mean that instead of doubling the SIMD sleep, we should >> multiply it by 3 (or another factor)? Or are you referring to >> increasing the maximum sleep from 1024? Or possibly both? > I'm not sure of the precise details, but the main thrust of my suggestion > is to assume that whatever sampling you do to determine whether to use SIMD > is good for a larger chunk of data. That is, if you are sampling 1K lines > and then using the result to choose whether to use SIMD for the next 100K > lines, we could instead bump the latter number to 1M lines (or something). > That way we minimize the regression for relatively uniform data sets while > retaining some ability to adapt in case things change halfway through a > large table. > I'd be ok with numbers like this, although I suspect the numbers of cases where we see shape shifts like this in the middle of a data set would be vanishingly small. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com