Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1up7Vq-00H0SQ-BR for pgsql-hackers@arkaria.postgresql.org; Thu, 21 Aug 2025 15:47:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1up7Vp-00HD1b-SM for pgsql-hackers@arkaria.postgresql.org; Thu, 21 Aug 2025 15:47:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1up7Vp-00HD0q-F1 for pgsql-hackers@lists.postgresql.org; Thu, 21 Aug 2025 15:47:38 +0000 Received: from mail-qt1-x832.google.com ([2607:f8b0:4864:20::832]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1up7Vm-0012dU-2X for pgsql-hackers@postgresql.org; Thu, 21 Aug 2025 15:47:37 +0000 Received: by mail-qt1-x832.google.com with SMTP id d75a77b69052e-4b297962525so10523881cf.1 for ; Thu, 21 Aug 2025 08:47:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dunslane-net.20230601.gappssmtp.com; s=20230601; t=1755791254; x=1756396054; darn=postgresql.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=T+W//BH8uboOd+g02h/SuBpNxlbQom2c6zNZwSPob+I=; b=obK7wjdA/QxZcXs7d+e+9e7g0niX3mp+tASJCaZ9vAZWQGZ738VPudD2oQv35NN633 Id8iomLi+ZHmGp4iWynY5JpPx72ilekGI+gb6DZ4RR6EBl7RA00m2OozVZ9KqRUYdXwr ILS3tpSU0iy24oLbbWWn+N+4QnY1EC521ippzpAuSK3HyAEqy0NI/cYhIf6K+SEKDjUn 8Abi9U7EMUHCWqNuVGWO/aFFRPygT2HRpZ5tn/G+ggIU68g+c1wQGHUpY3CQgz+zL422 5liBdQ4dZlYE/dEQACUAnVA2ZcN6tEeADI/vjL785yGsL8mw9Y8yI2VPI8ZXrmKjnTlc 3i6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755791254; x=1756396054; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=T+W//BH8uboOd+g02h/SuBpNxlbQom2c6zNZwSPob+I=; b=Xl8T7/56p8YQFuX2Nuha3WUzqVm/3DDLm3Id7MLkRAdrjaeU4AXYjS/RX4tpTjL5UC 8yPuUyLt8jfcCpNIrCp07L/PHtEcKLbua0YbZ3J0y3CsUPCwIJutXEVyVbmOmSDXcpJM BMBqtAmOy7fNTz0ITtnasCySVz/Ji2QbuDd5T47xYUrSsHWio/P5+C51kqzxKqDQSWMF tsgUcr2SUr01tQ3ErlaeqqydtSvY4d/YiGEvPSQQ9FVXGDMNRMe6N9I7zREUf7xzpvgT AR26trMC+Qe3p7N2Wyd31dRh0LrLNvuYYWCyFvubM2DvVLDRiNk3SDL8s4d3dfVb+oHo 7nIA== X-Forwarded-Encrypted: i=1; AJvYcCXvHvB2YF225x+2tWjUQkbK2ungJMQv5WV6HhVeE8OvJWGHxUiD4o5sgLmWioldEGRFZL8k//m42IND8PGJ@postgresql.org X-Gm-Message-State: AOJu0Yzy+r+MStWC6CW6eAP0QFHeMbutPg4FJmO2GCpl3C5chQDmoUbs sn5TqWNv2/2A3i/6JOWSIifJg3h0L/7pRGKtpsCCvF4LWrNl0uCWZh5n9vMg+QYNR3I= X-Gm-Gg: ASbGncuL6ARiMKKZjGkR4dM48bpQxCY8mhXCQuVY9faDQUMRBuuMPyrgHrA5sGeCYFm y5AmBmoJIpDHAAP2l1YHUAtgnVULzQWuhIM1fx0szOR1sXq/rVYNdY9o8U0xItEioq6QBOSZr6a DdK8w4r2uN7QS6vY3fz81USnNjINtaM0WLPnDqYvwqihVG6A9HadfcT5oOb62vnqQW0bGGOs2W3 uA5lYW8+W9Bv8ZsiOEWIzZQbLh2XM1Aqg0Sc/dt9tz0deZ3w9feio0Hywx/lOsa7qVoA483GUNN dqDaIppzoePKy73Z4gCjRWZ4rcJXTlSu7itMeY3AvQWZs/lm2YvRejj2ql5wvr82/YfMfGyOFyI zgYAFV+ui7/+ggDFdGbw6UlBs6NKk7Q== X-Google-Smtp-Source: AGHT+IGA0ieJXSpcdk6oQPwkihpwzgK6Ir0oMupw2c/qGXo4762aF1H6e8lu7F3xQHy+0k85guZN2A== X-Received: by 2002:a05:622a:559b:b0:4b1:162b:70 with SMTP id d75a77b69052e-4b2a042bb86mr32303591cf.30.1755791253759; Thu, 21 Aug 2025 08:47:33 -0700 (PDT) Received: from ?IPV6:2605:a601:a6b0:500::1cb? ([2605:a601:a6b0:500::1cb]) by smtp.googlemail.com with ESMTPSA id d75a77b69052e-4b11de55ae9sm100705301cf.55.2025.08.21.08.47.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 21 Aug 2025 08:47:33 -0700 (PDT) Message-ID: <8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net> Date: Thu, 21 Aug 2025 11:47:30 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nazir Bilal Yavuz , KAZAR Ayoub Cc: Shinya Kato , pgsql-hackers@postgresql.org References: From: Andrew Dunstan Content-Language: en-US Autocrypt: addr=andrew@dunslane.net; keydata= xsBNBE7KWFkBCAClridxur2AIc7eW2AR7izbfp3EnNefie2HbLF0izW5Ik5UjX2HBXBx4syI gY6b0ugohXrr274+baoAlvSbq6cAoQuEVrk5IZFzt20b1Xkx65FwGSEj526yiKLocqkJceSq Xr9xcA5SGY+FZv441chh5SU92v4q6z+6LPpoHOh97ptAVXZYNTtU0LevyvD5lja0TzbvJm6C eFXitJfnm1pLEr0DGJCR/iUOl/N62Kh4855zZC7NHIjQHPOvV5Stz/l5ilDhvGVk+xkXFPys SjZoUr1rXhYLpiyi5sR0X9FHXT0KnGuz1F5ERO7ZTLSSQ6fJwPj6gOk9K+vvoKvoeql5ABEB AAHNJEFuZHJldyBEdW5zdGFuIDxhbmRyZXdAZHVuc2xhbmUubmV0PsLAlwQTAQgAQQIbAwIX gAIZAQULCQgHAwUVCgkICwUWAgMBAAIeBRYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNGd BQkdEO8nAAoJEJn6f8tZ/DuBq74H/jkTR4Zi3stbw+xC7v2u3QozssK7MYPL2AsVfh7OealS h182fiWXpfvmmAB7WUHbhk9GC2RAOnHI/2d2jgKaMLAHsGYOT0YopTVIwRY43fCw/mK67yxc wmDcX+zyKfLaivNbf5A7QPLNwda98bEAMSJ8Sn652Uc6cA8t3uKGsVzbRBQOoYzjgvBCfSrE 9ql3PDNg0l4BfAqabd2f70ZUm9VAMEPrgv/v2xI7M2XiL4g5BVmqLCOwxLM8RMCotCuoweUr VO43DeBCIDwLxotMJKvGWDjBzQYlU1NPUAtNcz/gN9ITUe1VUGjyvGj4u1lxBOcQQUw7l1+T 5moZ4iZxXzvOwE0ETspYWQEIANGc4zQULOxhbqO2dyD51YhqCNRmm9oKWaqf+wmW4tpDe/VV cxAnNizd4LWCHfzpb5cHAtGkOPePMfzWVf6nvdF7d3eglbtf59+zG7O7llV0xSSoFiieQBsr GvqDInXYX/4mRRXMtyhM353/tixC9RWLs1oofyYmCPPXXY7h9R7en3B8BoVrRFcdzlIY/NFN hFGW/9dkEiGjgna2Rk6e15kln4ZvFBWUg23p93w/pqXcxY6+k/8TEk+C4R+M6w7o2PLGOjdZ +kPiUcw5H85zf/yZJwQXzisXaNduwWB6Vads9YC9dj6kPR1c4VGRqAaYL++LAEOqrlvm2Tvq QqZRtnEAEQEAAcLAfAQYAQgAJgIbDBYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNI2BQkd EODdAAoJEJn6f8tZ/DuBfw0IAKTsfD40teP/pp+bsLLMSxPXUYrrprTj7WFB5v61p6dkpSr/ qXmMlyahdxQFaPmfVgVirB1Vk/kHiWNnnGjfUV9nB2Zg9LI0Xb9/ts3LsUiRWXzG3tkMY6XL vsVOxW4XFRND9l2q+WW93aZ1DZl+fqWfYgMvsusFRhmGFOKTRfKPta2Pkv+AhA24N4+PrR5p bU4k2MO8PAGiK8eaYKGFG1bHKuAvoDoF7WXJ3FHxuWqLnKEt4dfOLm5pAe3zq1Lt6q8azT9i QWGpSAK5vQUWQHBHpiDjdPeqKZ6HiAXIIKfSmb+jrvXBqoP+D6/K7rUjG2aXiRtTIAXms9sm VRu7cmw= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2025-08-19 Tu 10:14 AM, Nazir Bilal Yavuz wrote: > Hi, > > On Tue, 19 Aug 2025 at 15:33, Nazir Bilal Yavuz wrote: >> I am able to reproduce the regression you mentioned but both >> regressions are %20 on my end. I found that (by experimenting) SIMD >> causes a regression if it advances less than 5 characters. >> >> So, I implemented a small heuristic. It works like that: >> >> - If advance < 5 -> insert a sleep penalty (n cycles). > 'sleep' might be a poor word choice here. I meant skipping SIMD for n > number of times. > I was thinking a bit about that this morning. I wonder if it might be better instead of having a constantly applied heuristic like this, it might be better to do a little extra accounting in the first, say, 1000 lines of an input file, and if less than some portion of the input is found to be special characters then switch to the SIMD code. What that portion should be would need to be determined by some experimentation with a variety of typical workloads, but given your findings 20% seems like a good starting point. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com