Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vMSRa-00ANS1-2s for pgsql-hackers@arkaria.postgresql.org; Fri, 21 Nov 2025 14:49:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vMSRY-007neH-2O for pgsql-hackers@arkaria.postgresql.org; Fri, 21 Nov 2025 14:49:01 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vMSRY-007ndq-13 for pgsql-hackers@lists.postgresql.org; Fri, 21 Nov 2025 14:49:00 +0000 Received: from mail-qk1-x730.google.com ([2607:f8b0:4864:20::730]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vMSRV-000i9j-0d for pgsql-hackers@postgresql.org; Fri, 21 Nov 2025 14:48:59 +0000 Received: by mail-qk1-x730.google.com with SMTP id af79cd13be357-8b29ff9d18cso205669285a.3 for ; Fri, 21 Nov 2025 06:48:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dunslane-net.20230601.gappssmtp.com; s=20230601; t=1763736536; x=1764341336; darn=postgresql.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=z/qIJlKhrmKLN7V39/YF9uRDDcVHDO1eDNL0xnnlvu0=; b=Fse1KSUnPTEQ61ZAc7dLPkruu2JtEZj0TwProXyxPo0/ooYluhFVFitgqVIADU1mwG IBlRzjzvnsXalaTj4VN6PExHayLsL5yKfg3tIBZQSbhNq8j2BaEY4bss/lxAHqvzR6mq S6b+WDJsWhp2ZNHIyJZypUKw2lWKGV6hSnrbNLLn6+Mv3+55CAPZk0TW8M9WmkjX4MsR l9+TVVnOVUOgboNunkSrOKgZCSePw0cjp/CB3VZMsSRepJKDlcIYxjEV2U0Hsw8fB4NV ovE2Po8mBLKuDk/47DTIFDK0P3Lg9v20zdkVWLhQZMQS0r63kDI9pOi/czKfwdkfoQa5 VtiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763736536; x=1764341336; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :from:references:cc:to:subject:user-agent:mime-version:date :message-id:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=z/qIJlKhrmKLN7V39/YF9uRDDcVHDO1eDNL0xnnlvu0=; b=S9YOI7Gq6F0MLKyQ6VmlsfFyNtUWRfOG4crwI3+q3Wk91fbNOCvKcDjuOgsHX26sG5 cg1qTEljlV1otAwg7LpD+ZJhrAQP2QRL6h76iueFuEcT177QZblbWgI2Ea8UDNqWQcVq CgzEIh5YdFx1DqUwZaeRKtmgV4+mzrtxBXDWsx4S7myosdgquyJhLYbGMrHc7LLpJw/4 07hZmd9wyK6B5tHIqZLWtgsbivppMR4PpacXgPS9HoUUFWEBEl2cRmY/DEOT0wa7M8XB 5DVbkRi3Jirz6EStGuPlUrTELuV0+CqWbDTmNdsA1fxEblXcAwVLFNAVmsZw9bcFxT8f aD3g== X-Forwarded-Encrypted: i=1; AJvYcCWrmBS4YqfEGHQFHmwfwOsgBJuI70HIWPoiFUSIZfxDwF3P8KMPxn4A8AriQ83Zjl/TBxqVATMEEIdTfhXe@postgresql.org X-Gm-Message-State: AOJu0Yz0103btLFuizoTVn/0u2+xdnk6/EwC/DrkuFKGlwljqdG47yUz 6S3dYDZsSK9WCPOWD2IbPGpfLvrIkey/vOvZ2XFuX9DOo54/vbXsoBxcFaiNGYmrol0= X-Gm-Gg: ASbGncuM5xp/hI6AeTrBHzDm8t8UUQ5k7dSOyr1guDcROLIIZ/MGv4NNgDEe3cdEVB3 dBtbNwaQqUihoNmrl9TeJQt07cXJCV1CiNjrYM8rt6IB2Fr5nDmNQpw/SvpcKgucpySEpD8IDL8 awUfReiu8xvXSvF0moiuzJBOQT7sNT4T94XpL9h6LpND5g6R4P7Npjtz9rHDMzznDo680Tq3VUx lFo3UrwBd353cwOB5QbGFaIjL3bz1CG8qjQORNzpJt17NQPgCMhj/Nb+Yhet9Z52pemDIHgpQsX slTy2zxqqsDnJREgZg9iTgM7EjplGXw7NWu/YppWTD2bnsNrkVPTwwS7Ejy9r+IuaKTTF3Nrepg I056/8vkDhZdkFnu20dWBKj9kT+VwdlkCFKT5U9xXzo3+mzVUxsIY1jYP+JQ3xJPcWgtyni5tJV vgm8wjat3wp4WnN3fPaQ== X-Google-Smtp-Source: AGHT+IETBr2EMqfWtWA9EYehKHdbibHEpLg3RGNfr+axRsXa1uSODxDMTzJWsRpHmKtU4i1lG6pDQQ== X-Received: by 2002:a05:620a:3728:b0:85b:cd94:71fe with SMTP id af79cd13be357-8b33d1d1146mr243199085a.33.1763736535839; Fri, 21 Nov 2025 06:48:55 -0800 (PST) Received: from ?IPV6:2605:a601:a6b0:500::1cb? ([2605:a601:a6b0:500::1cb]) by smtp.googlemail.com with ESMTPSA id af79cd13be357-8b329431939sm368737085a.18.2025.11.21.06.48.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Nov 2025 06:48:55 -0800 (PST) Message-ID: Date: Fri, 21 Nov 2025 09:48:53 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nazir Bilal Yavuz , Nathan Bossart Cc: Shinya Kato , Manni Wood , KAZAR Ayoub , PostgreSQL-development References: <5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net> <8e226753-57af-489a-bfbe-caa23dd71286@dunslane.net> From: Andrew Dunstan Content-Language: en-US Autocrypt: addr=andrew@dunslane.net; keydata= xsBNBE7KWFkBCAClridxur2AIc7eW2AR7izbfp3EnNefie2HbLF0izW5Ik5UjX2HBXBx4syI gY6b0ugohXrr274+baoAlvSbq6cAoQuEVrk5IZFzt20b1Xkx65FwGSEj526yiKLocqkJceSq Xr9xcA5SGY+FZv441chh5SU92v4q6z+6LPpoHOh97ptAVXZYNTtU0LevyvD5lja0TzbvJm6C eFXitJfnm1pLEr0DGJCR/iUOl/N62Kh4855zZC7NHIjQHPOvV5Stz/l5ilDhvGVk+xkXFPys SjZoUr1rXhYLpiyi5sR0X9FHXT0KnGuz1F5ERO7ZTLSSQ6fJwPj6gOk9K+vvoKvoeql5ABEB AAHNJEFuZHJldyBEdW5zdGFuIDxhbmRyZXdAZHVuc2xhbmUubmV0PsLAlwQTAQgAQQIbAwIX gAIZAQULCQgHAwUVCgkICwUWAgMBAAIeBRYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNGd BQkdEO8nAAoJEJn6f8tZ/DuBq74H/jkTR4Zi3stbw+xC7v2u3QozssK7MYPL2AsVfh7OealS h182fiWXpfvmmAB7WUHbhk9GC2RAOnHI/2d2jgKaMLAHsGYOT0YopTVIwRY43fCw/mK67yxc wmDcX+zyKfLaivNbf5A7QPLNwda98bEAMSJ8Sn652Uc6cA8t3uKGsVzbRBQOoYzjgvBCfSrE 9ql3PDNg0l4BfAqabd2f70ZUm9VAMEPrgv/v2xI7M2XiL4g5BVmqLCOwxLM8RMCotCuoweUr VO43DeBCIDwLxotMJKvGWDjBzQYlU1NPUAtNcz/gN9ITUe1VUGjyvGj4u1lxBOcQQUw7l1+T 5moZ4iZxXzvOwE0ETspYWQEIANGc4zQULOxhbqO2dyD51YhqCNRmm9oKWaqf+wmW4tpDe/VV cxAnNizd4LWCHfzpb5cHAtGkOPePMfzWVf6nvdF7d3eglbtf59+zG7O7llV0xSSoFiieQBsr GvqDInXYX/4mRRXMtyhM353/tixC9RWLs1oofyYmCPPXXY7h9R7en3B8BoVrRFcdzlIY/NFN hFGW/9dkEiGjgna2Rk6e15kln4ZvFBWUg23p93w/pqXcxY6+k/8TEk+C4R+M6w7o2PLGOjdZ +kPiUcw5H85zf/yZJwQXzisXaNduwWB6Vads9YC9dj6kPR1c4VGRqAaYL++LAEOqrlvm2Tvq QqZRtnEAEQEAAcLAfAQYAQgAJgIbDBYhBOQ+WEYd/Hy/RGkVpZn6f8tZ/DuBBQJoGNI2BQkd EODdAAoJEJn6f8tZ/DuBfw0IAKTsfD40teP/pp+bsLLMSxPXUYrrprTj7WFB5v61p6dkpSr/ qXmMlyahdxQFaPmfVgVirB1Vk/kHiWNnnGjfUV9nB2Zg9LI0Xb9/ts3LsUiRWXzG3tkMY6XL vsVOxW4XFRND9l2q+WW93aZ1DZl+fqWfYgMvsusFRhmGFOKTRfKPta2Pkv+AhA24N4+PrR5p bU4k2MO8PAGiK8eaYKGFG1bHKuAvoDoF7WXJ3FHxuWqLnKEt4dfOLm5pAe3zq1Lt6q8azT9i QWGpSAK5vQUWQHBHpiDjdPeqKZ6HiAXIIKfSmb+jrvXBqoP+D6/K7rUjG2aXiRtTIAXms9sm VRu7cmw= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2025-11-20 Th 7:55 AM, Nazir Bilal Yavuz wrote: > Hi, > > Thank you for looking into this! > > On Thu, 20 Nov 2025 at 00:01, Nathan Bossart wrote: > >> IMHO we should be looking for ways to simplify this should-we-use-SIMD >> code. For example, perhaps we could just disable the SIMD path for 10K or >> 100K lines any time a special character is found. I'm dubious that a lot >> of complexity is warranted. > I think this is a bit too harsh since SIMD is still worth it if SIMD > can advance more than ~5 character average. I am trying to use SIMD as > much as possible when it is worth it but what you said can remove the > regression completely, perhaps that is the correct way. > Perhaps a very small regression (say under 1%) in the worst case would be OK. But the closer you can get that to zero the more acceptable this will be. Very large loads of sparse data, which will often have lots of special characters AIUI, are very common, so we should not dismiss the worst case as an outlier. I still like the idea of testing, say, a thousand lines every million, or something like that. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com