Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vBeSK-002G2K-Gp for pgsql-hackers@arkaria.postgresql.org; Wed, 22 Oct 2025 19:25:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vBeSJ-000tFl-G7 for pgsql-hackers@arkaria.postgresql.org; Wed, 22 Oct 2025 19:25:06 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vBeSJ-000tFb-62 for pgsql-hackers@lists.postgresql.org; Wed, 22 Oct 2025 19:25:06 +0000 Received: from mail-il1-x129.google.com ([2607:f8b0:4864:20::129]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vBeSF-003fKe-2S for pgsql-hackers@postgresql.org; Wed, 22 Oct 2025 19:25:05 +0000 Received: by mail-il1-x129.google.com with SMTP id e9e14a558f8ab-42e2c336adcso122365ab.1 for ; Wed, 22 Oct 2025 12:25:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761161101; x=1761765901; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rbVSO5EtY+8z7KMRmbX0EnfrQWWBWx2IP5KTpwLpyGI=; b=dBjPI1epG3H8EpV8taqqh6+gK5hm0HanOiZ0FncFKvWdow5TSWDMsOQXaXbwjncjdA oCH48l33/yboE6spelmjBgeg5KNjzz9zAauHetUzFTGF/dpnAqErIrUafNQGtebUyjFF YR8iaiB8UQ0f71LPwlNLjytshNici5BLksLhThbejbQWSPxASGK9kHSG7zdmUiTjjiOH Ux83ZCY7/mVlyxV26BjSeGpIqe+u14ob5WGrVXCKjcaLiIYuEGAeFl9tEz0LKi886pzN 8OIBRMnmqkJF34t3TZVrYGDxZvS9lG0QLHtOalWqHK3Sl5KP10dpzlDhJCTpHwLoLLNf RL6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761161101; x=1761765901; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rbVSO5EtY+8z7KMRmbX0EnfrQWWBWx2IP5KTpwLpyGI=; b=G1pvbtJYcd7YETveceauKRzDjqBY1Y+bXKQ91GIp4KlVE7VCNCbohdCLJdcFe0adhA +hUaDeqH9s9FXcxM6XU628MkmCbaeqJH7nXHX7vkb1SjOtMncvd62fBJE/e9+bpfM6J5 QpXA0ashkZeAC5H1EBDa2Wec80bPhKuaKPtXWDcqur1ZlgNPc3VkSLTiDt6uMpQdNavF m4jcSVnwMTrGFkDQ1ByvaIgatA4GfjvMD7TZpbuiJ4qu0qgyZlj3lAmoYAHPr/BS93qP /ETntFbHjD1Adcru36VHjqhAqFfmmnhF26H7ttiEDKlQGaTPRed5/XI0L9geB4zafWp+ 0ZxQ== X-Forwarded-Encrypted: i=1; AJvYcCVkXyCk/NFOCIexJJbgrG9K89gsqr+gysNJz7H/xF936YTMzZJVa6hDuyUQthNHWrQFGPNyt7N/FG5AahwV@postgresql.org X-Gm-Message-State: AOJu0YwB0CY3xjdFIAaQwtchDHeA+gjxoqSiUqlY2HlMVB/RA3l8h+5g udPufrLfqmL5vmDWjks4iEZjg5P+4+b+Djus9cdLxvtwMaXFvuq+6Ylb X-Gm-Gg: ASbGncs3LexEt3za9wkSkJMw22FBUJeNm0HvOafPKIDLr+UHUMsbE8vz5S+T2ClCHi7 gPFiE8neD22FEPHliwBslX1SjB/4auQJZskxPVKsTkxR/I9d+Nkgkcpu982nb99G3o7kBUdJc7r POUDJzDLMSOtrIpywl/DO9iQc0ljRITWWqnTi1VDJPJJVD4OMxiGW3dby4yw1mP0NF8Lxiib1xt kFdVM9cNgeC5TFf/YhZQlZ1cd1FjKZS7qnT9lQ0KeFJXCaU7wKg6C5i14eaZsgNhs4Koh0VCPtT 1kfXgI3aac41zdNRaVxyPmRH3P/cxbDOJJY+cPkwEXN9G1SDcG7EfK0LLE3JvQeXi6UC8au7Jm8 SkQ3qMB3fn2gnBU4g8FFhBXQi/H64G+2LCG4PeOpStb6L5L36V25QyzXFiPBz/sKnrj6pf4/zzK 8BPQ7XPjsf9pXmgdIaVOCPPazvD30mVlVJYmtxTGhsFWI0y63ndT8NK2dnd63mo3Wan1nF4Wmb6 tc2KojQL9YHqUM= X-Google-Smtp-Source: AGHT+IFukHlniS6KFOShB6Tc1pRmRBEW6UX/EmIlsg9MzKldJC5nujKMCG4p1QgYOZuNV0+OwFAutw== X-Received: by 2002:a05:6e02:3c04:b0:430:b27a:7702 with SMTP id e9e14a558f8ab-430c5204155mr302181735ab.3.1761161101347; Wed, 22 Oct 2025 12:25:01 -0700 (PDT) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-5a8a97407d6sm5483109173.37.2025.10.22.12.25.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Oct 2025 12:25:00 -0700 (PDT) Date: Wed, 22 Oct 2025 14:24:59 -0500 From: Nathan Bossart To: Nazir Bilal Yavuz Cc: Andrew Dunstan , KAZAR Ayoub , Shinya Kato , pgsql-hackers@postgresql.org Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: <8615c983-1662-43b4-b0c9-49d194ac33aa@dunslane.net> <673d92f7-2489-475f-a208-9414ea35d4d8@dunslane.net> <8e045899-2023-48b1-bd91-f8cdffeb511d@dunslane.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote: > On Tue, 21 Oct 2025 at 21:40, Nathan Bossart wrote: >> I wonder if we could mitigate the regression further by spacing out the >> checks a bit more. It could be worth comparing a variety of values to >> identify what works best with the test data. > > Do you mean that instead of doubling the SIMD sleep, we should > multiply it by 3 (or another factor)? Or are you referring to > increasing the maximum sleep from 1024? Or possibly both? I'm not sure of the precise details, but the main thrust of my suggestion is to assume that whatever sampling you do to determine whether to use SIMD is good for a larger chunk of data. That is, if you are sampling 1K lines and then using the result to choose whether to use SIMD for the next 100K lines, we could instead bump the latter number to 1M lines (or something). That way we minimize the regression for relatively uniform data sets while retaining some ability to adapt in case things change halfway through a large table. -- nathan