Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umVEV-00Gk63-2v for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 10:30:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umVDT-004lpZ-TG for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 10:29:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umVDT-004lpR-Jx for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 10:29:51 +0000 Received: from mail-pg1-x535.google.com ([2607:f8b0:4864:20::535]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1umVDR-000hsM-0W for pgsql-hackers@postgresql.org; Thu, 14 Aug 2025 10:29:51 +0000 Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-b47174b3429so437648a12.2 for ; Thu, 14 Aug 2025 03:29:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755167387; x=1755772187; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DvcO1m42XHfqMcjqiQ3wxoqBBOhKNbyOftSrWj9gNR0=; b=XTRNAIbqpKY/ktoYKsPYsxpwpFzpW/jw4HbMu11DxPJUSmqYBsQZcFxGA7RlF2PGYN OzySZ0ZmSX5BeJ/1dn9Pgyajz4DDeu+DoMS9zi3WFiH1tuSO3ktwpAJj1y/mnJ+575GF aP/2eqdIyU9lfyi02lpEa7an+W6jtLQBGYj71PogXzqJICIiaru+MUw2gfzKZCwCQmYH 0LGu/eBawlkTDxoSWbS6F+t5yhMCB7FisvtFiX8uC/FazHVJqwC/vDjs0FJe9xwQtdLc ZzKjIpJMOMxf04Wa0q+KX9HHsRyndYAbK0KpwEakZ78HquQq4NdELP4m8OqoG90W/37r 9wzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755167387; x=1755772187; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DvcO1m42XHfqMcjqiQ3wxoqBBOhKNbyOftSrWj9gNR0=; b=CnHYuyNQBj1quIMCI/ULv18qhmZs97hSI/rKNTbGVTP29ZrUEAMiAc9mP6RnDdXnMv M03inCltFIgcJ8/dJeXUk00USf0Xsyu2iCh3O3UI2ICq7ZRcJ6mfWHkOag1cf9dAC6MM NoQ92KVruJcAYD+HqYi0YvZIH2TkIwaMJknox+ZzZClr6ew/d7PMXHFxJZU074jerXvq 6dU3lmhZUeuJNCuCdvKiXmg1wXn3vpgV7vO+hbn+BdOs4o3xrnKrRjfvqVn58sUtDJgq SwGU3YFiG8lMBgnZuTqU6D+HvpYQlbuHn9MYlCIqSzZrYO+kYmZYmVH2EKf02jJWN9eI MW4w== X-Forwarded-Encrypted: i=1; AJvYcCV4joJzFxz8GdV5mlgJJClwhA7ceUuYXFDN2wKzfffdmXprEcjNrhq8qjfNsDWN5kbCF5vMejax/fz79yIZ@postgresql.org X-Gm-Message-State: AOJu0YyTNUl6hHolW84UKteeZcAkNleVkjnVMNzMyiFMDWaFqZGXxY17 TgnEizkxAllslF0ua2dOQicLkd+7NCauBvDYBhezZLWcT4Jsopgo0u7my+uDHh5qhYuBFbZQfAW Mx9tdgaAFbE4iHhLFGHA9QeOBh4DonZ0= X-Gm-Gg: ASbGncsSl+5U18WWy1t4pl6hEQzKaNhhsXN6ZnC2ZhEYGdZ9oiUTO+3fY8uXSyCZyq9 q0TtWWV1q1F1NYGF0uprmTHBLKUfU/JlkCM9iaFjtVvbxQH/1mp2UONSwEDtSBSKCgEU3d3++cn 45sfpyiJkVk3ET3n09MIG8AgyOcW+orLd9+dIsOuDNxxwWY7vuL1aInl/VaSV4R0LCTxoeQJEMS EbjMKY= X-Google-Smtp-Source: AGHT+IHKb5vz52fAAD7Z/WvbPjbOU3dwQ1uczlSH4dSisTOvjcKXtPDdNMf/vteS1eOE6WQvXSEW7mt3LPnL6eWIZrQ= X-Received: by 2002:a17:902:da48:b0:242:a0b0:3c11 with SMTP id d9443c01a7336-2445868effemr41856895ad.31.1755167386711; Thu, 14 Aug 2025 03:29:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Nazir Bilal Yavuz Date: Thu, 14 Aug 2025 13:29:35 +0300 X-Gm-Features: Ac12FXzlt6H3B-6Qk4IdCxgwRB8ZBiTlnzBMOV-uGQkBBRoH1A1ylonJ6Vymupw Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: KAZAR Ayoub Cc: Shinya Kato , pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On Thu, 14 Aug 2025 at 05:25, KAZAR Ayoub wrote: > > Following Nazir's findings about 4096 bytes being the performant line length, I did more benchmarks from my side on both TEXT and CSV formats with two different cases of normal data (no special characters) and data with many special characters. > > Results are con good as expected and similar to previous benchmarks > ~30.9% faster copy in TEXT format > ~32.4% faster copy in CSV format > 20%-30% reduces cycles per instructions > > In the case of doing a lot of special characters in the lines (e.g., tables with large numbers of columns maybe), we obviously expect regressions here because of the overhead of many fallbacks to scalar processing. > Results for a 1/3 of line length of special characters: > ~43.9% slower copy in TEXT format > ~16.7% slower copy in CSV format > So for even less occurrences of special characters or wider distance between there might still be some regressions in this case, a non-significant case maybe, but can be treated in other patches if we consider to not use SIMD path sometimes. > > I hope this helps more and confirms the patch. Thanks for running that benchmark! Would you mind sharing a reproducer for the regression you observed? -- Regards, Nazir Bilal Yavuz Microsoft