Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vLSXR-000I4b-28 for pgsql-hackers@arkaria.postgresql.org; Tue, 18 Nov 2025 20:42:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vLSXO-008p76-2y for pgsql-hackers@arkaria.postgresql.org; Tue, 18 Nov 2025 20:42:55 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vLSXO-008p4U-1R for pgsql-hackers@lists.postgresql.org; Tue, 18 Nov 2025 20:42:54 +0000 Received: from mail-ed1-x535.google.com ([2a00:1450:4864:20::535]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vLSXL-000Duc-39 for pgsql-hackers@postgresql.org; Tue, 18 Nov 2025 20:42:53 +0000 Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-640b06fa959so10603588a12.3 for ; Tue, 18 Nov 2025 12:42:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=esi.dz; s=google; t=1763498571; x=1764103371; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=88LWXPFf1ZdRBMgSYg3qLJqx+5pXTn1CGsW6t8CE9ls=; b=aW8XBf4SEreXGwqqF9JaOINkRqadVOGWflK7BLK9YX4t4MTagai7IAttVEYEAP6Z1a zknxaXmxEtkd7V4a9+/AZgDkUvj2K8YcbcSQuR+CRHgzMfFXx5mQL+dEHbvLwrCQ426w hTF+WI+wbbTXEAB3PgZ6EqA3owN2KCLtbC9SnK0lPBRMA+aJwxkOqlM3TGf3WHa2bwSB YBWN4naRriNmFMYXRWKqIgIvEwaQ7JXcckpU96Jwsdb0rzsTjVwBXSOF7wxNBMFdgplp 7GvQZe8Ltw7FHJuUY/jI4GnVwxOh5DjKbtPQ0vwgkSur81zGPYy+6aTd4TkZQlmbqB/y h9uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763498571; x=1764103371; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=88LWXPFf1ZdRBMgSYg3qLJqx+5pXTn1CGsW6t8CE9ls=; b=L1nmrqbzm7tsFZbaAgQ7xzLceoNEP3Vx+xi3BrTuxqwk81X7KsC2VmcjLhfJHIMZZq diCQOx+imW1y8F7WuSbqt8NkK7GLMGxrXi9LFuxBde23gfL2A6Bdl6hPz8oIV45fCMfd l9FjfRazggHnniJyYI41Ihx0iMfY2Mg/GOKyarvarLdQi9tGFzXYAadEENcl6UQ98SrE JzsSUA7Quk+sdKIPF95OY0seumAADLXjgyKjDph6p+swWciEzMQ6xDv8SVBv2duTkECW EJypoEk5WMuG5eiq9poTGEd6tOTufSdiIjIrT8GxTfGX8UFPbKR0dYAITZ2DsCFlPHf5 2ZWA== X-Forwarded-Encrypted: i=1; AJvYcCUJd9FrHEipbWEz+majaKuZHKMuT0c33Zol3FC2DOpfjPUMG7paJUrj556RBxcYl1MIPaz4RiYS2y8Swg2o@postgresql.org X-Gm-Message-State: AOJu0Ywb2ZMI+yrKHfZoGWWQXgqKF8M0kdYApDt02rntX3UcXrNU0AvU GlDZdHG4yR5sRekbJd6JmZOCEIM8WYHE/x5cwyonZg1TmrAA8njJCgPFM4y7JgJoYKmOHJq6ESd IY5JloBalPlDalYzBQP40gakC1xa7VFwaAk4wJQ99 X-Gm-Gg: ASbGncsof4zQR9WPNn7fayodJzqAqXtw6burcYn4zhjQN4pf/hvs1kLw2YFipvepvRY X0REy3Tm8On/DYVEGayvZu1OHnE9B1FDLchwaKQjCCB8hJcDpAf+kAVUcMgTej9Rd75SFGiesGQ g9YY1GG4/aZNoSNnT/ug/aXqfbfUy7jdkSsmMwyNtWZ76e9cSGaxh/RDAZfUxoBT5G4lA7Q6vw1 xfh/dx8FgJ3wAqP0Mat6z+g+35KpzTFGNI9cbm5qieTcbRyLX/mp49xO0GTh22VCve/qBZR5kGP vKlRyUzlTmSOryZwqvuhrO/IEaXzBZo3qBBQtSBD X-Google-Smtp-Source: AGHT+IHio8Yask016pgIXGh4Sq89PBwVFEpiLqHH10vSRiBsV4OUNAPhazlV7n33ZA/PpYWEMzvb5HUyLI+RmZ3yRBw= X-Received: by 2002:a05:6402:4608:b0:643:4e9c:d16d with SMTP id 4fb4d7f45d1cf-6451e39d0c2mr110437a12.21.1763498570899; Tue, 18 Nov 2025 12:42:50 -0800 (PST) MIME-Version: 1.0 References: <8e045899-2023-48b1-bd91-f8cdffeb511d@dunslane.net> <5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net> In-Reply-To: From: KAZAR Ayoub Date: Tue, 18 Nov 2025 21:42:39 +0100 X-Gm-Features: AWmQ_blfAJ6jlt2jq9N8NS9yLLvaBpp8oPUA55DKtRokOcQJUiNdhIwqpbgGqXI Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nathan Bossart Cc: Manni Wood , Andrew Dunstan , Nazir Bilal Yavuz , Shinya Kato , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000e173190643e4816f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e173190643e4816f Content-Type: text/plain; charset="UTF-8" On Mon, Nov 17, 2025, 11:16 PM Nathan Bossart wrote: > (assuming there is a desire to > continue with it)? I'm hoping to start spending more time on it soon. > Somethings worth noting for future reference (so someone else wouldn't waste time thinking about it), previously I tried extra several micro optimizations inside and around CopyReadLineText: SIMD alignment*:* Forcing 16-byte aligned buffers so we could use aligned memory instructions (_mm_load_si128 vs _mm_loadu_si128) provided no measurable benefit on modern CPUs (there's definitely a thread somewhere talking about it that i didn't encounter yet). This likely explains why simd.h exclusively uses unaligned load intrinsics the performance difference has become negligible since Nehalem processors. Memory prefetching: Explicit prefetch instructions for the COPY buffer pipeline (copy_raw_buf, input buffers, etc.) either showed no improvement or slight regression. Multiple chunks are already within a cache line, other buffers are too far to prefetch and the next part of the buffer is easily prefetched, nothing special, so it turns out to be not worth having more uops. Instruction-level parallelism: Spreading too many independent vector operations to increase ILP eventually degrades performance, likely due to backend saturation observed through perf (execution port and execution units contention most likely ?) ..... This simply suggests that further optimization work should focus on the pipeline as a whole for large benefits (parallel copy[0], maybe ?). [0] https://www.postgresql.org/message-id/CAA4eK1+kpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV=g@mail.gmail.com -- Regards, Ayoub Kazar --000000000000e173190643e4816f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Mon, Nov 17, 2025, 11:16 PM Nathan Bos= sart <nathandbossart@gmail.com> wrote:
(assuming there is a desire to
continue with it)?=C2=A0
I'm hoping to start s= pending more time on it soon.

Somethings worth noting for future reference (so someone else wouldn't waste time= thinking about it), previously I tried extra several micro=20 optimizations inside and around CopyReadLineText:

SIMD alignment: Forcing 16-byte aligned buffers so we could use aligned memory instruction= s=20 (_mm_load_si128 vs _mm_loadu_si128) provided no measurable benefit on=20 modern CPUs (there's definitely a thread somewhere talking about it tha= t i didn't=C2=A0encounter yet). This likely explains why simd.h exclusi= vely uses unaligned=20 load intrinsics the performance difference has become negligible since=20 Nehalem processors.

Memory prefetching: Explicit prefetch instructions for the COPY buffer pipeline=20 (copy_raw_buf, input buffers, etc.) either showed no improvement or=20 slight regression.=C2=A0Multiple chunk= s are already within=C2=A0a cache line, other buffers are too far to prefet= ch and the next part of the buffer is easily prefetched, nothing special, s= o it turns out to be not worth having more=C2=A0uops.

Instruction-level paralleli= sm: Spreading too many independent vector operations to increase ILP eventuall= y degrades performance, likely due to=20 backend saturation observed through perf (execution port and execution unit= s contention most likely ?)
.....

This simply suggests that further optimization work should foc= us on the pipeline as a whole for large benefits (parallel copy[0], maybe ?= ).

[0] https://www.postg= resql.org/message-id/CAA4eK1+kpddvvLxWm4BuG_AhVvYz8mKAEa7osxp_X0d4ZEiV=3Dg@= mail.gmail.com

--
Regards,
Ayoub Kazar

--000000000000e173190643e4816f--