Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vM4Cf-000B4j-2P for pgsql-hackers@arkaria.postgresql.org; Thu, 20 Nov 2025 12:56:01 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vM4Ce-001dJR-0v for pgsql-hackers@arkaria.postgresql.org; Thu, 20 Nov 2025 12:56:00 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vM4Cd-001dJI-38 for pgsql-hackers@lists.postgresql.org; Thu, 20 Nov 2025 12:56:00 +0000 Received: from mail-pg1-x529.google.com ([2607:f8b0:4864:20::529]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vM4Cc-000YGf-03 for pgsql-hackers@postgresql.org; Thu, 20 Nov 2025 12:55:59 +0000 Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-b55517e74e3so593368a12.2 for ; Thu, 20 Nov 2025 04:55:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763643355; x=1764248155; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Yvc84zG1O6h6T9wqO6ro4PPH5CYqwrnflHCqc5eBpk4=; b=EaWEDv2e5Ykd9CEzPBP+nGaVlbxPJhvEvPsJy18o/CZuDOYOVn776LKKJvZiMCtopM bjFCIUpqD6hiQWen5Uuwm+NDLxanlnkruI8po2DsUf6OufMioltruHGTabOsww+LzqwL JLv9+/rHO0K3Ll3M/9Iv1D2zl5dD2oNXG0GknwLV8fpBTftbTDxcbWQv0EPpUSxqXJbW 7UrmX1kZJYGC+bM97nlUU6fZSHOgJkHqmKsTgy6R34ZI+FeHNXwTpcbaRGFQzileucvM vCVZV9QwobG3v1l3fhSzj7DzLFqwQb3cusQaUdSu0T51heyk/cBIWTRoa+iJDm0iFDFT 35sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763643355; x=1764248155; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Yvc84zG1O6h6T9wqO6ro4PPH5CYqwrnflHCqc5eBpk4=; b=B64hDA0q3ueiS34XHmdD/0CX6gzBy5uULikjHUslk/NVyFmO8W/skxbM0EuSZ6TaOu EDQQjTBwC1XwfS0apdAQyL/4i8NiMdFSPC2PwZ6JvP+evylsSKugev1V1drCslSvmEXE sajZeYWcH5OC8Spfep4X4s4x76BVKyw9m2c7+ETdnzQgEvMCeqzyWF2R/gN9Kk1lULTs UnKbkbH+kpwRS1iXys2J4HrMCjARzdQpczr+PJpWb4Dmp3Bpm6fy1iZsui4NJUiZIpZl XeMinXh7OxQzu9Iyn6P3CLjfcgAbav2AIwIgZnmGN+xBOHhaUygfJQ6BREpOzOQsTOlE qCUQ== X-Forwarded-Encrypted: i=1; AJvYcCVnQuWFDptOrkUzTxGlLYWX/k1Rgnyb73QKEtXyOURbBD/8NO9ljc2hx8rwsSfdMsN+Fw9tqBV8kZXq70D1@postgresql.org X-Gm-Message-State: AOJu0YzTwGlXShtJhW5E4YKH43d23DcK/54/7qNpLLcjBpviAOBRr1d1 jsJNIILXUqLg22U835XMorw0YvilyAXgCE2G7PJ5/YO8zyvO4wABJAGxap9gTn9yMhfXmxpza9f eGuv6mqxeUwdKfg3OvFxEmysuau8zAjE= X-Gm-Gg: ASbGncs9qWcsFQoEg13vMG6Ea7a6r7N9tq8C2m9vC3IT2lGOBtnaHubg4DmflyWDPgo iiOwN9bCgIBFkp6uczYCUA6xxPrQO3nw9IKtddEsbxBVstTFzXC2xJ5OPfsNGZTTcl6G3ahaIoN I2p3iREgy4RLUr5zPCklZQinzAqiaI62V6m3EmaduC4jazNHOOHVqruBOleh9K8wzoOLFxAaSuS QHzF3IZ+RCTVDUwYEtKl1cSqdKd9kvzWVagjCNlleGV83lhkwfVVPQVsEDp166wUF7oBhrEC7yP LrnK X-Google-Smtp-Source: AGHT+IFLpPb5IWBHteAp/0Zmx9jOOUBxo0VGLkXXTh4yG4cop1oMnJYkWKI8jKGj1ej7du1M+Sjz970KMaU+fCGA1dk= X-Received: by 2002:a05:693c:6006:b0:2a4:3593:c7d6 with SMTP id 5a478bee46e88-2a6fd106662mr1172435eec.22.1763643355082; Thu, 20 Nov 2025 04:55:55 -0800 (PST) MIME-Version: 1.0 References: <5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net> <8e226753-57af-489a-bfbe-caa23dd71286@dunslane.net> In-Reply-To: From: Nazir Bilal Yavuz Date: Thu, 20 Nov 2025 15:55:43 +0300 X-Gm-Features: AWmQ_bmJyG3JvnHBtCr9BWi1n9ZZlhHbCyGzqZ8zNHa43Z_0m7drkoS2A9zW_wU Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nathan Bossart Cc: Andrew Dunstan , Shinya Kato , Manni Wood , KAZAR Ayoub , PostgreSQL-development Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, Thank you for looking into this! On Thu, 20 Nov 2025 at 00:01, Nathan Bossart wrote: > > On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote: > > Thanks, done. > > I took a look at the v3 patches. Here are my high-level thoughts: > > + /* > + * Parse data and transfer into line_buf. To get benefit from inlining, > + * call CopyReadLineText() with the constant boolean variables. > + */ > + if (cstate->simd_continue) > + result = CopyReadLineText(cstate, is_csv, true); > + else > + result = CopyReadLineText(cstate, is_csv, false); > > I'm curious whether this actually generates different code, and if it does, > if it's actually faster. We're already branching on cstate->simd_continue > here. I had the same doubts before but my benchmark shows nice speedup. I used a test which is full of delimiters. The current code gives 2700 ms but when I changed these lines with the 'result = CopyReadLineText(cstate, is_csv, cstate->simd_continue);', the result was 2920 ms. I compiled code with both -O3 and -O2 and the results were similar. > > + /* Load a chunk of data into a vector register */ > + vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]); > > In other places, processing 2 or 4 vectors of data at a time has proven > faster. Have you tried that here? Sorry, I could not find the related code piece. I only saw the vector8_load() inside of hex_decode_safe() function and its comment says: /* * We must process 2 vectors at a time since the output will be half the * length of the input. */ But this does not mention any speedup from using 2 vectors at a time. Could you please show the related code? > > + /* \n and \r are not special inside quotes */ > + if (!in_quote) > + match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr)); > + > + if (is_csv) > + { > + match = vector8_or(match, vector8_eq(chunk, quote)); > + if (escapec != '\0') > + match = vector8_or(match, vector8_eq(chunk, escape)); > + } > + else > + match = vector8_or(match, vector8_eq(chunk, bs)); > > The amount of branching here catches my eye. Some branching might be > unavoidable, but in general we want to keep these SIMD paths as branch-free > as possible. You are right, I will check these branches and will try to remove as many branches as possible. > > + /* > + * Found a special character. Advance up to that point and let > + * the scalar code handle it. > + */ > + int advance = pg_rightmost_one_pos32(mask); > + > + input_buf_ptr += advance; > + simd_total_advance += advance; > > Do we actually need to advance here? Or could we just fall through to the > scalar path? My suspicion is that this extra code doesn't gain us much. My testing shows that if we advance more than ~5 characters then SIMD is worth it, but if we advance less than ~5; then code causes a regression. I used this information while writing a heuristic. > > + if (simd_last_sleep_cycle == 0) > + simd_last_sleep_cycle = 1; > + else if (simd_last_sleep_cycle >= SIMD_SLEEP_MAX / 2) > + simd_last_sleep_cycle = SIMD_SLEEP_MAX; > + else > + simd_last_sleep_cycle <<= 1; > + cstate->simd_current_sleep_cycle = simd_last_sleep_cycle; > + cstate->simd_last_sleep_cycle = simd_last_sleep_cycle; > > IMHO we should be looking for ways to simplify this should-we-use-SIMD > code. For example, perhaps we could just disable the SIMD path for 10K or > 100K lines any time a special character is found. I'm dubious that a lot > of complexity is warranted. I think this is a bit too harsh since SIMD is still worth it if SIMD can advance more than ~5 character average. I am trying to use SIMD as much as possible when it is worth it but what you said can remove the regression completely, perhaps that is the correct way. -- Regards, Nazir Bilal Yavuz Microsoft