Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vzfIK-001Cpb-1r for pgsql-hackers@arkaria.postgresql.org; Mon, 09 Mar 2026 18:25:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vzfII-000PNC-0Y for pgsql-hackers@arkaria.postgresql.org; Mon, 09 Mar 2026 18:25:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vzfIH-000PN4-2s for pgsql-hackers@lists.postgresql.org; Mon, 09 Mar 2026 18:25:30 +0000 Received: from mail-ot1-x32b.google.com ([2607:f8b0:4864:20::32b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vzfIG-00000001K9y-18LX for pgsql-hackers@postgresql.org; Mon, 09 Mar 2026 18:25:29 +0000 Received: by mail-ot1-x32b.google.com with SMTP id 46e09a7af769-7d756f2a06dso477056a34.1 for ; Mon, 09 Mar 2026 11:25:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773080728; x=1773685528; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=SsuGQB7cnyYgR2VD61geaVuTZVWH/w6os8so0whBjxk=; b=Njt1YZm5MsajZlIt6FgsHldxnzErpXRFgMXE8qEn5kIBp3vy6Xfqw0r8MGFLmG9kqs eNaqviy1EPy3xwh7dVQdLwK51VtHzArKq0AK2SMsv7ME4XfqQcGLy7uHUvDu+VAs44mk q+x/ALYCgAY++EyAxvR41nqvl6LH9cV20TLebQzjqrTNntGY09riPq7awWujgC3TDSLY xaBys12pYGK7ZQGEgVCNbuHfLHUbnZ4NgG6SEPGG8de44R3xBB6KFykgw5lX1tRaYZ1l HyLPu9BPaUSXYtMFIpS/SLBz2GGBqlJVX0xntHGROLXq4W3x8riwae/fLWlud3v3wVYv Q9ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773080728; x=1773685528; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SsuGQB7cnyYgR2VD61geaVuTZVWH/w6os8so0whBjxk=; b=p9P1FWk2LusNDac++9VLgU4xM5MldJzvN8albVDiOa3UQ0s6mMi97RqoXOnP8gwquG SQ/XwE10zy6fgX4Ne5iiT2MYFloydIs/1MU0Qq5izKws1ujQgNZ+owMpEKkmmKwg4wo+ 3VjZCGhB/xi+h7Ye7SKO7VZWA59g/qPq2rwAXzG8mDU5zH26899zUhtuaQSLBJSuUdFx pis4Lcl/0ks4nZSMI266a+sK6QXLlnHRc1ZLVxCkeMeMe4LnUKylS+fW956vgqLVVVjS bvalnzPNDHWfxqzRxGUWlM19RICBD/I6IVQP0Yty/SoS0+soPh7QMcoBPAwhoTTSHohW 38bw== X-Forwarded-Encrypted: i=1; AJvYcCXkcDCrI8ujqJpEUq4ApDyq58jAQp9EOTNFNFW8g18DqiUKzelfNbtlUJVxKfKGXlkq60+hJIr6xVGP94MG@postgresql.org X-Gm-Message-State: AOJu0YyAHnyNqLwFdqnWPQBewlES+2c9R5eYgUrRtpJ11UGHqNUJmIys ZlOiYHzNcedM+n5NpAq+/ds9jTw0uC14iEOcbgVE3TNEj3G3B20VUQCZ X-Gm-Gg: ATEYQzwJgVLZwKu6S3Ge3lNmKCioBTf/Umu0sdXqa1EdhwuJn89F9r3wCz5lRKa9CyL +dFATSI2L8TaEvvDxgsrI9a1ZvH8DKt+y7QSrZVm2ovVSdq8RDqoYvw9y9Um/0OP/wE+9fkFUoP Kdn+j3FOw/+AkpepAZlUKujTRl5POoHiXVT3FxqQ38fm/ko4mNHjLN+zBHVDykMRBZyClmUAdWD WF35/aanKHdsx5dsovBlt4km4cfU/CuOttnhqKsfKwgHjsCHYw/dGyVBwClpAk7wQSCmiNl7X+x a8WWAIeBU2Ox6eK5oK/bh3gvXlg78GUxqzXwTiA2JXSKwQeHBum0sqIZ4z5zSrD7GWlJN6ETgEj U9yw+43DRK249NoQCM/nhifhdN4ryrO/wTBXzzYMhtoekhYA4rYsXQPaEyqno5F+bmwAM4kZ8ev iZS5NDQaSDEfMk9Cq89xqEz89C5smunbzCJL2TnS713KccnXUEULfmErvUHJFA1Z/TPhG7ijUd8 RbCN4Kpb1XvrzU5csh+Bw== X-Received: by 2002:a05:6830:d3:b0:7d7:4fad:81d5 with SMTP id 46e09a7af769-7d759a15982mr262936a34.2.1773080727621; Mon, 09 Mar 2026 11:25:27 -0700 (PDT) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d73d87da93sm4432896a34.7.2026.03.09.11.25.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Mar 2026 11:25:27 -0700 (PDT) Date: Mon, 9 Mar 2026 13:25:25 -0500 From: Nathan Bossart To: Nazir Bilal Yavuz Cc: Manni Wood , KAZAR Ayoub , Neil Conway , Andrew Dunstan , Shinya Kato , PostgreSQL-development Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote: > +#ifndef USE_NO_SIMD > +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, > + bool *temp_hit_eof, int *temp_input_buf_ptr); > +#endif Should we inline this, too? > + /* > + * Do not disable SIMD when we hit EOL or EOF characters. In > + * practice, it does not matter for EOF because parsing ends > + * there, but we keep the behavior consistent. > + */ > + if (!(simd_hit_eof || simd_hit_eol)) > + cstate->simd_enabled = false; nitpick: I would personally avoid disabling it for EOF. It probably doesn't amount to much, but I don't see any point in the extra complexity/work solely for consistency. > + /* > + * We encountered a EOL or EOF on the first vector. This means > + * lines are not long enough to skip fully sized vector. If > + * this happens two times consecutively, then disable the > + * SIMD. > + */ > + if (first_vector) > + { > + if (cstate->simd_failed_first_vector) > + cstate->simd_enabled = false; > + > + cstate->simd_failed_first_vector = true; > + } The first time I saw this, my mind immediately went to the extreme case where this likely regresses: alternating long and short lines. We might just want to disable it the first time we see a short line, like we do for special characters. This is another thing that we can improve independently later on. > + /* First try to run SIMD, then continue with the scalar path */ > + if (cstate->simd_enabled) > + { > + int temp_input_buf_ptr = input_buf_ptr; > + bool temp_hit_eof = false; > + > + result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof, > + &temp_input_buf_ptr); > + input_buf_ptr = temp_input_buf_ptr; > + hit_eof = temp_hit_eof; Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code, moving the SIMD stuff to its own function is nice. The temp variables seem a bit too magical to me, though. If those really make a difference, IMHO there ought to be a big comment explaining why. -- nathan