Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uoILZ-001jD3-SS for pgsql-hackers@arkaria.postgresql.org; Tue, 19 Aug 2025 09:09:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uoILY-008Zqg-QS for pgsql-hackers@arkaria.postgresql.org; Tue, 19 Aug 2025 09:09:37 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uoILY-008ZqX-Fi for pgsql-hackers@lists.postgresql.org; Tue, 19 Aug 2025 09:09:37 +0000 Received: from mail-wr1-x42e.google.com ([2a00:1450:4864:20::42e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uoILW-000fut-12 for pgsql-hackers@postgresql.org; Tue, 19 Aug 2025 09:09:36 +0000 Received: by mail-wr1-x42e.google.com with SMTP id ffacd0b85a97d-3b9e411c820so2719617f8f.1 for ; Tue, 19 Aug 2025 02:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1755594571; x=1756199371; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=OhJEl+sdFJJflkSM3j0urQdUF8fUBizB/fclTdqBafA=; b=Sq98RA91eL9lbRCPmZ4BH2tVtOyjAEPGOrwghjmfL30GhFUv98CZidW28cc3nUeQ8Y 07q6VXaBJQIpivpc75oGM5oCXeFD+/vLrtbzRMIdGV/jX5OhR1La7e5t97rlkjyT8iMD Lp8NTMWr0HDAnzniZFRR/ajZ6etyDXFFVRvZtOeq1kAwJXuTlW8qBl4RmGynAo3cyLpO wYoDBTnY45tywus1KXR2VyGCoytPDQKOfaDIATBR9iC0Sq5gmhKI3AYIYIf4bOCwcp9d /Nwq8pbtVdJph4332B5IHUB9sDx3Kmr45aRRz0tccFgucJZV9feYmKMzqZCPq8f04xvk n0Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755594571; x=1756199371; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OhJEl+sdFJJflkSM3j0urQdUF8fUBizB/fclTdqBafA=; b=odHXi/r/kZlbNl5zrOEAKvHYJCQeJ/WC0DzMvd9XII7vb7yCAP0usYFeCJoIt89HzQ 8WLw5zBtClMX0vxg7KJ0Gy4R6Bn04Y8y85u++NPiBh2s1jsha8D3cXH+xLR81cVAjSSk 3xc+5hWz/Rh+kn5MGJ0e84b4VmR2YkA1RYXRlLEGzr7zhSZQeHBKt/9dOptcJgOYVlxf CGQESz2gnNqaUm0Wj7JzMcixGPToyTNuTsCKaEqWyLcNsh2hVyORvOAbnqixqfaTFZ1G jweIon5Z6QHqZwJSPMArY5LYaQQuf5M39xelulyl9c4kBGSFsZ4GaS7RH/4E3FftHFyU oTIQ== X-Forwarded-Encrypted: i=1; AJvYcCX0BgrQ1gMdBttmr0euMqB8bXH+qeTG2c59oXif4NNfEm/Wn76LsIaF7FNOIxGJwcPY14lVrJas2sONmQwl@postgresql.org X-Gm-Message-State: AOJu0YySF29mw6fZ6t7HPyWMX2e+COpXq+z76waryJg6MEAriQeNVlO/ wNMEoLDj5NA5TiqZpMr+4uXgtluil1sPqLY4Kewpcokjx8e/AZwbH3zh9oR0OM5AljsbZFwMWr3 cqZl7jAx1yiXgwkyEDEJMdbH0UT01IhQwtA2KI3hbZw== X-Gm-Gg: ASbGnct6y4pDF1tPcUoMj4zy/JOLst3zn5gaZXAdCpC+bbYA/lFv10lbPw8hMZK03SC CEk4XyE5u3RoWhPNV7PdYeWiRQ6XzbKeOmQR34F0mnJeQVvgK5lcDoc3nuFqQaKknEJMLGKFJ0X fR9bo7zG7MraFyUTA6gEAkm5Bhhr1+GuoVnnlzOHwHErvKdHZP49XeUo+4xH/awseuFEiniwlqV EBYwx0= X-Google-Smtp-Source: AGHT+IHxfvroZKh3RKWHhjb/jUt40OfclODklpYHeCoKEZPV5bxz/lqMG180F4ngYyLR2CHYFu1ZP3WJszaNVEMg7wM= X-Received: by 2002:a05:6000:4283:b0:3b8:fb9d:2482 with SMTP id ffacd0b85a97d-3c0ed1f320emr1502115f8f.42.1755594571425; Tue, 19 Aug 2025 02:09:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ants Aasma Date: Tue, 19 Aug 2025 12:09:20 +0300 X-Gm-Features: Ac12FXySMic2nPynmdLe6qqDeCDTi9m3X027iRMRp8B_iqFrHW0gmfmyYstfgZg Message-ID: Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD To: Nazir Bilal Yavuz Cc: Shinya Kato , pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 7 Aug 2025 at 14:15, Nazir Bilal Yavuz wrote: > I have a couple of ideas that I was working on: > --- > > + * However, SIMD optimization cannot be applied in the following cases: > + * - Inside quoted fields, where escape sequences and closing quotes > + * require sequential processing to handle correctly. > > I think you can continue SIMD inside quoted fields. Only important > thing is you need to set last_was_esc to false when SIMD skipped the > chunk. There is a trick with doing carryless multiplication with -1 that can be used to SIMD process transitions between quoted/not-quoted. [1] This is able to convert a bitmask of unescaped quote character positions to a quote mask in a single operation. I last looked at it 5 years ago, but I remember coming to the conclusion that it would work for implementing PostgreSQL's interpretation of CSV. [1] https://github.com/geofflangdale/simdcsv/blob/master/src/main.cpp#L76 -- Ants