Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5sBH-003iHk-1Q for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Mar 2026 21:23:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5sBF-005QCk-34 for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Mar 2026 21:23:54 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5sBF-005QCc-2A for pgsql-hackers@lists.postgresql.org; Thu, 26 Mar 2026 21:23:54 +0000 Received: from mail-ot1-x336.google.com ([2607:f8b0:4864:20::336]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w5sBD-00000001Co5-1qKi for pgsql-hackers@postgresql.org; Thu, 26 Mar 2026 21:23:52 +0000 Received: by mail-ot1-x336.google.com with SMTP id 46e09a7af769-7d8b2703f37so1256552a34.1 for ; Thu, 26 Mar 2026 14:23:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774560231; x=1775165031; darn=postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=zDE2Wsb9hJYs2FlG1TNnuxmO3v3bpz644Bbu3+ET/Hc=; b=PYbfPcaN2ds+6UE0DqzgWQCLVu1O8oyeEE7r1GeG3MjUAbIJ3q84UtcwBXBiw3bSUe QpQugNOGaQ3UKc7kbnK/D7zwKesbBeGmicQlwpnJAyRs1ttnH0KM4IS0ThtUdA+F8hgz HK6XJ4ICkfnPgearJ185c9KYC8OAxuPR9ZiBik3X+yUjofYlrLBAohRKQMlzqsooKcWA X4jnjanY4On9UqhFmd60IV41XvQdOHdLZxKz/DlxPiF1QuInt1F/peR6ECoFtYDPGLkv tG1EIU2Cq0x/CYNZK8Ssj07DMC5QrpHwRyoVxyO1HTSES2rfO1Z8704UqTKFJXJ4WuXf ny4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774560231; x=1775165031; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zDE2Wsb9hJYs2FlG1TNnuxmO3v3bpz644Bbu3+ET/Hc=; b=skvVdEcv9nwPS73s/IzhO+xU4srhJB0XtWy7K7j/ofEP+OFmL2Dc4zF0MzKmue/O00 Mn4c0qjdCAf9f+vd9OA3h+r8b6XYMFVY7ryREGIW9Cy3pTjpruzEw2x6vQq6xDDXufn3 96zJxCe39FViW1ADwrpLKu8b4G5ufuIsfg0bKo3kCqTuZ5W+aW4GuNhVPFu5TyHbJ3XV nDvBMKy+tdCemhPmmG0JKvsr+cR9vIHIOxaDFwMUwC9x9ykbQsNJnnG4HCKOs78jGpp9 0MX9j3Z8DyMNVXhOlDIC6vzJYoOVdyAfJF8EbRM2B5acTsy4BAR2H50Icy+CgD6Lb36D pXPg== X-Forwarded-Encrypted: i=1; AJvYcCU0eeQDG+QWJpR3UJEYqOqo0EB9I1r8KacgG9GQCl1CMTyhx4Wh0lHQH9aYlNMMbkXeVpXWr72jnEVzq2s/@postgresql.org X-Gm-Message-State: AOJu0YyWo100MFbX7MnOWFG/Oo51EfKm/4ec7oBFWW7QDW29FrckYCOC R/pSrqkonCPdbG25ZeTmIsMoSzDisHgDnWffr/lennkVyLTF+WkDZT/Lw4aPiQ== X-Gm-Gg: ATEYQzxt4zxbeVw/9j2FBttoEQJpv+pJDDJK/c/Zctc+4DhV/elDCzRU5VBUGDTwebk 83ed6b06ewZqv/j9e7f1JM/+WXdOFHGsEr5swM90qEqX1tMO+tUtcu4uwNMJSzhmT7pX4guPnjA Db3GrPs1pxRUMpJDF3cNtCuctEwOlQbXcki7+kamdPCPf848ui04w+a3yCCscem6eYuFLzZoLGx NYBH9YvnvlGr0/EQS2RSwFYB/OaBRKwsZiW+zFwfxYp+dtF25Tpmq8cYALLZ2ZNsR59nw2SXrVD Qsru2oSzjQhNwJkM1NGpj4rMRSK+TCDsyvEdzwG9+0UmeU83r7iIKlSVj4rcPYhEuYx2a4dlh8Y tYyfc8gFoRT9aoKjyguWwQ8WG297bhnD2dVSyD/MsshNlAuv/rP+m1EhzMkgdzE3YeGrcJu5DL8 3oByKuqHI55+R4NzzThDQT3Z9A4UzeT1I2rzz9Hpz+i5Uj1ceQL9WpRs+51rlo52onP/PP9sd+E SrO8T959c+duaf3zGpbUA== X-Received: by 2002:a05:6830:440a:b0:7d7:dcb1:1e6c with SMTP id 46e09a7af769-7d9d6781e45mr5117097a34.7.1774560230932; Thu, 26 Mar 2026 14:23:50 -0700 (PDT) Received: from nathan (162-195-168-172.lightspeed.stlsmo.sbcglobal.net. [162.195.168.172]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7d9e71f5abdsm3173054a34.13.2026.03.26.14.23.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Mar 2026 14:23:50 -0700 (PDT) Date: Thu, 26 Mar 2026 16:23:48 -0500 From: Nathan Bossart To: KAZAR Ayoub Cc: Andres Freund , Pg Hackers , Neil Conway , Manni Wood , Andrew Dunstan , Shinya Kato , Mark Wong , Nazir Bilal Yavuz Subject: Re: Speed up COPY TO text/CSV parsing using SIMD Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Mar 18, 2026 at 03:29:32AM +0100, KAZAR Ayoub wrote: > If we have some json(b) column like : {"key1":"val1","key2":"val2"}, for > CSV format this would immediately exit the SIMD path because of quote > character, for json(b) this is going to be always the case. > I measured the overhead of exiting the SIMD path a lot (8 million times for > one COPY TO command), i only found 3% regression for this case, sometimes > 2%. I'm a little worried that we might be dismissing small-yet-measurable regressions for extremely common workloads. Unlike the COPY FROM work, this operates on a per-attribute level, meaning we only use SIMD when an attribute is at least 16 bytes. The extra branching for each attribute might not be something we can just ignore. > For cases where we do a false commitment on SIMD because we read a binary > size >= sizeof(Vector8), which i found very niche too, the short circuit to > scalar each time is even more negligible (the above CSV JSON case is the > absolute worst case). That's good to hear. -- nathan