Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v1sIj-00DafL-Uv for pgsql-general@arkaria.postgresql.org; Thu, 25 Sep 2025 20:10:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v1sIi-005eCF-LU for pgsql-general@arkaria.postgresql.org; Thu, 25 Sep 2025 20:10:48 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v1sIh-005e7Y-PD for pgsql-general@lists.postgresql.org; Thu, 25 Sep 2025 20:10:48 +0000 Received: from sonic310-24.consmr.mail.ne1.yahoo.com ([66.163.186.205]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1v1sId-0005Jx-2R for pgsql-general@lists.postgresql.org; Thu, 25 Sep 2025 20:10:47 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1758831040; bh=gBjqYN/dsM/pc8hF7TemNkBe1CpHRzIls9UOolijGOo=; h=Date:From:To:In-Reply-To:References:Subject:From:Subject:Reply-To; b=C1eClEnud+EiuU1o6gtwFUX/YolwuVssb3JIH5QqnXmuUxNBIAtd41JuS26SFqiZjr2uRP0aqw9jpupJPjTwmYfoVNPiIxvLVhVcBbfN0ymsP1aKyh1J7/fVzc6CginxtsvJ9AM3iExfI1rmKYuZSm5/GSdtx0DFyjk0y+KzeQxRH9IZTyv2JnaKRttio2HRRdmrbxlY3iszupveH1vb63UdSBuKzJjjlvn9NUXLjxu+wgtU9hyNl2yACXxNv+syErGxipeK+MPG/UarMic7OF+v+bR0RN616O5fuL7Apix8wRhUxAVWUBUMHuIieEWLZKVaxJDPjetiPG3PGLX2iw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1758831040; bh=pHE6hxz9AdoCJEEitbT0EEnIMUtAsJTZjAKszTShtZk=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=lmze3OseVoFxPxG/VNZCHQ7Do6Cco/k7b1PrTLJ+tpgZi1dQvbAdGpR8EN9r+Dgbv/r99HNUW1Wcl+iKt4ODVe0NuBYS5KX1FuNVKasV3ucbfJtKllY9t4gzFh+1dqk+fxx8cnM+/0GfFkBFAC2Jz8GyJ5TQkUIeanwp8RC2sHc+2hPNq1KYZVCRFoPCKu4YMj4WoG//IA4B0SWSgpKfTcsUudZsT0y4DzsRrOWqf9ERaxbVzEoq8MRsPL2XWXQYiP5i7invK5fhMYK6E4qA+fD0HDl7ebLPlCOj5CGCcGOWNaGOBzcKcytZygyPg+sYJDXwtoyE6sCY3R/7p/T3Vw== X-YMail-OSG: NeU5wegVM1nm0gFpG7oha94470IWHCMaoP60FrNzGOx3G5tDgnFVbKP_QLdjgEM GgIvK4gSizxVcY3EzF.NqDwoInydMYuuIQZNsXK8C8w0YIIKU9OQtdUQRr1s2snTDMZT4g5AcT8q 6BNc5D4YwEfoXgVJdjPlyBzpcbOL0eZ3m0OxFx7garEU.gUApdKI7_kBA_MgvCsgxh4XqVePVjvk 05g5zok_c7F0FSRUd5AD3kjRuo5MZA1polzffKLovbBwpKkrUdEe05ZFKhNlJ_9v8v1tYiJodR.. rIzLfoEM90tKc2FNddksoqz3FlIqFWwnFfSc5HRqWj38QLPo3m0POGsa4KySPCjhYe_s8sfptgaL 74qUUjObjWs8Z3.yqHvtC9ZeMTJYQZrnmMejvxOUJC5vdVv4gic76uemUkpQwPfWW04OaC_bENYx nTSHPJfbJadpeMT0klOhaXC0q.6gIDkjUNCLcyH1trpyS64cFzqhdFd0rJvm0eL2wuxXcxEQp2In Pd5wFfCGmrh.wfzEihsvVn53WNN2MBQ72cw0AEZmcZ4NbYfqLc2GphxOpY3PPoVYQlxrc1IbZUQ4 2g21jlZkQqMphZa6x8q.tv1lip3IlCVLHCY3ym7m78KB8F6V2d6qkYhYukzJrb0PhQJreyAFMNtD EAdPwMB83hu8._7cox0ZG5j4VrjKMfHIuHArdPRMcAwBOriGBAsAeQQ3kwAwiO2SbqA9c..ixEGG AJ8ZeTB50OtAaIOzyvKOOaZ6Pr2CaQae8Wed48VfsLJt9B7hHiB.CyHkRtTJ8fAoZjg0ND59JT0b cbdLdDLCy5BFLO5h1wUCcApg1hM1UyNUkwLtc9z0yr7LPeJVztIOlBa0DRptwZcIxf73yawFAS8_ daIf3TD3Fg2gMDcAIz_yoVEbKg6.GIJkosx0jq6IbgYRAAr7uA5d0gGPzehDVDi.wlz.DBEFXvFo Mo91loi0721DMiADjoFYOr.Ad4S9745mgHKO97uC0xFxzk1NFR6nIw.toiasdKBMqKxPmxet5msK b8AdnHkWhpXAQDbvMux9bm28nIAYtXnTiLBV72Vq0n7I7rJS8PB8Pl4p1ToSM1SztNtO_PWOhO2f jTDHqX0tCvk1qkvhqNrP1iXEC55_0kzidjrO3ZxKuZYp_oh5dxW52xloq.0yc5g0h4OHiYrALkN4 oWwpNML1admoeWl6PJMTx.m4OLX4znv4oM1ksnV5oRuVpEOzIKCSoSaXw.obmYU4KaPmGVYSwq8c hk8tNvLYX2nqa7bezZdBgPOFnSKNGlkmBNSQTjX6Eyb9dgfl7WFoXwohX47W8pMprFb_VQF2E3mE u63xgW0jNnEm0W9WCZffruED48NAG8Tr15iSCF7xnmHyvgirjy8ktL_oc6PmS8O8amikyZBsVWr_ iohQReBrZzH6KHYfngfb_AV7dWzKrEtW_3TnzicV6P41wFrtfxi9I3WYRTY7AgXsIp7VwLxsRaCg Ehu1_byA3EOIVW.g4oMoPzn_oVFvbfnTZHoRX2ckg6iNJ2HUn5nBG_WTVotUuYoJQ1BXmLcbybZ7 BWFzS1akSptOYwSyuKOc.m8gxtsmHDus38OIMQsZ1Erj4uBxR6fmEsqPEI7aCQRN9i2wVBA5NkYT z_BkS1nwQD.15LV.qHClntvcE3id4x._XuCSUkTVFu3twLLAkk4li.Z1bYRb0wWu99SghO0rtZGq 8mhiRPzeKjdXAH.1bRzIT9KABPSOoYqF54ce0nP.hmXgxgTcRH2IB4lLPpPdfu6t5c1DS9kTrkgg Omg3c4TSwJjh6p5AQesTEo57Na974uWMF83Q4Dq14pT192XT3CdSKOdzxnhfIR2_rMhb8epvWDhB DVUGWwSx8DhJggOZtf5h8qYH0HdFiZKpphj1dGv4LKY7U17wcO_rxlIw_OUwwhhKtsMRR8_i05fV kA9pM2T_Cb1j8xE3CddFNixoOlJo26yhcRh2OdtWKvX_P2h0seR1qnOm8T0lntzQOzBfGqIwsCGs W5TI9khblrrrnJ0WGo.k_mTAb_RK9NYwGcQ31gPaApxWernaLR9L3lwFz7D8JU6KgMFQtEMY8KQD rAZ.mKQhmuWBZYVqwDKeE.wbQDNXvDyfiUu85cvl50lqVkpjMSAyZYzStolylhaWAqOt0euPhLsy QY3cEEfr.HGuiG490wiptzsVIuAqTaDhGG8KbsboGhnO_41bicBEiscGLsxbxk.T72ldcrYwOFPc NYgPmuIr8PUZkVXgNrjdrh8GUNGecelIqp1.lbIYHjXcPv2IAZ5OI X-Sonic-MF: X-Sonic-ID: 6e8707b2-9d85-458b-b34d-1fdd28b4dc51 Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.ne1.yahoo.com with HTTP; Thu, 25 Sep 2025 20:10:40 +0000 Date: Thu, 25 Sep 2025 20:10:39 +0000 (UTC) From: Brent Wood To: "pgsql-general@lists.postgresql.org" , Bryan Sayer Message-ID: <1729237457.837174.1758831039332@mail.yahoo.com> In-Reply-To: <80c44b4d-1185-478b-939c-62e0d4377106@gmail.com> References: <80c44b4d-1185-478b-939c-62e0d4377106@gmail.com> Subject: Re: Additional options for COPY from MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_837173_1925973751.1758831039330" X-Mailer: WebService/1.1.24485 YMailNorrin Content-Length: 5679 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------=_Part_837173_1925973751.1758831039330 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable As I understand it, COPY is intended as a simple & fast way to get clean da= ta into Postgres. It is not intended to groom data during the process. There are a variety of= ETL tools that can apply filters & rules to data streams to do what you ar= e asking, so no need to invent another Postgres specific wheel. That said, there are some ways to do this without a formal ETL application. If the data rules you want to apply can be provided in a script, an approac= h I use is to pipe the data file through a set of checks & filters into a C= OPY command reading from STDIN. Very effective on Linux, Windows doesn't re= ally do this very well. eg: to change any instance of Grent in your data to great, use: cat | sed 's/Grent/great/' | psql -c "copy from STDIN w= ith delimiter ',' null '';"=C2=A0 You can chain as many awk, sed, tr, grep, etc commands as you need together= to transform the data as required on the way to COPY A Postgres only approach that=C2=A0I've also used is to create a loading ta= ble with columns to match the incoming data, but each column is an unconstr= ained text or varchar. Then modify these data to remove any errors using SQ= L, finishing up with a select to insert the cleaned data into the final tab= le. If the final table has appropriate constraints to validate the data, on= ly valid data will get there. As you find errors, you fix them & rerun the = insert until it works. HTH, Brent Wood =20 =20 On Friday, September 26, 2025 at 07:49:09 AM GMT+12, Bryan Sayer wrote:=20 =20 =20 Hi, I'm not sure if this is the best list to ask this (and I am very new to Pos= tgreSQL) but I think more options are needed in the COPY from command, in o= rder to better deal with exceptions. By exceptions I mean data not consiste= nt with the format, empty rows, extra delimiters at the end of rows, etc. Is there someone or a party that deals with the details of the COPY command= that I could discuss this with? Just for context, I am dealing with delimited data of millions of rows and = perhaps 60 columns or so that I wish to read into tables in a Postgres 17 d= atabase under Windows 11, with everything on local drives. --=20 Bryan Sayer Retired Demographer/Statistician In a world in which you can be anything, be kind=20 ------=_Part_837173_1925973751.1758831039330 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
As I understand it,= COPY is intended as a simple & fast way to get clean data into Postgre= s.

It is not intended to groom data during the pro= cess. There are a variety of ETL tools that can apply filters & rules t= o data streams to do what you are asking, so no need to invent another Post= gres specific wheel.

That said, there are some ways to do this without a formal ETL applica= tion.

If the data rules you want to apply can be p= rovided in a script, an approach I use is to pipe the data file through a s= et of checks & filters into a COPY command reading from STDIN. Very eff= ective on Linux, Windows doesn't really do this very well.

eg: to change any instance of Grent in your data to great, use:

cat <file> | sed 's/Grent/great/' | psql &l= t;db> -c "copy <table> from STDIN with delimiter ',' null '';"&nbs= p;

You can chain as many awk, sed, tr, grep, etc commands= as you need together to transform the data as required on the way to COPY<= /div>

A Postgres only approach that I've also used i= s to create a loading table with columns to match the incoming data, but ea= ch column is an unconstrained text or varchar. Then modify these data to re= move any errors using SQL, finishing up with a select to insert the cleaned= data into the final table. If the final table has appropriate constraints = to validate the data, only valid data will get there. As you find errors, y= ou fix them & rerun the insert until it works.

HTH,

Brent Wood=




On Friday, September 26, 2025 at 07:49:09 AM GMT+1= 2, Bryan Sayer <brysayer@gmail.com> wrote:





<= br>
Hi,

I'm not sure if this is the best list to ask this (and I= am very new to PostgreSQL) but I think more options are needed in the COPY= from command, in order to better deal with exceptions. By exceptions I mea= n data not consistent with the format, empty rows, extra delimiters at the = end of rows, etc.

Is there someone or a party that deals with the de= tails of the COPY command that I could discuss this with?

Just for c= ontext, I am dealing with delimited data of millions of rows and perhaps 60= columns or so that I wish to read into tables in a Postgres 17 database un= der Windows 11, with everything on local drives.

--
Bryan Sayer<= br>Retired Demographer/Statistician
In a world in which you can be anyth= ing, be kind




------=_Part_837173_1925973751.1758831039330--