X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org Received: from localhost (av.hub.org [200.46.204.144]) by postgresql.org (Postfix) with ESMTP id 6CFBE9DCBEA for ; Sun, 4 Dec 2005 12:52:55 -0400 (AST) Received: from postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 11432-05 for ; Sun, 4 Dec 2005 12:52:53 -0400 (AST) X-Greylist: from auto-whitelisted by SQLgrey- Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130]) by postgresql.org (Postfix) with ESMTP id 1343B9DCBCD for ; Sun, 4 Dec 2005 12:52:52 -0400 (AST) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id jB4GqjLS008438; Sun, 4 Dec 2005 11:52:45 -0500 (EST) To: Paul Lindner cc: Bruce Momjian , Neil Conway , pgsql-hackers@postgresql.org Subject: Re: Upcoming PG re-releases In-reply-to: <20051204164054.GE10317@inuus.com> References: <1133625371.9297.3.camel@localhost.localdomain> <200512031554.jB3Fs8h10927@candle.pha.pa.us> <20051204162520.GD10317@inuus.com> <8284.1133714056@sss.pgh.pa.us> <20051204164054.GE10317@inuus.com> Comments: In-reply-to Paul Lindner message dated "Sun, 04 Dec 2005 08:40:54 -0800" Date: Sun, 04 Dec 2005 11:52:45 -0500 Message-ID: <8437.1133715165@sss.pgh.pa.us> From: Tom Lane X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.001 required=5 tests=[AWL=0.001] X-Spam-Score: 0.001 X-Spam-Level: X-Archive-Number: 200512/184 X-Sequence-Number: 77032 Paul Lindner writes: > On Sun, Dec 04, 2005 at 11:34:16AM -0500, Tom Lane wrote: >> Paul Lindner writes: >>> iconv -c -f UTF8 -t UTF8 -o fixed.sql dump.sql >> >> Is that really a one-size-fits-all solution? Especially with -c? > I'd say yes, and the -c flag is needed so iconv strips out the > invalid characters. That's exactly what's bothering me about it. If we recommend that we had better put a large THIS WILL DESTROY YOUR DATA warning first. The problem is that the data is not "invalid" from the user's point of view --- more likely, it's in some non-UTF8 encoding --- and so just throwing away some of the characters is unlikely to make people happy. regards, tom lane