X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org Received: from localhost (av.hub.org [200.46.204.144]) by postgresql.org (Postfix) with ESMTP id 9A8319DCB34 for ; Tue, 6 Dec 2005 23:46:04 -0400 (AST) Received: from postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 55563-10 for ; Tue, 6 Dec 2005 23:46:09 -0400 (AST) X-Greylist: from auto-whitelisted by SQLgrey- Received: from linuxworld.com.au (unknown [203.34.46.50]) by postgresql.org (Postfix) with ESMTP id C49EF9DCB38 for ; Tue, 6 Dec 2005 23:46:01 -0400 (AST) Received: from linuxworld.com.au (IDENT:swm@localhost.localdomain [127.0.0.1]) by linuxworld.com.au (8.13.2/8.13.2) with ESMTP id jB73jpVC022005; Wed, 7 Dec 2005 14:45:52 +1100 Received: from localhost (swm@localhost) by linuxworld.com.au (8.13.2/8.13.2/Submit) with ESMTP id jB73jpn8022000; Wed, 7 Dec 2005 14:45:51 +1100 Date: Wed, 7 Dec 2005 14:45:51 +1100 (EST) From: Gavin Sherry To: Bruce Momjian cc: Peter Eisentraut , pgsql-hackers@postgresql.org Subject: Re: Upcoming PG re-releases In-Reply-To: <200512062100.jB6L0ZF08711@candle.pha.pa.us> Message-ID: References: <200512062100.jB6L0ZF08711@candle.pha.pa.us> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0 required=5 tests=[none] X-Spam-Score: 0 X-Spam-Level: X-Archive-Number: 200512/363 X-Sequence-Number: 77211 Hi, On Tue, 6 Dec 2005, Bruce Momjian wrote: > > Nice, updated. > > --------------------------------------------------------------------------- > I think my suggestion from the other day is useful also. --- Omar Kilani and I have spent a few hours looking at the problem. For situations where there is a lot of invalid encoding, manual fixing is just not viable. The vim project has a kind of fuzzy encoding conversion which accounts for a lot of the non-UTF8 sequences in UTF8 data. You can use vim to modify your text dump as follows: vim -c ":wq! ++enc=utf8 fixed.dump" original.dump --- I think this is a viable option for people with a non-trivial amount of data and don't see manual fixing or potentially losing data as a viable option. Thanks, Gavin