X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org Received: from localhost (av.hub.org [200.46.204.144]) by postgresql.org (Postfix) with ESMTP id 0CF009DD623 for ; Tue, 6 Dec 2005 15:26:49 -0400 (AST) Received: from postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 69968-05 for ; Tue, 6 Dec 2005 15:26:49 -0400 (AST) X-Greylist: from auto-whitelisted by SQLgrey- Received: from candle.pha.pa.us (candle.pha.pa.us [64.139.89.126]) by postgresql.org (Postfix) with ESMTP id 3D4749DD61A for ; Tue, 6 Dec 2005 15:26:46 -0400 (AST) Received: (from pgman@localhost) by candle.pha.pa.us (8.11.6/8.11.6) id jB6JQcS23646; Tue, 6 Dec 2005 14:26:38 -0500 (EST) From: Bruce Momjian Message-Id: <200512061926.jB6JQcS23646@candle.pha.pa.us> Subject: Re: Upcoming PG re-releases In-Reply-To: <20051204162520.GD10317@inuus.com> To: Paul Lindner Date: Tue, 6 Dec 2005 14:26:38 -0500 (EST) CC: Neil Conway , Tom Lane , pgsql-hackers@postgresql.org X-Mailer: ELM [version 2.4ME+ PL121 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.009 required=5 tests=[AWL=0.009] X-Spam-Score: 0.009 X-Spam-Level: X-Archive-Number: 200512/316 X-Sequence-Number: 77164 I have added your suggestions to the 8.1.X release notes. --------------------------------------------------------------------------- Paul Lindner wrote: -- Start of PGP signed section. > On Sat, Dec 03, 2005 at 10:54:08AM -0500, Bruce Momjian wrote: > > Neil Conway wrote: > > > On Wed, 2005-11-30 at 10:56 -0500, Tom Lane wrote: > > > > It's been about a month since 8.1.0 was released, and we've found about > > > > the usual number of bugs for a new release, so it seems like it's time > > > > for 8.1.1. > > > > > > I think one fix that should be made in time for 8.1.1 is adding a note > > > to the "version migration" section of the 8.1 release notes describing > > > the "invalid UTF-8 byte sequence" problems that some people have run > > > into when upgrading from prior versions. I'm not familiar enough with > > > the problem or its remedies to add the note myself, though. > > > > Agreed, but I don't understand the problem well enough either. Does > > anyone? > > There was a thread a couple of weeks back about this problem. Here's > my sample writeup -- I give my permission for anyone to use it as they > see fit: > > > Upgrading UNICODE databases to 8.1 > > Postgres 8.1 includes a number of bug-fixes and improvements to > Unicode and UTF-8 character handling. Unfortunately previous releases > would accept character sequences that were not valid UTF-8. This > may cause problems when upgrading your database using > pg_dump/pg_restore resulting in an error message like this: > > Invalid UNICODE byte sequence detected near byte ... > > To convert your pre-8.1 database to 8.1 you may have to remove and/or > fix the offending characters. One simple way to fix the problem is to > run your pg_dump output through the iconv command like this: > > iconv -c -f UTF8 -t UTF8 -o fixed.sql dump.sql > > The -c flag tells iconv to omit invalid characters from output. > > There is one problem with this. Most versions of iconv try to read > the entire input file into memory. If you dump is quite large you > will need to split the dump into multiple files and convert each one > individually. You must use the -l flag for split to insure that the > unicode byte sequences are not split. > > split -l 10000 dump.sql > > Another possible solution is to use the --inserts flag to pg_dump. > When you load the resulting data dump in 8.1 this will result in the > problem rows showing up in your error log. > > -- > Paul Lindner ||||| | | | | | | | | | > lindner@inuus.com -- End of PGP section, PGP failed! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073