public inbox for [email protected]  
help / color / mirror / Atom feed
From: Bruce Momjian <[email protected]>
To: Bruce Momjian <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Paul Lindner <[email protected]>
Cc: Neil Conway <[email protected]>
Cc: [email protected]
Subject: Re: Upcoming PG re-releases
Date: Tue, 6 Dec 2005 15:25:13 -0500 (EST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <[email protected]> writes:
> > > I have added your suggestions to the 8.1.X release notes.
> > 
> > Did you read the followup discussion?  Recommending -c without a large
> > warning seems a very bad idea.
> 
> Well, I said it would remove invalid sequences.  What else should we
> say?
> 
> 	This will remove invalid character sequences.
> 
> I saw no clear solution that allowed sequences to be corrected.

The release note text is:

	Some users are having problems loading <literal>UTF8</> data into 8.1.X.
	 This is because previous versions allowed invalid <literal>UTF8</>
	sequences to be entered into the database, and this release properly
	accepts only valid <literal>UTF8</> sequences.	One way to correct a
	dumpfile is to use <command>iconv -c -f UTF-8 -t UTF-8</>. This will
	remove invalid character sequences. <command>iconv</> reads the entire
	input file into memory so it might be necessary to <command>split</> the
	dump into multiple smaller files for processing.

One nice solution would be if iconv would report the lines with errors
and you could correct them, but I see no way to do that.  The only thing
you could do is to diff the old and new files to see the problems.  Is
that helpful?  Here is new text I have used:

	Some users are having problems loading <literal>UTF8</> data into 8.1.X.
	 This is because previous versions allowed invalid <literal>UTF8</>
	sequences to be entered into the database, and this release properly
	accepts only valid <literal>UTF8</> sequences.  One way to correct a
	dumpfile is to use <command>iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql
	dumpfile.sql</>.  The <literal>-c</> option removes invalid character
	sequences.  A diff of the two files will show the sequences that are
	invalid.  <command>iconv</> reads the entire input file into memory so
	it might be necessary to <command>split</> the dump into multiple
	smaller files for processing.

It highlights the 'diff' idea.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [email protected]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073



view thread (55+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Upcoming PG re-releases
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox