public inbox for [email protected]
help / color / mirror / Atom feedFrom: Bruce Momjian <[email protected]>
To: Martijn van Oosterhout <[email protected]>
Cc: Gregory Maxwell <[email protected]>
Cc: Gavin Sherry <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: [email protected]
Subject: Re: Upcoming PG re-releases
Date: Fri, 9 Dec 2005 11:34:22 -0500 (EST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Thu, Dec 08, 2005 at 05:54:35PM -0500, Gregory Maxwell wrote:
> > No, what is needed for people who care about fixing their data is a
> > loadable strip_invalid_utf8() that works in older versions.. then just
> > select * from bar where foo != strip_invalid_utf8(foo); The function
> > would be useful in general, for example, if you have an application
> > which doesn't already have much utf8 logic, you want to use a text
> > field, and stripping is the behaviour you want. For example, lots of
> > simple web applications.
>
> Would something like the following work? It's written in pl/pgsql and
> does (AFAICS) the same checking as the backend in recent releases.
> Except the backend only supports up to 4-byte UTF-8 whereas this
> function checks upto six byte. For a six byte UTF-8 character, who is
> wrong?
>
> In any case, people should be able to do something like:
>
> SELECT field FROM table WHERE NOT utf8_verify(field,4);
>
> To check conformance with PostgreSQL 8.1. Note, I don't have large
> chunks of UTF-8 to test with but it works for the characters I tried
> with. Tested with 7.4.
I think the problem with any kind of function-call detection is that the
data has to get into the database first, and it isn't clear how someone
loading a failed dump would do that aside from modifying the column to
bytea in the dump, loading it in, then fixing it. The iconv idea has
the advantage that it can be fixed before loading into the database.
--
Bruce Momjian | http://candle.pha.pa.us
[email protected] | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
view thread (55+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Upcoming PG re-releases
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox