X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org Received: from localhost (av.hub.org [200.46.204.144]) by postgresql.org (Postfix) with ESMTP id BE1079DD73D for ; Thu, 8 Dec 2005 18:54:36 -0400 (AST) Received: from postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 89858-05-2 for ; Thu, 8 Dec 2005 18:54:38 -0400 (AST) X-Greylist: domain auto-whitelisted by SQLgrey- Received: from nproxy.gmail.com (nproxy.gmail.com [64.233.182.207]) by postgresql.org (Postfix) with ESMTP id 562D79DD72B for ; Thu, 8 Dec 2005 18:54:33 -0400 (AST) Received: by nproxy.gmail.com with SMTP id b2so251586nfe for ; Thu, 08 Dec 2005 14:54:35 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=IujbsgRS4ZlnKoOJNUn4JyWS/pkA2bG8VHIt9C3gSdZOpg64KH8pqMC4/ruBEbpuEIOddRF/fnl11AexP7qg21Jezc8MSJIeib0Z8k6OafXJCeM29XzCGXN8tZ9H3k6Shz2vMwKkZNTy+wN8qUn107kXMOx9lGnZXGLNnOD8Ib8= Received: by 10.48.240.16 with SMTP id n16mr362015nfh; Thu, 08 Dec 2005 14:54:35 -0800 (PST) Received: by 10.48.249.3 with HTTP; Thu, 8 Dec 2005 14:54:35 -0800 (PST) Message-ID: Date: Thu, 8 Dec 2005 17:54:35 -0500 From: Gregory Maxwell To: Bruce Momjian Subject: Re: Upcoming PG re-releases Cc: Gavin Sherry , Peter Eisentraut , pgsql-hackers@postgresql.org In-Reply-To: <200512082244.jB8MiYT02161@candle.pha.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <200512082244.jB8MiYT02161@candle.pha.pa.us> X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0 required=5 tests=[none] X-Spam-Score: 0 X-Spam-Level: X-Archive-Number: 200512/461 X-Sequence-Number: 77309 On 12/8/05, Bruce Momjian wrote: > > A script which identifies non-utf-8 characters and provides some > > context, line numbers, etc, will greatly speed up the process of > > remedying the situation. > > I think the best we can do is the "iconv -c with the diff" idea, which > is already in the release notes. I suppose we could merge the iconv and > diff into a single command, but I don't see a portable way to output the > iconv output to stdout., /dev/stdin not being portable. No, what is needed for people who care about fixing their data is a loadable strip_invalid_utf8() that works in older versions.. then just select * from bar where foo !=3D strip_invalid_utf8(foo); The function would be useful in general, for example, if you have an application which doesn't already have much utf8 logic, you want to use a text field, and stripping is the behaviour you want. For example, lots of simple web applications.