Received: from localhost (unknown [200.46.204.183]) by postgresql.org (Postfix) with ESMTP id A039D65012D for ; Fri, 1 Aug 2008 09:39:44 -0300 (ADT) Received: from postgresql.org ([200.46.204.86]) by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024) with ESMTP id 53327-08 for ; Fri, 1 Aug 2008 09:39:32 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from svr2.hagander.net (svr2.hagander.net [88.198.128.226]) by postgresql.org (Postfix) with ESMTP id C91C564FF57 for ; Fri, 1 Aug 2008 09:39:06 -0300 (ADT) Received: from dynamic.hagander.net ([127.0.0.1]) (encrypted and authenticated) by svr2.hagander.net (Postfix) with ESMTP id B8405DCC61C; Fri, 1 Aug 2008 14:39:03 +0200 (CEST) Received: from [127.0.0.1] (localhost [127.0.0.1]) by mha-laptop.hagander.net (Postfix) with ESMTP id 2A16A124149; Fri, 1 Aug 2008 14:39:03 +0200 (CEST) Message-ID: <489303E6.40508@hagander.net> Date: Fri, 01 Aug 2008 14:39:02 +0200 From: Magnus Hagander User-Agent: Thunderbird 2.0.0.16 (X11/20080724) MIME-Version: 1.0 To: "Joshua D. Drake" CC: Tom Lane , Bruce Momjian , PostgreSQL www Subject: Re: Email search failure References: <200807312019.m6VKJvR02505@momjian.us> <23897.1217539560@sss.pgh.pa.us> <48927FEF.8070200@commandprompt.com> In-Reply-To: <48927FEF.8070200@commandprompt.com> X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none X-Spam-Level: X-Archive-Number: 200808/2 X-Sequence-Number: 15592 Joshua D. Drake wrote: > Tom Lane wrote: >> Bruce Momjian writes: >>> Why is the email below now appearing in a search? >> >> Probably because nothing has gotten indexed for a month or more. >> Whoever is supposed to maintain the archive indexer has been >> on vacation since it broke ... > > That would be Magnus and you are correct. He just got back. The problem > (last I checked) is an issue with Russian emails. Looking at it now. That clearly wasn't the only problem, because there was a "sleep 1800" process that had been running since July 3. Logfiles weren't touched etc. Just restarting it fixed that part, which clearly somebody else could've done as well ;) I found the bug with the Russian emails, btw. It seems mhonarc encoded the invalid UTF8 sequences inside valid HTML escape entities And the code applied the "fix broken UTF8" logic *before* it unescaped the HTML entities. Now it does it both before and after.. Oh, and this should never have affected messages on -hackers for example, because it was always processed before ru-general. It would hit the PUG lists, -www, -patches and a few others. //Magnus