X-Original-To: pgsql-www-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [200.46.204.144]) by svr1.postgresql.org (Postfix) with ESMTP id C56355E4A57 for ; Sun, 5 Sep 2004 17:52:24 +0100 (BST) Received: from svr1.postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 07068-02 for ; Sun, 5 Sep 2004 16:52:20 +0000 (GMT) Received: from svr4.postgresql.org (svr4.postgresql.org [66.98.251.159]) by svr1.postgresql.org (Postfix) with ESMTP id 42E715E4956 for ; Sun, 5 Sep 2004 17:51:58 +0100 (BST) Received: from ganymede.hub.org (blk-222-46-91.eastlink.ca [24.222.46.91]) by svr4.postgresql.org (Postfix) with ESMTP id 282445AFA9F for ; Sun, 5 Sep 2004 14:17:14 +0000 (GMT) Received: by ganymede.hub.org (Postfix, from userid 1000) id 7A11033C79; Sun, 5 Sep 2004 11:16:14 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id 78FD033C16; Sun, 5 Sep 2004 11:16:14 -0300 (ADT) Date: Sun, 5 Sep 2004 11:16:14 -0300 (ADT) From: "Marc G. Fournier" X-X-Sender: scrappy@ganymede.hub.org To: John Hansen Cc: Oleg Bartunov , Greg Sabino Mullane , pgsql-www@postgresql.org Subject: Re: Suggestion for improving Archives In-Reply-To: <5066E5A966339E42AA04BA10BA706AE56190@rodrick.geeknet.com.au> Message-ID: <20040905111403.F76678@ganymede.hub.org> References: <5066E5A966339E42AA04BA10BA706AE56190@rodrick.geeknet.com.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, hits=0.0 tagged_above=0.0 required=5.0 tests=HTML_MESSAGE X-Spam-Level: X-Archive-Number: 200409/45 X-Sequence-Number: 5115 On Sun, 5 Sep 2004, John Hansen wrote: >> Marc again dropped last time modification header, so it's >> impossible to sort results by date (in general case ) without >> specific parser. > > Yes, that is unfortunate, but the code required to make this happen puts > stress on the archives to some degree. > >> Also, he changed template for message. These changes cause >> recrawling the whole archive each time and overloading >> archives.postgresql.org More specific search engine could use >> another source of information which messages to crawl, but >> one we use at pgsql.ru is a general search engine and it >> can't get modification date without proper header. > > There should be no need to reindex the entire archive because of a > template change, since if you honor the embedded > .. tags, the body text never changes. > Unless of course, you want to keep an up-to-date cached copy. I think what Oleg is referring to is that search engines generally compare the Last-Modified header before pulling in the whole file, to see if they are the same or not ... php, unfortunately, sets that to now(), so as far as SE's are concerned, every time they index is a new file :( I'm going to play with mhonarc this week to see if I can get it to properly set Last-Modified to Date based on the message itself ... that will clean up that mess ... Oleg, is there anything that I can put into for this? To avoid having to use PHP to do it? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664