X-Original-To: pgsql-www-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [200.46.204.144]) by svr1.postgresql.org (Postfix) with ESMTP id B57D65E48ED for ; Sun, 5 Sep 2004 09:47:42 +0100 (BST) Received: from svr1.postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 50900-10 for ; Sun, 5 Sep 2004 08:47:35 +0000 (GMT) Received: from rodrick.geeknet.com.au (ns1.geeknet.com.au [220.244.63.182]) by svr1.postgresql.org (Postfix) with ESMTP id 612C05E48E0 for ; Sun, 5 Sep 2004 09:47:29 +0100 (BST) Subject: Re: Suggestion for improving Archives MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-class: urn:content-classes:message Date: Sun, 5 Sep 2004 18:47:31 +1000 X-MimeOLE: Produced By Microsoft Exchange V6.5.6944.0 Message-ID: <5066E5A966339E42AA04BA10BA706AE56190@rodrick.geeknet.com.au> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pgsql-www] Suggestion for improving Archives Thread-Index: AcSTHN09Ok7FnTGbS1aY0xfLlBzakwABhYAw From: "John Hansen" To: "Oleg Bartunov" Cc: "Greg Sabino Mullane" , X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, hits=0.0 tagged_above=0.0 required=5.0 tests= X-Spam-Level: X-Archive-Number: 200409/42 X-Sequence-Number: 5112 > Marc again dropped last time modification header, so it's=20 > impossible to sort results by date (in general case ) without=20 > specific parser. Yes, that is unfortunate, but the code required to make this happen puts stress on the archives to some degree. > Also, he changed template for message. These changes cause=20 > recrawling the whole archive each time and overloading=20 > archives.postgresql.org More specific search engine could use=20 > another source of information which messages to crawl, but=20 > one we use at pgsql.ru is a general search engine and it=20 > can't get modification date without proper header. There should be no need to reindex the entire archive because of a template change, since if you honor the embedded .. tags, the body text never changes. Unless of course, you want to keep an up-to-date cached copy. >=20 > I suggest: >=20 > 1. Use 3-server architecture (image server, frontend, backend) which > could be reduced to 2 servers (image+frontend, backend) - > frontend could be plain apache+mod_accel and serve/cache=20 > all backends > outputs, backend is a modperl or/and php enabled apache. > 2. return last modification header - be friendly to crawlers=20 > and browsers=20 Tho an accellerator would only work if last-modified header is returned by the backend, this might be worth looking into. > 3. stop changing message template >=20 Template changes are inevitable, they're part of progress :) ... John