X-Original-To: pgsql-www-postgresql.org@localhost.postgresql.org Received: from localhost (neptune.hub.org [200.46.204.2]) by svr1.postgresql.org (Postfix) with ESMTP id 7D23FD1DF84 for ; Tue, 13 Jan 2004 17:53:34 +0000 (GMT) Received: from svr1.postgresql.org ([200.46.204.71]) by localhost (neptune.hub.org [200.46.204.2]) (amavisd-new, port 10024) with ESMTP id 98204-04 for ; Tue, 13 Jan 2004 13:53:04 -0400 (AST) Received: from ra.sai.msu.su (ra.sai.msu.su [158.250.29.2]) by svr1.postgresql.org (Postfix) with ESMTP id 03910D1DD3B for ; Tue, 13 Jan 2004 13:52:56 -0400 (AST) Received: from ra (ra [158.250.29.2]) by ra.sai.msu.su (8.12.10/8.12.10) with ESMTP id i0DHqqYJ018044 for ; Tue, 13 Jan 2004 20:52:52 +0300 (MSK) Date: Tue, 13 Jan 2004 20:52:52 +0300 (MSK) From: Oleg Bartunov X-X-Sender: megera@ra.sai.msu.su To: pgsql-www@postgresql.org Subject: incomplete headers: archives.postgresql.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavisd-new at postgresql.org X-Archive-Number: 200401/27 X-Sequence-Number: 3266 Hi there, crawling of archives.postgresql.org is a pain, because there are no last-modified information in headers and crawler have to download message again. For example: megera@mira:~$ curl -I http://archives.postgresql.org/pgsql-hackers/2004-01/msg00282.php HTTP/1.1 200 OK Date: Tue, 13 Jan 2004 17:38:26 GMT Server: Apache/1.3.28 (Unix) PHP/4.3.3RC1 X-Powered-By: PHP/4.3.3RC1 Content-Type: text/html Is't possible to add, at least, header 'Last-Modified', so crawler could understand if this page should be downloaded again ? It'll save bandwidth and time to crawle. I think the best way to set 'Last-Modified' header to date of message from 'Date:' field. Of course, there are should be proof from 'bad clocks', so default time may be arrival time. Also, it could be useful to add 'Expires' header. I think, headers should be added only to pages with individual message, not to indexes, because index pages are indeed changed. I don't think it's very difficult, but it help site and people. btw, I use cacheability to check if page could cached: http://www.sai.msu.su/admin/cacheability/?query=http%3A%2F%2Farchives.postgresql.org%2Fpgsql-hackers%2F2004-01%2Fmsg00282.php&descend=on http://archives.postgresql.org/pgsql-hackers/2004-01/msg00282.php Expires - Cache-Control - Last-Modified - ETag - Content-Length - (actual size: 13277) Server Apache/1.3.28 (Unix) PHP/4.3.3RC1 This object will be considered stale, because it doesn't have any freshness information assigned. It doesn't have a validator present. It doesn't have a Content-Length header present, so it can't be used in a HTTP/1.0 persistent connection. Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83