X-Original-To: pgsql-www-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [200.46.204.144]) by svr1.postgresql.org (Postfix) with ESMTP id A94993A3E48 for ; Thu, 4 Nov 2004 13:16:53 +0000 (GMT) Received: from svr1.postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 52138-01 for ; Thu, 4 Nov 2004 13:16:44 +0000 (GMT) Received: from anchor-post-31.mail.demon.net (anchor-post-31.mail.demon.net [194.217.242.89]) by svr1.postgresql.org (Postfix) with ESMTP id 364473A47DB for ; Thu, 4 Nov 2004 13:16:46 +0000 (GMT) Received: from mailgate.vale-housing.co.uk ([194.217.48.34] helo=ratbert.vale-housing.co.uk) by anchor-post-31.mail.demon.net with esmtp (Exim 4.42) id 1CPhUH-000Ovh-4W for pgsql-www@postgresql.org; Thu, 04 Nov 2004 13:17:13 +0000 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Subject: Re: Mirror.php performance Date: Thu, 4 Nov 2004 13:16:45 -0000 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [pgsql-www] Mirror.php performance Thread-Index: AcTCbvIyx6R+p+l0TLCUmTP+YCaqpgAAY36A From: "Dave Page" To: "Alexey Borzov" Cc: X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, hits=0.0 tagged_above=0.0 required=5.0 tests= X-Spam-Level: X-Archive-Number: 200411/97 X-Sequence-Number: 5828 =20 > -----Original Message----- > From: Alexey Borzov [mailto:borz_off@cs.msu.su]=20 > Sent: 04 November 2004 13:03 > To: Dave Page > Cc: pgsql-www@postgresql.org > Subject: Re: [pgsql-www] Mirror.php performance >=20 > Hi, >=20 > Alexey Borzov wrote: > >> Nov 04 09:16:04 mirror [info] Mirroring finished. 1027=20 > page(s) saved, > >> 1346 second(s) spent > >> > >> It appears to have saved everything in the root directory=20 > afaict, and=20 > >> the 7.4 static docs, but nothing else. > >> > >> Any ideas? > >=20 > > Ouch. It did the same for me, will look into this: seems as if some=20 > > links are dropped / not followed. >=20 > Fixed. Turned out the regexes to extract links from pages=20 > were broken and some of the links (including the main menu,=20 > unfortunately) were thus not crawled. Thanks, I'll give it a try. Regard,s dave.