X-Original-To: pgsql-www-postgresql.org@localhost.postgresql.org Received: from localhost (neptune.hub.org [200.46.204.2]) by svr1.postgresql.org (Postfix) with ESMTP id 39825D1B4AE; Sat, 17 Jan 2004 04:58:25 +0000 (GMT) Received: from svr1.postgresql.org ([200.46.204.71]) by localhost (neptune.hub.org [200.46.204.2]) (amavisd-new, port 10024) with ESMTP id 78426-01; Sat, 17 Jan 2004 00:57:54 -0400 (AST) Received: from ra.sai.msu.su (ra.sai.msu.su [158.250.29.2]) by svr1.postgresql.org (Postfix) with ESMTP id EB188D1D89A; Sat, 17 Jan 2004 00:57:46 -0400 (AST) Received: from ra (ra [158.250.29.2]) by ra.sai.msu.su (8.12.10/8.12.10) with ESMTP id i0H4vaYJ003839; Sat, 17 Jan 2004 07:57:36 +0300 (MSK) Date: Sat, 17 Jan 2004 07:57:36 +0300 (MSK) From: Oleg Bartunov X-X-Sender: megera@ra.sai.msu.su To: "Marc G. Fournier" Cc: Robert Treat , pgsql-www@postgresql.org, "Joshua D. Drake" Subject: Re: [pgsql-advocacy] New PostgreSQL search resource In-Reply-To: <20040116235037.Y13900@ganymede.hub.org> Message-ID: References: <400882E2.8000506@commandprompt.com> <200401162246.33928.xzilla@users.sourceforge.net> <20040116235037.Y13900@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavisd-new at postgresql.org X-Archive-Number: 200401/177 X-Sequence-Number: 3416 On Fri, 16 Jan 2004, Marc G. Fournier wrote: > On Fri, 16 Jan 2004, Robert Treat wrote: > > > Ok, this is now the second site that has come to be in the last few days that > > is using FTS and Tsearch for site search, and providing something that seems > > a lot better than the search available on the main website... what is > > stopping us from implementing this and dumping mnogosearch? at least for the > > main site if not for the archives? > > actually, Dave is working on an improved search ... but, Oleg just > announced a crawler using tsearch that I'm going to take a look at > implementing as well ... I suggest to follow Roberts suggestion about indexing main site. Archives is a bit another thing. I already wrote it needs to be optimized for crawlers (headers). Also, It would be much better to be able to index just content without headers/footers etc. If you have access to files :), you may index much faster without any crawler ! OpenFTS distribution contains example scripts to index file collections. It's very very easy. I quoting from "Survival Guide": APOD collection is consists of 1757 articles (about 7 Mb) and ideally suited for OpenFTS. Indexing tooks about 29 seconds on my IBM ThinkPad T21 notebook ( Linux, 2.4.17, 256 Mb RAM, 20 Gb IDE HD). Total number of lexems is 131310, while the number of unique lexemes is only 8,806 ( using Porter's stemmer ). Official PostgreSQL documentation is about the same size. > > > > > > Robert Treat > > > > On Friday 16 January 2004 19:33, Joshua D. Drake wrote: > > > Hello, > > > > > > Took an hour today and made the 7.3.4, 7.4.1 and Practical PostgreSQL > > > documentation > > > all searchable using OpenFTS and Tsearch2. You can take a look at: > > > > > > http://www.commandprompt.com/community/ > > > > > > Sincerely, > > > > > > Joshua Drake > > > > -- > > Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83