X-Original-To: pgsql-www-postgresql.org@postgresql.org Received: from localhost (wm.hub.org [200.46.204.128]) by postgresql.org (Postfix) with ESMTP id 2DE0E9FB24D for ; Tue, 29 Aug 2006 11:28:09 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.128]) (amavisd-new, port 10024) with ESMTP id 20565-01 for ; Tue, 29 Aug 2006 14:27:57 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey- Received: from lists.commandprompt.com (host-130.commandprompt.net [207.173.203.130]) by postgresql.org (Postfix) with ESMTP id 1ECC19FB227 for ; Tue, 29 Aug 2006 11:27:56 -0300 (ADT) Received: from [192.168.1.50] (or-67-76-146-141.sta.embarqhsd.net [67.76.146.141]) (authenticated bits=0) by lists.commandprompt.com (8.13.7/8.13.6) with ESMTP id k7TERuYP004924 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 29 Aug 2006 07:27:57 -0700 Message-ID: <44F44EF5.3090102@commandprompt.com> Date: Tue, 29 Aug 2006 07:28:05 -0700 From: "Joshua D. Drake" Organization: Command Prompt, Inc. User-Agent: Thunderbird 1.5.0.5 (X11/20060728) MIME-Version: 1.0 To: Dave Page CC: PostgreSQL WWW Subject: Re: A counter productive conversation about search. References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV version 0.88.3, clamav-milter version 0.88.3 on projects.commandprompt.com X-Virus-Status: Clean X-Greylist: Sender succeded SMTP AUTH authentication, not delayed by milter-greylist-1.6 (lists.commandprompt.com [192.168.2.159]); Tue, 29 Aug 2006 07:27:57 -0700 (PDT) X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=0.135 tagged_above=0 required=5 tests=FORGED_RCVD_HELO X-Spam-Level: X-Archive-Number: 200608/163 X-Sequence-Number: 10554 >> Other options include lucene, and rolling our own. > > Is Lucene capable of handling the size of our index? This has always I am going to say, "yes" without any actual knowledge because of Lucene but that is because I am putting more trust in the fact that it is an Apache project then anything. I will check. > been the problem we've had with other projects like MnogoSearch. They > work well until you load them up with the archives after which they > simply can't cope without ridiculous amounts of hardware. > >> Rolling our own really wouldn't be that hard "if" we can create a >> reasonably smart web page grabber. We have all the tools >> (tsearch2 and >> pg_pgtrm) to easily do the searches. >> >> So is anyone up for helping develop a page grabber? > > We have one - it builds the static version of the main site by spidering > it hourly. Should we look at that then? > > Regards, Dave. > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/