public inbox for [email protected]  
help / color / mirror / Atom feed
From: Oleg Bartunov <[email protected]>
To: Tino Wildenhain <[email protected]>
Cc: Joshua D. Drake <[email protected]>
Cc: PostgreSQL WWW <[email protected]>
Subject: Re: A counter productive conversation about search.
Date: Tue, 29 Aug 2006 09:37:04 +0400 (MSD)
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

On Tue, 29 Aug 2006, Tino Wildenhain wrote:

> Joshua D. Drake wrote:
> ...
>> Rolling our own really wouldn't be that hard "if" we can create a
>> reasonably smart web page grabber. We have all the tools (tsearch2 and
>> pg_pgtrm) to easily do the searches.
>>
>> So is anyone up for helping develop a page grabber?
>
> Thats not the hardest part but why do we need to grab if the contents
> of the pages could be in the database? But admittedly, I don't know
> any good CMS w/ postgresql backend. But anyway, grabbing the sources
> of the pages while they are published (like the docbook stuff
> for the documentation) makes a lot more sense imho. Ditto for the
> archives. Its much easier to get an idea of the structure and nature
> of the data when you dont have to deal with the final result (e.g. HTML)
>
> So a couple of scripts that fire when mail comes in, documentation
> is compiled and when some other publishing takes place could
> really help to keep the index in sync w/o having to crawl all sites
> over and over again.

This is exactly what we have on pgsql.ru/db/mw. We use procmail to fire
our backend to process incoming message. This is not a problem, the
most complex thing is a backend.

>
> Regards
> Tino Wildenhain
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to [email protected] so that your
>       message can get through to the mailing list cleanly
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83



view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: A counter productive conversation about search.
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox