public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Postgresql.org search engine.
13+ messages / 5 participants
[nested] [flat]

* Re: Postgresql.org search engine.
@ 2004-01-30 20:53  Dave Page <[email protected]>
  0 siblings, 2 replies; 13+ messages in thread

From: Dave Page @ 2004-01-30 20:53 UTC (permalink / raw)
  To: Marc G. Fournier <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; [email protected]; pgsql-www

 

> -----Original Message-----
> From: Marc G. Fournier [mailto:[email protected]] 
> Sent: 30 January 2004 20:43
> To: Dave Page
> Cc: Oleg Bartunov; [email protected]; [email protected]
> Subject: Re: [pgsql-www] Postgresql.org search engine.
> 
> 
> k, before I regenerate the lists, is this stuff you want me 
> to add to the META DATA part?

There's not much point I don't think. It's the XML feed that might make
use of it, not the standard indexer.

What I really want to see is the absolute bare minimum in the msg files
(not even the titles that are there at the moment - speacking of which,
might be worth including them as a php var we can pickup from the
top_config.php)  - as per the example I emailed you. Then, we should be
able to do anything by editting the header and footer php include files.

Regards, Dave.



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-01-30 21:01  Marc G. Fournier <[email protected]>
  parent: Dave Page <[email protected]>
  1 sibling, 0 replies; 13+ messages in thread

From: Marc G. Fournier @ 2004-01-30 21:01 UTC (permalink / raw)
  To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; [email protected]; pgsql-www

On Fri, 30 Jan 2004, Dave Page wrote:

>
>
> > -----Original Message-----
> > From: Marc G. Fournier [mailto:[email protected]]
> > Sent: 30 January 2004 20:43
> > To: Dave Page
> > Cc: Oleg Bartunov; [email protected]; [email protected]
> > Subject: Re: [pgsql-www] Postgresql.org search engine.
> >
> >
> > k, before I regenerate the lists, is this stuff you want me
> > to add to the META DATA part?
>
> There's not much point I don't think. It's the XML feed that might make
> use of it, not the standard indexer.
>
> What I really want to see is the absolute bare minimum in the msg files
> (not even the titles that are there at the moment - speacking of which,
> might be worth including them as a php var we can pickup from the
> top_config.php)  - as per the example I emailed you. Then, we should be
> able to do anything by editting the header and footer php include files.

D'oh ... I was going to say that I didn't think taht was possible, but, it
just might be ... seems I have a section declared twice (note that someone
else wrote this originally, I've only just begun to understand it to
modify it), so the second section is overriding the first, but I was only
ever seeing the first ...

Let me play with this over the weekend, I'll do a 'small sample set' that
you can look at the messages in, and we can go from there ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-01-31 05:47  Oleg Bartunov <[email protected]>
  parent: Dave Page <[email protected]>
  1 sibling, 1 reply; 13+ messages in thread

From: Oleg Bartunov @ 2004-01-31 05:47 UTC (permalink / raw)
  To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; [email protected]; pgsql-www

On Fri, 30 Jan 2004, Dave Page wrote:

>
>
> > -----Original Message-----
> > From: Marc G. Fournier [mailto:[email protected]]
> > Sent: 30 January 2004 20:43
> > To: Dave Page
> > Cc: Oleg Bartunov; [email protected]; [email protected]
> > Subject: Re: [pgsql-www] Postgresql.org search engine.
> >
> >
> > k, before I regenerate the lists, is this stuff you want me
> > to add to the META DATA part?
>
> There's not much point I don't think. It's the XML feed that might make
> use of it, not the standard indexer.
>
> What I really want to see is the absolute bare minimum in the msg files
> (not even the titles that are there at the moment - speacking of which,
> might be worth including them as a php var we can pickup from the
> top_config.php)  - as per the example I emailed you. Then, we should be
> able to do anything by editting the header and footer php include files.


I don't understand waht's the problem having postings in raw format stored
in filesystem, metadatt - in postgres and show component which combines
both sources to nice html page. Dave could get raw postings from filesystem
using metadata and index them without any problem. Marc could change
html wrapping everyday and everybody are happy :)



>
> Regards, Dave.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-01-31 05:49  Marc G. Fournier <[email protected]>
  parent: Oleg Bartunov <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Marc G. Fournier @ 2004-01-31 05:49 UTC (permalink / raw)
  To: Oleg Bartunov <[email protected]>; +Cc: Dave Page <[email protected]>; Marc G. Fournier <[email protected]>; [email protected]; pgsql-www

On Sat, 31 Jan 2004, Oleg Bartunov wrote:

> I don't understand waht's the problem having postings in raw format
> stored in filesystem, metadatt - in postgres and show component which
> combines both sources to nice html page. Dave could get raw postings
> from filesystem using metadata and index them without any problem. Marc
> could change html wrapping everyday and everybody are happy :)

Do you have software to do this, including all the inter-posting
references and followups?  Or do you propose we write this all from
scratch?

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-01-31 06:01  Josh Berkus <[email protected]>
  parent: Marc G. Fournier <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Josh Berkus @ 2004-01-31 06:01 UTC (permalink / raw)
  To: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; +Cc: Dave Page <[email protected]>; Marc G. Fournier <[email protected]>; pgsql-www

Guys,

> Do you have software to do this, including all the inter-posting
> references and followups?  Or do you propose we write this all from
> scratch?

Robert Bernier apparently wrote something to break up mail for inclusion in a 
database, and should be able to help in a couple months.  Josh Drake is also 
willing to help, and has already done a prototype wiithout header searching.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco




^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-01-31 06:14  Marc G. Fournier <[email protected]>
  parent: Josh Berkus <[email protected]>
  0 siblings, 1 reply; 13+ messages in thread

From: Marc G. Fournier @ 2004-01-31 06:14 UTC (permalink / raw)
  To: Josh Berkus <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; Dave Page <[email protected]>; pgsql-www

On Fri, 30 Jan 2004, Josh Berkus wrote:

> Guys,
>
> > Do you have software to do this, including all the inter-posting
> > references and followups?  Or do you propose we write this all from
> > scratch?
>
> Robert Bernier apparently wrote something to break up mail for inclusion in a
> database, and should be able to help in a couple months.  Josh Drake is also
> willing to help, and has already done a prototype wiithout header searching.

Dumping mail into a database isn't that hard to do ... there are several
projects on the 'Net right now doing that, including one that connects a
POP3 daemon into the database to download the mail ... in fact, from what
I recall of fts.postgresql.org, isn't that what Oleg/Teodor's stuff does?

I'm kinda curious here ... exactly what problem are we trying to solve
here?

Me, I'm just trying to clean up the archives so that when someone gets
their search results, they don't all show the same 'text', which I've
already accomplished ... Dave is working on improving the speed of the
searches, which he has accomplished with ASPseek ...

If I can figure out how to get the Date: of the posting into the
Last-Modified field (I know *how* it should work, but last time I tried it
ended up generating a whack of errors), then that should satisfy Oleg's
beef ...

Oleg, one question ... what do you recommend setting max-age to for
Cache-control?  Right now, I have it set to 30 days ... too long?  not
long enough?

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-01-31 12:45  Oleg Bartunov <[email protected]>
  parent: Marc G. Fournier <[email protected]>
  0 siblings, 2 replies; 13+ messages in thread

From: Oleg Bartunov @ 2004-01-31 12:45 UTC (permalink / raw)
  To: Marc G. Fournier <[email protected]>; +Cc: Josh Berkus <[email protected]>; Dave Page <[email protected]>; pgsql-www

On Sat, 31 Jan 2004, Marc G. Fournier wrote:

> On Fri, 30 Jan 2004, Josh Berkus wrote:
>
> > Guys,
> >
> > > Do you have software to do this, including all the inter-posting
> > > references and followups?  Or do you propose we write this all from
> > > scratch?
> >
> > Robert Bernier apparently wrote something to break up mail for inclusion in a
> > database, and should be able to help in a couple months.  Josh Drake is also
> > willing to help, and has already done a prototype wiithout header searching.
>
> Dumping mail into a database isn't that hard to do ... there are several
> projects on the 'Net right now doing that, including one that connects a
> POP3 daemon into the database to download the mail ... in fact, from what
> I recall of fts.postgresql.org, isn't that what Oleg/Teodor's stuff does?
>
> I'm kinda curious here ... exactly what problem are we trying to solve
> here?
>
> Me, I'm just trying to clean up the archives so that when someone gets
> their search results, they don't all show the same 'text', which I've
> already accomplished ... Dave is working on improving the speed of the
> searches, which he has accomplished with ASPseek ...
>
> If I can figure out how to get the Date: of the posting into the
> Last-Modified field (I know *how* it should work, but last time I tried it
> ended up generating a whack of errors), then that should satisfy Oleg's
> beef ...
>
> Oleg, one question ... what do you recommend setting max-age to for
> Cache-control?  Right now, I have it set to 30 days ... too long?  not
> long enough?

in my experience Cache-control is not effective, because it's
HTTP/1.1 feature and a lot of users come through proxy which still
doesn't support HTTP/1.1
Last-Modified header is the most universal way.
Check http://www.mnot.net/cache_docs/#CACHE-CONTROL

>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-02-01 22:12  Marc G. Fournier <[email protected]>
  parent: Oleg Bartunov <[email protected]>
  1 sibling, 1 reply; 13+ messages in thread

From: Marc G. Fournier @ 2004-02-01 22:12 UTC (permalink / raw)
  To: Oleg Bartunov <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Josh Berkus <[email protected]>; Dave Page <[email protected]>; pgsql-www

On Sat, 31 Jan 2004, Oleg Bartunov wrote:

> > If I can figure out how to get the Date: of the posting into the
> > Last-Modified field (I know *how* it should work, but last time I tried it
> > ended up generating a whack of errors), then that should satisfy Oleg's
> > beef ...

'k, figured out my error with the mhonarc resource file, and now have
posting date in as last-modified ... I'm doing this off to the side right
now, while I work out the noindex stuff for Dave, but check out:

http://archives.postgresql.org/dev

And let me know if the headers look right to you ... I took out the
Cache-control stuff ...

Let me know if there is anything else you'd like to see in there ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* META Tags on Archives
@ 2004-02-01 22:34  Marc G. Fournier <[email protected]>
  parent: Oleg Bartunov <[email protected]>
  1 sibling, 2 replies; 13+ messages in thread

From: Marc G. Fournier @ 2004-02-01 22:34 UTC (permalink / raw)
  To: Oleg Bartunov <[email protected]>; +Cc: pgsql-www


Oleg ... as the "resident pro" here ... does this make sense:

Messages have:

     <META NAME="robots" CONTENT="nofollow, index, archive">

And indexes have:

     <META NAME="robots" CONTENT="follow, noindex, noarchive">



----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: Postgresql.org search engine.
@ 2004-02-02 12:57  Oleg Bartunov <[email protected]>
  parent: Marc G. Fournier <[email protected]>
  0 siblings, 0 replies; 13+ messages in thread

From: Oleg Bartunov @ 2004-02-02 12:57 UTC (permalink / raw)
  To: Marc G. Fournier <[email protected]>; +Cc: Josh Berkus <[email protected]>; Dave Page <[email protected]>; pgsql-www

On Sun, 1 Feb 2004, Marc G. Fournier wrote:

> On Sat, 31 Jan 2004, Oleg Bartunov wrote:
>
> > > If I can figure out how to get the Date: of the posting into the
> > > Last-Modified field (I know *how* it should work, but last time I tried it
> > > ended up generating a whack of errors), then that should satisfy Oleg's
> > > beef ...
>
> 'k, figured out my error with the mhonarc resource file, and now have
> posting date in as last-modified ... I'm doing this off to the side right
> now, while I work out the noindex stuff for Dave, but check out:
>
> http://archives.postgresql.org/dev
>
> And let me know if the headers look right to you ... I took out the
> Cache-control stuff ...
>
> Let me know if there is anything else you'd like to see in there ...
>

http headers looks fine !

> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: META Tags on Archives
@ 2004-02-02 13:02  Oleg Bartunov <[email protected]>
  parent: Marc G. Fournier <[email protected]>
  1 sibling, 0 replies; 13+ messages in thread

From: Oleg Bartunov @ 2004-02-02 13:02 UTC (permalink / raw)
  To: Marc G. Fournier <[email protected]>; +Cc: pgsql-www

On Sun, 1 Feb 2004, Marc G. Fournier wrote:

>
> Oleg ... as the "resident pro" here ... does this make sense:
>
> Messages have:
>
>      <META NAME="robots" CONTENT="nofollow, index, archive">
>
> And indexes have:
>
>      <META NAME="robots" CONTENT="follow, noindex, noarchive">
>

I don't know 'archive, noarchive', but others looks ok. I'm rather
sceptical about this tag, because I dont know robots which recognize it :)

>
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664
>

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: META Tags on Archives
@ 2004-02-02 14:46  Jeroen Ruigrok/asmodai <[email protected]>
  parent: Marc G. Fournier <[email protected]>
  1 sibling, 1 reply; 13+ messages in thread

From: Jeroen Ruigrok/asmodai @ 2004-02-02 14:46 UTC (permalink / raw)
  To: Marc G. Fournier <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; pgsql-www

-On [20040201 23:43], Marc G. Fournier ([email protected]) wrote:
>     <META NAME="robots" CONTENT="nofollow, index, archive">

According to http://www.robotstxt.org/wc/meta-user.html
archive|noarchive does not exist.

Where'd you find it?

-- 
Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / kita no mono
PGP fingerprint: 2D92 980E 45FE 2C28 9DB7  9D88 97E6 839B 2EAC 625B
http://www.tendra.org/   | http://diary.in-nomine.org/
The human race is challenged more than ever before to demonstrate our
mastery -- not over nature but of ourselves...



^ permalink  raw  reply  [nested|flat] 13+ messages in thread

* Re: META Tags on Archives
@ 2004-02-02 14:52  Marc G. Fournier <[email protected]>
  parent: Jeroen Ruigrok/asmodai <[email protected]>
  0 siblings, 0 replies; 13+ messages in thread

From: Marc G. Fournier @ 2004-02-02 14:52 UTC (permalink / raw)
  To: Jeroen Ruigrok/asmodai <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; pgsql-www

On Mon, 2 Feb 2004, Jeroen Ruigrok/asmodai wrote:

> -On [20040201 23:43], Marc G. Fournier ([email protected]) wrote:
> >     <META NAME="robots" CONTENT="nofollow, index, archive">
>
> According to http://www.robotstxt.org/wc/meta-user.html
> archive|noarchive does not exist.
>
> Where'd you find it?

Actually, that one was in the original .resource file, but a quick search
on google shows:

http://www.bauser.com/websnob/meta/robots.html

and

http://www.google.com/webmasters/faq.html#cached

the funny thing is that this one:

http://www.katpatuka.org/pub/doc/robotexclusion.html

refers to the NOARCHIVE, but puts to:

http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt

which doesn't include it ...

I love standards that everyone follows *roll eyes*

I'm gathering its somethign that some use (Google does, apparently), and
some don't ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: [email protected]           Yahoo!: yscrappy              ICQ: 7615664




^ permalink  raw  reply  [nested|flat] 13+ messages in thread


end of thread, other threads:[~2004-02-02 14:52 UTC | newest]

Thread overview: 13+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2004-01-30 20:53 Re: Postgresql.org search engine. Dave Page <[email protected]>
2004-01-30 21:01 ` Marc G. Fournier <[email protected]>
2004-01-31 05:47 ` Oleg Bartunov <[email protected]>
2004-01-31 05:49   ` Marc G. Fournier <[email protected]>
2004-01-31 06:01     ` Josh Berkus <[email protected]>
2004-01-31 06:14       ` Marc G. Fournier <[email protected]>
2004-01-31 12:45         ` Oleg Bartunov <[email protected]>
2004-02-01 22:12           ` Marc G. Fournier <[email protected]>
2004-02-02 12:57             ` Oleg Bartunov <[email protected]>
2004-02-01 22:34           ` META Tags on Archives Marc G. Fournier <[email protected]>
2004-02-02 13:02             ` Re: META Tags on Archives Oleg Bartunov <[email protected]>
2004-02-02 14:46             ` Re: META Tags on Archives Jeroen Ruigrok/asmodai <[email protected]>
2004-02-02 14:52               ` Re: META Tags on Archives Marc G. Fournier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox