public inbox for [email protected]
help / color / mirror / Atom feedRe: Postgresql.org search engine.
22+ messages / 5 participants
[nested] [flat]
* Re: Postgresql.org search engine.
@ 2004-01-30 20:53 Dave Page <[email protected]>
0 siblings, 2 replies; 22+ messages in thread
From: Dave Page @ 2004-01-30 20:53 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; [email protected]; pgsql-www
> -----Original Message-----
> From: Marc G. Fournier [mailto:[email protected]]
> Sent: 30 January 2004 20:43
> To: Dave Page
> Cc: Oleg Bartunov; [email protected]; [email protected]
> Subject: Re: [pgsql-www] Postgresql.org search engine.
>
>
> k, before I regenerate the lists, is this stuff you want me
> to add to the META DATA part?
There's not much point I don't think. It's the XML feed that might make
use of it, not the standard indexer.
What I really want to see is the absolute bare minimum in the msg files
(not even the titles that are there at the moment - speacking of which,
might be worth including them as a php var we can pickup from the
top_config.php) - as per the example I emailed you. Then, we should be
able to do anything by editting the header and footer php include files.
Regards, Dave.
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-01-30 21:01 Marc G. Fournier <[email protected]>
parent: Dave Page <[email protected]>
1 sibling, 0 replies; 22+ messages in thread
From: Marc G. Fournier @ 2004-01-30 21:01 UTC (permalink / raw)
To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; [email protected]; pgsql-www
On Fri, 30 Jan 2004, Dave Page wrote:
>
>
> > -----Original Message-----
> > From: Marc G. Fournier [mailto:[email protected]]
> > Sent: 30 January 2004 20:43
> > To: Dave Page
> > Cc: Oleg Bartunov; [email protected]; [email protected]
> > Subject: Re: [pgsql-www] Postgresql.org search engine.
> >
> >
> > k, before I regenerate the lists, is this stuff you want me
> > to add to the META DATA part?
>
> There's not much point I don't think. It's the XML feed that might make
> use of it, not the standard indexer.
>
> What I really want to see is the absolute bare minimum in the msg files
> (not even the titles that are there at the moment - speacking of which,
> might be worth including them as a php var we can pickup from the
> top_config.php) - as per the example I emailed you. Then, we should be
> able to do anything by editting the header and footer php include files.
D'oh ... I was going to say that I didn't think taht was possible, but, it
just might be ... seems I have a section declared twice (note that someone
else wrote this originally, I've only just begun to understand it to
modify it), so the second section is overriding the first, but I was only
ever seeing the first ...
Let me play with this over the weekend, I'll do a 'small sample set' that
you can look at the messages in, and we can go from there ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-01-31 05:47 Oleg Bartunov <[email protected]>
parent: Dave Page <[email protected]>
1 sibling, 1 reply; 22+ messages in thread
From: Oleg Bartunov @ 2004-01-31 05:47 UTC (permalink / raw)
To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; [email protected]; pgsql-www
On Fri, 30 Jan 2004, Dave Page wrote:
>
>
> > -----Original Message-----
> > From: Marc G. Fournier [mailto:[email protected]]
> > Sent: 30 January 2004 20:43
> > To: Dave Page
> > Cc: Oleg Bartunov; [email protected]; [email protected]
> > Subject: Re: [pgsql-www] Postgresql.org search engine.
> >
> >
> > k, before I regenerate the lists, is this stuff you want me
> > to add to the META DATA part?
>
> There's not much point I don't think. It's the XML feed that might make
> use of it, not the standard indexer.
>
> What I really want to see is the absolute bare minimum in the msg files
> (not even the titles that are there at the moment - speacking of which,
> might be worth including them as a php var we can pickup from the
> top_config.php) - as per the example I emailed you. Then, we should be
> able to do anything by editting the header and footer php include files.
I don't understand waht's the problem having postings in raw format stored
in filesystem, metadatt - in postgres and show component which combines
both sources to nice html page. Dave could get raw postings from filesystem
using metadata and index them without any problem. Marc could change
html wrapping everyday and everybody are happy :)
>
> Regards, Dave.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-01-31 05:49 Marc G. Fournier <[email protected]>
parent: Oleg Bartunov <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Marc G. Fournier @ 2004-01-31 05:49 UTC (permalink / raw)
To: Oleg Bartunov <[email protected]>; +Cc: Dave Page <[email protected]>; Marc G. Fournier <[email protected]>; [email protected]; pgsql-www
On Sat, 31 Jan 2004, Oleg Bartunov wrote:
> I don't understand waht's the problem having postings in raw format
> stored in filesystem, metadatt - in postgres and show component which
> combines both sources to nice html page. Dave could get raw postings
> from filesystem using metadata and index them without any problem. Marc
> could change html wrapping everyday and everybody are happy :)
Do you have software to do this, including all the inter-posting
references and followups? Or do you propose we write this all from
scratch?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-01-31 06:01 Josh Berkus <[email protected]>
parent: Marc G. Fournier <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Josh Berkus @ 2004-01-31 06:01 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; +Cc: Dave Page <[email protected]>; Marc G. Fournier <[email protected]>; pgsql-www
Guys,
> Do you have software to do this, including all the inter-posting
> references and followups? Or do you propose we write this all from
> scratch?
Robert Bernier apparently wrote something to break up mail for inclusion in a
database, and should be able to help in a couple months. Josh Drake is also
willing to help, and has already done a prototype wiithout header searching.
--
-Josh Berkus
Aglio Database Solutions
San Francisco
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-01-31 06:14 Marc G. Fournier <[email protected]>
parent: Josh Berkus <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Marc G. Fournier @ 2004-01-31 06:14 UTC (permalink / raw)
To: Josh Berkus <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; Dave Page <[email protected]>; pgsql-www
On Fri, 30 Jan 2004, Josh Berkus wrote:
> Guys,
>
> > Do you have software to do this, including all the inter-posting
> > references and followups? Or do you propose we write this all from
> > scratch?
>
> Robert Bernier apparently wrote something to break up mail for inclusion in a
> database, and should be able to help in a couple months. Josh Drake is also
> willing to help, and has already done a prototype wiithout header searching.
Dumping mail into a database isn't that hard to do ... there are several
projects on the 'Net right now doing that, including one that connects a
POP3 daemon into the database to download the mail ... in fact, from what
I recall of fts.postgresql.org, isn't that what Oleg/Teodor's stuff does?
I'm kinda curious here ... exactly what problem are we trying to solve
here?
Me, I'm just trying to clean up the archives so that when someone gets
their search results, they don't all show the same 'text', which I've
already accomplished ... Dave is working on improving the speed of the
searches, which he has accomplished with ASPseek ...
If I can figure out how to get the Date: of the posting into the
Last-Modified field (I know *how* it should work, but last time I tried it
ended up generating a whack of errors), then that should satisfy Oleg's
beef ...
Oleg, one question ... what do you recommend setting max-age to for
Cache-control? Right now, I have it set to 30 days ... too long? not
long enough?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-01-31 12:45 Oleg Bartunov <[email protected]>
parent: Marc G. Fournier <[email protected]>
0 siblings, 2 replies; 22+ messages in thread
From: Oleg Bartunov @ 2004-01-31 12:45 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Josh Berkus <[email protected]>; Dave Page <[email protected]>; pgsql-www
On Sat, 31 Jan 2004, Marc G. Fournier wrote:
> On Fri, 30 Jan 2004, Josh Berkus wrote:
>
> > Guys,
> >
> > > Do you have software to do this, including all the inter-posting
> > > references and followups? Or do you propose we write this all from
> > > scratch?
> >
> > Robert Bernier apparently wrote something to break up mail for inclusion in a
> > database, and should be able to help in a couple months. Josh Drake is also
> > willing to help, and has already done a prototype wiithout header searching.
>
> Dumping mail into a database isn't that hard to do ... there are several
> projects on the 'Net right now doing that, including one that connects a
> POP3 daemon into the database to download the mail ... in fact, from what
> I recall of fts.postgresql.org, isn't that what Oleg/Teodor's stuff does?
>
> I'm kinda curious here ... exactly what problem are we trying to solve
> here?
>
> Me, I'm just trying to clean up the archives so that when someone gets
> their search results, they don't all show the same 'text', which I've
> already accomplished ... Dave is working on improving the speed of the
> searches, which he has accomplished with ASPseek ...
>
> If I can figure out how to get the Date: of the posting into the
> Last-Modified field (I know *how* it should work, but last time I tried it
> ended up generating a whack of errors), then that should satisfy Oleg's
> beef ...
>
> Oleg, one question ... what do you recommend setting max-age to for
> Cache-control? Right now, I have it set to 30 days ... too long? not
> long enough?
in my experience Cache-control is not effective, because it's
HTTP/1.1 feature and a lot of users come through proxy which still
doesn't support HTTP/1.1
Last-Modified header is the most universal way.
Check http://www.mnot.net/cache_docs/#CACHE-CONTROL
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
> Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-01 22:12 Marc G. Fournier <[email protected]>
parent: Oleg Bartunov <[email protected]>
1 sibling, 1 reply; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-01 22:12 UTC (permalink / raw)
To: Oleg Bartunov <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Josh Berkus <[email protected]>; Dave Page <[email protected]>; pgsql-www
On Sat, 31 Jan 2004, Oleg Bartunov wrote:
> > If I can figure out how to get the Date: of the posting into the
> > Last-Modified field (I know *how* it should work, but last time I tried it
> > ended up generating a whack of errors), then that should satisfy Oleg's
> > beef ...
'k, figured out my error with the mhonarc resource file, and now have
posting date in as last-modified ... I'm doing this off to the side right
now, while I work out the noindex stuff for Dave, but check out:
http://archives.postgresql.org/dev
And let me know if the headers look right to you ... I took out the
Cache-control stuff ...
Let me know if there is anything else you'd like to see in there ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* META Tags on Archives
@ 2004-02-01 22:34 Marc G. Fournier <[email protected]>
parent: Oleg Bartunov <[email protected]>
1 sibling, 2 replies; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-01 22:34 UTC (permalink / raw)
To: Oleg Bartunov <[email protected]>; +Cc: pgsql-www
Oleg ... as the "resident pro" here ... does this make sense:
Messages have:
<META NAME="robots" CONTENT="nofollow, index, archive">
And indexes have:
<META NAME="robots" CONTENT="follow, noindex, noarchive">
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-02 12:57 Oleg Bartunov <[email protected]>
parent: Marc G. Fournier <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Oleg Bartunov @ 2004-02-02 12:57 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Josh Berkus <[email protected]>; Dave Page <[email protected]>; pgsql-www
On Sun, 1 Feb 2004, Marc G. Fournier wrote:
> On Sat, 31 Jan 2004, Oleg Bartunov wrote:
>
> > > If I can figure out how to get the Date: of the posting into the
> > > Last-Modified field (I know *how* it should work, but last time I tried it
> > > ended up generating a whack of errors), then that should satisfy Oleg's
> > > beef ...
>
> 'k, figured out my error with the mhonarc resource file, and now have
> posting date in as last-modified ... I'm doing this off to the side right
> now, while I work out the noindex stuff for Dave, but check out:
>
> http://archives.postgresql.org/dev
>
> And let me know if the headers look right to you ... I took out the
> Cache-control stuff ...
>
> Let me know if there is anything else you'd like to see in there ...
>
http headers looks fine !
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
> Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: META Tags on Archives
@ 2004-02-02 13:02 Oleg Bartunov <[email protected]>
parent: Marc G. Fournier <[email protected]>
1 sibling, 0 replies; 22+ messages in thread
From: Oleg Bartunov @ 2004-02-02 13:02 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: pgsql-www
On Sun, 1 Feb 2004, Marc G. Fournier wrote:
>
> Oleg ... as the "resident pro" here ... does this make sense:
>
> Messages have:
>
> <META NAME="robots" CONTENT="nofollow, index, archive">
>
> And indexes have:
>
> <META NAME="robots" CONTENT="follow, noindex, noarchive">
>
I don't know 'archive, noarchive', but others looks ok. I'm rather
sceptical about this tag, because I dont know robots which recognize it :)
>
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
> Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-02 13:39 Dave Page <[email protected]>
0 siblings, 2 replies; 22+ messages in thread
From: Dave Page @ 2004-02-02 13:39 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; +Cc: Josh Berkus <[email protected]>; pgsql-www
> -----Original Message-----
> From: Marc G. Fournier [mailto:[email protected]]
> Sent: 01 February 2004 22:12
> To: Oleg Bartunov
> Cc: Marc G. Fournier; Josh Berkus; Dave Page; [email protected]
> Subject: Re: [pgsql-www] Postgresql.org search engine.
>
> And let me know if the headers look right to you ... I took
> out the Cache-control stuff ...
>
> Let me know if there is anything else you'd like to see in there ...
It looks even more complex to me now - there are what, 6 include files?
How about something more simple:
========================================================================
======
<?
$last_modified = "Fri, 9 Jan 2004 19:00:28 +0000 (GMT)";
$subject = " Re: IMPORTANT: A temporary list for Strategic Marketing";
require("$DOCUMENT_ROOT/includes/header.php");
?>
<pre>Joshua D. Drake wrote:
>
> >There shouldn't be any tangents or general discussion on
-advocacy
> >either -- that's what -general is for. A one-time incident
should not
> >lead to such drastic measures. If the marketing plan is no
longer
> >discussed on -advocacy, what is?
> >
> >
> I disagree whole heartedly. If you look at general, it is basically
> PostgreSQL-Support.
<!--noindex-->
<HR>
<UL>
<li>Prev by Date:
<strong><a href="msg00116.php">Re: IMPORTANT: A temporary list for
Strategic Marketing</a></strong>
</li>
<li>Next by Date:
<strong><a href="msg00118.php">Re: IMPORTANT: A temporary list for
Strategic Marketing</a></strong>
</li>
<li>Previous by thread:
<strong><a href="msg00126.php">Re: IMPORTANT: A temporary list for
Strategic</a></strong>
</li>
<li>Next by thread:
<strong><a href="msg00125.php">Re: IMPORTANT: A temporary list for
Strategic</a></strong>
</li>
<LI>Index(es):
<UL>
<LI><A HREF="mail2.php#00117"><STRONG>Main</STRONG></A></LI>
<LI><A HREF="thrd2.php#00117"><STRONG>Thread</STRONG></A></LI>
</UL>
</LI>
</UL>
<!--/noindex-->
<?
require("$DOCUMENT_ROOT/includes/footer.php");
?>
========================================================================
======
Header.php then may look something like:
========================================================================
======
<?
if(isset($last_modified)) {
header("Last-Modified: $last_modified");
} else {
header("Last-Modified: " .date("r", filemtime($SCRIPT_FILENAME)));
}
// Other stuff here
?>
<HTML>
<HEADER>
<TITLE><? php echo $subject ?></TITLE>
<META NAME="robots" CONTENT="nofollow, index, archive">
</HEADER>
<BODY>
<!--noindex-->
<!-- HTML code for search form etc. -->
<!--/noindex-->
========================================================================
======
And footer.php (minus and footers we might add).
========================================================================
======
</BODY>
</HTML>
========================================================================
======
In addition, there is an awful lot of HTML comments that mhonarc has
added:
<!--X-Head-Body-Sep-End-->
<!--X-Body-of-Message-->
As examples. These seem somewhat extranous and could be removed for ease
of reading and disk space/bandwidth usage reduction.
Oh, and on the current version the noindex tags seem to be in the wrong
places. On the index/thread pages for example, they should enclose all
the hyperlinks. The noindex tags do not stop links being followed, just
the text within them from being included in the index.
Regards, Dave.
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-02 14:11 Marc G. Fournier <[email protected]>
parent: Dave Page <[email protected]>
1 sibling, 1 reply; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-02 14:11 UTC (permalink / raw)
To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
On Mon, 2 Feb 2004, Dave Page wrote:
>
>
> > -----Original Message-----
> > From: Marc G. Fournier [mailto:[email protected]]
> > Sent: 01 February 2004 22:12
> > To: Oleg Bartunov
> > Cc: Marc G. Fournier; Josh Berkus; Dave Page; [email protected]
> > Subject: Re: [pgsql-www] Postgresql.org search engine.
> >
> > And let me know if the headers look right to you ... I took
> > out the Cache-control stuff ...
> >
> > Let me know if there is anything else you'd like to see in there ...
>
> It looks even more complex to me now - there are what, 6 include files?
>
> How about something more simple:
if you can figure out how to do it in the .resource file, please let me
know ... I've strip'd out everything that I believe can be done without
making the .resource file itself majorly confusing ...
> In addition, there is an awful lot of HTML comments that mhonarc has
> added:
>
> <!--X-Head-Body-Sep-End-->
> <!--X-Body-of-Message-->
Nothing I can do about these, there are no configuration directives that
I've found to strip those ...
> Oh, and on the current version the noindex tags seem to be in the wrong
> places. On the index/thread pages for example, they should enclose all
> the hyperlinks. The noindex tags do not stop links being followed, just
> the text within them from being included in the index.
The index/thread pages all have noindex,follow set in the META TAG ...
isn't that what that META TAG is supposed to be for?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-02 14:25 Marc G. Fournier <[email protected]>
parent: Marc G. Fournier <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-02 14:25 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Dave Page <[email protected]>; Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
I just cleaned up the <HEAD></HEAD> section of the message layout, so that
shrinks the msg*.php files by a few more lines ...
On Mon, 2 Feb 2004, Marc G. Fournier wrote:
> On Mon, 2 Feb 2004, Dave Page wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Marc G. Fournier [mailto:[email protected]]
> > > Sent: 01 February 2004 22:12
> > > To: Oleg Bartunov
> > > Cc: Marc G. Fournier; Josh Berkus; Dave Page; [email protected]
> > > Subject: Re: [pgsql-www] Postgresql.org search engine.
> > >
> > > And let me know if the headers look right to you ... I took
> > > out the Cache-control stuff ...
> > >
> > > Let me know if there is anything else you'd like to see in there ...
> >
> > It looks even more complex to me now - there are what, 6 include files?
> >
> > How about something more simple:
>
> if you can figure out how to do it in the .resource file, please let me
> know ... I've strip'd out everything that I believe can be done without
> making the .resource file itself majorly confusing ...
>
> > In addition, there is an awful lot of HTML comments that mhonarc has
> > added:
> >
> > <!--X-Head-Body-Sep-End-->
> > <!--X-Body-of-Message-->
>
> Nothing I can do about these, there are no configuration directives that
> I've found to strip those ...
>
> > Oh, and on the current version the noindex tags seem to be in the wrong
> > places. On the index/thread pages for example, they should enclose all
> > the hyperlinks. The noindex tags do not stop links being followed, just
> > the text within them from being included in the index.
>
> The index/thread pages all have noindex,follow set in the META TAG ...
> isn't that what that META TAG is supposed to be for?
>
> ----
> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
> Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
>
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: META Tags on Archives
@ 2004-02-02 14:46 Jeroen Ruigrok/asmodai <[email protected]>
parent: Marc G. Fournier <[email protected]>
1 sibling, 1 reply; 22+ messages in thread
From: Jeroen Ruigrok/asmodai @ 2004-02-02 14:46 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; pgsql-www
-On [20040201 23:43], Marc G. Fournier ([email protected]) wrote:
> <META NAME="robots" CONTENT="nofollow, index, archive">
According to http://www.robotstxt.org/wc/meta-user.html
archive|noarchive does not exist.
Where'd you find it?
--
Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / kita no mono
PGP fingerprint: 2D92 980E 45FE 2C28 9DB7 9D88 97E6 839B 2EAC 625B
http://www.tendra.org/ | http://diary.in-nomine.org/
The human race is challenged more than ever before to demonstrate our
mastery -- not over nature but of ourselves...
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-02 14:48 Dave Page <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Dave Page @ 2004-02-02 14:48 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
> -----Original Message-----
> From: Marc G. Fournier [mailto:[email protected]]
> Sent: 02 February 2004 14:11
> To: Dave Page
> Cc: Marc G. Fournier; Oleg Bartunov; Josh Berkus;
> [email protected]
> Subject: RE: [pgsql-www] Postgresql.org search engine.
>>
> if you can figure out how to do it in the .resource file,
> please let me know ... I've strip'd out everything that I
> believe can be done without making the .resource file itself
> majorly confusing ...
If I make a copy of the directory to play with, how do I re-run mhonarc?
(probably won't be today though, I have screaming headache and a broken
pbx).
>
> The index/thread pages all have noindex,follow set in the META TAG ...
> isn't that what that META TAG is supposed to be for?
Yes, but then why include the <!--noindex--> tags as well if they are in
the wrong place?
Regards, Dave.
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: META Tags on Archives
@ 2004-02-02 14:52 Marc G. Fournier <[email protected]>
parent: Jeroen Ruigrok/asmodai <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-02 14:52 UTC (permalink / raw)
To: Jeroen Ruigrok/asmodai <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; pgsql-www
On Mon, 2 Feb 2004, Jeroen Ruigrok/asmodai wrote:
> -On [20040201 23:43], Marc G. Fournier ([email protected]) wrote:
> > <META NAME="robots" CONTENT="nofollow, index, archive">
>
> According to http://www.robotstxt.org/wc/meta-user.html
> archive|noarchive does not exist.
>
> Where'd you find it?
Actually, that one was in the original .resource file, but a quick search
on google shows:
http://www.bauser.com/websnob/meta/robots.html
and
http://www.google.com/webmasters/faq.html#cached
the funny thing is that this one:
http://www.katpatuka.org/pub/doc/robotexclusion.html
refers to the NOARCHIVE, but puts to:
http://www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt
which doesn't include it ...
I love standards that everyone follows *roll eyes*
I'm gathering its somethign that some use (Google does, apparently), and
some don't ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-02 14:56 Marc G. Fournier <[email protected]>
parent: Dave Page <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-02 14:56 UTC (permalink / raw)
To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
On Mon, 2 Feb 2004, Dave Page wrote:
>
>
> > -----Original Message-----
> > From: Marc G. Fournier [mailto:[email protected]]
> > Sent: 02 February 2004 14:11
> > To: Dave Page
> > Cc: Marc G. Fournier; Oleg Bartunov; Josh Berkus;
> > [email protected]
> > Subject: RE: [pgsql-www] Postgresql.org search engine.
> >>
> > if you can figure out how to do it in the .resource file,
> > please let me know ... I've strip'd out everything that I
> > believe can be done without making the .resource file itself
> > majorly confusing ...
>
> If I make a copy of the directory to play with, how do I re-run mhonarc?
> (probably won't be today though, I have screaming headache and a broken
> pbx).
there is a mk-mhonarc script in the directory that you can run ...
> > The index/thread pages all have noindex,follow set in the META TAG ...
> > isn't that what that META TAG is supposed to be for?
>
> Yes, but then why include the <!--noindex--> tags as well if they are in
> the wrong place?
removed from the index page(s) ... in fact, I hadn't even put them into
the thread index pages ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-02 15:38 Marc G. Fournier <[email protected]>
parent: Dave Page <[email protected]>
1 sibling, 0 replies; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-02 15:38 UTC (permalink / raw)
To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
On Mon, 2 Feb 2004, Dave Page wrote:
> In addition, there is an awful lot of HTML comments that mhonarc has
> added:
>
> <!--X-Head-Body-Sep-End-->
> <!--X-Body-of-Message-->
>
> As examples. These seem somewhat extranous and could be removed for ease
> of reading and disk space/bandwidth usage reduction.
I've put a note out to the mhonarc list to see if there is somethign I'm
missing in the docs that allows one to turn those off ... it seems to add
about a 1k worth of data to each file, which, when dealing with 100's of
thousands of messages, is a fair amount of disk space ...
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-04 12:07 Dave Page <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Dave Page @ 2004-02-04 12:07 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
> -----Original Message-----
> From: Marc G. Fournier [mailto:[email protected]]
> Sent: 02 February 2004 14:11
> To: Dave Page
> Cc: Marc G. Fournier; Oleg Bartunov; Josh Berkus;
> [email protected]
> Subject: RE: [pgsql-www] Postgresql.org search engine.
>
> > How about something more simple:
>
> if you can figure out how to do it in the .resource file,
> please let me know ... I've strip'd out everything that I
> believe can be done without making the .resource file itself
> majorly confusing ...
OK, well frankly mhonarc looks like a nightmare to setup. I've had a
play with hypermail instead. I realise that I've yet to drop in your
search engine detection code and there is still work to be done, but how
does this look:
http://archives.postgresql.org/dave/pgsql-advocacy/
It's all pretty self contained at the moment - feel free to have a play
with it.
(mk-hypermail to rebuild, you may need to clear old files first if you
make drastic changes).
Regards, Dave
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-04 14:11 Marc G. Fournier <[email protected]>
parent: Dave Page <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Marc G. Fournier @ 2004-02-04 14:11 UTC (permalink / raw)
To: Dave Page <[email protected]>; +Cc: Marc G. Fournier <[email protected]>; Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
On Wed, 4 Feb 2004, Dave Page wrote:
>
>
> > -----Original Message-----
> > From: Marc G. Fournier [mailto:[email protected]]
> > Sent: 02 February 2004 14:11
> > To: Dave Page
> > Cc: Marc G. Fournier; Oleg Bartunov; Josh Berkus;
> > [email protected]
> > Subject: RE: [pgsql-www] Postgresql.org search engine.
> >
> > > How about something more simple:
> >
> > if you can figure out how to do it in the .resource file,
> > please let me know ... I've strip'd out everything that I
> > believe can be done without making the .resource file itself
> > majorly confusing ...
>
> OK, well frankly mhonarc looks like a nightmare to setup.
Actually, its quite easy one you read through the docs ... there are
formats in the .resource file for the Date Index, Thread Index and Message
Page ... and each of those is broken down into sub-sections ... best place
to start is:
http://www.mhonarc.org/MHonArc/doc/layout.html
and then look at each subsection as you need to modify it ...
> I've had a
> play with hypermail instead. I realise that I've yet to drop in your
> search engine detection code and there is still work to be done, but how
> does this look:
>
> http://archives.postgresql.org/dave/pgsql-advocacy/
>
> It's all pretty self contained at the moment - feel free to have a play
> with it.
k, first thing that is missing is the last-modified date isn't set right,
which makes it a no go option ... looking at the hypermail.conf file, it
more reminds me of setting up a web stats program then a list archiver ...
options are either ... you can add footers and headers, but scanning
through the docs that it points to, there doesn't seem to be any way of
adding a Last-Modified header, since they don't even seem to define
VARIABLES (ie. $DATE$ for date of posting) that you can use when
generating the archives ...
let me know if I've missed something .. ?
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email: [email protected] Yahoo!: yscrappy ICQ: 7615664
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: Postgresql.org search engine.
@ 2004-02-04 19:34 Dave Page <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Dave Page @ 2004-02-04 19:34 UTC (permalink / raw)
To: Marc G. Fournier <[email protected]>; +Cc: Oleg Bartunov <[email protected]>; Josh Berkus <[email protected]>; pgsql-www
> -----Original Message-----
> From: Marc G. Fournier [mailto:[email protected]]
> Sent: 04 February 2004 14:12
> To: Dave Page
> Cc: Marc G. Fournier; Oleg Bartunov; Josh Berkus;
> [email protected]
> Subject: RE: [pgsql-www] Postgresql.org search engine.
>
>
> k, first thing that is missing is the last-modified date
> isn't set right, which makes it a no go option ... looking at
> the hypermail.conf file, it more reminds me of setting up a
> web stats program then a list archiver ...
I thought the only archiver you knew was mhonarc? Not the biggest frame
of reference :-)
> options are either ... you can add footers and headers, but
> scanning through the docs that it points to, there doesn't
> seem to be any way of adding a Last-Modified header, since
> they don't even seem to define VARIABLES (ie. $DATE$ for date
> of posting) that you can use when generating the archives ...
>
> let me know if I've missed something .. ?
I was looking at this after I posted my last message. Hypermail supports
HTML templates which may be used instead of headers and footers. These
do have variables, however I couldn't see one for posting date :-(
Probably not the hardest mod in the world to add it to the program, but
unfortunately I just started a new module at Uni so am somewhat short of
spare time again...
Regards, Dave.
^ permalink raw reply [nested|flat] 22+ messages in thread
end of thread, other threads:[~2004-02-04 19:34 UTC | newest]
Thread overview: 22+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2004-01-30 20:53 Re: Postgresql.org search engine. Dave Page <[email protected]>
2004-01-30 21:01 ` Marc G. Fournier <[email protected]>
2004-01-31 05:47 ` Oleg Bartunov <[email protected]>
2004-01-31 05:49 ` Marc G. Fournier <[email protected]>
2004-01-31 06:01 ` Josh Berkus <[email protected]>
2004-01-31 06:14 ` Marc G. Fournier <[email protected]>
2004-01-31 12:45 ` Oleg Bartunov <[email protected]>
2004-02-01 22:12 ` Marc G. Fournier <[email protected]>
2004-02-02 12:57 ` Oleg Bartunov <[email protected]>
2004-02-01 22:34 ` META Tags on Archives Marc G. Fournier <[email protected]>
2004-02-02 13:02 ` Re: META Tags on Archives Oleg Bartunov <[email protected]>
2004-02-02 14:46 ` Re: META Tags on Archives Jeroen Ruigrok/asmodai <[email protected]>
2004-02-02 14:52 ` Re: META Tags on Archives Marc G. Fournier <[email protected]>
2004-02-02 13:39 Re: Postgresql.org search engine. Dave Page <[email protected]>
2004-02-02 14:11 ` Marc G. Fournier <[email protected]>
2004-02-02 14:25 ` Marc G. Fournier <[email protected]>
2004-02-02 15:38 ` Marc G. Fournier <[email protected]>
2004-02-02 14:48 Re: Postgresql.org search engine. Dave Page <[email protected]>
2004-02-02 14:56 ` Marc G. Fournier <[email protected]>
2004-02-04 12:07 Re: Postgresql.org search engine. Dave Page <[email protected]>
2004-02-04 14:11 ` Marc G. Fournier <[email protected]>
2004-02-04 19:34 Re: Postgresql.org search engine. Dave Page <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox