X-Original-To: pgsql-www-postgresql.org@localhost.postgresql.org Received: from localhost (av.hub.org [200.46.204.144]) by postgresql.org (Postfix) with ESMTP id 1D7379DD7CC for ; Fri, 13 Jan 2006 22:51:38 -0400 (AST) Received: from postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 54650-03 for ; Fri, 13 Jan 2006 22:51:38 -0400 (AST) X-Greylist: domain auto-whitelisted by SQLgrey- Received: from uproxy.gmail.com (uproxy.gmail.com [66.249.92.192]) by postgresql.org (Postfix) with ESMTP id 7B2849DD7C2 for ; Fri, 13 Jan 2006 22:51:31 -0400 (AST) Received: by uproxy.gmail.com with SMTP id s2so112831uge for ; Fri, 13 Jan 2006 18:51:35 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=EZmXJZ7x6qdBWGDP2lae5yJ5L44GoHfGljmbxwO9jUOnDVmcqvGk1i7z8N86Z96iZGCAeitNp+IZOA/rOHoycXD/rU5dSTM5LHKrpcDYY1oBV28eBNAAvYGgSXMiUPE8PzLwCqI+8pHntRljif3nRwh49ha6RcXoOq+KcFVzj/k= Received: by 10.48.157.1 with SMTP id f1mr118746nfe; Fri, 13 Jan 2006 18:51:35 -0800 (PST) Received: by 10.48.108.20 with HTTP; Fri, 13 Jan 2006 18:51:35 -0800 (PST) Message-ID: Date: Sat, 14 Jan 2006 02:51:35 +0000 From: Guido Barosio To: "Marc G. Fournier" Subject: Re: Infrastructure monitoring Cc: Josh Berkus , John Hansen , pgsql-www@postgresql.org, "Jim C. Nasby" In-Reply-To: <20060113220930.R28752@ganymede.hub.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_20282_8839188.1137207095115" References: <20060114010502.GW9017@pervasive.com> <200601131714.48132.josh@agliodbs.com> <20060113220930.R28752@ganymede.hub.org> X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, score=0.562 required=5 tests=[AWL=0.561, HTML_MESSAGE=0.001] X-Spam-Score: 0.562 X-Spam-Level: X-Archive-Number: 200601/93 X-Sequence-Number: 9281 ------=_Part_20282_8839188.1137207095115 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Actually it seems to be as easy as requesting a GET to search.postgresql.or= g If a script is able to handle the http codes, then alerts could be triggere= d upon events. The search failure was due to a 503 error being dispatched from the server. GET http://search.postgresql.org Am I wrong? Tho, thinking about content, there is an opensource doing a job such as siteconf (http://www.siteconfidence.com) but I can't remember the name atm. But I understand that the search problem was not a *content* problem itself= . G.- On 1/14/06, Marc G. Fournier wrote: > > On Fri, 13 Jan 2006, Josh Berkus wrote: > > > Jim, > > > >> Search has been down for at least 2 days now, and this certainly isn't > >> the first time it's happened. There's also been cases of archives > >> getting stuck, and probably other outages besides those that went on > >> until someone email'd about it. > >> > >> Would it be difficult to setup something to monitor these various > >> services? I know there's at least one OSS tool to do it, though I have > >> no idea how hard it would be to tie that into the current > >> infrastructure. > > > > We have an open offer of Hyperic licenses, and they support FreeBSD now= . > > Not to discount the offer ... but, what exactly would that provide us? W= e > already monitor the *servers*, its what is inside of the servers that > needs better monitoring ... knowing nothing about Hyperic, does that > provide something for that? > > In the case of the archives, for instance, the problem was a perl process > that for some unknown reason got stuck randomly ... removed that in favor > of an awk script, and it hasn't done it since ... i also redirected cron'= s > email to scrappy@postgresql.org, so that any errors show up in my mailbox > instead of roots, so I get an hourly reminder that things are running wel= l > ... > > In the case of search ... John would be better at answering that, but whe= n > he and I talked this past week, he mentioned that he was moving it all > over to two new servers, which I changed the DNS for on Wednesday ... > > As I've said above ... physical servers are being monitored, so if anyone > has some ideas on how we can improve "content monitoring", for lack of a > better word, I know I'm all ears ... > > Again, if Hyperic can offer something for this, let me know ... > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.or= g > ) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: > 7615664 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- /"\ ASCII Ribbon Campaign . \ / - NO HTML/RTF in e-mail . X - NO Word docs in e-mail . / \ ----------------------------------------------------------------- ------=_Part_20282_8839188.1137207095115 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
Actually it seems to be as easy as requesting a GET to search.postgresql.org
If a script is able to handle the http codes, then alerts could be tri= ggered upon events.
 
The search failure was due to a 503 error being dispatched from the se= rver.
 
 
Am I wrong?
 
Tho, thinking about content,  there is an opensource doing a job = such as siteconf (http://www.site= confidence.com) but I can't remember the name atm.
 
But I understand that the search problem was not a *content* problem i= tself.
 
G.-

 
On 1/14/06, = Marc G. Fournier <scrappy@= postgresql.org> wrote:
On Fri, 13 Jan 2006, Josh Berkus= wrote:

> Jim,
>
>> Search has been down for at le= ast 2 days now, and this certainly isn't
>> the first time it's happened. There's also been cases of archi= ves
>> getting stuck, and probably other outages besides those tha= t went on
>> until someone email'd about it.
>>
>&g= t; Would it be difficult to setup something to monitor these various
>> services? I know there's at least one OSS tool to do it, thoug= h I have
>> no idea how hard it would be to tie that into the curr= ent
>> infrastructure.
>
> We have an open offer of Hy= peric licenses, and they support FreeBSD now.

Not to discount the offer ... but, what exactly would that provide = us?  We
already monitor the *servers*, its what is inside of t= he servers that
needs better monitoring ... knowing nothing about Hyperi= c, does that
provide something for that?

In the case of the archives, for ins= tance, the problem was a perl process
that for some unknown reason got s= tuck randomly ... removed that in favor
of an awk script, and it hasn't = done it since ... i also redirected cron's
email to scrappy@postgresql.o= rg, so that any errors show up in my mailbox
instead of roots, so I = get an hourly reminder that things are running well
...

In the ca= se of search ... John would be better at answering that, but when
he and I talked this past week, he mentioned that he was moving it all<= br>over to two new servers, which I changed the DNS for on Wednesday ...
As I've said above ... physical servers are being monitored, so if any= one
has some ideas on how we can improve "content monitoring", fo= r lack of a
better word, I know I'm all ears ...

Again, if Hyperi= c can offer something for this, let me know ...

----
Marc G. Four= nier          =20 Hub.Org Networking Services (http://www.hub.= org)
Email: scrappy@hub.org&n= bsp;          Yahoo!: yscrappy=             &nb= sp; ICQ: 7615664

---------------------------(end of broadcast)-= --------------------------
TIP 2: Don't 'kill -9' the postmaster



--
/"\   ASCII Ribbon Campaign  = ;.
\ / - NO HTML/RTF in e-mail  .
X  - NO Word do= cs in e-mail .
/ \ -----------------------------------------------------= ------------=20 ------=_Part_20282_8839188.1137207095115--