X-Original-To: pgsql-www-postgresql.org@localhost.postgresql.org Received: from localhost (mx1.hub.org [200.46.208.251]) by postgresql.org (Postfix) with ESMTP id E23779F944D; Sat, 13 May 2006 16:29:35 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.208.251]) (amavisd-new, port 10024) with ESMTP id 03962-09; Sat, 13 May 2006 16:29:23 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey- X-Greylist: from auto-whitelisted by SQLgrey- Received: from anchor-post-36.mail.demon.net (anchor-post-36.mail.demon.net [194.217.242.86]) by postgresql.org (Postfix) with ESMTP id 3F9C69F9316; Sat, 13 May 2006 16:29:23 -0300 (ADT) Received: from mailgate.vale-housing.co.uk ([194.217.48.34] helo=vale-housing.co.uk) by anchor-post-36.mail.demon.net with esmtp (Exim 4.42) id 1Feznl-000MXx-M9; Sat, 13 May 2006 19:29:21 +0000 Received: from 192.168.1.106 ([192.168.1.106]) by ratbert.vale-housing.co.uk ([192.168.1.106]) with Microsoft Exchange Server HTTP-DAV ; Sat, 13 May 2006 19:29:21 +0000 From: "Dave Page" To: , Cc: , , Subject: Re: [pgadmin-hackers] developer.pgadmin.org/nagios.pgadmin.org - Diskfailure Message-ID: <001501c676c3$891ddd89$6a01a8c0@valehousing.co.uk> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Sat, 13 May 2006 20:29:19 +0100 MIME-Version: 1.0 thread-topic: [pgadmin-hackers] [pgsql-www] developer.pgadmin.org/nagios.pgadmin.org - Diskfailure thread-index: AcZ2gs8dKIzzBD4ZTimR1IVHGknZ3gAQLoAn X-MimeOLE: Produced By Microsoft Exchange V6.5 X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=1.679 tagged_above=0 required=5 tests=AWL, BAYES_00, MSGID_DOLLARS, RATWARE_MS_HASH, RATWARE_OUTLOOK_NONAME, SARE_BAYES_5x8 X-Spam-Level: * X-Archive-Number: 200605/59 X-Sequence-Number: 10022 -----Original Message----- From: "Rapha=C3=ABl Enrici" Sent: 13/05/06 12:45:59 To: "Dave Page" Cc: "Jeff MacDonald", = "pgadmin-hackers@postgresql.org", = "PostgreSQL WWW" Subject: Re: [pgadmin-hackers] [pgsql-www] = developer.pgadmin.org/nagios.pgadmin.org - Diskfailure Hi Raph, >I recently (2 months ago) experienced kernel crash with reiserfs after > some electrical failure. I solved the problem by doing a full fsck (I > mean fsck and then a reiserfs rebuild of the tree [dangerous]). It > worked, at least for me. Thanks - I'm leaning towards the memory issue atm as it seems to be OK = again following a reboot, and the svn repo which previously wouldn't tar = or rsync now verifys perfectly and can be tarred up.=20 I'll swap the sticks on Monday, and if that doesn't work, then consider = a 'full fsck'. If that fails, I guess I'll just move it into the new = chassis, and use scp backup to another box until the new scsi cable = arrives. Cheers, Dave -----Unmodified Original Message----- Dave Page wrote: > =20 >=20 >=20 >>-----Original Message----- >>From: Jeff MacDonald [mailto:jam@zoidtechnologies.com]=20 >>Sent: 12 May 2006 23:19 >>To: Dave Page >>Cc: Jeff MacDonald >>Subject: Re: [pgsql-www]=20 >>developer.pgadmin.org/nagios.pgadmin.org - Diskfailure >> >>On Fri, 2006-05-12 at 22:47 +0100, Dave Page wrote: >> >>>The machine hosting the developer.pgadmin.org and=20 >> >>nagios.pgadmin.org=20 >> >>>vservers is currently having serious filesystem problems, which are=20 >>>causing disk intensive operations (like rsync, tar) to segfault for=20 >>>currently unknown reasons. >> >>do a memory test, swap as needed, see if that solves the=20 >>problem..=20 >=20 >=20 > I'll try just replacing it - I have some unopened sticks for that = mobo. > FWIW, a reboot with a forced fsck found no errors at all and the box = is > currently working OK, but I have now found errors similar to the > following: >=20 > May 12 21:11:29 barbas rsyncd[32134]: rsync: writefd_unbuffered failed > to write 4 bytes: phase "send_file_entry" [sender]: Broken pipe (32) > May 12 21:11:29 barbas rsyncd[32134]: rsync error: error in rsync > protocol data stream (code 12) at io.c(1126) [sender] > May 12 22:13:52 barbas kernel: kernel BUG at page_alloc.c:142! > May 12 22:13:52 barbas kernel: invalid operand: 0000 > May 12 22:13:52 barbas kernel: CPU: 1 > May 12 22:13:52 barbas kernel: EIP: 0010:[] Not = tainted > May 12 22:13:52 barbas kernel: EFLAGS: 00010286 > May 12 22:13:52 barbas kernel: eax: d9e18100 ebx: c262c140 ecx: > c262c140 edx: 00000000 > May 12 22:13:52 barbas kernel: esi: c262c140 edi: 00000000 ebp: > 00000000 esp: d50d5edc > May 12 22:13:52 barbas kernel: ds: 0018 es: 0018 ss: 0018 > May 12 22:13:52 barbas kernel: Process rsync (pid: 32141, > stackpage=3Dd50d5000) > May 12 22:13:52 barbas kernel: Stack: d50d5ee8 c0133ab0 00001000 > c262c140 e3a59d44 00006000 c01348e9 00000000 > May 12 22:13:52 barbas kernel: 00000000 00001000 c262c140 > e3a59d44 00000000 c013423d d50d5f7c c262c140 > May 12 22:13:52 barbas kernel: 00000000 00001000 00001000 > 00000001 00000000 0000013b e3a59c80 c01347f0 > May 12 22:13:52 barbas kernel: Call Trace: [] = [] > [] [] [] > May 12 22:13:52 barbas kernel: [] [] = [] > [] > May 12 22:13:52 barbas kernel: > May 12 22:13:52 barbas kernel: Code: 0f 0b 8e 00 6b ba 37 c0 e9 ba fd = ff > ff 8b 69 60 85 ed 0f 85 Dave, I recently (2 months ago) experienced kernel crash with reiserfs after some electrical failure. I solved the problem by doing a full fsck (I mean fsck and then a reiserfs rebuild of the tree [dangerous]). It worked, at least for me. Regards, Rapha=C3=ABl