Received: from maia.hub.org (unknown [200.46.204.183]) by mail.postgresql.org (Postfix) with ESMTP id 0D633632DAC for ; Thu, 25 Mar 2010 16:48:55 -0300 (ADT) Received: from mail.postgresql.org ([200.46.204.86]) by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024) with ESMTP id 67144-09 for ; Thu, 25 Mar 2010 19:48:44 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from outmail148114.authsmtp.net (outmail148114.authsmtp.net [62.13.148.114]) by mail.postgresql.org (Postfix) with ESMTP id 106D1632817 for ; Thu, 25 Mar 2010 16:48:43 -0300 (ADT) Received: from mail-c194.authsmtp.com (mail-c194.authsmtp.com [62.13.128.121]) by punt3.authsmtp.com (8.14.2/8.14.2/Kp) with ESMTP id o2PJm8wK063571; Thu, 25 Mar 2010 19:48:08 GMT Received: from [192.168.0.4] (88-110-151-22.dynamic.dsl.as9105.com [88.110.151.22]) (authenticated bits=0) by mail.authsmtp.com (8.14.2/8.14.2/Kp) with ESMTP id o2PJm6Cu036794; Thu, 25 Mar 2010 19:48:07 GMT Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL From: Simon Riggs To: Heikki Linnakangas Cc: Tom Lane , Fujii Masao , Aidan Van Dyk , PostgreSQL-development In-Reply-To: <4BAB3A51.5050707@enterprisedb.com> References: <3f0b79eb1002092105r21e009d3v468496058ba04392@mail.gmail.com> <4B743E7D.5070603@enterprisedb.com> <3f0b79eb1002180337t1fab1395ve3491256672af15f@mail.gmail.com> <4BA0B079.3050301@enterprisedb.com> <3f0b79eb1003180727g7877743eq81274e014fe70a49@mail.gmail.com> <1268988724.3556.3.camel@ebony> <4BA361E4.7020309@enterprisedb.com> <3f0b79eb1003230017v16f4ecbeyc20e75beeffe8f1c@mail.gmail.com> <4BAA060A.2020000@enterprisedb.com> <1269472981.8481.8946.camel@ebony> <3f0b79eb1003241908n1e8f38e0q7cd7465163b3d7af@mail.gmail.com> <6198.1269483277@sss.pgh.pa.us> <4BAB1AC1.7000900@enterprisedb.com> <1269505427.8481.8978.camel@ebony> <4BAB3A51.5050707@enterprisedb.com> Content-Type: text/plain Date: Thu, 25 Mar 2010 19:48:06 +0000 Message-Id: <1269546486.3684.106.camel@ebony> Mime-Version: 1.0 X-Mailer: Evolution 2.26.1 Content-Transfer-Encoding: 7bit X-Server-Quench: 5587a77e-3847-11df-80b9-0022640b883e X-AuthReport-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse X-AuthRoute: OCdxZQATClZOTQEd DAteCiN5VAwpPBRK HVkIKg5MJUcNSQVJ NksacxtFagBbYFhD HGQLWlREUVR7XWt/ aw0fZQ1DY0tOQQRv UVZLQE1XHAJ3AVJe BH5kDx8BMgVFfnl5 YQhjXXVZXAp/cE56 E0lTQHAFbWNndWEe BBZFfwMGeR5Kfh1H bFR/BnsFYjBJBC9q VzwTFhsSEA9kHWxv T1NFHnk1ZGMqIgIR fSs3VS0gBlQBFW0W Jh8rYkUAFUAdOFR6 KlY7R18CUVcZDQtC HkdQBzJCI0hJXSc3 FxISR0MFDDpHQCFT SgEoL1dODyxOEhVx ICMA X-Authentic-SMTP: 61633235383639.1015:706/Kp X-AuthFastPath: 255 X-Virus-Status: No virus detected - but ensure you scan with your own anti-virus system. X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=-2.599 tagged_above=-10 required=5 tests=BAYES_00=-2.599 X-Spam-Level: X-Archive-Number: 201003/1018 X-Sequence-Number: 159794 On Thu, 2010-03-25 at 12:26 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: > > > >> PANIC seems like the appropriate solution for now. > > > > It definitely is not. Think some more. > > Well, what happens now in previous versions with pg_standby et al is > that the standby starts up. That doesn't seem appropriate either. Agreed. I said that also, immediately upthread. Bottom line is I am against anyone being allowed to PANIC the server just because their piece of it ain't working. The whole purpose of all of this is High Availability and we don't get that if everybody keeps stopping for a tea break every time things get tricky. Staying up when problems occur is the only way to avoid a falling domino taking out the whole farm. > I'm worried that the administrator won't notice the error promptly > because at a quick glance the server is up and running, while it's > actually stuck at the error and falling indefinitely behind the master. > Maybe if we make it a WARNING, that's enough to alleviate that. It's > true that if the standby is actively being used for read-only queries, > shutting it down to just get the administrators attention isn't good either. That's what monitoring is for. Let's just make sure this state is accessible, so people will notice. -- Simon Riggs www.2ndQuadrant.com