Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1YkS3o-00041O-6D for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Apr 2015 06:53:32 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.80) (envelope-from ) id 1YkS3n-00074z-EH for pgsql-hackers@arkaria.postgresql.org; Tue, 21 Apr 2015 06:53:31 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1YkS3k-00074n-Th for pgsql-hackers@postgresql.org; Tue, 21 Apr 2015 06:53:29 +0000 Received: from mail-qg0-x229.google.com ([2607:f8b0:400d:c04::229]) by magus.postgresql.org with esmtps (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1YkS3g-0001zR-Ox for pgsql-hackers@postgresql.org; Tue, 21 Apr 2015 06:53:27 +0000 Received: by qgfi89 with SMTP id i89so64220656qgf.1 for ; Mon, 20 Apr 2015 23:53:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=9nz4Yoh1z7fjdXO9iKe551c1cxElN3TqN/Ka+72x9PM=; b=cSf9S/BjFinQCN3uSTi5JZKKNVRRtNS9ThOOO6pR+w7RDMNYq0YlgqiZnaQOTo5TTe RHILCTxBk+FKqXiIWhzed0aWRHv1abzEGaWBNmGoTjleV11+qBWXy/9+5nX1qO64Ab0M p7a+GdHfKqJYh8v/BWVh0ugIuEuOumcy0aDXxiBzGMppBU8g2gvCYJixmOPLAN659gYd nTliGSQ1AvHrPPjcaTg5WeWe62mpSojL32y6qAKgxzNiYDFFMyBIOaRt+5fNjHQRrVV2 wk1+2Cqoe05obOpqKoOFx/d9ISiaFZFwIjUm2Vy2nIR1gztEpJ4QUTfFELkPcVuWSr6I daMQ== MIME-Version: 1.0 X-Received: by 10.140.232.3 with SMTP id d3mr22503961qhc.46.1429599202927; Mon, 20 Apr 2015 23:53:22 -0700 (PDT) Received: by 10.140.97.69 with HTTP; Mon, 20 Apr 2015 23:53:22 -0700 (PDT) In-Reply-To: <552FA38F.9060005@iki.fi> References: <548AF1CB.80702@vmware.com> <689EB259-44C2-4820-B901-4F6B1C55A1E4@simply.name> <549083D6.1000301@vmware.com> <54949108.3030109@vmware.com> <552FA38F.9060005@iki.fi> Date: Tue, 21 Apr 2015 15:53:22 +0900 Message-ID: Subject: Re: Streaming replication and WAL archive interactions From: Michael Paquier To: hlinnaka@iki.fi Cc: Venkata Balaji N , Andres Freund , Fujii Masao , Borodin Vladimir , PostgreSQL-development Content-Type: multipart/alternative; boundary=001a11c128b0a96a4f0514368098 X-Pg-Spam-Score: -2.0 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-hackers Precedence: bulk Sender: pgsql-hackers-owner@postgresql.org --001a11c128b0a96a4f0514368098 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakangas wrote: > Oh, hang on, that's not necessarily true. On promotion, the standby archives > the last, partial WAL segment from the old timeline. That's just wrong > (http://www.postgresql.org/message-id/52FCD37C.3070806@vmware.com), and in > fact I somehow thought I changed that already, but apparently not. So let's > stop doing that. Er. Are you planning to prevent the standby from archiving the last partial segment from the old timeline at promotion? I thought from previous discussions that we should do it as master (be it crashed, burned, burried or dead) may not have the occasion to do it. By preventing its archiving you close the door to the case where master did not have the occasion to archive it. +/* */ +static char primary_last_archived[MAX_XFN_CHARS + 1]; This is visibly missing a comment. As primary_last_archived is used only by ProcessArchivalReport(), wouldn't it be better to pass it as argument to this function? + /* Check that the filename the primary reported looks valid */ + if (strlen(primary_last_archived) < 24 || + strspn(primary_last_archived, "0123456789ABCDEF") != 24) + return; Not related to this patch, but we had better have a macro doing this job I think... It keeps spreading around. People may be surprised that a base backup taken from a node that has archive_mode = on set (that's the case in a very large number of cases) will not be able to work as-is as node startup will fail as follows: FATAL: archive_mode='on' cannot be used in archive recovery HINT: Use 'shared' or 'always' mode instead. One idea would be to simply ignore the fact that archive_mode = on on nodes in recovery instead of dropping an error. Note that I like the fact that it drops an error as that's clear, I just point the fact that people may be surprised that base backups are not working anymore now in this case. Are both WalSndArchivalReport() and WalSndArchivalReportIfNecessary() really necessary? I think that for simplicity you could merge them and use last_archival_report as a local variable. Creating a dependency between the pgstat machinery and the WAL sender looks weak to me. For example with this patch a master cannot stop, as it waits indefinitely: LOG: using stale statistics instead of current ones because stats collector is not responding LOG: sending archival report: You could scan archive_status/ but that would be costly if there are many entries to scan and I think that walsender should be highly responsive. Or you could directly store the name of the lastly archived WAL segment marked as .done in let's say archive_status/last_archived. An entry for that in the control file does not seem the right place as a node may not have archive_mode enabled that's why I am not mentioning it. Regards, -- Michael --001a11c128b0a96a4f0514368098 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable


On Thu, Apr 16, 2015 at 8:57 PM, Heikki Linnakanga= s wrote:
> Oh, hang on, that's not necessarily true. On promotion= , the standby archives
> the last, partial WAL segment from the old t= imeline. That's just wrong
> (http://www.postgresql.org/message= -id/52FCD37C.3070806@vmware.com), and in
> fact I somehow thought= I changed that already, but apparently not. So let's
> stop doin= g that.

Er. Are you planning to prevent the standby from archiving t= he last partial segment from the old timeline at promotion? I thought from = previous discussions that we should do it as master (be it crashed, burned,= burried or dead) may not have the occasion to do it. By preventing its arc= hiving you close the door to the case where master did not have the occasio= n to archive it.

+/* */
+static char primary_last_archived[MAX_XF= N_CHARS + 1];
This is visibly missing a comment.

As primary_last_= archived is used only by ProcessArchivalReport(), wouldn't it be better= to pass it as argument to this function?

+ =A0 =A0 =A0 /* Check tha= t the filename the primary reported looks valid */
+ =A0 =A0 =A0 if (str= len(primary_last_archived) < 24 ||
+ =A0 =A0 =A0 =A0 =A0 =A0 =A0 strs= pn(primary_last_archived, "0123456789ABCDEF") !=3D 24)
+ =A0 = =A0 =A0 =A0 =A0 =A0 =A0 return;
Not related to this patch, but we had be= tter have a macro doing this job I think... It keeps spreading around.
<= br>People may be surprised that a base backup taken from a node that has ar= chive_mode =3D on set (that's the case in a very large number of cases)= will not be able to work as-is as node startup will fail as follows:
FA= TAL: =A0archive_mode=3D'on' cannot be used in archive recovery
H= INT: =A0Use 'shared' or 'always' mode instead.
One idea = would be to simply ignore the fact that archive_mode =3D on on nodes in rec= overy instead of dropping an error. Note that I like the fact that it drops= an error as that's clear, I just point the fact that people may be sur= prised that base backups are not working anymore now in this case.

A= re both WalSndArchivalReport() and WalSndArchivalReportIfNecessary() really= necessary? I think that for simplicity you could merge them and use last_a= rchival_report as a local variable.

Creating a dependency between th= e pgstat machinery and the WAL sender looks weak to me. For example with th= is patch a master cannot stop, as it waits indefinitely:
LOG: =A0using s= tale statistics instead of current ones because stats collector is not resp= onding
LOG: =A0sending archival report:
You could scan archive_statu= s/ but that would be costly if there are many entries to scan and I think t= hat walsender should be highly responsive. Or you could directly store the = name of the lastly archived WAL segment marked as .done in let's say ar= chive_status/last_archived. An entry for that in the control file does not = seem the right place as a node may not have archive_mode enabled that's= why I am not mentioning it.

Regards,
--
Michael
--001a11c128b0a96a4f0514368098--