splitting pg_resetwal output strings

public inbox for [email protected]  
help / color / mirror / Atom feed

splitting pg_resetwal output strings
8+ messages / 3 participants
[nested] [flat]

* splitting pg_resetwal output strings
@ 2026-01-31 09:41  Álvaro Herrera <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Álvaro Herrera @ 2026-01-31 09:41 UTC (permalink / raw)
  To: Pg Hackers <[email protected]>

Hello,

For a long time I've been annoyed that certain tools such as (but not
only) pg_resetwal have translatable strings full of whitespace that
translators have to be careful about so that things look nice.
For Spanish I have made the lines a bit wider so that everything fits,
which means that every time someone adds a new line that's not yet
translated, the output looks bad; and if I want to add one more space
to align a new longer string, then I have to edit every single one of
them.  This is horrible.

Here's a proposal.  The idea is to have a separate file (entries.h right
now but proposal for better names are welcome) which lists those
strings, together with the printf specifiers needed to actually print
them.  This way, we can measure the length of each exactly as they
translate before printing anything, and then line up everything to the
same output length.

One curious thing of note is that I had to add an internal_wcswidth()
function, to avoid having to link libpq just to be able to do
pg_wcswidth().

This is not complete.  It modifies PrintControlValues(), which is easy
because I just print each item in the entries.h file in the same order
they appear there.   But for PrintNewControlValues() I'll need to add a
symbolic identifier to each string, that the code can use to scan the
array and print just those elements.  I'll do that if there are no
objections to this idea here.  Also, pg_controldata could probably
something very similar or maybe the same entries.h file.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you want to have good ideas, you must have many ideas.  Most of them
will be wrong, and what you have to learn is which ones to throw away."
                                                         (Linus Pauling)

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: splitting pg_resetwal output strings
@ 2026-01-31 15:08  Álvaro Herrera <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  0 siblings, 2 replies; 8+ messages in thread

From: Álvaro Herrera @ 2026-01-31 15:08 UTC (permalink / raw)
  To: Pg Hackers <[email protected]>

On 2026-Jan-31, Álvaro Herrera wrote:

> This is not complete.  It modifies PrintControlValues(), which is easy
> because I just print each item in the entries.h file in the same order
> they appear there.   But for PrintNewControlValues() I'll need to add a
> symbolic identifier to each string, that the code can use to scan the
> array and print just those elements.  I'll do that if there are no
> objections to this idea here.

It looks more or less like this.  Patch 0001 is the same as before, and
0002 adds the symbolic names, which is used to create an enum and then
to search for the correct lines to print in PrintNewControlValues.

I decided to reuse simple_oid_list to store the integers values, which
is kinda icky, so getting to something committable I think would have me
add simple_int_list.  Also, there's a few blocks that are duplicate
cases of the same line measuring and printing logic; I suppose I should
have a routine to simplify.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
 sources, so let's move on."                               (Nathaniel Smith)
      https://mail.gnu.org/archive/html/monotone-devel/2007-01/msg00080.html

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: splitting pg_resetwal output strings
@ 2026-02-02 03:58  Chao Li <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  1 sibling, 0 replies; 8+ messages in thread

From: Chao Li @ 2026-02-02 03:58 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: Pg Hackers <[email protected]>



> On Jan 31, 2026, at 23:08, Álvaro Herrera <[email protected]> wrote:
> 
> On 2026-Jan-31, Álvaro Herrera wrote:
> 
>> This is not complete.  It modifies PrintControlValues(), which is easy
>> because I just print each item in the entries.h file in the same order
>> they appear there.   But for PrintNewControlValues() I'll need to add a
>> symbolic identifier to each string, that the code can use to scan the
>> array and print just those elements.  I'll do that if there are no
>> objections to this idea here.
> 
> It looks more or less like this.  Patch 0001 is the same as before, and
> 0002 adds the symbolic names, which is used to create an enum and then
> to search for the correct lines to print in PrintNewControlValues.
> 
> I decided to reuse simple_oid_list to store the integers values, which
> is kinda icky, so getting to something committable I think would have me
> add simple_int_list.  Also, there's a few blocks that are duplicate
> cases of the same line measuring and printing logic; I suppose I should
> have a routine to simplify.
> 

Hi Alvaro,

I don't think we necessarily need to add a simple_int_list. Since OIDs are not globally unique across all object types, we can reasonably treat ControlData items as "objects" and assign them OIDs within this context.

If we take that approach, the enum could be named ControlDataOid:
```
/* Define the string enums */
#define CONTROLDATA_LINE(symbol, description, fmt, ...) \
symbol,
enum {
#include "entries.h”
} ControlDataOid;
#undef CONTROLDATA_LINE
```

This makes reusing simple_oid_list feel appropriate. If you aren't fond of that idea, I’d suggest naming the enum ControlDataSymbol rather than ControlDataStrings, as the former better reflects what the members actually are.

Regarding patch 0002:
```
+#define CONTROLDATA_LINE(symbol, description, fmt, ...)			\
+	if (simple_oid_list_member(&toprint, symbol))				\
+	{															\
+	thislen = internal_wcswidth(_(description),					\
+								strlen(_(description)),			\
+								encoding);						\
+	if (thislen > maxlen)										\
+		maxlen = thislen;										\
 	}
+#include "entries.h"
+#undef CONTROLDATA_LINE
```
The code block inside the first if statement is missing an indentation level.

Otherwise, the patch looks good to me. I've tested it and confirmed that calculating the padding at runtime generates better-looking output.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/










^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: splitting pg_resetwal output strings
@ 2026-05-23 15:31  Jonathan Gonzalez V. <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  1 sibling, 1 reply; 8+ messages in thread

From: Jonathan Gonzalez V. @ 2026-05-23 15:31 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; Pg Hackers <[email protected]>

Hello!

I've tested the patch, and I found that in the first patch, the nsl.mk
file is changed to match the first argument of CONTROLDATA_LINE as the
string to match:

+GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS) \
+				  CONTROLDATA_LINE:1

but in the second patch the CONTROLDATA_LINE moves the strings to the
second argument:

-CONTROLDATA_LINE("pg_control version number",
+CONTROLDATA_LINE(CD_CONTROL_VERSION, "pg_control version number",
 				"%u", ControlFile.pg_control_version)

While keeping the argument as it is, I suggest to switch to
CONTROLDATA_LINE:2 like this:

+GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS) \
+				  CONTROLDATA_LINE:2

This will make `make update-po` take the strings from entries.h, which
is what I was expecting.

During testing, I found a couple of things that may be interesting, or
at least worth keeping in mind for the future implementation already
mentioned [1].

Even if the spacing calculation is correct, I found that the font size
is not the same for different languages, so if a strings or part of it,
is translated into, for example, Hindi, and most of it it's in English,
it will affect how it look in the terminal, meaning that it's probably
that a the second column will be moved to the left generating a
misalignment in the printed data

Regards!

[1]
https://www.postgresql.org/message-id/202509300530.jnwbfyrdxaah%40alvherre.pgsql
[2] https://sw.kovidgoyal.net/kitty/

-- 
Jonathan Gonzalez V. 
EDB: https://www.enterprisedb.com

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: splitting pg_resetwal output strings
@ 2026-05-24 03:50  Álvaro Herrera <[email protected]>
  parent: Jonathan Gonzalez V. <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Álvaro Herrera @ 2026-05-24 03:50 UTC (permalink / raw)
  To: Jonathan Gonzalez V. <[email protected]>; +Cc: Pg Hackers <[email protected]>

On 2026-May-23, Jonathan Gonzalez V. wrote:

> Hello!
> 
> I've tested the patch, and I found that in the first patch, the nsl.mk
> file is changed to match the first argument of CONTROLDATA_LINE as the
> string to match:
[...]
> While keeping the argument as it is, I suggest to switch to
> CONTROLDATA_LINE:2 like this:

Yeah, you're correct, this is a bug in 0002.  Thanks!

> During testing, I found a couple of things that may be interesting, or
> at least worth keeping in mind for the future implementation already
> mentioned [1].
> 
> Even if the spacing calculation is correct, I found that the font size
> is not the same for different languages, so if a strings or part of it,
> is translated into, for example, Hindi, and most of it it's in English,
> it will affect how it look in the terminal, meaning that it's probably
> that a the second column will be moved to the left generating a
> misalignment in the printed data

I don't think varying font width is something we should care about.  The
multibyte system has a mechanism to tell us how wide the characters are,
in units of some monospace font.  This is what the pg_wcswidth calls are
there for (or internal_wcslength in the patch).  If the terminal doesn't
align the characters correctly according to the declared width because
different fonts are used, that's not our bug.

The idea behind the patch is to measure the length of all strings after
translation.  If a number of the strings are not translated, then we
still measure them in their final presentation form.  Then we align the
two columns based on the maximum length of all strings.  I don't think
this is any worse than what you get with the original code; if a
translator has chosen a different alignment than what English has, and
some strings are not translated, then the output will appear misaligned.
In fact, with the code it will be better (assuming no font issues),
because we will try to align correctly even the strings that aren't
translated.  In fact, if fonts differ, there's nothing the translator
can do, because there's no way to know what's the user going to do at
runtime -- users are going to have different terminals and different
fonts.

> [2] https://sw.kovidgoyal.net/kitty/

Hmm, I'm not sure what you wanted to show with this URL -- can you
elaborate?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: splitting pg_resetwal output strings
@ 2026-05-24 09:48  Jonathan Gonzalez V. <[email protected]>
  parent: Álvaro Herrera <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Jonathan Gonzalez V. @ 2026-05-24 09:48 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: Pg Hackers <[email protected]>

[...]

> I don't think varying font width is something we should care about. 
> The
> multibyte system has a mechanism to tell us how wide the characters
> are,
> in units of some monospace font.  This is what the pg_wcswidth calls
> are
> there for (or internal_wcslength in the patch).  If the terminal
> doesn't
> align the characters correctly according to the declared width
> because
> different fonts are used, that's not our bug.

Exactly, I forgot to clarify that this is to keep in mind if someone
else want to test and reach the same point and had the same questions.

> > [2] https://sw.kovidgoyal.net/kitty/
> 
> Hmm, I'm not sure what you wanted to show with this URL -- can you
> elaborate?

Just an example of a terminal if someone wants to test something
different, but yeah lack of context.

-- 
Jonathan Gonzalez V. 
EDB: https://www.enterprisedb.com

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: splitting pg_resetwal output strings
@ 2026-05-24 10:10  Jonathan Gonzalez V. <[email protected]>
  parent: Jonathan Gonzalez V. <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Jonathan Gonzalez V. @ 2026-05-24 10:10 UTC (permalink / raw)
  To: Álvaro Herrera <[email protected]>; +Cc: Pg Hackers <[email protected]>

Hello,

On a second round a notice these two strings:

+               char *str = "First log segment after reset";
+               thislen = internal_wcswidth(_(str), strlen(_(str)),
encoding);
+               if (thislen > maxlen)

and 

+               char *str = "First log segment after reset";
+               thislen = internal_wcswidth(_(str), strlen(_(str)),
encoding);
+               if (thislen > maxlen)

Are never catch by the `make update-po` command, instead, the strings
are there in the `.po.new` file commented.

Should we put them between `_()`? or is expected that it's being catch
by the second line with the `_()` ?

-- 
Jonathan Gonzalez V. 
EDB: https://www.enterprisedb.com

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: splitting pg_resetwal output strings
@ 2026-05-24 11:35  Álvaro Herrera <[email protected]>
  parent: Jonathan Gonzalez V. <[email protected]>
  0 siblings, 0 replies; 8+ messages in thread

From: Álvaro Herrera @ 2026-05-24 11:35 UTC (permalink / raw)
  To: Jonathan Gonzalez V. <[email protected]>; +Cc: Pg Hackers <[email protected]>

Hi,

On 2026-May-24, Jonathan Gonzalez V. wrote:

> On a second round a notice these two strings:
> 
> +               char *str = "First log segment after reset";
> +               thislen = internal_wcswidth(_(str), strlen(_(str)),
> encoding);
> +               if (thislen > maxlen)
> 
> and [...]

> Are never catch by the `make update-po` command, instead, the strings
> are there in the `.po.new` file commented.
> 
> Should we put them between `_()`? or is expected that it's being catch
> by the second line with the `_()` ?

Hmm, yeah, they need to be wrapped in gettext_noop() so that they are
not immediately translated (because that will be done by the _() call in
the other lines) but are picked up by msgmerge.  (I think the second one
you reference is "NextXID epoch" but it was victim of a copy-paste
failure.)

Thanks again for looking,

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

^ permalink  raw  reply  [nested|flat] 8+ messages in thread

end of thread, other threads:[~2026-05-24 11:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-31 09:41 splitting pg_resetwal output strings Álvaro Herrera <[email protected]>
2026-01-31 15:08 ` Álvaro Herrera <[email protected]>
2026-02-02 03:58   ` Chao Li <[email protected]>
2026-05-23 15:31   ` Jonathan Gonzalez V. <[email protected]>
2026-05-24 03:50     ` Álvaro Herrera <[email protected]>
2026-05-24 09:48       ` Jonathan Gonzalez V. <[email protected]>
2026-05-24 10:10         ` Jonathan Gonzalez V. <[email protected]>
2026-05-24 11:35           ` Álvaro Herrera <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox