gen_guc_tables improvements

public inbox for [email protected]  
help / color / mirror / Atom feed

gen_guc_tables improvements
4+ messages / 2 participants
[nested] [flat]

* gen_guc_tables improvements
@ 2026-03-15 12:18 Zsolt Parragi <[email protected]>
  2026-03-15 22:27 ` Re: gen_guc_tables improvements Michael Paquier <[email protected]>
  2026-03-16 05:57 ` Re: gen_guc_tables improvements Zsolt Parragi <[email protected]>
  0 siblings, 2 replies; 4+ messages in thread

From: Zsolt Parragi @ 2026-03-15 12:18 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>

Hello

While reviewing a patch, I noticed a typo in guc_params.dat. The code
compiled and seemingly worked, and I was very surprised that the
generator script didn't catch the mistake.

I looked into it, and I found several missing checks in
gen_guc_tables. I attached fixes for 4 that I think would definitely
improve the script (for now as separate patches, so it is easy to
select only some of them):

* 0001 fixes the issue that started this, it validates the allowed
field names, preventing typos in their names
* 0002 goes a step further and validates that fields specific to some
types can only appear for those types
* 0003 just improves the error reported by duplicate names, previously
this was confusing (it referred to incorrect ordering)
* 0004 adds basic checks about allowed characters in GUC names

I was also thinking about adding validations for the enum/define
values (config group, flags, guc context), but that requires a
somewhat fragile extraction code, and I decided to leave that out for
now.

What do you think about these changes?


Attachments:

  [application/octet-stream] 0001-gen_guc_tables-reject-unrecognized-field-names-in-gu.patch (1.5K, 2-0001-gen_guc_tables-reject-unrecognized-field-names-in-gu.patch)
  download | inline diff:
From 2dea67bdc43b93c8d85979cc4f4a0ed839dad6e5 Mon Sep 17 00:00:00 2001
From: Zsolt Parragi <[email protected]>
Date: Sun, 15 Mar 2026 11:21:33 +0000
Subject: [PATCH 1/4] gen_guc_tables: reject unrecognized field names in
 guc_parameters.dat

Previously, a typo in a field name (e.g. "contex" instead of "context")
would be silently ignored, potentially leading to subtle bugs or missing
configuration.  Add a whitelist of all recognized fields and report an
error if any entry contains an unknown key.
---
 src/backend/utils/misc/gen_guc_tables.pl | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl
index a285c62f98d..ba2e6f001f1 100644
--- a/src/backend/utils/misc/gen_guc_tables.pl
+++ b/src/backend/utils/misc/gen_guc_tables.pl
@@ -53,6 +53,25 @@ sub validate_guc_entry
 		string => [],    # no extra required fields
 	);
 
+	# All fields recognized by the generator.  "line_number" is injected
+	# by Catalog::ParseData and is not a user-facing field.
+	my %valid_fields = map { $_ => 1 } (
+		@required_common,
+		qw(long_desc flags ifdef min max options
+		  check_hook assign_hook show_hook
+		  line_number));
+
+	for my $f (sort keys %$entry)
+	{
+		unless ($valid_fields{$f})
+		{
+			die sprintf(
+				qq{%s:%d: error: entry "%s" has unrecognized field "%s"\n},
+				$input_fname, $entry->{line_number},
+				$entry->{name} // '<unknown>', $f);
+		}
+	}
+
 	for my $f (@required_common)
 	{
 		unless (defined $entry->{$f})
-- 
2.43.0



  [application/octet-stream] 0002-gen_guc_tables-reject-type-inappropriate-fields-in-g.patch (2.6K, 3-0002-gen_guc_tables-reject-type-inappropriate-fields-in-g.patch)
  download | inline diff:
From 33634107d72866cb85544f274a217ad592f61fe0 Mon Sep 17 00:00:00 2001
From: Zsolt Parragi <[email protected]>
Date: Sun, 15 Mar 2026 11:22:00 +0000
Subject: [PATCH 2/4] gen_guc_tables: reject type-inappropriate fields in
 guc_parameters.dat

Detect when a GUC entry has fields that don't apply to its type, such
as "min" or "max" on a bool or string, or "options" on an int.  These
would be silently ignored, likely indicating a copy-paste mistake.

The existing required_by_type structure is replaced with a single
type_specific_fields hash that drives both the required-field and
inappropriate-field checks.
---
 src/backend/utils/misc/gen_guc_tables.pl | 32 ++++++++++++++++++------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl
index ba2e6f001f1..b1c1b7afc6c 100644
--- a/src/backend/utils/misc/gen_guc_tables.pl
+++ b/src/backend/utils/misc/gen_guc_tables.pl
@@ -45,12 +45,12 @@ sub validate_guc_entry
 	my @required_common =
 	  qw(name type context group short_desc variable boot_val);
 
-	my %required_by_type = (
-		int => [qw(min max)],
-		real => [qw(min max)],
-		enum => [qw(options)],
-		bool => [],      # no extra required fields
-		string => [],    # no extra required fields
+	my %type_specific_fields = (
+		int => { map { $_ => 1 } qw(min max) },
+		real => { map { $_ => 1 } qw(min max) },
+		enum => { map { $_ => 1 } qw(options) },
+		bool => {},
+		string => {},
 	);
 
 	# All fields recognized by the generator.  "line_number" is injected
@@ -83,7 +83,7 @@ sub validate_guc_entry
 		}
 	}
 
-	unless (exists $required_by_type{ $entry->{type} })
+	unless (exists $type_specific_fields{ $entry->{type} })
 	{
 		die sprintf(
 			qq{%s:%d: error: entry "%s" has unrecognized GUC type "%s"\n},
@@ -91,7 +91,9 @@ sub validate_guc_entry
 			$entry->{name}, $entry->{type} // '<unknown>');
 	}
 
-	for my $f (@{ $required_by_type{ $entry->{type} } })
+	my $fields_for_type = $type_specific_fields{ $entry->{type} };
+
+	for my $f (sort keys %$fields_for_type)
 	{
 		unless (defined $entry->{$f})
 		{
@@ -101,6 +103,20 @@ sub validate_guc_entry
 				$entry->{type}, $f);
 		}
 	}
+
+	my %all_type_specific;
+	$all_type_specific{$_} = 1
+	  for map { keys %$_ } values %type_specific_fields;
+	for my $f (sort keys %$entry)
+	{
+		if ($all_type_specific{$f} && !$fields_for_type->{$f})
+		{
+			die sprintf(
+				qq{%s:%d: error: entry "%s" of type "%s" must not have field "%s"\n},
+				$input_fname, $entry->{line_number}, $entry->{name},
+				$entry->{type}, $f);
+		}
+	}
 }
 
 # Print GUC table.
-- 
2.43.0



  [application/octet-stream] 0003-gen_guc_tables-report-duplicate-entry-names-distinct.patch (1.5K, 4-0003-gen_guc_tables-report-duplicate-entry-names-distinct.patch)
  download | inline diff:
From 163c196f4043b97a27ff79b7d77cc9954f35a63c Mon Sep 17 00:00:00 2001
From: Zsolt Parragi <[email protected]>
Date: Sun, 15 Mar 2026 11:22:15 +0000
Subject: [PATCH 3/4] gen_guc_tables: report duplicate entry names distinctly

The existing alphabetical-order check catches duplicates, but reports
them as "not in alphabetical order" which is confusing.  Split the
check to give a clear "duplicate entry" message, and also include the
file name and line number in the ordering error.
---
 src/backend/utils/misc/gen_guc_tables.pl | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl
index b1c1b7afc6c..9b6e3ed6622 100644
--- a/src/backend/utils/misc/gen_guc_tables.pl
+++ b/src/backend/utils/misc/gen_guc_tables.pl
@@ -133,10 +133,17 @@ sub print_table
 	{
 		validate_guc_entry($entry);
 
-		if (defined($prev_name) && lc($prev_name) ge lc($entry->{name}))
+		if (defined($prev_name) && lc($prev_name) eq lc($entry->{name}))
 		{
 			die sprintf(
-				"entries are not in alphabetical order: \"%s\", \"%s\"\n",
+				qq{%s:%d: error: duplicate entry "%s"\n},
+				$input_fname, $entry->{line_number}, $entry->{name});
+		}
+		if (defined($prev_name) && lc($prev_name) gt lc($entry->{name}))
+		{
+			die sprintf(
+				qq{%s:%d: error: entries are not in alphabetical order: "%s", "%s"\n},
+				$input_fname, $entry->{line_number},
 				$prev_name, $entry->{name});
 		}
 
-- 
2.43.0



  [application/octet-stream] 0004-gen_guc_tables-validate-GUC-name-format.patch (1.1K, 5-0004-gen_guc_tables-validate-GUC-name-format.patch)
  download | inline diff:
From 9f92321ce09309de1b25b04567195045235d5134 Mon Sep 17 00:00:00 2001
From: Zsolt Parragi <[email protected]>
Date: Sun, 15 Mar 2026 11:22:34 +0000
Subject: [PATCH 4/4] gen_guc_tables: validate GUC name format

Verify that each entry's name consists of letters, digits, and
underscores, starting with a letter.  This catches accidental spaces
or special characters that would produce invalid C code.
---
 src/backend/utils/misc/gen_guc_tables.pl | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl
index 9b6e3ed6622..2e8a3391046 100644
--- a/src/backend/utils/misc/gen_guc_tables.pl
+++ b/src/backend/utils/misc/gen_guc_tables.pl
@@ -83,6 +83,13 @@ sub validate_guc_entry
 		}
 	}
 
+	unless ($entry->{name} =~ /^[a-zA-Z][a-zA-Z0-9_]*$/)
+	{
+		die sprintf(
+			qq{%s:%d: error: entry name "%s" is not a valid GUC name (must start with a letter, contain only letters, digits, and underscores)\n},
+			$input_fname, $entry->{line_number}, $entry->{name});
+	}
+
 	unless (exists $type_specific_fields{ $entry->{type} })
 	{
 		die sprintf(
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: gen_guc_tables improvements
  2026-03-15 12:18 gen_guc_tables improvements Zsolt Parragi <[email protected]>
@ 2026-03-15 22:27 ` Michael Paquier <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread

From: Michael Paquier @ 2026-03-15 22:27 UTC (permalink / raw)
  To: Zsolt Parragi <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On Sun, Mar 15, 2026 at 12:18:53PM +0000, Zsolt Parragi wrote:
> While reviewing a patch, I noticed a typo in guc_params.dat. The code
> compiled and seemingly worked, and I was very surprised that the
> generator script didn't catch the mistake.

Nice.

> I was also thinking about adding validations for the enum/define
> values (config group, flags, guc context), but that requires a
> somewhat fragile extraction code, and I decided to leave that out for
> now.

I'd say no on this one, which would mean going back-and-forth with the
input.  If these fail due to a compilation failure, it's not that bad
either, it would be easy to pin-point that the failures are related to
the values of the GUC, not the overall shape of the data in the input
file.  Saying that, it would depend on how much complexity this adds
and what kind of validation we'd get out of it.  If the additions to
the script are simple like the ones you are proposing here, that would
be probably OK if these improvements catch a bunch of ground-breaking
inconsistencies.  If the whole script would need to be refactored,
probably not.

> What do you think about these changes?

Such sanity checks can save from stupid mistakes, for new or even
seasoned developers of the tree.  When I worked on the wait event
parsing scripts or even the query jumbling for DDLs, I have noticed
that checking ahead the shape of the input files to make sure that
correct C code is produced may not catch all the mistakes, but
catching most of them is more productive than catching none of them.
The GUC script is new as of HEAD, so it seems more useful to do such
adjustments now.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: gen_guc_tables improvements
  2026-03-15 12:18 gen_guc_tables improvements Zsolt Parragi <[email protected]>
@ 2026-03-16 05:57 ` Zsolt Parragi <[email protected]>
  2026-03-17 08:40   ` Re: gen_guc_tables improvements Michael Paquier <[email protected]>
  1 sibling, 1 reply; 4+ messages in thread

From: Zsolt Parragi @ 2026-03-16 05:57 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On Sun, Mar 15, 2026 at 10:27 PM Michael Paquier <[email protected]> wrote:

> I'd say no on this one, which would mean going back-and-forth with the
> input.  If these fail due to a compilation failure, it's not that bad
> either, it would be easy to pin-point that the failures are related to
> the values of the GUC, not the overall shape of the data in the input
> file.  Saying that, it would depend on how much complexity this adds
> and what kind of validation we'd get out of it.  If the additions to
> the script are simple like the ones you are proposing here, that would
> be probably OK if these improvements catch a bunch of ground-breaking
> inconsistencies.  If the whole script would need to be refactored,
> probably not.

We can have two kinds of errors:

1. A typo in an enum name - this will result in a compilation failure
most likely, and the compiler will provide nice error messages with
the fix suggestion (e.g. GUC_NOT_IN_SAMPLE -> GUC_NOT_NI_SAMPLE, or
PGC_SUSET -> PGC_SUSTE)
2. a mismatched enum because of a copy paste mistake (group says
PGC_SUSET instead of WAL_ARCHIVING, or boot_val says WAL_ARCHIVING
instead of ARCHIVE_MODE_OFF). The examples I provided there aren't
likely copy-paste errors, that one is when you copy-paste an existing
enum guc and forget to change its boot value.

(2) currently compiles without warnings in both cases, but it is also
usually easier to spot during review.

The patch itself would be well contained: one or two utility functions
(GUC_NOT_IN_SAMPLE and other flags are defines, not enums), and then
calling them at a few places. So it's not a significant refactoring.

But I'm also not a fan of parsing C source code in perl, that's what I
meant by fragile. It seems to work, but I wouldn't guarantee that it
can properly parse all edge cases. Thinking about this more, these
checks would better be implemented as a custom clang-tidy check for
example, which I already started to look into for another issue.

On Mon, Mar 16, 2026 at 3:13 AM Chao Li <[email protected]> wrote:

> I think dot is allowed in a GUC name. Looking at the C code:

Yes, but by convention dot is only used by extensions to mark the
prefix/namespace of the extension (<extension_name>.<guc_name>). None
of the core server GUCs use it. The point of this script isn't to
ensure that the generated C code will compile, but to prevent hidden
errors and ensure we follow existing proper coding conventions.

> BTW, I have similar patch [1] that verify duplicate GUC enum values.

I forgotten about that, thanks for the reminder! I didn't like the
runtime check approach of it, and I think there are more exceptions
than the whitelisted items in the patch, but adding the ability to
mark a guc value list unique could be another good candidate for a
custom clang-tidy check. (assuming people like my clang-tidy idea,
which I didn't even start a thread about yet)

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: gen_guc_tables improvements
  2026-03-15 12:18 gen_guc_tables improvements Zsolt Parragi <[email protected]>
  2026-03-16 05:57 ` Re: gen_guc_tables improvements Zsolt Parragi <[email protected]>
@ 2026-03-17 08:40   ` Michael Paquier <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Michael Paquier @ 2026-03-17 08:40 UTC (permalink / raw)
  To: Zsolt Parragi <[email protected]>; +Cc: Chao Li <[email protected]>; PostgreSQL Hackers <[email protected]>

On Mon, Mar 16, 2026 at 05:57:39AM +0000, Zsolt Parragi wrote:
> Yes, but by convention dot is only used by extensions to mark the
> prefix/namespace of the extension (<extension_name>.<guc_name>). None
> of the core server GUCs use it. The point of this script isn't to
> ensure that the generated C code will compile, but to prevent hidden
> errors and ensure we follow existing proper coding conventions.

I am not sure that it is a good idea to enforce that in the script
this way, to be honest.  There may be a point about this new rule
being annoying for forks of the core code, at least, where they would
like to add their own parameters with dots in the names.  I have done
that in the past, as one example, and I am sure that there are
projects out there that do so, meaning the requirement to live with
one more custom patch to make things work if this rule is enforced in
gen_guc_tables.pl.

Saying that, as far as I can see 0001 and 0003 will save some time
when inserting some incorrect data.  The case of duplicated names was
indeed confusing, and 0003 can save from typos when specifying
non-mandatory fields.  So I have applied these two.

0004 has some value here, to save the .dat from copy-paste bloat.  I
am wondering what others think about it.
--
Michael

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2026-03-17 08:40 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-15 12:18 gen_guc_tables improvements Zsolt Parragi <[email protected]>
2026-03-15 22:27 ` Michael Paquier <[email protected]>
2026-03-16 05:57 ` Zsolt Parragi <[email protected]>
2026-03-17 08:40   ` Michael Paquier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox