public inbox for [email protected]help / color / mirror / Atom feed
gen_guc_tables improvements 4+ messages / 2 participants [nested] [flat]
* gen_guc_tables improvements @ 2026-03-15 12:18 Zsolt Parragi <[email protected]> 0 siblings, 2 replies; 4+ messages in thread From: Zsolt Parragi @ 2026-03-15 12:18 UTC (permalink / raw) To: PostgreSQL Hackers <[email protected]> Hello While reviewing a patch, I noticed a typo in guc_params.dat. The code compiled and seemingly worked, and I was very surprised that the generator script didn't catch the mistake. I looked into it, and I found several missing checks in gen_guc_tables. I attached fixes for 4 that I think would definitely improve the script (for now as separate patches, so it is easy to select only some of them): * 0001 fixes the issue that started this, it validates the allowed field names, preventing typos in their names * 0002 goes a step further and validates that fields specific to some types can only appear for those types * 0003 just improves the error reported by duplicate names, previously this was confusing (it referred to incorrect ordering) * 0004 adds basic checks about allowed characters in GUC names I was also thinking about adding validations for the enum/define values (config group, flags, guc context), but that requires a somewhat fragile extraction code, and I decided to leave that out for now. What do you think about these changes? Attachments: [application/octet-stream] 0001-gen_guc_tables-reject-unrecognized-field-names-in-gu.patch (1.5K, 2-0001-gen_guc_tables-reject-unrecognized-field-names-in-gu.patch) download | inline diff: From 2dea67bdc43b93c8d85979cc4f4a0ed839dad6e5 Mon Sep 17 00:00:00 2001 From: Zsolt Parragi <[email protected]> Date: Sun, 15 Mar 2026 11:21:33 +0000 Subject: [PATCH 1/4] gen_guc_tables: reject unrecognized field names in guc_parameters.dat Previously, a typo in a field name (e.g. "contex" instead of "context") would be silently ignored, potentially leading to subtle bugs or missing configuration. Add a whitelist of all recognized fields and report an error if any entry contains an unknown key. --- src/backend/utils/misc/gen_guc_tables.pl | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl index a285c62f98d..ba2e6f001f1 100644 --- a/src/backend/utils/misc/gen_guc_tables.pl +++ b/src/backend/utils/misc/gen_guc_tables.pl @@ -53,6 +53,25 @@ sub validate_guc_entry string => [], # no extra required fields ); + # All fields recognized by the generator. "line_number" is injected + # by Catalog::ParseData and is not a user-facing field. + my %valid_fields = map { $_ => 1 } ( + @required_common, + qw(long_desc flags ifdef min max options + check_hook assign_hook show_hook + line_number)); + + for my $f (sort keys %$entry) + { + unless ($valid_fields{$f}) + { + die sprintf( + qq{%s:%d: error: entry "%s" has unrecognized field "%s"\n}, + $input_fname, $entry->{line_number}, + $entry->{name} // '<unknown>', $f); + } + } + for my $f (@required_common) { unless (defined $entry->{$f}) -- 2.43.0 [application/octet-stream] 0002-gen_guc_tables-reject-type-inappropriate-fields-in-g.patch (2.6K, 3-0002-gen_guc_tables-reject-type-inappropriate-fields-in-g.patch) download | inline diff: From 33634107d72866cb85544f274a217ad592f61fe0 Mon Sep 17 00:00:00 2001 From: Zsolt Parragi <[email protected]> Date: Sun, 15 Mar 2026 11:22:00 +0000 Subject: [PATCH 2/4] gen_guc_tables: reject type-inappropriate fields in guc_parameters.dat Detect when a GUC entry has fields that don't apply to its type, such as "min" or "max" on a bool or string, or "options" on an int. These would be silently ignored, likely indicating a copy-paste mistake. The existing required_by_type structure is replaced with a single type_specific_fields hash that drives both the required-field and inappropriate-field checks. --- src/backend/utils/misc/gen_guc_tables.pl | 32 ++++++++++++++++++------ 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl index ba2e6f001f1..b1c1b7afc6c 100644 --- a/src/backend/utils/misc/gen_guc_tables.pl +++ b/src/backend/utils/misc/gen_guc_tables.pl @@ -45,12 +45,12 @@ sub validate_guc_entry my @required_common = qw(name type context group short_desc variable boot_val); - my %required_by_type = ( - int => [qw(min max)], - real => [qw(min max)], - enum => [qw(options)], - bool => [], # no extra required fields - string => [], # no extra required fields + my %type_specific_fields = ( + int => { map { $_ => 1 } qw(min max) }, + real => { map { $_ => 1 } qw(min max) }, + enum => { map { $_ => 1 } qw(options) }, + bool => {}, + string => {}, ); # All fields recognized by the generator. "line_number" is injected @@ -83,7 +83,7 @@ sub validate_guc_entry } } - unless (exists $required_by_type{ $entry->{type} }) + unless (exists $type_specific_fields{ $entry->{type} }) { die sprintf( qq{%s:%d: error: entry "%s" has unrecognized GUC type "%s"\n}, @@ -91,7 +91,9 @@ sub validate_guc_entry $entry->{name}, $entry->{type} // '<unknown>'); } - for my $f (@{ $required_by_type{ $entry->{type} } }) + my $fields_for_type = $type_specific_fields{ $entry->{type} }; + + for my $f (sort keys %$fields_for_type) { unless (defined $entry->{$f}) { @@ -101,6 +103,20 @@ sub validate_guc_entry $entry->{type}, $f); } } + + my %all_type_specific; + $all_type_specific{$_} = 1 + for map { keys %$_ } values %type_specific_fields; + for my $f (sort keys %$entry) + { + if ($all_type_specific{$f} && !$fields_for_type->{$f}) + { + die sprintf( + qq{%s:%d: error: entry "%s" of type "%s" must not have field "%s"\n}, + $input_fname, $entry->{line_number}, $entry->{name}, + $entry->{type}, $f); + } + } } # Print GUC table. -- 2.43.0 [application/octet-stream] 0003-gen_guc_tables-report-duplicate-entry-names-distinct.patch (1.5K, 4-0003-gen_guc_tables-report-duplicate-entry-names-distinct.patch) download | inline diff: From 163c196f4043b97a27ff79b7d77cc9954f35a63c Mon Sep 17 00:00:00 2001 From: Zsolt Parragi <[email protected]> Date: Sun, 15 Mar 2026 11:22:15 +0000 Subject: [PATCH 3/4] gen_guc_tables: report duplicate entry names distinctly The existing alphabetical-order check catches duplicates, but reports them as "not in alphabetical order" which is confusing. Split the check to give a clear "duplicate entry" message, and also include the file name and line number in the ordering error. --- src/backend/utils/misc/gen_guc_tables.pl | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl index b1c1b7afc6c..9b6e3ed6622 100644 --- a/src/backend/utils/misc/gen_guc_tables.pl +++ b/src/backend/utils/misc/gen_guc_tables.pl @@ -133,10 +133,17 @@ sub print_table { validate_guc_entry($entry); - if (defined($prev_name) && lc($prev_name) ge lc($entry->{name})) + if (defined($prev_name) && lc($prev_name) eq lc($entry->{name})) { die sprintf( - "entries are not in alphabetical order: \"%s\", \"%s\"\n", + qq{%s:%d: error: duplicate entry "%s"\n}, + $input_fname, $entry->{line_number}, $entry->{name}); + } + if (defined($prev_name) && lc($prev_name) gt lc($entry->{name})) + { + die sprintf( + qq{%s:%d: error: entries are not in alphabetical order: "%s", "%s"\n}, + $input_fname, $entry->{line_number}, $prev_name, $entry->{name}); } -- 2.43.0 [application/octet-stream] 0004-gen_guc_tables-validate-GUC-name-format.patch (1.1K, 5-0004-gen_guc_tables-validate-GUC-name-format.patch) download | inline diff: From 9f92321ce09309de1b25b04567195045235d5134 Mon Sep 17 00:00:00 2001 From: Zsolt Parragi <[email protected]> Date: Sun, 15 Mar 2026 11:22:34 +0000 Subject: [PATCH 4/4] gen_guc_tables: validate GUC name format Verify that each entry's name consists of letters, digits, and underscores, starting with a letter. This catches accidental spaces or special characters that would produce invalid C code. --- src/backend/utils/misc/gen_guc_tables.pl | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/backend/utils/misc/gen_guc_tables.pl b/src/backend/utils/misc/gen_guc_tables.pl index 9b6e3ed6622..2e8a3391046 100644 --- a/src/backend/utils/misc/gen_guc_tables.pl +++ b/src/backend/utils/misc/gen_guc_tables.pl @@ -83,6 +83,13 @@ sub validate_guc_entry } } + unless ($entry->{name} =~ /^[a-zA-Z][a-zA-Z0-9_]*$/) + { + die sprintf( + qq{%s:%d: error: entry name "%s" is not a valid GUC name (must start with a letter, contain only letters, digits, and underscores)\n}, + $input_fname, $entry->{line_number}, $entry->{name}); + } + unless (exists $type_specific_fields{ $entry->{type} }) { die sprintf( -- 2.43.0 ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: gen_guc_tables improvements @ 2026-03-15 22:27 Michael Paquier <[email protected]> parent: Zsolt Parragi <[email protected]> 1 sibling, 0 replies; 4+ messages in thread From: Michael Paquier @ 2026-03-15 22:27 UTC (permalink / raw) To: Zsolt Parragi <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Sun, Mar 15, 2026 at 12:18:53PM +0000, Zsolt Parragi wrote: > While reviewing a patch, I noticed a typo in guc_params.dat. The code > compiled and seemingly worked, and I was very surprised that the > generator script didn't catch the mistake. Nice. > I was also thinking about adding validations for the enum/define > values (config group, flags, guc context), but that requires a > somewhat fragile extraction code, and I decided to leave that out for > now. I'd say no on this one, which would mean going back-and-forth with the input. If these fail due to a compilation failure, it's not that bad either, it would be easy to pin-point that the failures are related to the values of the GUC, not the overall shape of the data in the input file. Saying that, it would depend on how much complexity this adds and what kind of validation we'd get out of it. If the additions to the script are simple like the ones you are proposing here, that would be probably OK if these improvements catch a bunch of ground-breaking inconsistencies. If the whole script would need to be refactored, probably not. > What do you think about these changes? Such sanity checks can save from stupid mistakes, for new or even seasoned developers of the tree. When I worked on the wait event parsing scripts or even the query jumbling for DDLs, I have noticed that checking ahead the shape of the input files to make sure that correct C code is produced may not catch all the mistakes, but catching most of them is more productive than catching none of them. The GUC script is new as of HEAD, so it seems more useful to do such adjustments now. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: gen_guc_tables improvements @ 2026-03-16 05:57 Zsolt Parragi <[email protected]> parent: Zsolt Parragi <[email protected]> 1 sibling, 1 reply; 4+ messages in thread From: Zsolt Parragi @ 2026-03-16 05:57 UTC (permalink / raw) To: Chao Li <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]> On Sun, Mar 15, 2026 at 10:27 PM Michael Paquier <[email protected]> wrote: > I'd say no on this one, which would mean going back-and-forth with the > input. If these fail due to a compilation failure, it's not that bad > either, it would be easy to pin-point that the failures are related to > the values of the GUC, not the overall shape of the data in the input > file. Saying that, it would depend on how much complexity this adds > and what kind of validation we'd get out of it. If the additions to > the script are simple like the ones you are proposing here, that would > be probably OK if these improvements catch a bunch of ground-breaking > inconsistencies. If the whole script would need to be refactored, > probably not. We can have two kinds of errors: 1. A typo in an enum name - this will result in a compilation failure most likely, and the compiler will provide nice error messages with the fix suggestion (e.g. GUC_NOT_IN_SAMPLE -> GUC_NOT_NI_SAMPLE, or PGC_SUSET -> PGC_SUSTE) 2. a mismatched enum because of a copy paste mistake (group says PGC_SUSET instead of WAL_ARCHIVING, or boot_val says WAL_ARCHIVING instead of ARCHIVE_MODE_OFF). The examples I provided there aren't likely copy-paste errors, that one is when you copy-paste an existing enum guc and forget to change its boot value. (2) currently compiles without warnings in both cases, but it is also usually easier to spot during review. The patch itself would be well contained: one or two utility functions (GUC_NOT_IN_SAMPLE and other flags are defines, not enums), and then calling them at a few places. So it's not a significant refactoring. But I'm also not a fan of parsing C source code in perl, that's what I meant by fragile. It seems to work, but I wouldn't guarantee that it can properly parse all edge cases. Thinking about this more, these checks would better be implemented as a custom clang-tidy check for example, which I already started to look into for another issue. On Mon, Mar 16, 2026 at 3:13 AM Chao Li <[email protected]> wrote: > I think dot is allowed in a GUC name. Looking at the C code: Yes, but by convention dot is only used by extensions to mark the prefix/namespace of the extension (<extension_name>.<guc_name>). None of the core server GUCs use it. The point of this script isn't to ensure that the generated C code will compile, but to prevent hidden errors and ensure we follow existing proper coding conventions. > BTW, I have similar patch [1] that verify duplicate GUC enum values. I forgotten about that, thanks for the reminder! I didn't like the runtime check approach of it, and I think there are more exceptions than the whitelisted items in the patch, but adding the ability to mark a guc value list unique could be another good candidate for a custom clang-tidy check. (assuming people like my clang-tidy idea, which I didn't even start a thread about yet) ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: gen_guc_tables improvements @ 2026-03-17 08:40 Michael Paquier <[email protected]> parent: Zsolt Parragi <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Michael Paquier @ 2026-03-17 08:40 UTC (permalink / raw) To: Zsolt Parragi <[email protected]>; +Cc: Chao Li <[email protected]>; PostgreSQL Hackers <[email protected]> On Mon, Mar 16, 2026 at 05:57:39AM +0000, Zsolt Parragi wrote: > Yes, but by convention dot is only used by extensions to mark the > prefix/namespace of the extension (<extension_name>.<guc_name>). None > of the core server GUCs use it. The point of this script isn't to > ensure that the generated C code will compile, but to prevent hidden > errors and ensure we follow existing proper coding conventions. I am not sure that it is a good idea to enforce that in the script this way, to be honest. There may be a point about this new rule being annoying for forks of the core code, at least, where they would like to add their own parameters with dots in the names. I have done that in the past, as one example, and I am sure that there are projects out there that do so, meaning the requirement to live with one more custom patch to make things work if this rule is enforced in gen_guc_tables.pl. Saying that, as far as I can see 0001 and 0003 will save some time when inserting some incorrect data. The case of duplicated names was indeed confusing, and 0003 can save from typos when specifying non-mandatory fields. So I have applied these two. 0004 has some value here, to save the .dat from copy-paste bloat. I am wondering what others think about it. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2026-03-17 08:40 UTC | newest] Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-03-15 12:18 gen_guc_tables improvements Zsolt Parragi <[email protected]> 2026-03-15 22:27 ` Michael Paquier <[email protected]> 2026-03-16 05:57 ` Zsolt Parragi <[email protected]> 2026-03-17 08:40 ` Michael Paquier <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox