public inbox for [email protected]  
help / color / mirror / Atom feed
Re: [PATCH] Refactor SLRU to always use long file names
7+ messages / 2 participants
[nested] [flat]

* Re: [PATCH] Refactor SLRU to always use long file names
@ 2024-11-08 04:21  Michael Paquier <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Michael Paquier @ 2024-11-08 04:21 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On Thu, Sep 12, 2024 at 12:33:14PM +0300, Aleksander Alekseev wrote:
> It wouldn't hurt re-checking the segment file names in the TAP test
> but this would mean hardcoding catalog names which as I understand you
> want to avoid. With high probability PG wouldn't start if the
> corresponding piece of pg_upgrade is wrong (I checked more than once
> :). So I'm not entirely sure if it's worth the effort, but let's see
> what others think.

+        segno = strtoi64(de->d_name, NULL, 16);
+        snprintf(new_path, MAXPGPATH, "%s/%015llX", dir_path,
+                    (long long) segno);
+        snprintf(old_path, MAXPGPATH, "%s/%s", dir_path, de->d_name);
+
+        if (pg_mv_file(old_path, new_path) != 0)
+            pg_fatal("could not rename file \"%s\" to \"%s\": %m",
+                     old_path, new_path);

Your patch is just doing a rename() of the files from short to long
names.  How about adding a new TAP script in pg_upgrade that creates a
couple of empty files with short files names in each path that needs
to do the transfer?  Then the test could run one pg_upgrade command
and check that the new names are in place.  You could use a array of
objects, with the base path, the old name and the new name, then loop
over it.  With the check in check_slru_segment_filenames() based on
SLRU_SEG_FILENAMES_CHANGE_CAT_VER, that should work.

+	static const char* dirs[] = {
+		"pg_xact",
+		"pg_commit_ts",
+		"pg_multixact/offsets",
+		"pg_multixact/members",
+		"pg_subtrans",
+		"pg_serial",
+	};

Hardcoding that is always annoying, but well, that's not going to
change.  So living with this is not going to be a maintenance burden.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: [PATCH] Refactor SLRU to always use long file names
@ 2024-11-12 14:15  Aleksander Alekseev <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Aleksander Alekseev @ 2024-11-12 14:15 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Michael Paquier <[email protected]>

Hi Michael,

Thanks for your feedback!

> Your patch is just doing a rename() of the files from short to long
> names.  How about adding a new TAP script in pg_upgrade that creates a
> couple of empty files with short files names in each path that needs
> to do the transfer?  Then the test could run one pg_upgrade command
> and check that the new names are in place.  You could use a array of
> objects, with the base path, the old name and the new name, then loop
> over it.  With the check in check_slru_segment_filenames() based on
> SLRU_SEG_FILENAMES_CHANGE_CAT_VER, that should work.

OK I gave it a try and discovered that the test becomes very ugly very
fast. Attached is the draft of the test to give you an idea of how
it's going to look like.

In order to trigger renaming of SLRU segments first we have to
downgrade the catalog version in the pg_control file. Otherwise the
check in check_slru_segment_filenames() is not going to pass and
pg_upgrade will do nothing (*). This per se is easily done with
binmode() and pack() however the file has a CRC. Calculating it is not
difficult since we have pg_read_binary_file() and crc32c() SQL
functions, although personally I don't find a need for starting a
cluster for this quite satisfactory. The CRC is stored by the offset
`sizeof(ControlData)-4` and unless I'm wrong is platform-dependent.

I see several solutions for this problem:

* We could add sizeof(ControlData) to the output of `pg_controldata`
so we could use it from Perl
* We could teach `initdb` to override the catalog version
* We could implement a new tool for editing pg_control file

On top of that I should add that the test takes about 7 seconds on my
laptop. Apparently executing two initdb's and one pg_upgrade is not
very cheap. This makes me wonder if instead of writing a new test we
should modify 002_pg_upgrade.pl. This however will make the test even
less readable and maintainable.

What do you think?

(*) BTW I noticed a mistake in the commented code. The condition
should be `>=`, not `<`, i.e:

```
    if(new_cluster.controldata.cat_ver >= SLRU_SEG_FILENAMES_CHANGE_CAT_VER)
        return;
```

-- 
Best regards,
Aleksander Alekseev

# Copyright (c) 2024, PostgreSQL Global Development Group

use strict;
use warnings FATAL => 'all';

use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;

# This test ensures that pg_upgrade renames SLRU segments.
# After the upgrade all segments should have long file names.

# equals SLRU_SEG_FILENAMES_CHANGE_CAT_VER in pg_upgrade.h
my $slru_seg_filenames_change_cat_ver = 202411121;

my @slru_dirs = (
	"pg_xact",
	"pg_commit_ts",
	"pg_multixact/offsets",
	"pg_multixact/members",
	"pg_subtrans",
	"pg_serial",
);

my $short_segment_name = "1234";
my $long_segment_name = "000000000001234";

my $oldnode = PostgreSQL::Test::Cluster->new('old_node');
$oldnode->init();
my $oldbindir = $oldnode->config_data('--bindir');

my $newnode = PostgreSQL::Test::Cluster->new('new_node');
$newnode->init();
my $newbindir = $newnode->config_data('--bindir');

# Fill data_dir of the old node with SLRU segments that use short file names.
# pg_upgrade renames the files without looking at the content, so the content
# is not important.
foreach my $dir (@slru_dirs)
{
	my $fname = $oldnode->data_dir."/".$dir."/".$short_segment_name;
	open my $fh, ">", $fname or die $!;
	close $fh;
}

# Modify pg_control of the old node to make it look like a version that needs
# migration (decrease ControlData->cat_ver). Otherwise pg_upgrade will skip it.
my $pg_control_fname = $oldnode->data_dir."/global/pg_control";

open my $fh, "+<", $pg_control_fname or die $!;
binmode($fh);
sysseek($fh, 12, 0);
my $binval = pack("L!", $slru_seg_filenames_change_cat_ver - 1);
syswrite($fh, $binval, 4);
close($fh);

# Calculate CRC of the updated file using pg_read_binary_file() and crc32c()
my $fsize = -s $pg_control_fname; # WRONG! should be sizeof(ControlFile)
$newnode->start;
my $newcrc;
$newnode->psql(
	"postgres",
	"SELECT crc32(pg_read_binary_file('".$pg_control_fname."',0,".($fsize-4)."));",
	stdout => \$newcrc,
	on_error_die => 1,
	);
$newnode->stop;

# Update CRC
open $fh, "+<", $pg_control_fname or die $!;
binmode($fh);
sysseek($fh, $fsize-4, 0);
my $bincrc = pack("L!", $newcrc);
syswrite($fh, $bincrc, 4);
close($fh);

$newnode->command_ok(
	[
		'pg_upgrade',
		'--old-datadir', $oldnode->data_dir,
		'--new-datadir', $newnode->data_dir,
		'--old-bindir', $oldbindir,
		'--new-bindir', $newbindir,
	],
	'run of pg_upgrade');

# Check that pg_upgrade renamed the SLRU segments we created
foreach my $dir (@slru_dirs)
{
	ok(-e $newnode->data_dir."/".$dir."/".$long_segment_name);
}

done_testing();


Attachments:

  [text/plain] 005_slru.pl.txt (2.5K, 2-005_slru.pl.txt)
  download | inline:
# Copyright (c) 2024, PostgreSQL Global Development Group

use strict;
use warnings FATAL => 'all';

use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;

# This test ensures that pg_upgrade renames SLRU segments.
# After the upgrade all segments should have long file names.

# equals SLRU_SEG_FILENAMES_CHANGE_CAT_VER in pg_upgrade.h
my $slru_seg_filenames_change_cat_ver = 202411121;

my @slru_dirs = (
	"pg_xact",
	"pg_commit_ts",
	"pg_multixact/offsets",
	"pg_multixact/members",
	"pg_subtrans",
	"pg_serial",
);

my $short_segment_name = "1234";
my $long_segment_name = "000000000001234";

my $oldnode = PostgreSQL::Test::Cluster->new('old_node');
$oldnode->init();
my $oldbindir = $oldnode->config_data('--bindir');

my $newnode = PostgreSQL::Test::Cluster->new('new_node');
$newnode->init();
my $newbindir = $newnode->config_data('--bindir');

# Fill data_dir of the old node with SLRU segments that use short file names.
# pg_upgrade renames the files without looking at the content, so the content
# is not important.
foreach my $dir (@slru_dirs)
{
	my $fname = $oldnode->data_dir."/".$dir."/".$short_segment_name;
	open my $fh, ">", $fname or die $!;
	close $fh;
}

# Modify pg_control of the old node to make it look like a version that needs
# migration (decrease ControlData->cat_ver). Otherwise pg_upgrade will skip it.
my $pg_control_fname = $oldnode->data_dir."/global/pg_control";

open my $fh, "+<", $pg_control_fname or die $!;
binmode($fh);
sysseek($fh, 12, 0);
my $binval = pack("L!", $slru_seg_filenames_change_cat_ver - 1);
syswrite($fh, $binval, 4);
close($fh);

# Calculate CRC of the updated file using pg_read_binary_file() and crc32c()
my $fsize = -s $pg_control_fname; # WRONG! should be sizeof(ControlFile)
$newnode->start;
my $newcrc;
$newnode->psql(
	"postgres",
	"SELECT crc32(pg_read_binary_file('".$pg_control_fname."',0,".($fsize-4)."));",
	stdout => \$newcrc,
	on_error_die => 1,
	);
$newnode->stop;

# Update CRC
open $fh, "+<", $pg_control_fname or die $!;
binmode($fh);
sysseek($fh, $fsize-4, 0);
my $bincrc = pack("L!", $newcrc);
syswrite($fh, $bincrc, 4);
close($fh);

$newnode->command_ok(
	[
		'pg_upgrade',
		'--old-datadir', $oldnode->data_dir,
		'--new-datadir', $newnode->data_dir,
		'--old-bindir', $oldbindir,
		'--new-bindir', $newbindir,
	],
	'run of pg_upgrade');

# Check that pg_upgrade renamed the SLRU segments we created
foreach my $dir (@slru_dirs)
{
	ok(-e $newnode->data_dir."/".$dir."/".$long_segment_name);
}

done_testing();

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: [PATCH] Refactor SLRU to always use long file names
@ 2024-11-12 14:37  Aleksander Alekseev <[email protected]>
  parent: Aleksander Alekseev <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Aleksander Alekseev @ 2024-11-12 14:37 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Michael Paquier <[email protected]>

Hi again,

Just a quick follow-up.

> (*) BTW I noticed a mistake in the commented code. The condition
> should be `>=`, not `<`, i.e:
>
> ```
>     if(new_cluster.controldata.cat_ver >= SLRU_SEG_FILENAMES_CHANGE_CAT_VER)
>         return;
> ```

The concentration of caffeine in my blood is a bit low right now. I
suspect I may need to re-check this statement with a fresh head.

Also it occured to me that as a 4th option we could just get rid of
this check. Users however will pay the price every time they execute
pg_upgrade so I doubt we are going to do this.

-- 
Best regards,
Aleksander Alekseev






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: [PATCH] Refactor SLRU to always use long file names
@ 2024-11-14 01:11  Michael Paquier <[email protected]>
  parent: Aleksander Alekseev <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Michael Paquier @ 2024-11-14 01:11 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On Tue, Nov 12, 2024 at 05:37:02PM +0300, Aleksander Alekseev wrote:
> Also it occured to me that as a 4th option we could just get rid of
> this check. Users however will pay the price every time they execute
> pg_upgrade so I doubt we are going to do this.

We cannot remove the check, or Nathan will come after us as he's
working hard on reducing the time pg_upgrade takes.  We should not
make it longer if there is no need to.

The scans may be quite long as well, actually, which could be a
bottleneck.  Did you measure the runtime with a maximized (still
realistic) pool of files for these SLRUs in the upgrade time?  For
upgrades, data would be the neck.

# equals SLRU_SEG_FILENAMES_CHANGE_CAT_VER in pg_upgrade.h
my $slru_seg_filenames_change_cat_ver = 202411121;
[...]
open my $fh, "+<", $pg_control_fname or die $!;
binmode($fh);
sysseek($fh, 12, 0);
my $binval = pack("L!", $slru_seg_filenames_change_cat_ver - 1);
syswrite($fh, $binval, 4);
close($fh);

Control file manipulation may be useful as a routine in Cluster.pm,
based on an offset in the file and a format to pack as argument?  Note
that this also depends on the system endianness, see 039_end_of_wal.pl.
It's one of these things I could see myself reuse to force a state in
the cluster and make a test cheaper, for example.  You don't really
need the lookup part, actually?  You would just need the part where
the control file is rewritten, which should be OK as long as the
cluster is freshly initdb'd meaning that there should be nothing that
interacts with the new value set.  pg_upgrade only has CAT_VER flags
for some multixact changes and the jsonb check from 9.4.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: [PATCH] Refactor SLRU to always use long file names
@ 2024-11-14 11:04  Aleksander Alekseev <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Aleksander Alekseev @ 2024-11-14 11:04 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Michael Paquier <[email protected]>

Hi Michael,

> The scans may be quite long as well, actually, which could be a
> bottleneck.  Did you measure the runtime with a maximized (still
> realistic) pool of files for these SLRUs in the upgrade time?  For
> upgrades, data would be the neck.

Good question.

In theory SLRUs are not supposed to grow large and their size is a
small fraction of the rest of the database. As an example CLOG (
pg_xact/ ) stores 2 bits per transaction. Since every SLRU has a
dedicated directory and we scan just it, non-SLRU files don't affect
the scan time.

To make sure I asked several people to check how many SLRUs they have
in the prod environment. The typical response looked like this:

```
$PGDATA/pg_xact: 191 segments
$PGDATA/pg_commit_ts: 3
$PGDATA/pg_multixact/offsets: 148
$PGDATA/pg_multixact/members: 400
$PGDATA/pg_subtrans: 4
$PGDATA/pg_serial: 3
```

This is a 800 Gb database. Interestingly larger databases (4.2Tb) may
have much less SLRU segments (220 in total, most of them are pg_xact).

And here is the *worst* case that was reported to me:

```
$PGDATA/pg_xact: 171 segments
$PGDATA/pg_commit_ts: 3
$PGDATA/pg_multixact/offsets: 4864
$PGDATA/pg_multixact/members: 40996
$PGDATA/pg_subtrans: 5
$PGDATA/pg_serial: 3
```

I was told this is a "1Tb+" database. For this user pg_upgrade will
rename 45 000 files. I wrote a little script to check how much time it
will take:

```
#!/usr/bin/env perl

use strict;

my $from = "test_0001.tmp";
my $to = "test_0002.tmp";

system("touch $from");

for my $i (1..45000) {
    rename($from, $to);
    ($from, $to) = ($to, $from);
}
```

On my laptop I get 0.5 seconds. Note that I don't do scanning, only
renaming, assuming that the recent should take most of the time. I
think this should be multiplied by 10 to take into account the role of
the filesystem cache and other factors.

All in all in the absolutely worst case scenario this shouldn't take
more than 5 seconds, in reality it will probably be orders of
magnitude less.

> Note that this also depends on the system endianness, see 039_end_of_wal.pl.

Sure, I think I took it into account when using pack("L!"). My
understanding is that "L" takes care of the endiness since I see
special flags to force little- or big-endiness independently from the
platform [1]. This of course should be tested in practice on different
machines. Using an exclamation mark in "L!" was a mistake since
cat_ver is not an int, but rather an uint32.

> You don't really need the lookup part, actually?

For lookup we already have the pg_controldata tool, that's not a problem.

> Control file manipulation may be useful as a routine in Cluster.pm,
> based on an offset in the file and a format to pack as argument?
> [...]
> It's one of these things I could see myself reuse to force a state in
> the cluster and make a test cheaper, for example.

> You would just need the part where
> the control file is rewritten, which should be OK as long as the
> cluster is freshly initdb'd meaning that there should be nothing that
> interacts with the new value set.

Agree. Still I don't see a good way of figuring out
sizeof(ControlFileData) from Perl. The structure has int's in it (e.g.
wal_level, MaxConnections, etc) thus the size is platform-dependent.
The CRC should be placed at the end of the structure. If we want to
manipulate MaxConnections etc their offsets are going to be
platform-dependent as well. And my understanding is that the alignment
is platform/compiler dependent too.

I guess we are going to need either a `pg_writecontoldata` tool or
`pg_controldata -w` flag. I wonder which option you find more
attractive, or maybe you have better ideas?

[1]: https://perldoc.perl.org/functions/pack

-- 
Best regards,
Aleksander Alekseev






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: [PATCH] Refactor SLRU to always use long file names
@ 2024-12-05 08:08  Aleksander Alekseev <[email protected]>
  parent: Aleksander Alekseev <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Aleksander Alekseev @ 2024-12-05 08:08 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Michael Paquier <[email protected]>

Hi,

> I guess we are going to need either a `pg_writecontoldata` tool or
> `pg_controldata -w` flag. I wonder which option you find more
> attractive, or maybe you have better ideas?

For the record, Michael and I had a brief discussion about this
offlist and decided to abandon the idea of adding TAP tests, relying
only on buildfarm. Also I will check if we have a clear error message
in case when a user forgot to run pg_upgrade and running new slru.c
with old filenames. If the user doesn't get such an error message I
will see if it's possible to add it somewhere in slru.c without
introducing much performance overhead.

Also I'm going to submit precise steps to test this migration manually
for the reviewers convenience.

-- 
Best regards,
Aleksander Alekseev






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: [PATCH] Refactor SLRU to always use long file names
@ 2025-01-06 13:23  Aleksander Alekseev <[email protected]>
  parent: Aleksander Alekseev <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Aleksander Alekseev @ 2025-01-06 13:23 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Michael Paquier <[email protected]>

Hi,

> For the record, Michael and I had a brief discussion about this
> offlist and decided to abandon the idea of adding TAP tests, relying
> only on buildfarm. Also I will check if we have a clear error message
> in case when a user forgot to run pg_upgrade and running new slru.c
> with old filenames. If the user doesn't get such an error message I
> will see if it's possible to add it somewhere in slru.c without
> introducing much performance overhead.
>
> Also I'm going to submit precise steps to test this migration manually
> for the reviewers convenience.

Here is an updated patch. The steps to test it manually are as follows.

Compile and install PostgreSQL from the REL_17_STABLE branch:

```
git checkout REL_17_STABLE
git fetch origin
git rebase -i origin/REL_17_STABLE
git clean -dfx
meson setup --buildtype debug -Dcassert=true -Dtap_tests=enabled
-Dprefix=/home/eax/pginstall-17 build
ninja -C build
meson install -C build
~/pginstall-17/bin/initdb --data-checksums -D ~/pginstall-17/data
~/pginstall-17/bin/pg_ctl -D ~/pginstall-17/data -l
~/pginstall-17/data/logfile start
~/pginstall-17/bin/createdb $(whoami)

# fill DB (or even better - use a copy of an existing one), e.g:
~/pginstall-17/bin/pgbench -i -s 100
~/pginstall-17/bin/pgbench -j 16 -c 16 -T 10 -P 5

# should see 4-digit SLRU segment filenames, more files is better
 ls -la ~/pginstall-17/data/pg_xact/ \
    ~/pginstall-17/data/pg_commit_ts/ \
    ~/pginstall-17/data/pg_multixact/members/ \
    ~/pginstall-17/data/pg_multixact/offsets/ \
    ~/pginstall-17/data/pg_subtrans/ \
    ~/pginstall-17/data/pg_serial/

~/pginstall-17/bin/pg_ctl -D ~/pginstall-17/data stop
```

Apply the patch to the `master` branch, recompile PostgreSQL, install
to the different location:

```
git checkout slru_pg_upgrade_v2
git clean -dfx
meson setup --buildtype debug -Dcassert=true -Dtap_tests=enabled
-Dprefix=/home/eax/pginstall-18 build
ninja -C build
meson install -C build
```

Try to start PostgreSQL without running pg_upgrade:

```
cp -r ~/pginstall-17/data ~/pginstall-18/data
~/pginstall-18/bin/pg_ctl -D ~/pginstall-18/data -l
~/pginstall-18/data/logfile start
```

You should get:

```
waiting for server to start.... stopped waiting
pg_ctl: could not start server
Examine the log output.

$ tail ~/pginstall-18/data/logfile

FATAL:  database files are incompatible with server
DETAIL:  The data directory was initialized by PostgreSQL version 17,
which is not compatible with this version 18devel
```

Run pg_upgrade:

```
rm -r ~/pginstall-18/data
~/pginstall-18/bin/initdb --data-checksums -D ~/pginstall-18/data
~/pginstall-18/bin/pg_upgrade
--old-datadir=/home/eax/pginstall-17/data
--new-datadir=/home/eax/pginstall-18/data
--old-bindir=/home/eax/pginstall-17/bin
--new-bindir=/home/eax/pginstall-18/bin
```

Make sure the output contains:

```
Renaming SLRU segments in pg_xact                             ok
Renaming SLRU segments in pg_commit_ts                        ok
Renaming SLRU segments in pg_multixact/offsets                ok
Renaming SLRU segments in pg_multixact/members                ok
Renaming SLRU segments in pg_subtrans                         ok
Renaming SLRU segments in pg_serial                           ok
```

Make sure PostgreSQL starts after the upgrade:

```
~/pginstall-18/bin/pg_ctl -D ~/pginstall-18/data -l
~/pginstall-18/data/logfile start
~/pginstall-18/bin/psql -c 'select count(*) from pgbench_accounts'
~/pginstall-18/bin/pg_ctl -D ~/pginstall-18/data stop

# should see 15-digit SLRU segment filenames
ls -la ~/pginstall-18/data/pg_xact/ \
    ~/pginstall-18/data/pg_commit_ts/ \
    ~/pginstall-18/data/pg_multixact/members/ \
    ~/pginstall-18/data/pg_multixact/offsets/ \
    ~/pginstall-18/data/pg_subtrans/ \
    ~/pginstall-18/data/pg_serial/
```

Make sure that the second run of pg_upgrade doesn't produce "Renaming
SLRU segments" messages:

```
mv ~/pginstall-18/data ~/pginstall-18/data.bak
~/pginstall-18/bin/initdb --data-checksums -D ~/pginstall-18/data
~/pginstall-18/bin/pg_upgrade
--old-datadir=/home/eax/pginstall-18/data.bak
--new-datadir=/home/eax/pginstall-18/data
--old-bindir=/home/eax/pginstall-18/bin
--new-bindir=/home/eax/pginstall-18/bin
```

As always, your feedback and suggestions are most welcomed.

-- 
Best regards,
Aleksander Alekseev


Attachments:

  [application/octet-stream] v2-0001-Always-use-long-SLRU-segment-file-names.patch (14.9K, 2-v2-0001-Always-use-long-SLRU-segment-file-names.patch)
  download | inline diff:
From 7ae61fbb2fb310fafc5360222e28c881524ba83b Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <[email protected]>
Date: Wed, 11 Sep 2024 13:17:33 +0300
Subject: [PATCH v2] Always use long SLRU segment file names

PG17 introduced long SLRU segment file names (commit 4ed8f0913bfd). We used
short or long file names depending on SlruCtl->long_segment_names. This commit
refactors SLRU to always use long file names in order to simplify the code.

Aleksander Alekseev, reviewed by Michael Paquier
Discussion: https://postgr.es/m/CAJ7c6TOy7fUW9MuNeOWor3cSFnQg9tgz=mjXHDb94GORtM_Eyg@mail.gmail.com

(!!!) bump catversion and change the corresponding TODO FIXME line in pg_upgrade.h
---
 src/backend/access/transam/clog.c           |  2 +-
 src/backend/access/transam/commit_ts.c      |  3 +-
 src/backend/access/transam/multixact.c      |  6 +-
 src/backend/access/transam/slru.c           | 73 ++++----------------
 src/backend/access/transam/subtrans.c       |  2 +-
 src/backend/commands/async.c                |  2 +-
 src/backend/storage/lmgr/predicate.c        |  2 +-
 src/bin/pg_upgrade/pg_upgrade.c             | 74 +++++++++++++++++++++
 src/bin/pg_upgrade/pg_upgrade.h             |  6 ++
 src/bin/pg_verifybackup/t/003_corruption.pl |  2 +-
 src/include/access/slru.h                   | 10 +--
 src/test/modules/test_slru/test_slru.c      |  8 +--
 12 files changed, 104 insertions(+), 86 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 0d556c00b8c..7a238efc227 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -810,7 +810,7 @@ CLOGShmemInit(void)
 	XactCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(XactCtl, "transaction", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
 				  "pg_xact", LWTRANCHE_XACT_BUFFER,
-				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG, false);
+				  LWTRANCHE_XACT_SLRU, SYNC_HANDLER_CLOG);
 	SlruPagePrecedesUnitTests(XactCtl, CLOG_XACTS_PER_PAGE);
 }
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 95049acd0b5..99252cd9b87 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -556,8 +556,7 @@ CommitTsShmemInit(void)
 	SimpleLruInit(CommitTsCtl, "commit_timestamp", CommitTsShmemBuffers(), 0,
 				  "pg_commit_ts", LWTRANCHE_COMMITTS_BUFFER,
 				  LWTRANCHE_COMMITTS_SLRU,
-				  SYNC_HANDLER_COMMIT_TS,
-				  false);
+				  SYNC_HANDLER_COMMIT_TS);
 	SlruPagePrecedesUnitTests(CommitTsCtl, COMMIT_TS_XACTS_PER_PAGE);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 27ccdf9500f..2cc64289054 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1974,15 +1974,13 @@ MultiXactShmemInit(void)
 				  "multixact_offset", multixact_offset_buffers, 0,
 				  "pg_multixact/offsets", LWTRANCHE_MULTIXACTOFFSET_BUFFER,
 				  LWTRANCHE_MULTIXACTOFFSET_SLRU,
-				  SYNC_HANDLER_MULTIXACT_OFFSET,
-				  false);
+				  SYNC_HANDLER_MULTIXACT_OFFSET);
 	SlruPagePrecedesUnitTests(MultiXactOffsetCtl, MULTIXACT_OFFSETS_PER_PAGE);
 	SimpleLruInit(MultiXactMemberCtl,
 				  "multixact_member", multixact_member_buffers, 0,
 				  "pg_multixact/members", LWTRANCHE_MULTIXACTMEMBER_BUFFER,
 				  LWTRANCHE_MULTIXACTMEMBER_SLRU,
-				  SYNC_HANDLER_MULTIXACT_MEMBER,
-				  false);
+				  SYNC_HANDLER_MULTIXACT_MEMBER);
 	/* doesn't call SimpleLruTruncate() or meet criteria for unit tests */
 
 	/* Initialize our shared state struct */
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 7eeaafe2cb3..7e2a12d6a0d 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -77,42 +77,22 @@
  *
  * "path" should point to a buffer at least MAXPGPATH characters long.
  *
- * If ctl->long_segment_names is true, segno can be in the range [0, 2^60-1].
- * The resulting file name is made of 15 characters, e.g. dir/123456789ABCDEF.
- *
- * If ctl->long_segment_names is false, segno can be in the range [0, 2^24-1].
- * The resulting file name is made of 4 to 6 characters, as of:
- *
- *  dir/1234   for [0, 2^16-1]
- *  dir/12345  for [2^16, 2^20-1]
- *  dir/123456 for [2^20, 2^24-1]
+ * segno can be in the range [0, 2^60-1]. The resulting file name is made
+ * of 15 characters, e.g. dir/123456789ABCDEF.
  */
 static inline int
 SlruFileName(SlruCtl ctl, char *path, int64 segno)
 {
-	if (ctl->long_segment_names)
-	{
-		/*
-		 * We could use 16 characters here but the disadvantage would be that
-		 * the SLRU segments will be hard to distinguish from WAL segments.
-		 *
-		 * For this reason we use 15 characters. It is enough but also means
-		 * that in the future we can't decrease SLRU_PAGES_PER_SEGMENT easily.
-		 */
-		Assert(segno >= 0 && segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
-		return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir,
-						(long long) segno);
-	}
-	else
-	{
-		/*
-		 * Despite the fact that %04X format string is used up to 24 bit
-		 * integers are allowed. See SlruCorrectSegmentFilenameLength()
-		 */
-		Assert(segno >= 0 && segno <= INT64CONST(0xFFFFFF));
-		return snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir,
-						(unsigned int) segno);
-	}
+	/*
+	 * We could use 16 characters here but the disadvantage would be that
+	 * the SLRU segments will be hard to distinguish from WAL segments.
+	 *
+	 * For this reason we use 15 characters. It is enough but also means
+	 * that in the future we can't decrease SLRU_PAGES_PER_SEGMENT easily.
+	 */
+	Assert(segno >= 0 && segno <= INT64CONST(0xFFFFFFFFFFFFFFF));
+	return snprintf(path, MAXPGPATH, "%s/%015llX", ctl->Dir,
+					(long long) segno);
 }
 
 /*
@@ -251,7 +231,7 @@ SimpleLruAutotuneBuffers(int divisor, int max)
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			  const char *subdir, int buffer_tranche_id, int bank_tranche_id,
-			  SyncRequestHandler sync_handler, bool long_segment_names)
+			  SyncRequestHandler sync_handler)
 {
 	SlruShared	shared;
 	bool		found;
@@ -342,7 +322,6 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 */
 	ctl->shared = shared;
 	ctl->sync_handler = sync_handler;
-	ctl->long_segment_names = long_segment_names;
 	ctl->bank_mask = (nslots / SLRU_BANK_SIZE) - 1;
 	strlcpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
@@ -1748,30 +1727,6 @@ SlruScanDirCbDeleteAll(SlruCtl ctl, char *filename, int64 segpage, void *data)
 	return false;				/* keep going */
 }
 
-/*
- * An internal function used by SlruScanDirectory().
- *
- * Returns true if a file with a name of a given length may be a correct
- * SLRU segment.
- */
-static inline bool
-SlruCorrectSegmentFilenameLength(SlruCtl ctl, size_t len)
-{
-	if (ctl->long_segment_names)
-		return (len == 15);		/* see SlruFileName() */
-	else
-
-		/*
-		 * Commit 638cf09e76d allowed 5-character lengths. Later commit
-		 * 73c986adde5 allowed 6-character length.
-		 *
-		 * Note: There is an ongoing plan to migrate all SLRUs to 64-bit page
-		 * numbers, and the corresponding 15-character file names, which may
-		 * eventually deprecate the support for 4, 5, and 6-character names.
-		 */
-		return (len == 4 || len == 5 || len == 6);
-}
-
 /*
  * Scan the SimpleLru directory and apply a callback to each file found in it.
  *
@@ -1803,7 +1758,7 @@ SlruScanDirectory(SlruCtl ctl, SlruScanCallback callback, void *data)
 
 		len = strlen(clde->d_name);
 
-		if (SlruCorrectSegmentFilenameLength(ctl, len) &&
+		if ((len == 15) &&
 			strspn(clde->d_name, "0123456789ABCDEF") == len)
 		{
 			segno = strtoi64(clde->d_name, NULL, 16);
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 15153618fad..58a5ef657ea 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -243,7 +243,7 @@ SUBTRANSShmemInit(void)
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "subtransaction", SUBTRANSShmemBuffers(), 0,
 				  "pg_subtrans", LWTRANCHE_SUBTRANS_BUFFER,
-				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE, false);
+				  LWTRANCHE_SUBTRANS_SLRU, SYNC_HANDLER_NONE);
 	SlruPagePrecedesUnitTests(SubTransCtl, SUBTRANS_XACTS_PER_PAGE);
 }
 
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 4bd37d5beb5..373b0357fad 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -537,7 +537,7 @@ AsyncShmemInit(void)
 	NotifyCtl->PagePrecedes = asyncQueuePagePrecedes;
 	SimpleLruInit(NotifyCtl, "notify", notify_buffers, 0,
 				  "pg_notify", LWTRANCHE_NOTIFY_BUFFER, LWTRANCHE_NOTIFY_SLRU,
-				  SYNC_HANDLER_NONE, true);
+				  SYNC_HANDLER_NONE);
 
 	if (!found)
 	{
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 5b21a053981..bc83e8e859d 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -814,7 +814,7 @@ SerialInit(void)
 	SimpleLruInit(SerialSlruCtl, "serializable",
 				  serializable_buffers, 0, "pg_serial",
 				  LWTRANCHE_SERIAL_BUFFER, LWTRANCHE_SERIAL_SLRU,
-				  SYNC_HANDLER_NONE, false);
+				  SYNC_HANDLER_NONE);
 #ifdef USE_ASSERT_CHECKING
 	SerialPagePrecedesLogicallyUnitTests();
 #endif
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 36c7f3879d5..6d3dcc63d2b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -38,6 +38,7 @@
 
 #include "postgres_fe.h"
 
+#include <dirent.h>
 #include <time.h>
 
 #include "catalog/pg_class_d.h"
@@ -59,6 +60,8 @@ static void prepare_new_cluster(void);
 static void prepare_new_globals(void);
 static void create_new_objects(void);
 static void copy_xact_xlog_xid(void);
+static void check_slru_segment_filenames(void);
+static void rename_slru_segments(const char *dirname);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0);
@@ -154,6 +157,7 @@ main(int argc, char **argv)
 	 */
 
 	copy_xact_xlog_xid();
+	check_slru_segment_filenames();
 
 	/* New now using xids of the old system */
 
@@ -806,6 +810,76 @@ copy_xact_xlog_xid(void)
 	check_ok();
 }
 
+static void
+rename_slru_segments(const char* dirname)
+{
+	DIR		   *dir;
+	struct dirent *de;
+	int 		len;
+	int64 		segno;
+	char		dir_path[MAXPGPATH];
+	char		old_path[MAXPGPATH];
+	char		new_path[MAXPGPATH];
+
+	prep_status("Renaming SLRU segments in %s", dirname);
+	snprintf(dir_path, sizeof(dir_path), "%s/%s", new_cluster.pgdata, dirname);
+
+	dir = opendir(dir_path);
+	if (dir == NULL)
+		pg_fatal("could not open directory \"%s\": %m", dir_path);
+
+	while (errno = 0, (de = readdir(dir)) != NULL)
+	{
+		/*
+		 * ignore '.', '..' and everything else that doesn't look
+		 * like an SLRU segment with a short file name
+		 */
+
+		len = strlen(de->d_name);
+		if(len != 4 && len != 5 && len != 6)
+			continue;
+
+		if(strspn(de->d_name, "0123456789ABCDEF") != len)
+			continue;
+
+		segno = strtoi64(de->d_name, NULL, 16);
+		snprintf(new_path, MAXPGPATH, "%s/%015llX", dir_path,
+					(long long) segno);
+		snprintf(old_path, MAXPGPATH, "%s/%s", dir_path, de->d_name);
+
+		if (pg_mv_file(old_path, new_path) != 0)
+			pg_fatal("could not rename file \"%s\" to \"%s\": %m",
+					 old_path, new_path);
+	}
+
+	if (errno)
+		pg_fatal("could not read directory \"%s\": %m", dir_path);
+
+	if (closedir(dir))
+		pg_fatal("could not close directory \"%s\": %m", dir_path);
+
+	check_ok();
+}
+
+static void
+check_slru_segment_filenames(void)
+{
+	int i;
+	static const char* dirs[] = {
+		"pg_xact",
+		"pg_commit_ts",
+		"pg_multixact/offsets",
+		"pg_multixact/members",
+		"pg_subtrans",
+		"pg_serial",
+	};
+
+	if(old_cluster.controldata.cat_ver >= SLRU_SEG_FILENAMES_CHANGE_CAT_VER)
+		return;
+
+	for (i = 0; i < sizeof(dirs)/sizeof(dirs[0]); i++)
+		rename_slru_segments(dirs[i]);
+}
 
 /*
  *	set_frozenxids()
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 0cdd675e4f1..a839f19e310 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -125,6 +125,12 @@ extern char *output_files[];
  */
 #define JSONB_FORMAT_CHANGE_CAT_VER 201409291
 
+/*
+ * change of SLRU segment filenames length in 18.0
+ * TODO FIXME CHANGE TO THE ACTUAL VALUE BEFORE COMMITTING
+ */
+#define SLRU_SEG_FILENAMES_CHANGE_CAT_VER 202412201
+
 
 /*
  * Each relation is represented by a relinfo structure.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index 1111b09637d..9d1dbb93d72 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -237,7 +237,7 @@ sub mutilate_extra_tablespace_file
 sub mutilate_missing_file
 {
 	my ($backup_path) = @_;
-	my $pathname = "$backup_path/pg_xact/0000";
+	my $pathname = "$backup_path/pg_xact/000000000000000";
 	unlink($pathname) || die "$pathname: $!";
 	return;
 }
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index ae871b640f8..490ea85c5e3 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -133,13 +133,6 @@ typedef struct SlruCtlData
 	 */
 	bits16		bank_mask;
 
-	/*
-	 * If true, use long segment file names.  Otherwise, use short file names.
-	 *
-	 * For details about the file name format, see SlruFileName().
-	 */
-	bool		long_segment_names;
-
 	/*
 	 * Which sync handler function to use when handing sync requests over to
 	 * the checkpointer.  SYNC_HANDLER_NONE to disable fsync (eg pg_notify).
@@ -187,8 +180,7 @@ extern Size SimpleLruShmemSize(int nslots, int nlsns);
 extern int	SimpleLruAutotuneBuffers(int divisor, int max);
 extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 						  const char *subdir, int buffer_tranche_id,
-						  int bank_tranche_id, SyncRequestHandler sync_handler,
-						  bool long_segment_names);
+						  int bank_tranche_id, SyncRequestHandler sync_handler);
 extern int	SimpleLruZeroPage(SlruCtl ctl, int64 pageno);
 extern int	SimpleLruReadPage(SlruCtl ctl, int64 pageno, bool write_ok,
 							  TransactionId xid);
diff --git a/src/test/modules/test_slru/test_slru.c b/src/test/modules/test_slru/test_slru.c
index 3ea5ceb8552..cbd5173015a 100644
--- a/src/test/modules/test_slru/test_slru.c
+++ b/src/test/modules/test_slru/test_slru.c
@@ -213,11 +213,6 @@ test_slru_page_precedes_logically(int64 page1, int64 page2)
 static void
 test_slru_shmem_startup(void)
 {
-	/*
-	 * Short segments names are well tested elsewhere so in this test we are
-	 * focusing on long names.
-	 */
-	const bool	long_segment_names = true;
 	const char	slru_dir_name[] = "pg_test_slru";
 	int			test_tranche_id;
 	int			test_buffer_tranche_id;
@@ -241,8 +236,7 @@ test_slru_shmem_startup(void)
 	TestSlruCtl->PagePrecedes = test_slru_page_precedes_logically;
 	SimpleLruInit(TestSlruCtl, "TestSLRU",
 				  NUM_TEST_BUFFERS, 0, slru_dir_name,
-				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE,
-				  long_segment_names);
+				  test_buffer_tranche_id, test_tranche_id, SYNC_HANDLER_NONE);
 }
 
 void
-- 
2.47.1



^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2025-01-06 13:23 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-11-08 04:21 Re: [PATCH] Refactor SLRU to always use long file names Michael Paquier <[email protected]>
2024-11-12 14:15 ` Aleksander Alekseev <[email protected]>
2024-11-12 14:37   ` Aleksander Alekseev <[email protected]>
2024-11-14 01:11     ` Michael Paquier <[email protected]>
2024-11-14 11:04       ` Aleksander Alekseev <[email protected]>
2024-12-05 08:08         ` Aleksander Alekseev <[email protected]>
2025-01-06 13:23           ` Aleksander Alekseev <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox