public inbox for [email protected]
help / color / mirror / Atom feedRe: relfilenode statistics
25+ messages / 4 participants
[nested] [flat]
* Re: relfilenode statistics
@ 2024-11-29 15:52 Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Kirill Reshke @ 2024-11-29 15:52 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Robert Haas <[email protected]>; Michael Paquier <[email protected]>; [email protected]
On Fri, 29 Nov 2024 at 20:20, Bertrand Drouvot
<[email protected]> wrote:
> On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:
> > If we don’t have the relation OID when writing buffers out, can we
> > just store oid to buffertag mapping somewhere and use it?
>
> Do you mean add the relation OID into the BufferTag? While that could probably
> be done from a technical point of view (with probably non negligible amount
> of refactoring), I can see those cons:
Not exactly, what i had in mind was a separate hashmap into shared
memory, mapping buffertag<>oid.
> 2. Probably lot of refactoring
> 3. This new member would be there "only" for stats and reporting purpose as
> it is not needed at all for buffer related operations
To this design, your points 2&3 apply.
--
Best regards,
Kirill Reshke
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
@ 2024-12-03 10:31 ` Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2024-12-03 10:31 UTC (permalink / raw)
To: Kirill Reshke <[email protected]>; +Cc: Robert Haas <[email protected]>; Michael Paquier <[email protected]>; [email protected]
Hi,
On Fri, Nov 29, 2024 at 08:52:13PM +0500, Kirill Reshke wrote:
> On Fri, 29 Nov 2024 at 20:20, Bertrand Drouvot
> <[email protected]> wrote:
> > On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:
> > > If we don’t have the relation OID when writing buffers out, can we
> > > just store oid to buffertag mapping somewhere and use it?
> >
> > Do you mean add the relation OID into the BufferTag? While that could probably
> > be done from a technical point of view (with probably non negligible amount
> > of refactoring), I can see those cons:
>
> Not exactly, what i had in mind was a separate hashmap into shared
> memory, mapping buffertag<>oid.
I see.
> > 2. Probably lot of refactoring
> > 3. This new member would be there "only" for stats and reporting purpose as
> > it is not needed at all for buffer related operations
>
> To this design, your points 2&3 apply.
That said, it might also help for DropRelationBuffers() where we need to scan
the entire buffer pool (there is an optimization in place though). We could
imagine buffertag as key and the value could be the relation OID and each entry
would have next/prev pointers linking to other BufferTags with same OID.
That's probably much more refactoring (and more invasive) that the initial idea
in this thread but could lead to multiple pros though. I'm not very familar with
the "buffer" area of the code and would also need to study the performance impact
to maintain this new hash map.
Do you and/or others have any thoughts/ideas about it?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-01-03 16:18 ` Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2025-01-03 16:18 UTC (permalink / raw)
To: Kirill Reshke <[email protected]>; +Cc: Robert Haas <[email protected]>; Michael Paquier <[email protected]>; [email protected]
Hi,
On Tue, Dec 03, 2024 at 10:31:15AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Fri, Nov 29, 2024 at 08:52:13PM +0500, Kirill Reshke wrote:
> > On Fri, 29 Nov 2024 at 20:20, Bertrand Drouvot
> > <[email protected]> wrote:
> > > On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:
> > > > If we don’t have the relation OID when writing buffers out, can we
> > > > just store oid to buffertag mapping somewhere and use it?
> > >
> > > Do you mean add the relation OID into the BufferTag? While that could probably
> > > be done from a technical point of view (with probably non negligible amount
> > > of refactoring), I can see those cons:
> >
> > Not exactly, what i had in mind was a separate hashmap into shared
> > memory, mapping buffertag<>oid.
>
> I see.
>
> > > 2. Probably lot of refactoring
> > > 3. This new member would be there "only" for stats and reporting purpose as
> > > it is not needed at all for buffer related operations
> >
> > To this design, your points 2&3 apply.
>
> That said, it might also help for DropRelationBuffers() where we need to scan
> the entire buffer pool (there is an optimization in place though). We could
> imagine buffertag as key and the value could be the relation OID and each entry
> would have next/prev pointers linking to other BufferTags with same OID.
>
> That's probably much more refactoring (and more invasive) that the initial idea
> in this thread but could lead to multiple pros though. I'm not very familar with
> the "buffer" area of the code and would also need to study the performance impact
> to maintain this new hash map.
>
> Do you and/or others have any thoughts/ideas about it?
As mentioned by Andres in [1], relying on the relation OID would not work to
"recover" the stats because we don't have access to the relation oid during crash
recovery. So, I'm going to resume working on the "initial" idea (i.e having the
stats keyed by relfilenode).
[1]: https://www.postgresql.org/message-id/xvetwjsnkhx2gp6np225g2h64f4mfmg6oopkuaiivrpzd2futj%40pflk55su3...
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-03-13 09:00 ` Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Kirill Reshke @ 2025-03-13 09:00 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Robert Haas <[email protected]>; Michael Paquier <[email protected]>; [email protected]
On Fri, 3 Jan 2025 at 21:18, Bertrand Drouvot
<[email protected]> wrote:
> As mentioned by Andres in [1], relying on the relation OID would not work to
> "recover" the stats because we don't have access to the relation oid during crash
> recovery. So, I'm going to resume working on the "initial" idea (i.e having the
> stats keyed by relfilenode).
>
> [1]: https://www.postgresql.org/message-id/xvetwjsnkhx2gp6np225g2h64f4mfmg6oopkuaiivrpzd2futj%40pflk55su3...
>
Hmm. While it is true that catalog lookups cannot be performed during
crash recovery, is it really necessary to save and retrieve statistics
after a crash? Given that statistics are permitted to be outdated and
server crashes are anticipated to be infrequent, it looks loke losing
a few analysis runs due to server crashes is acceptable.
In any case, I am totally OK with the relfilenode-based method because
it is generally less restricted (to other postgresql parts e.g. wal-
replay ) and simpler.
Also, this patch needs a rebase;)
--
Best regards,
Kirill Reshke
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
@ 2025-09-16 06:44 ` Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Michael Paquier @ 2025-09-16 06:44 UTC (permalink / raw)
To: Kirill Reshke <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Robert Haas <[email protected]>; [email protected]
On Thu, Mar 13, 2025 at 02:00:52PM +0500, Kirill Reshke wrote:
> Hmm. While it is true that catalog lookups cannot be performed during
> crash recovery, is it really necessary to save and retrieve statistics
> after a crash?
Yes, losing stats on crash is a *very* annoying thing. Having no
stats for a relation means that autovacuum gives up entirely on
relations it has no stats of, skipping it entirely until they have
rebuilt and bloat would accumulate. Being able to recover these stats
from crash recovery is a cheap design, that would improve reliability
by a large degree.
> Given that statistics are permitted to be outdated and
> server crashes are anticipated to be infrequent, it looks loke losing
> a few analysis runs due to server crashes is acceptable.
> In any case, I am totally OK with the relfilenode-based method because
> it is generally less restricted (to other postgresql parts e.g. wal-
> replay ) and simpler.
The startup process is not connected to a database and has no access
to pg_class: the only thing we can know about are the on-disk files,
not their in-catalog OIDs. FWIW, I think that this patch would be a
huge step forward a more reliable stats system.
True that the patch needs a rebase. Bertrand has also mentioned that
some points needed more work.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
@ 2025-09-30 10:13 ` Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2025-09-30 10:13 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Tue, Sep 16, 2025 at 03:44:25PM +0900, Michael Paquier wrote:
> On Thu, Mar 13, 2025 at 02:00:52PM +0500, Kirill Reshke wrote:
> > Hmm. While it is true that catalog lookups cannot be performed during
> > crash recovery, is it really necessary to save and retrieve statistics
> > after a crash?
>
> Yes, losing stats on crash is a *very* annoying thing. Having no
> stats for a relation means that autovacuum gives up entirely on
> relations it has no stats of, skipping it entirely until they have
> rebuilt and bloat would accumulate. Being able to recover these stats
> from crash recovery is a cheap design, that would improve reliability
> by a large degree.
+1.
> The startup process is not connected to a database and has no access
> to pg_class: the only thing we can know about are the on-disk files,
> not their in-catalog OIDs. FWIW, I think that this patch would be a
> huge step forward a more reliable stats system.
>
> True that the patch needs a rebase. Bertrand has also mentioned that
> some points needed more work.
Right. I'll come back with a rebase, and a POC proposal on some stats so that
we could agree on the design. Also, it looks like that we have a consensus on
"sometimes I don't know the relation OID so I want to use the relfilenumber instead,
without changing the user experience" (see [1)).
As far Michael's concern about adding a new field in the hash key, as 8 bytes
is allocated for the object ID, then we can go with:
dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid and to the RelFileLocator's relNumber)
and avoid adding a new field in the key.
[1]: https://www.postgresql.org/message-id/CA%2BTgmoZ0u6ek_xxYJaGVBk0uEvH5txoYsCwbvxKWe-2xn_G_qg%40mail.g...
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-09-30 23:05 ` Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Michael Paquier @ 2025-09-30 23:05 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
On Tue, Sep 30, 2025 at 10:13:57AM +0000, Bertrand Drouvot wrote:
> As far Michael's concern about adding a new field in the hash key, as 8 bytes
> is allocated for the object ID, then we can go with:
>
> dboid (linked to RelFileLocator's dbOid)
> objoid (linked to RelFileLocator's spcOid and to the RelFileLocator's relNumber)
>
> and avoid adding a new field in the key.
RelFileNumber is a 4-byte Oid, so this mapping should be able to work.
Is there any reason why you would want an efficient filtering of the
contents of the shared hashtable based only on a relnumber or a
tablespace OID? Perhaps yes, like when a relfilenode is dropped into
a bin for an efficient removal from the shared hashtable so as we
don't need to do a seqscan, I just don't remember all the details of
the patch and if it could act as a bottleneck in some scenarios.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
@ 2025-10-01 14:33 ` Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2025-10-01 14:33 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Wed, Oct 01, 2025 at 08:05:16AM +0900, Michael Paquier wrote:
> On Tue, Sep 30, 2025 at 10:13:57AM +0000, Bertrand Drouvot wrote:
> > As far Michael's concern about adding a new field in the hash key, as 8 bytes
> > is allocated for the object ID, then we can go with:
> >
> > dboid (linked to RelFileLocator's dbOid)
> > objoid (linked to RelFileLocator's spcOid and to the RelFileLocator's relNumber)
> >
> > and avoid adding a new field in the key.
>
> RelFileNumber is a 4-byte Oid, so this mapping should be able to work.
Right.
> Is there any reason why you would want an efficient filtering of the
> contents of the shared hashtable based only on a relnumber or a
> tablespace OID?
Not that I can think of currently.
> Perhaps yes, like when a relfilenode is dropped into
> a bin for an efficient removal from the shared hashtable so as we
> don't need to do a seqscan, I just don't remember all the details of
> the patch and if it could act as a bottleneck in some scenarios.
I think the first step is to replace (i.e get rid) PGSTAT_KIND_RELATION by a brand
new PGSTAT_KIND_RELFILENODE and move all the existing stats that are currently
under the PGSTAT_KIND_RELATION to this new PGSTAT_KIND_RELFILENODE.
Let's do this by keeping the pg_stat_all_tables|indexes and pg_statio_all_tables|indexes
on top of the PGSTAT_KIND_RELFILENODE and ensure that a relation rewrite keeps
those stats. Once done, we could work from there to add new stats (add writes
counters and ensure that some counters (n_dead_tup and friends) are replicated).
Does that make sense to you?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-10-02 01:23 ` Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Michael Paquier @ 2025-10-02 01:23 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
On Wed, Oct 01, 2025 at 02:33:11PM +0000, Bertrand Drouvot wrote:
> I think the first step is to replace (i.e get rid) PGSTAT_KIND_RELATION by a brand
> new PGSTAT_KIND_RELFILENODE and move all the existing stats that are currently
> under the PGSTAT_KIND_RELATION to this new PGSTAT_KIND_RELFILENODE.
Likely so, yes.
> Let's do this by keeping the pg_stat_all_tables|indexes and pg_statio_all_tables|indexes
> on top of the PGSTAT_KIND_RELFILENODE and ensure that a relation rewrite keeps
> those stats. Once done, we could work from there to add new stats (add writes
> counters and ensure that some counters (n_dead_tup and friends) are replicated).
Do you think it is OK to define non-transactional pending stats as
being always a subset of the transactional stats? I don't quite see
if there would be a case to have stats that are only flushed in a
non-transactional path, while being discarded at the stats report done
at transaction commit time. This means that it may be possible to
structure things so as the pending non-transaction stats structure are
always part of the transactional bits, and that the other way around
is not possible. Perhaps that influences the design choices, at least
a bit.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
@ 2025-11-07 11:28 ` Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2025-11-07 11:28 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Thu, Oct 02, 2025 at 10:23:11AM +0900, Michael Paquier wrote:
> On Wed, Oct 01, 2025 at 02:33:11PM +0000, Bertrand Drouvot wrote:
> > I think the first step is to replace (i.e get rid) PGSTAT_KIND_RELATION by a brand
> > new PGSTAT_KIND_RELFILENODE and move all the existing stats that are currently
> > under the PGSTAT_KIND_RELATION to this new PGSTAT_KIND_RELFILENODE.
>
> Likely so, yes.
PFA the new implementation. It does not introduce a new PGSTAT_KIND_RELFILENODE,
instead it keys the PGSTAT_KIND_RELATION by relfile locator. We may want to
rename PGSTAT_KIND_RELATION to PGSTAT_KIND_RELFILENODE as a next step.
The patch is structured that way:
==== 0001
Add stats tests related to rewrite
While there are existing rewrite tests, the stats behavior during rewrites
doesn't have a good coverage. This patch adds some tests to record some stats
after different rewrite scenarios.
That way, we'll be able to test that the stats are still the ones we
expect after rewrites. Note that it generates a new stats_1.out (which is quite
large), so we may want to move those new tests to "isolation" instead.
==== 0002
Key PGSTAT_KIND_RELATION by relfile locator
This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
Instead of the relation oid, it now relies on:
- dboid (linked to RelFileLocator's dbOid)
- objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
that computes an objoid based on the RelFileLocator's spcOid and the RelFileLocator's
relNumber.
That will allow us to add new stats (add writes counters) and ensure that some
counters (n_dead_tup and friends) are replicated.
The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
RelationIdGetRelation() to get the relfilelocator based on the relation oid
and 2) handle the partitioned table case.
Please note that:
- when running pg_stat_have_stats('relation',...) we now need to be connected
to the database that hosts the relation. As pg_stat_have_stats() is not
documented publicly, then the changes done in 029_stats_restart.pl look
enough.
- this patch does not handle rewrites so some tests are failing. It's only
intent is to ease the review and should not be pushed without being
merged with the following patch that handles the rewrites.
- it can be used to test that stats are incremented correctly and that we're
able to retrieve them as long as rewrites are not involved.
==== 0003
handle relation statistics correctly during rewrites
Now that PGSTAT_KIND_RELATION is keyed by refilenode, we need to handle rewrites.
To do so, this patch:
- Adds PgStat_PendingRewrite, a new struct to track rewrite operations within
a transaction, storing the old locator, new locator, and original locator (for
rewrite chains). This allows stats to be copied from the original location to
the final location at commit time.
- Adds a new function, pgstat_mark_rewrite(), called when a table rewrite begins.
It records the rewrite operation in a local list and detects rewrite chains by
checking if the old_locator matches any existing new_locator, preserving the
chain's original_locator.
- Modifies pgstat_copy_relation_stats(), to accept RelFileLocators instead of
Relations, with a new increment parameter to accumulate stats (needed for rewrite
chains with DML between rewrites).
- Ensures that AtEOXact_PgStat_Relations(), AtPrepare_PgStat_Relations(),
pgstat_twophase_postcommit()/postabort() pgstat_drop_relation() handle the
PgStat_PendingRewrite list correctly.
Note that due to the new flush call in pgstat_twophase_postcommit() we can not
call GetCurrentTransactionStopTimestamp() in pgstat_relation_flush_cb(). So,
adding a check to handle this special case and call GetCurrentTimestamp() instead.
Note that we'd call GetCurrentTimestamp() only if there is a rewrite, so that
the GetCurrentTimestamp() extra cost should be negligible. Another solution
could be to trigger the flush from FinishPreparedTransaction() but that's not
worth the extra complexity.
The new pending_rewrites list is traversed in multiple places. The overhead
should be negligible in comparison to a rewrite and the list should not contain
a lot of rewrites in practice.
Another design that I tried was to copy the stats in pgstat_mark_rewrite() but
that lead to difficulties during abort, subtransactions. It looks to me that
the list approach proposed here makes more sense.
We could also imagine adding a function similar to pg_stat_have_stats() that
would take relfile locator as arguments. That could help validate that after
a rewrite the old stats are gone.
> Do you think it is OK to define non-transactional pending stats as
> being always a subset of the transactional stats? I don't quite see
> if there would be a case to have stats that are only flushed in a
> non-transactional path, while being discarded at the stats report done
> at transaction commit time. This means that it may be possible to
> structure things so as the pending non-transaction stats structure are
> always part of the transactional bits, and that the other way around
> is not possible. Perhaps that influences the design choices, at least
> a bit.
The proposed patch does not change anything it that regard.
It keeps the relation's behavior as it is.
This patch just ensure that a relation rewrite keeps its stats.
Adding new stats (add writes counters) and ensure that some counters
(n_dead_tup and friends) are replicated will be done once this one gets in.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-11-08 23:33 ` Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Michael Paquier @ 2025-11-08 23:33 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
On Fri, Nov 07, 2025 at 11:28:27AM +0000, Bertrand Drouvot wrote:
> While there are existing rewrite tests, the stats behavior during rewrites
> doesn't have a good coverage. This patch adds some tests to record some stats
> after different rewrite scenarios.
>
> That way, we'll be able to test that the stats are still the ones we
> expect after rewrites. Note that it generates a new stats_1.out (which is quite
> large), so we may want to move those new tests to "isolation" instead.
Looking at this part of the patch set for now, not looked at the rest
yet. This new stats_1.out is 2k lines long, introduced for the tests
related to rewrites as an effect of 2PC. It seems to me that a split
into a new stats_rewrite would be justified for this case, to reduce
the output duplication.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
@ 2025-11-10 08:53 ` Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Michael Paquier @ 2025-11-10 08:53 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
On Sun, Nov 09, 2025 at 08:33:54AM +0900, Michael Paquier wrote:
> Looking at this part of the patch set for now, not looked at the rest
> yet. This new stats_1.out is 2k lines long, introduced for the tests
> related to rewrites as an effect of 2PC. It seems to me that a split
> into a new stats_rewrite would be justified for this case, to reduce
> the output duplication.
The first patch had an issue with some of the tests checking for dead
tuples: if an autovacuum kicks in before querying the stats, we would
get a dead tuple number of 0. So I have expanded the tests a bit to
avoid autovacuum interactions, which should be enough to avoid noise,
did a split into a new file, which should also be fine because we
don't rely on a system-wide stats reset, then applied the result.
The patch is spending a great deal of effort on three fronts:
- making sure that the statistics are copied over after a relation
rewrite.
- making sure that we assign a "correct" object ID, assigning
the fields of RelFileLocator based on a relation ID. Mapped and
shared relations make the exercise a bit more difficult. It would be
nice to avoid this kind of duplication with other code paths that
assign a RelFileLocator.
- Partitioned tables, where we don't have a relfilenode but we need to
track statistics. The patch relies on the relation oid to assign a
key, as far as I've read.
Among the three points, the first one is the most invasive in the
patch, it seems, and do we actually want to keep the stats across
rewrites at all? The main reason of doing the relfilenode move
would be to rebuild these stats on a WAL-record basis because the
relfile locator is the only thing we know in the startup process, and
once rewritten the state of the data is different.
relation_needs_vacanalyze() then cares about three fields:
- Number of dead tuples, which would be 0 after a rewrite.
- ins_since_vacuum, which would be 0 after a rewrite.
- mod_since_analyze, for analyze, again 0.
I have not checked the recent autovacuum scheduling thread to see if
this set changes there.
Are these numbers worth the effort of copying over at the end? Was
this particular point discussed? I've seen this mentioned once here,
but I am wondering what are the arguments in favor of copying the
stats data versus not copying it across rewrites:
https://www.postgresql.org/message-id/20240607031736.7izmr2yirznvidka%40awork3.anarazel.de
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
@ 2025-11-12 17:03 ` Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2025-11-12 17:03 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Mon, Nov 10, 2025 at 05:53:45PM +0900, Michael Paquier wrote:
> On Sun, Nov 09, 2025 at 08:33:54AM +0900, Michael Paquier wrote:
> > Looking at this part of the patch set for now, not looked at the rest
> > yet. This new stats_1.out is 2k lines long, introduced for the tests
> > related to rewrites as an effect of 2PC. It seems to me that a split
> > into a new stats_rewrite would be justified for this case, to reduce
> > the output duplication.
>
> did a split into a new file, which should also be fine because we
> don't rely on a system-wide stats reset, then applied the result.
Thanks!
> The patch is spending a great deal of effort on three fronts:
> - making sure that the statistics are copied over after a relation
> rewrite.
Right, in 0003.
> - making sure that we assign a "correct" object ID, assigning
> the fields of RelFileLocator based on a relation ID. Mapped and
> shared relations make the exercise a bit more difficult. It would be
> nice to avoid this kind of duplication with other code paths that
> assign a RelFileLocator.
Are you referring to the new pgstat_reloid_to_relfilelocator() function?
If so, I'll try to avoid code duplication with other code paths as suggested.
> - Partitioned tables, where we don't have a relfilenode but we need to
> track statistics. The patch relies on the relation oid to assign a
> key, as far as I've read.
Right. It's not doing that much in this area. It's needed so that things like
"last_analyze" on a partitioned table is populated (see "Ensure only the
partitioned table is analyzed" in vacuum.sql).
> Among the three points, the first one is the most invasive in the
> patch, it seems, and do we actually want to keep the stats across
> rewrites at all?
Not doing so would mean that all stats related to a relation will be lost after
a rewrite. I think that would be a major regression as compared to the current
behavior.
> The main reason of doing the relfilenode move
> would be to rebuild these stats on a WAL-record basis because the
> relfile locator is the only thing we know in the startup process, and
> once rewritten the state of the data is different.
> relation_needs_vacanalyze() then cares about three fields:
> - Number of dead tuples, which would be 0 after a rewrite.
> - ins_since_vacuum, which would be 0 after a rewrite.
> - mod_since_analyze, for analyze, again 0.
>
> I have not checked the recent autovacuum scheduling thread to see if
> this set changes there.
>
> Are these numbers worth the effort of copying over at the end?
I think so because that would impact all the other relation's stats (not only
the ones linked to relation_needs_vacanalyze()).
> Was
> this particular point discussed? I've seen this mentioned once here,
> but I am wondering what are the arguments in favor of copying the
> stats data versus not copying it across rewrites:
> https://www.postgresql.org/message-id/20240607031736.7izmr2yirznvidka%40awork3.anarazel.de
In favor of copying, I would say:
- no regression as compared to the current behavior. That means, for example,
not breaking DBA's activities/decisions based on the pg_stat_all_tables fields
after a rewrite.
- a rewrite is not changing the number of dead tuples, ins_since_vacuum and
mod_since_analyze. So, if don't copy those, then we'd change the
relation_needs_vacanalyze() decision(s) as compared to the current one(s) for no
reasons (as a rewrite has no impact on those).
In favor of not copying, I would say make the code simpler.
I'm in favor of copying but open to different point of views.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-12-15 16:29 ` Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2025-12-15 16:29 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Wed, Nov 12, 2025 at 05:03:55PM +0000, Bertrand Drouvot wrote:
> In favor of not copying, I would say make the code simpler.
>
> I'm in favor of copying but open to different point of views.
PFA a mandatory rebase.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-12-15 17:48 ` Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-12-16 10:22 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 2 replies; 25+ messages in thread
From: Andres Freund @ 2025-12-15 17:48 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Michael Paquier <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On 2025-12-15 16:29:18 +0000, Bertrand Drouvot wrote:
> From 7908ba56cb8b6255b869af6be13077aa0315d5f1 Mon Sep 17 00:00:00 2001
> From: Bertrand Drouvot <[email protected]>
> Date: Wed, 1 Oct 2025 09:45:26 +0000
> Subject: [PATCH v8 1/2] Key PGSTAT_KIND_RELATION by relfile locator
>
> This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
> Instead of the relation oid, it now relies on:
>
> - dboid (linked to RelFileLocator's dbOid)
> - objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
> that computes an objoid based on the RelFileLocator's spcOid and the
> RelFileLocator's relNumber.
I think this needs to make more explicit that this works because the object ID
now is a uint64, and that both the inputs are 32 bits.
> That will allow us to add new stats (add writes counters) and ensure that some
> counters (n_dead_tup and friends) are replicated.
Yay.
> The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
> RelationIdGetRelation() to get the relfilelocator based on the relation oid
> and 2) handle the partitioned table case.
>
> Please note that:
>
> - when running pg_stat_have_stats('relation',...) we now need to be connected
> to the database that hosts the relation. As pg_stat_have_stats() is not
> documented publicly, then the changes done in 029_stats_restart.pl look
> enough.
That seems fine.
> - this patch does not handle rewrites so some tests are failing. It's only
> intent is to ease the review and should not be pushed without being
> merged with the following patch that handles the rewrites.
Makes sense.
> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 62035b7f9c3..a9b2b4e1033 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -961,8 +961,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
> * soon in cases where the failsafe prevented significant amounts of heap
> * vacuuming.
> */
> - pgstat_report_vacuum(RelationGetRelid(rel),
> - rel->rd_rel->relisshared,
> + pgstat_report_vacuum(rel->rd_locator,
> Max(vacrel->new_live_tuples, 0),
> vacrel->recently_dead_tuples +
> vacrel->missed_dead_tuples,
Why not pass in the Relation itself? Given that we do that already for
pgstat_report_analyze(), it seems like that'd be an improvement even
independent of this change?
> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> index 1bd3924e35e..563a3697690 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -2048,8 +2048,7 @@ do_autovacuum(void)
>
> /* Fetch reloptions and the pgstat entry for this table */
> relopts = extract_autovac_opts(tuple, pg_class_desc);
> - tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
> - relid);
> + tabentry = pgstat_fetch_stat_tabentry_ext(relid);
>
> /* Check if it needs vacuum or analyze */
> relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
I don't think this is good - now do_autovacuum() will do a separate syscache
lookup to fetch information the caller already has (due to the
pgstat_reloid_to_relfilelocator() in pgstat_fetch_stat_tabentry_ext()). That's
not too bad for things like viewing stats, but do_autovacuum() does this for
every table in a database...
> @@ -326,9 +363,26 @@ pgstat_report_analyze(Relation rel,
> ts = GetCurrentTimestamp();
> elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
>
> + if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
> + locator = rel->rd_locator;
> + else
> + {
> + /*
> + * Partitioned tables don't have storage, so construct a synthetic
> + * locator for statistics tracking. Use the relation OID as relNumber.
> + * No collision with regular relations is possible because relNumbers
> + * are also assigned from the pg_class OID space (see
> + * GetNewRelFileNumber()), making each value unique across the
> + * database regardless of spcOid.
> + */
I don't think this is true as stated. Two reasons:
1) This afaict guarantees that the relfilenode will not class with oids, but
it does *NOT* guarantee that it does not clash with other relfilenodes
2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
creating a new relfilenode for an existing relation:
* If the relfilenumber will also be used as the relation's OID, pass the
* opened pg_class catalog, and this routine will guarantee that the result
* is also an unused OID within pg_class. If the result is to be used only
* as a relfilenumber for an existing relation, pass NULL for pg_class.
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
@ 2025-12-16 07:33 ` Michael Paquier <[email protected]>
2025-12-16 10:24 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-16 15:39 ` Re: relfilenode statistics Andres Freund <[email protected]>
1 sibling, 2 replies; 25+ messages in thread
From: Michael Paquier @ 2025-12-16 07:33 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
On Mon, Dec 15, 2025 at 12:48:25PM -0500, Andres Freund wrote:
> I don't think this is true as stated. Two reasons:
>
> 1) This afaict guarantees that the relfilenode will not clash with oids, but
> it does *NOT* guarantee that it does not clash with other relfilenodes
>
> 2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
> creating a new relfilenode for an existing relation:
> * If the relfilenumber will also be used as the relation's OID, pass the
> * opened pg_class catalog, and this routine will guarantee that the result
> * is also an unused OID within pg_class. If the result is to be used only
> * as a relfilenumber for an existing relation, pass NULL for pg_class.
FWIW, I am also still troubled by the part of the proposed patch set
where we are trying to hide the idea of a partitioned table has a
relfilenode set by using its relid instead in the key for the data.
This leads to a huge amount of complexity in the patch, mainly to
store data for autovacuum that we do not need at the end:
- autovacuum discards partitioned tables in do_autovacuum(), so the
stats related to partitioned tables that we need to select the
relations does not matter.
- manual vacuums may include partitioned tables to extract its
partitions, vacuum_rel() at the end discarding them. Well, stats
don't matter anyway.
We only need to attach three fields to let autovacuum know if a
relation needs to run or not: dead_tuples, ins_since_vacuum,
mod_since_analyze. Most the fields of PgStat_StatTabEntry make sense
only for tables, few are required by indexes for pg_stat_all_indexes.
Some fields actually make sense because they refer to on-disk files,
mostly for pg_statio_all_tables (blocks_fetched, blocks_hit).
Hence, why don't we split PgStat_StatTabEntry into three things from
the start, even if it means to duplicate some of them? Say:
- Table fields: includes [auto]vacuum/analyze data, block fields,
fields of pg_stat_all_tables.
- Index fields: no need for the [auto]vacuum/analyze time and counts,
block fields, pg_stat_all_indexes fields.
- Relfilenode fields: dead_tuples, ins_since_vacuum and
mod_since_analyze. Does not apply to partitioned tables and indexes,
only applies to tables. Provides a clean split, embrace the fact that
these are the only three fields we need to worry about during
recovery.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
@ 2025-12-16 10:24 ` Bertrand Drouvot <[email protected]>
1 sibling, 0 replies; 25+ messages in thread
From: Bertrand Drouvot @ 2025-12-16 10:24 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Tue, Dec 16, 2025 at 04:33:17PM +0900, Michael Paquier wrote:
>
> Hence, why don't we split PgStat_StatTabEntry into three things from
> the start, even if it means to duplicate some of them? Say:
> - Table fields: includes [auto]vacuum/analyze data, block fields,
> fields of pg_stat_all_tables.
> - Index fields: no need for the [auto]vacuum/analyze time and counts,
> block fields, pg_stat_all_indexes fields.
> - Relfilenode fields: dead_tuples, ins_since_vacuum and
> mod_since_analyze. Does not apply to partitioned tables and indexes,
> only applies to tables. Provides a clean split, embrace the fact that
> these are the only three fields we need to worry about during
> recovery.
I think that the PSEUDO_PARTITION_TABLE_SPCOID just proposed in [1] approach
is simple enough and solves the collision issue raised by Andres.
I think I prefer the unified structure as proposed in the patch (though we
may want to split tables and indexes later on). The reason is that it's
easier to expose publicly.
Indeed, at the very beginning of this thread, in v1, I created a new
PGSTAT_KIND_RELFILENODE and had to make it coexist with PGSTAT_KIND_RELATION and
that led to discussion on how we should expose them ([2]).
[1]: https://www.postgresql.org/message-id/aUEyzoOJtrCLAEeT%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://www.postgresql.org/message-id/CA%2BTgmoZtwT6h%3DnyuQ1J9GNSrRyhf0fv7Ai6FzO%3DbH0C9Bf6tew%40ma...
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
@ 2025-12-16 15:39 ` Andres Freund <[email protected]>
2026-03-09 07:43 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
1 sibling, 1 reply; 25+ messages in thread
From: Andres Freund @ 2025-12-16 15:39 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On 2025-12-16 16:33:17 +0900, Michael Paquier wrote:
> On Mon, Dec 15, 2025 at 12:48:25PM -0500, Andres Freund wrote:
> > I don't think this is true as stated. Two reasons:
> >
> > 1) This afaict guarantees that the relfilenode will not clash with oids, but
> > it does *NOT* guarantee that it does not clash with other relfilenodes
> >
> > 2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
> > creating a new relfilenode for an existing relation:
> > * If the relfilenumber will also be used as the relation's OID, pass the
> > * opened pg_class catalog, and this routine will guarantee that the result
> > * is also an unused OID within pg_class. If the result is to be used only
> > * as a relfilenumber for an existing relation, pass NULL for pg_class.
>
> FWIW, I am also still troubled by the part of the proposed patch set
> where we are trying to hide the idea of a partitioned table has a
> relfilenode set by using its relid instead in the key for the data.
> This leads to a huge amount of complexity in the patch, mainly to
> store data for autovacuum that we do not need at the end:
> - autovacuum discards partitioned tables in do_autovacuum(), so the
> stats related to partitioned tables that we need to select the
> relations does not matter.
I feel like that's an implementation wart that we ought to fix. It's not
infrequently a problem that we don't automatically analyze partitioned
tables. Weren't there even a couple threads on that on the list in the last
weeks?
> - manual vacuums may include partitioned tables to extract its
> partitions, vacuum_rel() at the end discarding them. Well, stats
> don't matter anyway.
>
> We only need to attach three fields to let autovacuum know if a
> relation needs to run or not: dead_tuples, ins_since_vacuum,
> mod_since_analyze.
That may be true for autovacuum today, but I don't see any reason for
live_tuples, tuples_inserted etc to be inaccurate after a failover.
> Most the fields of PgStat_StatTabEntry make sense
> only for tables, few are required by indexes for pg_stat_all_indexes.
> Some fields actually make sense because they refer to on-disk files,
> mostly for pg_statio_all_tables (blocks_fetched, blocks_hit).
>
> Hence, why don't we split PgStat_StatTabEntry into three things from
> the start, even if it means to duplicate some of them? Say:
> - Table fields: includes [auto]vacuum/analyze data, block fields,
> fields of pg_stat_all_tables.
What do you mean with "block fields"? pg_statio_all_tables? If so, what's the
point of including them here, rather than in the relfilenode fields?
> - Index fields: no need for the [auto]vacuum/analyze time and counts,
> block fields, pg_stat_all_indexes fields.
I think we actually should populate the [auto]vac fields for indexes, right
now it's impossible to figure out from stats whether indexes are frequently
scanned as part of vacuum or not.
> - Relfilenode fields: dead_tuples, ins_since_vacuum and
> mod_since_analyze. Does not apply to partitioned tables and indexes,
> only applies to tables. Provides a clean split, embrace the fact that
> these are the only three fields we need to worry about during
> recovery.
I think we really ought to populate not just these during recovery, but also
at least n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup.
I don't understand why we would want to only populate these three fields?
I'm not against splitting the index fields off, but it seems pretty orthogonal
to what we're discussing here. If we were to split of index stats into a
separate stat, why wouldn't we keep the statio fields in the relfilenode
stats, since they're obviously intimately tied to that?
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-12-16 15:39 ` Re: relfilenode statistics Andres Freund <[email protected]>
@ 2026-03-09 07:43 ` Bertrand Drouvot <[email protected]>
2026-03-18 03:57 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2026-03-09 07:43 UTC (permalink / raw)
To: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Mon, Feb 23, 2026 at 06:39:03AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Jan 13, 2026 at 09:29:02AM +0000, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Tue, Dec 16, 2025 at 10:39:15AM -0500, Andres Freund wrote:
> > > On 2025-12-16 16:33:17 +0900, Michael Paquier wrote:
> >
> > Andres, Michael, let me try to sum up my understanding of the current state
> > and see how we could now move forward.
> >
> > First of all, I understand that you both think that the patch outcome will be
> > useful to have. The current debate is about the design, the current status is:
> >
> > - Andres raised specific technical/implementation concerns and I've proposed
> > solutions in [1]. It also looks like Andres supports the overall design approach.
> > - Michael is not really ok with the current design approach.
> >
> > That means, that with the current design in place, Michael would probably not
> > commit it (even after review(s)).
> >
> > Given that I'm also in favor of the current proposed design, this raises the
> > questions:
> >
> > - Andres, would you commit such a patch (after review iteration(s) of course)?
> > - Michael, if Andres is ok with the above, would you still offer your help for the
> > review part (even if the design is not what you "prefer"/"like")?
> >
> > [1]: https://postgr.es/m/aUEyzoOJtrCLAEeT%40ip-10-97-1-34.eu-west-3.compute.internal
>
> PFA, tiny rebase due to 9842e8aca09.
PFA, a new mandatory rebase.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-12-16 15:39 ` Re: relfilenode statistics Andres Freund <[email protected]>
2026-03-09 07:43 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2026-03-18 03:57 ` Bertrand Drouvot <[email protected]>
2026-03-25 03:25 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2026-03-18 03:57 UTC (permalink / raw)
To: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Mon, Mar 09, 2026 at 07:43:43AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Mon, Feb 23, 2026 at 06:39:03AM +0000, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Tue, Jan 13, 2026 at 09:29:02AM +0000, Bertrand Drouvot wrote:
> > > Hi,
> > >
> > > On Tue, Dec 16, 2025 at 10:39:15AM -0500, Andres Freund wrote:
> > > > On 2025-12-16 16:33:17 +0900, Michael Paquier wrote:
> > >
> > > Andres, Michael, let me try to sum up my understanding of the current state
> > > and see how we could now move forward.
> > >
> > > First of all, I understand that you both think that the patch outcome will be
> > > useful to have. The current debate is about the design, the current status is:
> > >
> > > - Andres raised specific technical/implementation concerns and I've proposed
> > > solutions in [1]. It also looks like Andres supports the overall design approach.
> > > - Michael is not really ok with the current design approach.
> > >
> > > That means, that with the current design in place, Michael would probably not
> > > commit it (even after review(s)).
> > >
> > > Given that I'm also in favor of the current proposed design, this raises the
> > > questions:
> > >
> > > - Andres, would you commit such a patch (after review iteration(s) of course)?
> > > - Michael, if Andres is ok with the above, would you still offer your help for the
> > > review part (even if the design is not what you "prefer"/"like")?
> > >
> > > [1]: https://postgr.es/m/aUEyzoOJtrCLAEeT%40ip-10-97-1-34.eu-west-3.compute.internal
> >
> > PFA, tiny rebase due to 9842e8aca09.
>
> PFA, a new mandatory rebase.
PFA, new rebase due to fba4233c832.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-12-16 15:39 ` Re: relfilenode statistics Andres Freund <[email protected]>
2026-03-09 07:43 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2026-03-18 03:57 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2026-03-25 03:25 ` Bertrand Drouvot <[email protected]>
2026-03-31 10:45 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2026-03-25 03:25 UTC (permalink / raw)
To: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Wed, Mar 18, 2026 at 03:57:48AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> PFA, new rebase due to fba4233c832.
Another rebase, due to 2102ebb1953 this time.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-12-16 15:39 ` Re: relfilenode statistics Andres Freund <[email protected]>
2026-03-09 07:43 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2026-03-18 03:57 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2026-03-25 03:25 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2026-03-31 10:45 ` Bertrand Drouvot <[email protected]>
2026-05-18 16:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2026-03-31 10:45 UTC (permalink / raw)
To: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Wed, Mar 25, 2026 at 03:25:07AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Wed, Mar 18, 2026 at 03:57:48AM +0000, Bertrand Drouvot wrote:
> > Hi,
> >
> > PFA, new rebase due to fba4233c832.
>
> Another rebase, due to 2102ebb1953 this time.
It's more than probably too late for v19 but it needs another rebase due to
d7965d65fc5b this time.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 07:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-12-16 15:39 ` Re: relfilenode statistics Andres Freund <[email protected]>
2026-03-09 07:43 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2026-03-18 03:57 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2026-03-25 03:25 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2026-03-31 10:45 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2026-05-18 16:28 ` Bertrand Drouvot <[email protected]>
0 siblings, 0 replies; 25+ messages in thread
From: Bertrand Drouvot @ 2026-05-18 16:28 UTC (permalink / raw)
To: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Tue, Mar 31, 2026 at 10:45:50AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Wed, Mar 25, 2026 at 03:25:07AM +0000, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Wed, Mar 18, 2026 at 03:57:48AM +0000, Bertrand Drouvot wrote:
> > > Hi,
> > >
> > > PFA, new rebase due to fba4233c832.
> >
> > Another rebase, due to 2102ebb1953 this time.
>
> It's more than probably too late for v19 but it needs another rebase due to
> d7965d65fc5b this time.
PFA v16, a rebase due to 775fe51daae, 71ff232a5bc and c0b53ec0630.
While at it, let's sum up the current state:
Regarding Michael's question [1] about whether we should copy stats across
rewrites: I still believe we should. Not doing so would produce user-visible
regressions. The complexity is contained in patch 0002 and the approach is
tested (including 2PC, subtransaction abort, and rewrite chains).
Regarding Michael's suggestion [2] to split PgStat_StatTabEntry into three kinds
(table/index/relfilenode) from the start: I think this patch is the right
incremental step that doesn't preclude a future split. Here's my reasoning:
As Andres pointed out [3], we'd want to populate more than just
dead_tuples/ins_since_vacuum/mod_since_analyze during recovery. The right
boundary for a split isn't clear yet until we actually implement WAL-replay-based
stat population.
I think that splitting now would be a much larger change with the risk of drawing
the boundaries wrong.
The current approach (key PGSTAT_KIND_RELATION by locator, keep the unified structure)
is a contained change that unblocks future work. Once we have WAL replay populating
stats, we'll have a much better understanding of what a split should look like,
if one is still needed.
I think we should do this incremental step first, then split later if/when the
need becomes clearer.
I believe we have consensus on the core approach ("use the relfilenumber instead
of the relation OID, without changing the user experience"). The implementation
addresses all the technical concerns raised so far (no new hash key field,
PSEUDO_PARTITION_TABLE_SPCOID for partitioned tables, pgstat_fetch_stat_tabentry_by_locator()
to avoid extra syscache lookups in do_autovacuum()).
Andres, would you be willing to drive this toward commit once we've iterated
on any remaining review feedback?
Michael, I understand this isn't the design you'd prefer. Would you be open to
reviewing the implementation nonetheless, or do you have a hard objection that
would block this path?
I'm happy to address any further concerns.
[1]: https://postgr.es/m/aRGoGcOdutTHQfpn%40paquier.xyz
[2]: https://postgr.es/m/aUELPdhdcyzTM_8K%40paquier.xyz
[3]: https://postgr.es/m/zferux2jlbhqymubzhpubfrkjzhzxzguq4eprtycojtif5vbqh%402t7cu2teyqmi
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
@ 2025-12-16 10:22 ` Bertrand Drouvot <[email protected]>
2025-12-17 07:30 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
1 sibling, 1 reply; 25+ messages in thread
From: Bertrand Drouvot @ 2025-12-16 10:22 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Michael Paquier <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Mon, Dec 15, 2025 at 12:48:25PM -0500, Andres Freund wrote:
> On 2025-12-15 16:29:18 +0000, Bertrand Drouvot wrote:
> > From 7908ba56cb8b6255b869af6be13077aa0315d5f1 Mon Sep 17 00:00:00 2001
>
> I think this needs to make more explicit that this works because the object ID
> now is a uint64, and that both the inputs are 32 bits.
Yeah, it's now added in the commit message (mentioning b14e9ce7d55c).
> > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> > index 62035b7f9c3..a9b2b4e1033 100644
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -961,8 +961,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
> > * soon in cases where the failsafe prevented significant amounts of heap
> > * vacuuming.
> > */
> > - pgstat_report_vacuum(RelationGetRelid(rel),
> > - rel->rd_rel->relisshared,
> > + pgstat_report_vacuum(rel->rd_locator,
> > Max(vacrel->new_live_tuples, 0),
> > vacrel->recently_dead_tuples +
> > vacrel->missed_dead_tuples,
>
> Why not pass in the Relation itself? Given that we do that already for
> pgstat_report_analyze(), it seems like that'd be an improvement even
> independent of this change?
Makes sense, done in [1].
> > diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> > index 1bd3924e35e..563a3697690 100644
> > --- a/src/backend/postmaster/autovacuum.c
> > +++ b/src/backend/postmaster/autovacuum.c
> > @@ -2048,8 +2048,7 @@ do_autovacuum(void)
> >
> > /* Fetch reloptions and the pgstat entry for this table */
> > relopts = extract_autovac_opts(tuple, pg_class_desc);
> > - tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
> > - relid);
> > + tabentry = pgstat_fetch_stat_tabentry_ext(relid);
> >
> > /* Check if it needs vacuum or analyze */
> > relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
>
> I don't think this is good - now do_autovacuum() will do a separate syscache
> lookup to fetch information the caller already has (due to the
> pgstat_reloid_to_relfilelocator() in pgstat_fetch_stat_tabentry_ext()). That's
> not too bad for things like viewing stats, but do_autovacuum() does this for
> every table in a database...
Good point. In the attached I added pgstat_fetch_stat_tabentry_by_locator().
It's called directly in do_autovacuum() and also in pgstat_fetch_stat_tabentry_ext().
I did not check if there are other places where we can call pgstat_fetch_stat_tabentry_by_locator()
directly. I want first to validate this idea makes sense, does it?
> I don't think this is true as stated. Two reasons:
>
> 1) This afaict guarantees that the relfilenode will not class with oids, but
> it does *NOT* guarantee that it does not clash with other relfilenodes
> 2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
> creating a new relfilenode for an existing relation:
> * If the relfilenumber will also be used as the relation's OID, pass the
> * opened pg_class catalog, and this routine will guarantee that the result
> * is also an unused OID within pg_class. If the result is to be used only
> * as a relfilenumber for an existing relation, pass NULL for pg_class.
Oh right, in case of OID wraparound. In the attached I added a new
"
#define PSEUDO_PARTITION_TABLE_SPCOID 1665
"
to ensure uniqueness then.
[1]: https://www.postgresql.org/message-id/flat/aUEA6UZZkDCQFgSA%40ip-10-97-1-34.eu-west-3.compute.intern...
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
* Re: relfilenode statistics
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Re: relfilenode statistics Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-09-30 10:13 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-10-01 14:33 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-07 11:28 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-10 08:53 ` Re: relfilenode statistics Michael Paquier <[email protected]>
2025-11-12 17:03 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Re: relfilenode statistics Andres Freund <[email protected]>
2025-12-16 10:22 ` Re: relfilenode statistics Bertrand Drouvot <[email protected]>
@ 2025-12-17 07:30 ` Bertrand Drouvot <[email protected]>
0 siblings, 0 replies; 25+ messages in thread
From: Bertrand Drouvot @ 2025-12-17 07:30 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Michael Paquier <[email protected]>; Kirill Reshke <[email protected]>; Robert Haas <[email protected]>; [email protected]
Hi,
On Tue, Dec 16, 2025 at 10:22:06AM +0000, Bertrand Drouvot wrote:
> In the attached
PFA a mandatory rebase due to f4e797171ea.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 25+ messages in thread
end of thread, other threads:[~2026-05-18 16:28 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-11-29 15:52 Re: relfilenode statistics Kirill Reshke <[email protected]>
2024-12-03 10:31 ` Bertrand Drouvot <[email protected]>
2025-01-03 16:18 ` Bertrand Drouvot <[email protected]>
2025-03-13 09:00 ` Kirill Reshke <[email protected]>
2025-09-16 06:44 ` Michael Paquier <[email protected]>
2025-09-30 10:13 ` Bertrand Drouvot <[email protected]>
2025-09-30 23:05 ` Michael Paquier <[email protected]>
2025-10-01 14:33 ` Bertrand Drouvot <[email protected]>
2025-10-02 01:23 ` Michael Paquier <[email protected]>
2025-11-07 11:28 ` Bertrand Drouvot <[email protected]>
2025-11-08 23:33 ` Michael Paquier <[email protected]>
2025-11-10 08:53 ` Michael Paquier <[email protected]>
2025-11-12 17:03 ` Bertrand Drouvot <[email protected]>
2025-12-15 16:29 ` Bertrand Drouvot <[email protected]>
2025-12-15 17:48 ` Andres Freund <[email protected]>
2025-12-16 07:33 ` Michael Paquier <[email protected]>
2025-12-16 10:24 ` Bertrand Drouvot <[email protected]>
2025-12-16 15:39 ` Andres Freund <[email protected]>
2026-03-09 07:43 ` Bertrand Drouvot <[email protected]>
2026-03-18 03:57 ` Bertrand Drouvot <[email protected]>
2026-03-25 03:25 ` Bertrand Drouvot <[email protected]>
2026-03-31 10:45 ` Bertrand Drouvot <[email protected]>
2026-05-18 16:28 ` Bertrand Drouvot <[email protected]>
2025-12-16 10:22 ` Bertrand Drouvot <[email protected]>
2025-12-17 07:30 ` Bertrand Drouvot <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox