Re: relfilenode statistics - Michael Paquier

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Michael Paquier <[email protected]>
To: Andres Freund <[email protected]>
Cc: Bertrand Drouvot <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: [email protected]
Subject: Re: relfilenode statistics
Date: Tue, 16 Dec 2025 16:33:17 +0900
Message-ID: <[email protected]> (raw)
In-Reply-To: <ezbfcpjtfh3vtm667aegyrptcdsvinbuxw6y5p6j3e5fbffrl5@i7v6cjapz3yv>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<ezbfcpjtfh3vtm667aegyrptcdsvinbuxw6y5p6j3e5fbffrl5@i7v6cjapz3yv>

On Mon, Dec 15, 2025 at 12:48:25PM -0500, Andres Freund wrote:
> I don't think this is true as stated. Two reasons:
> 
> 1) This afaict guarantees that the relfilenode will not clash with oids, but
>    it does *NOT* guarantee that it does not clash with other relfilenodes
> 
> 2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
>    creating a new relfilenode for an existing relation:
>  * If the relfilenumber will also be used as the relation's OID, pass the
>  * opened pg_class catalog, and this routine will guarantee that the result
>  * is also an unused OID within pg_class.  If the result is to be used only
>  * as a relfilenumber for an existing relation, pass NULL for pg_class.

FWIW, I am also still troubled by the part of the proposed patch set
where we are trying to hide the idea of a partitioned table has a
relfilenode set by using its relid instead in the key for the data.
This leads to a huge amount of complexity in the patch, mainly to
store data for autovacuum that we do not need at the end:
- autovacuum discards partitioned tables in do_autovacuum(), so the
stats related to partitioned tables that we need to select the
relations does not matter.
- manual vacuums may include partitioned tables to extract its
partitions, vacuum_rel() at the end discarding them.  Well, stats
don't matter anyway.

We only need to attach three fields to let autovacuum know if a
relation needs to run or not: dead_tuples, ins_since_vacuum,
mod_since_analyze.  Most the fields of PgStat_StatTabEntry make sense
only for tables, few are required by indexes for pg_stat_all_indexes.
Some fields actually make sense because they refer to on-disk files,
mostly for pg_statio_all_tables (blocks_fetched, blocks_hit).

Hence, why don't we split PgStat_StatTabEntry into three things from
the start, even if it means to duplicate some of them?  Say:
- Table fields: includes [auto]vacuum/analyze data, block fields,
fields of pg_stat_all_tables.
- Index fields: no need for the [auto]vacuum/analyze time and counts,
block fields, pg_stat_all_indexes fields.
- Relfilenode fields: dead_tuples, ins_since_vacuum and
mod_since_analyze.  Does not apply to partitioned tables and indexes,
only applies to tables.  Provides a clean split, embrace the fact that
these are the only three fields we need to worry about during
recovery.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: relfilenode statistics
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox