Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sgXRW-003ghw-Hy for pgsql-hackers@arkaria.postgresql.org; Tue, 20 Aug 2024 22:35:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sgXRU-004E42-Js for pgsql-hackers@arkaria.postgresql.org; Tue, 20 Aug 2024 22:35:09 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sgXRT-004E3u-TS for pgsql-hackers@lists.postgresql.org; Tue, 20 Aug 2024 22:35:08 +0000 Received: from mail.postgrespro.ru ([93.174.131.139]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sgXRP-000gJ4-Qn for pgsql-hackers@postgresql.org; Tue, 20 Aug 2024 22:35:07 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=postgrespro.ru; s=mx2023; t=1724193303; bh=62b9b+klxdBVRDFOvovQ4wTsJlOHd6mKZkLJIyP9hFc=; h=Message-ID:Date:User-Agent:Subject:To:Cc:References:From: In-Reply-To:From; b=Y4Sb9P4dkeVnkJehTfEASBPEW9FzmI95LaJKdivHwBCYb/ph4flW/RVz3wQAu8d+Y VQ6uxRZ3k+3rH6uo53kWzirv3odoei2K2Mz7S4aWT0XXoFfsZTuTuF/MVe6IbqvRrk cP10vGnZUyFyt/WPhVa+cniuMwvoLLHF/782CfuxykvkUweZpvzoUaNNfyUf35KQxj u5oERq9uVyc7b83OVsLqpZw0dwR7muQWA6J33KubjXht3k6rQKi4RQJX0JHCRAH9vR IYiWyRhUGI9kfkKCESv+ghsRhbqLWlnQqUJAK09HiSAu8++hQMtquB/ox0S+OMymdK pNuP/SIwU/5bA== Received: from [172.20.10.4] (unknown [89.113.147.14]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: a.rybakina@postgrespro.ru) by mail.postgrespro.ru (Postfix/587) with ESMTPSA id 7364760144; Wed, 21 Aug 2024 01:35:02 +0300 (MSK) Content-Type: multipart/alternative; boundary="------------ShdrI09esBFmnfZW10UeCsPk" Message-ID: Date: Wed, 21 Aug 2024 01:35:00 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Vacuum statistics To: jian he Cc: Ilia Evdokimov , Andrei Zubkov , Alena Rybakina , pgsql-hackers , a.lepikhov@postgrespro.ru References: <53c47c2d-72a5-44f2-900c-9973b2af1808@tantorlabs.com> <4a902cea-54fb-41b5-b208-b84731a5f577@postgrespro.ru> Content-Language: en-US From: Alena Rybakina In-Reply-To: X-KSMG-AntiPhishing: NotDetected, bases: 2024/08/20 21:41:00 X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.1.0.7854, bases: 2024/08/20 21:31:00 #26390795 X-KSMG-AntiVirus-Status: NotDetected, skipped X-KSMG-LinksScanning: not scanned, disabled by settings X-KSMG-Message-Action: skipped X-KSMG-Rule-ID: 1 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------ShdrI09esBFmnfZW10UeCsPk Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi! Thank you very much for your review! Sorry for my late response I was overwhelmed by tasks. On 16.08.2024 14:12, jian he wrote: > On Thu, Aug 15, 2024 at 4:49 PM Alena Rybakina > wrote: >> Hi! > > I've applied all the v5 patches. > 0002 and 0003 have white space errors. > > + > + Number of times blocks of this index were already found > + in the buffer cache by vacuum operations, so that a read was > not necessary > + (this only includes hits in the > + &project; buffer cache, not the operating system's file system cache) > + > > + Number of times blocks of this table were already found > + in the buffer cache by vacuum operations, so that a read was > not necessary > + (this only includes hits in the > + &project; buffer cache, not the operating system's file system cache) > + > > "&project;" > represents a sgml file placeholder name as "project" and puts all the > content of "project.sgml" to system-views.sgml. > but you don't have "project.sgml". you may check > doc/src/sgml/filelist.sgml or doc/src/sgml/ref/allfiles.sgml > for usage of "&place_holder;". > so you can change it to "project", otherwise doc cannot build. > > > src/backend/commands/dbcommands.c > we have: > /* > * If built with appropriate switch, whine when regression-testing > * conventions for database names are violated. But don't complain during > * initdb. > */ > #ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS > if (IsUnderPostmaster && strstr(dbname, "regression") == NULL) > elog(WARNING, "databases created by regression test cases > should have names including \"regression\""); > #endif > so in src/test/regress/sql/vacuum_tables_and_db_statistics.sql you > need to change dbname: > CREATE DATABASE statistic_vacuum_database; > CREATE DATABASE statistic_vacuum_database1; > > > + > + The view pg_stat_vacuum_indexes will contain > + one row for each index in the current database (including TOAST > + table indexes), showing statistics about vacuuming that specific index. > + > TOAST should > TOAST > > > > + /* Build a tuple descriptor for our result type */ > + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) > + elog(ERROR, "return type must be a row type"); > maybe change to > ereport(ERROR, > (errcode(ERRCODE_DATATYPE_MISMATCH), > errmsg("return type must be a row type"))); > Later I found out "InitMaterializedSRF(fcinfo, 0);" already did all > the work. much of the code can be gotten rid of. > please check attached. I agree with your suggestions for improving the code. I will add this in the next version of the patch. > > #define EXTVACHEAPSTAT_COLUMNS 27 > #define EXTVACIDXSTAT_COLUMNS 19 > #define EXTVACDBSTAT_COLUMNS 15 > #define EXTVACSTAT_COLUMNS Max(EXTVACHEAPSTAT_COLUMNS, EXTVACIDXSTAT_COLUMNS) > > static Oid CurrentDatabaseId = InvalidOid; > we already defined MyDatabaseId in src/include/miscadmin.h, > Why do we need "static Oid CurrentDatabaseId = InvalidOid;"? > also src/backend/utils/adt/pgstatfuncs.c already included "miscadmin.h". Hmm, Tom Lane added "misc admin.h", or I didn't notice something. Could you point this out, please? We used the Current Database Id to output statistics on tables from another database, so we need to replace it with a different default value. But I want to rewrite this patch to display table statistics only for the current database, that is, this part will be removed in the future. In my opinion, it would be more correct. > the following code one function has 2 return statements? > ------------------------------------------------------------------------ > /* > * Get the vacuum statistics for the heap tables. > */ > Datum > pg_stat_vacuum_tables(PG_FUNCTION_ARGS) > { > return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_HEAP, EXTVACHEAPSTAT_COLUMNS); > > PG_RETURN_NULL(); > } > > /* > * Get the vacuum statistics for the indexes. > */ > Datum > pg_stat_vacuum_indexes(PG_FUNCTION_ARGS) > { > return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_INDEX, EXTVACIDXSTAT_COLUMNS); > > PG_RETURN_NULL(); > } > > /* > * Get the vacuum statistics for the database. > */ > Datum > pg_stat_vacuum_database(PG_FUNCTION_ARGS) > { > return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_DB, EXTVACDBSTAT_COLUMNS); > > PG_RETURN_NULL(); > } You are right - the second return is superfluous. I'll fix it. > ------------------------------------------------------------------------ > in pg_stats_vacuum: > if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP) > { > Oid relid = PG_GETARG_OID(1); > > /* Load table statistics for specified database. */ > if (OidIsValid(relid)) > { > tabentry = fetch_dbstat_tabentry(dbid, relid); > if (tabentry == NULL || tabentry->vacuum_ext.type != type) > /* Table don't exists or isn't an heap relation. */ > PG_RETURN_NULL(); > > tuplestore_put_for_relation(relid, tupstore, tupdesc, > tabentry, ncolumns); > } > else > { > ... > } > } > I don't understand the ELSE branch. the IF branch means the input > dboid, heap/index oid is correct. > the ELSE branch means table reloid is invalid = 0. > I am not sure 100% what the ELSE Branch means. > Also there are no comments explaining why. > experiments seem to show that when reloid is 0, it will print out all > the vacuum statistics > for all the tables in the current database. If so, then only super > users can call pg_stats_vacuum? > but the table owner should be able to call pg_stats_vacuum for that > specific table. If any reloid has not been set by the user, we output statistics for all objects - tables or indexes.In this part of the code, we find all the suitable objects from the snapshot, if they belong to the index or table type of objects. > /* Type of ExtVacReport */ > typedef enum ExtVacReportType > { > PGSTAT_EXTVAC_INVALID = 0, > PGSTAT_EXTVAC_HEAP = 1, > PGSTAT_EXTVAC_INDEX = 2, > PGSTAT_EXTVAC_DB = 3, > } ExtVacReportType; > generally "HEAP" means table and index, maybe "PGSTAT_EXTVAC_HEAP" would be term No, Heap means something like a table in a relationship database, or its alternative name is Heap. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company --------------ShdrI09esBFmnfZW10UeCsPk Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi! Thank you very much for your review! Sorry for my late response I was overwhelmed by tasks.

On 16.08.2024 14:12, jian he wrote:
On Thu, Aug 15, 2024 at 4:49 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
Hi!

I've applied all the v5 patches.
0002 and 0003 have white space errors.

+      <para>
+        Number of times blocks of this index were already found
+        in the buffer cache by vacuum operations, so that a read was
not necessary
+        (this only includes hits in the
+        &project; buffer cache, not the operating system's file system cache)
+      </para></entry>

+        Number of times blocks of this table were already found
+        in the buffer cache by vacuum operations, so that a read was
not necessary
+        (this only includes hits in the
+        &project; buffer cache, not the operating system's file system cache)
+      </para></entry>

"&project;"
represents a sgml file placeholder name as "project" and puts all the
content of "project.sgml" to system-views.sgml.
but you don't have "project.sgml". you may check
doc/src/sgml/filelist.sgml or doc/src/sgml/ref/allfiles.sgml
for usage of "&place_holder;".
so you can change it to "project", otherwise doc cannot build.


src/backend/commands/dbcommands.c
we have:
    /*
     * If built with appropriate switch, whine when regression-testing
     * conventions for database names are violated.  But don't complain during
     * initdb.
     */
#ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
    if (IsUnderPostmaster && strstr(dbname, "regression") == NULL)
        elog(WARNING, "databases created by regression test cases
should have names including \"regression\"");
#endif
so in src/test/regress/sql/vacuum_tables_and_db_statistics.sql you
need to change dbname:
CREATE DATABASE statistic_vacuum_database;
CREATE DATABASE statistic_vacuum_database1;


+  <para>
+   The view <structname>pg_stat_vacuum_indexes</structname> will contain
+   one row for each index in the current database (including TOAST
+   table indexes), showing statistics about vacuuming that specific index.
+  </para>
TOAST should
<acronym>TOAST</acronym>



+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
maybe change to
            ereport(ERROR,
                    (errcode(ERRCODE_DATATYPE_MISMATCH),
                     errmsg("return type must be a row type")));
Later I found out "InitMaterializedSRF(fcinfo, 0);" already did all
the work. much of the code can be gotten rid of.
please check attached.
I agree with your suggestions for improving the code. I will add this in the next version of the patch.


            
#define EXTVACHEAPSTAT_COLUMNS    27
#define EXTVACIDXSTAT_COLUMNS    19
#define EXTVACDBSTAT_COLUMNS    15
#define EXTVACSTAT_COLUMNS Max(EXTVACHEAPSTAT_COLUMNS, EXTVACIDXSTAT_COLUMNS)

static Oid CurrentDatabaseId = InvalidOid;

            
we already defined MyDatabaseId in src/include/miscadmin.h,
Why do we need "static Oid CurrentDatabaseId = InvalidOid;"?
also src/backend/utils/adt/pgstatfuncs.c already included "miscadmin.h".
Hmm, Tom Lane added "misc admin.h", or I didn't notice something. Could you point this out, please?

We used the Current Database Id to output statistics on tables from another database, so we need to replace it with a different default value. But I want to rewrite this patch to display table statistics only for the current database, that is, this part will be removed in the future. In my opinion, it would be more correct.
the following code one function has 2 return statements?
------------------------------------------------------------------------
/*
 * Get the vacuum statistics for the heap tables.
 */
Datum
pg_stat_vacuum_tables(PG_FUNCTION_ARGS)
{
    return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_HEAP, EXTVACHEAPSTAT_COLUMNS);

    PG_RETURN_NULL();
}

/*
 * Get the vacuum statistics for the indexes.
 */
Datum
pg_stat_vacuum_indexes(PG_FUNCTION_ARGS)
{
    return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_INDEX, EXTVACIDXSTAT_COLUMNS);

    PG_RETURN_NULL();
}

/*
 * Get the vacuum statistics for the database.
 */
Datum
pg_stat_vacuum_database(PG_FUNCTION_ARGS)
{
        return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_DB, EXTVACDBSTAT_COLUMNS);

    PG_RETURN_NULL();
}
You are right - the second return is superfluous. I'll fix it.
------------------------------------------------------------------------
in pg_stats_vacuum:
    if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP)
    {
        Oid                    relid = PG_GETARG_OID(1);

        /* Load table statistics for specified database. */
        if (OidIsValid(relid))
        {
            tabentry = fetch_dbstat_tabentry(dbid, relid);
            if (tabentry == NULL || tabentry->vacuum_ext.type != type)
                /* Table don't exists or isn't an heap relation. */
                PG_RETURN_NULL();

            tuplestore_put_for_relation(relid, tupstore, tupdesc,
tabentry, ncolumns);
        }
        else
        {
         ...
        }
}
I don't understand the ELSE branch. the IF branch means the input
dboid, heap/index oid is correct.
the ELSE branch means table reloid is invalid = 0.
I am not sure 100% what the ELSE Branch means.
Also there are no comments explaining why.
experiments seem to show that  when reloid is 0, it will print out all
the vacuum statistics
for all the tables in the current database. If so, then only super
users can call pg_stats_vacuum?
but the table owner should be able to call pg_stats_vacuum for that
specific table.
If any reloid has not been set by the user, we output statistics for all objects - tables or indexes.In this part of the code, we find all the suitable objects from the snapshot, if they belong to the index or table type of objects.
/* Type of ExtVacReport */
typedef enum ExtVacReportType
{
    PGSTAT_EXTVAC_INVALID = 0,
    PGSTAT_EXTVAC_HEAP = 1,
    PGSTAT_EXTVAC_INDEX = 2,
    PGSTAT_EXTVAC_DB = 3,
} ExtVacReportType;
generally "HEAP" means table and index, maybe "PGSTAT_EXTVAC_HEAP" would be term

No, Heap means something like a table in a relationship database, or its alternative name is Heap.

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--------------ShdrI09esBFmnfZW10UeCsPk--