Duplicate Extended Statistics

public inbox for [email protected]  
help / color / mirror / Atom feed

Duplicate Extended Statistics
4+ messages / 4 participants
[nested] [flat]

* Duplicate Extended Statistics
@ 2024-09-03 13:09  Ilia Evdokimov <[email protected]>
  0 siblings, 2 replies; 4+ messages in thread

From: Ilia Evdokimov @ 2024-09-03 13:09 UTC (permalink / raw)
  To: [email protected]

Hello everyone,

I have a question regarding extended statistics in PostgreSQL. Why is it 
possible to create duplicate extended statistics? To make it clearer, 
here’s an example:

CREATE TABLE t(a int, b int);
INSERT INTO t(a, b) VALUES (...);
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;
....
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;

After executing these queries, the following issues might arise:

 1. ANALYZE will take longer to run because, for example, MCV extended
    statistics would need to be gathered twice.
 2. Duplicate information will be stored.
 3. The planner might take longer to find the relevant statistics since
    it has to search through them in a loop.

Or do duplicate extended statistics practically never occur in practice?

Thanks in advance for your response.

-- 
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Duplicate Extended Statistics
@ 2024-09-03 14:11  Tom Lane <[email protected]>
  parent: Ilia Evdokimov <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread

From: Tom Lane @ 2024-09-03 14:11 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; +Cc: [email protected]

Ilia Evdokimov <[email protected]> writes:
> I have a question regarding extended statistics in PostgreSQL. Why is it 
> possible to create duplicate extended statistics?

[ shrug... ] We don't prevent duplicate indexes, either.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Duplicate Extended Statistics
@ 2024-09-04 08:28  Asad Ali <[email protected]>
  parent: Ilia Evdokimov <[email protected]>
  1 sibling, 1 reply; 4+ messages in thread

From: Asad Ali @ 2024-09-04 08:28 UTC (permalink / raw)
  To: Ilia Evdokimov <[email protected]>; +Cc: [email protected]

Hi Ilia,

In PostgreSQL, it is possible to create duplicate extended statistics
because the system does not enforce uniqueness on statistics definitions.
However, this is generally not recommended, as it leads to longer ANALYZE
times, increased storage usage, potential planner performance impact, and
unnecessary complexity. In practice, duplicates are rare because users and
tools usually avoid redundancy, as there is no added benefit to having
multiple identical sets of statistics on the same columns.

Regards,
Asad Ali

On Tue, Sep 3, 2024 at 6:10 PM Ilia Evdokimov <[email protected]>
wrote:

> Hello everyone,
>
> I have a question regarding extended statistics in PostgreSQL. Why is it
> possible to create duplicate extended statistics? To make it clearer,
> here’s an example:
>
> CREATE TABLE t(a int, b int);
> INSERT INTO t(a, b) VALUES (...);
> CREATE STATISTICS ON a, b FROM t;
> ANALYZE t;
> ....
> CREATE STATISTICS ON a, b FROM t;
> ANALYZE t;
>
> After executing these queries, the following issues might arise:
>
>    1. ANALYZE will take longer to run because, for example, MCV extended
>    statistics would need to be gathered twice.
>    2. Duplicate information will be stored.
>    3. The planner might take longer to find the relevant statistics since
>    it has to search through them in a loop.
>
> Or do duplicate extended statistics practically never occur in practice?
>
> Thanks in advance for your response.
>
> --
> Regards,
> Ilia Evdokimov,
> Tantor Labs LCC.
>
>

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Duplicate Extended Statistics
@ 2024-09-04 08:43  Holger Jakobs <[email protected]>
  parent: Asad Ali <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Holger Jakobs @ 2024-09-04 08:43 UTC (permalink / raw)
  To: [email protected]

Am 04.09.24 um 10:28 schrieb Asad Ali:
> Hi Ilia,
>
> In PostgreSQL, it is possible to create duplicate extended statistics 
> because the system does not enforce uniqueness on statistics 
> definitions. However, this is generally not recommended, as it leads 
> to longer ANALYZE times, increased storage usage, potential planner 
> performance impact, and unnecessary complexity. In practice, 
> duplicates are rare because users and tools usually avoid redundancy, 
> as there is no added benefit to having multiple identical sets of 
> statistics on the same columns.
>
> Regards,
> Asad Ali
>
>
To find duplicate indexes:

SELECT  indrelid::regclass::text AS "Table Name",
         string_agg(indexrelid::regclass::text ||
           CASE WHEN indisunique
           THEN ' (UNIQUE)' ELSE ''
           END, ', ') AS "Indexes"
       FROM pg_index
       GROUP BY indrelid, indkey
       HAVING COUNT(*) > 1

-- 

Holger Jakobs, Bergisch Gladbach



Attachments:

  [application/pgp-signature] OpenPGP_signature (203B, 3-OpenPGP_signature)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2024-09-04 08:43 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-03 13:09 Duplicate Extended Statistics Ilia Evdokimov <[email protected]>
2024-09-03 14:11 ` Tom Lane <[email protected]>
2024-09-04 08:28 ` Asad Ali <[email protected]>
2024-09-04 08:43   ` Holger Jakobs <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox