public inbox for [email protected]
help / color / mirror / Atom feedDuplicate Extended Statistics
4+ messages / 4 participants
[nested] [flat]
* Duplicate Extended Statistics
@ 2024-09-03 13:09 Ilia Evdokimov <[email protected]>
2024-09-03 14:11 ` Re: Duplicate Extended Statistics Tom Lane <[email protected]>
2024-09-04 08:28 ` Re: Duplicate Extended Statistics Asad Ali <[email protected]>
0 siblings, 2 replies; 4+ messages in thread
From: Ilia Evdokimov @ 2024-09-03 13:09 UTC (permalink / raw)
To: [email protected]
Hello everyone,
I have a question regarding extended statistics in PostgreSQL. Why is it
possible to create duplicate extended statistics? To make it clearer,
here’s an example:
CREATE TABLE t(a int, b int);
INSERT INTO t(a, b) VALUES (...);
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;
....
CREATE STATISTICS ON a, b FROM t;
ANALYZE t;
After executing these queries, the following issues might arise:
1. ANALYZE will take longer to run because, for example, MCV extended
statistics would need to be gathered twice.
2. Duplicate information will be stored.
3. The planner might take longer to find the relevant statistics since
it has to search through them in a loop.
Or do duplicate extended statistics practically never occur in practice?
Thanks in advance for your response.
--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Duplicate Extended Statistics
2024-09-03 13:09 Duplicate Extended Statistics Ilia Evdokimov <[email protected]>
@ 2024-09-03 14:11 ` Tom Lane <[email protected]>
1 sibling, 0 replies; 4+ messages in thread
From: Tom Lane @ 2024-09-03 14:11 UTC (permalink / raw)
To: Ilia Evdokimov <[email protected]>; +Cc: [email protected]
Ilia Evdokimov <[email protected]> writes:
> I have a question regarding extended statistics in PostgreSQL. Why is it
> possible to create duplicate extended statistics?
[ shrug... ] We don't prevent duplicate indexes, either.
regards, tom lane
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Duplicate Extended Statistics
2024-09-03 13:09 Duplicate Extended Statistics Ilia Evdokimov <[email protected]>
@ 2024-09-04 08:28 ` Asad Ali <[email protected]>
2024-09-04 08:43 ` Re: Duplicate Extended Statistics Holger Jakobs <[email protected]>
1 sibling, 1 reply; 4+ messages in thread
From: Asad Ali @ 2024-09-04 08:28 UTC (permalink / raw)
To: Ilia Evdokimov <[email protected]>; +Cc: [email protected]
Hi Ilia,
In PostgreSQL, it is possible to create duplicate extended statistics
because the system does not enforce uniqueness on statistics definitions.
However, this is generally not recommended, as it leads to longer ANALYZE
times, increased storage usage, potential planner performance impact, and
unnecessary complexity. In practice, duplicates are rare because users and
tools usually avoid redundancy, as there is no added benefit to having
multiple identical sets of statistics on the same columns.
Regards,
Asad Ali
On Tue, Sep 3, 2024 at 6:10 PM Ilia Evdokimov <[email protected]>
wrote:
> Hello everyone,
>
> I have a question regarding extended statistics in PostgreSQL. Why is it
> possible to create duplicate extended statistics? To make it clearer,
> here’s an example:
>
> CREATE TABLE t(a int, b int);
> INSERT INTO t(a, b) VALUES (...);
> CREATE STATISTICS ON a, b FROM t;
> ANALYZE t;
> ....
> CREATE STATISTICS ON a, b FROM t;
> ANALYZE t;
>
> After executing these queries, the following issues might arise:
>
> 1. ANALYZE will take longer to run because, for example, MCV extended
> statistics would need to be gathered twice.
> 2. Duplicate information will be stored.
> 3. The planner might take longer to find the relevant statistics since
> it has to search through them in a loop.
>
> Or do duplicate extended statistics practically never occur in practice?
>
> Thanks in advance for your response.
>
> --
> Regards,
> Ilia Evdokimov,
> Tantor Labs LCC.
>
>
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Duplicate Extended Statistics
2024-09-03 13:09 Duplicate Extended Statistics Ilia Evdokimov <[email protected]>
2024-09-04 08:28 ` Re: Duplicate Extended Statistics Asad Ali <[email protected]>
@ 2024-09-04 08:43 ` Holger Jakobs <[email protected]>
0 siblings, 0 replies; 4+ messages in thread
From: Holger Jakobs @ 2024-09-04 08:43 UTC (permalink / raw)
To: [email protected]
Am 04.09.24 um 10:28 schrieb Asad Ali:
> Hi Ilia,
>
> In PostgreSQL, it is possible to create duplicate extended statistics
> because the system does not enforce uniqueness on statistics
> definitions. However, this is generally not recommended, as it leads
> to longer ANALYZE times, increased storage usage, potential planner
> performance impact, and unnecessary complexity. In practice,
> duplicates are rare because users and tools usually avoid redundancy,
> as there is no added benefit to having multiple identical sets of
> statistics on the same columns.
>
> Regards,
> Asad Ali
>
>
To find duplicate indexes:
SELECT indrelid::regclass::text AS "Table Name",
string_agg(indexrelid::regclass::text ||
CASE WHEN indisunique
THEN ' (UNIQUE)' ELSE ''
END, ', ') AS "Indexes"
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1
--
Holger Jakobs, Bergisch Gladbach
Attachments:
[application/pgp-signature] OpenPGP_signature (203B, 3-OpenPGP_signature)
download
^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2024-09-04 08:43 UTC | newest]
Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-03 13:09 Duplicate Extended Statistics Ilia Evdokimov <[email protected]>
2024-09-03 14:11 ` Tom Lane <[email protected]>
2024-09-04 08:28 ` Asad Ali <[email protected]>
2024-09-04 08:43 ` Holger Jakobs <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox