Re: index fragmentation on insert-only table with non-unique column

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Justin Pryzby <[email protected]>
To: [email protected]
Cc: Jeff Janes <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Claudio Freire <[email protected]>
Subject: Re: index fragmentation on insert-only table with non-unique column
Date: Sat, 13 Aug 2016 13:54:48 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAMkU=1xwGn+O0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q@mail.gmail.com> <CAMkU=1wsSEgByzYQ5uazD1gSsJh9RRYME6ehgJsUp3XcMm2Djg@mail.gmail.com>
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

Regarding this earlier thread:
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.GA1...

On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby <[email protected]> wrote:
> Summary: Non-unique btree indices are returning CTIDs for rows with same
> value of indexed column not in logical order, imposing a high performance
> penalty.

I have to point out that by "logical" I clearly meant "physical", hopefully
nobody was too misled..

On Sun, Jun 05, 2016 at 12:28:47PM -0700, Jeff Janes wrote:
> On Sun, Jun 5, 2016 at 9:03 AM, Tom Lane <[email protected]> wrote:
> > Claudio Freire <[email protected]> writes:
> >> So correlated index scans look extra favourable vs bitmap index scans
> >> because bitmap heap scans consider random page costs sans correlation
> >> effects (even though correlation applies to bitmap heap scans as
> >> well).
> >
> > Really?  How?  The index ordering has nothing to do with the order in
> > which heap tuples will be visited.
> 
> It is not the order itself, but the density.
> 
> If the index is read in a range scan (as opposed to =ANY scan), and
> the index lead column is correlated with the table ordering, then the
> parts of the table that need to be visited will be much denser than if
> there were no correlation.  But Claudio is saying that this is not
> being accounted for.

I didn't completely understand Claudio/Jeff here, and not sure if we're on the
same page.  For queries on these tables, the index scan was very slow, due to
fragmented index on non-unique column, and seq scan would have been (was)
faster (even if it means reading 70GB and filtering out 6 of 7 days' data).
That was resolved by added a nightly reindex job (..  which sometimes competes
with other maintenance and has trouble running every table every night).

But I did find that someone else had previously reported this problem (in a
strikingly similar context and message, perhaps clearer than mine):
https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#[email protected]

I also found this older thread:
https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9r...

There was mention of a TODO item:

    * Compute index correlation on CREATE INDEX and ANALYZE, use it for index
    * scan cost estimation

.. but perhaps I misunderstand and that's long since resolved ?

Justin


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index fragmentation on insert-only table with non-unique column
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox