public inbox for [email protected]  
help / color / mirror / Atom feed
From: Stephen Frost <[email protected]>
To: Peter Geoghegan <[email protected]>
Cc: Justin Pryzby <[email protected]>
Cc: [email protected]
Subject: Re: index fragmentation on insert-only table with non-unique column
Date: Wed, 25 May 2016 00:26:25 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAH2-WznDpnJaabA1tQht5rUZRQUp3YnQ21QbA-ePZ3xLm_X7ww@mail.gmail.com>
References: <[email protected]>
	<CAH2-WznDpnJaabA1tQht5rUZRQUp3YnQ21QbA-ePZ3xLm_X7ww@mail.gmail.com>
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

* Peter Geoghegan ([email protected]) wrote:
> On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby <[email protected]> wrote:
> > I was able to see great improvement without planner parameters by REINDEX the
> > timestamp index.  My theory is that the index/planner doesn't handle well the
> > case of many tuples with same column value, and returns pages out of logical
> > order.  Reindex fixes that, rewriting the index data with pages in order
> > (confirmed with pageinspect), which causes index scans to fetch heap data more
> > or less monotonically (if not consecutively).  strace shows that consecutive
> > read()s are common (without intervening seeks).  I gather this allows the OS
> > readahead to kick in.
> 
> The basic problem is that the B-Tree code doesn't maintain this
> property. However, B-Tree index builds will create an index that
> initially has this property, because the tuplesort.c code happens to
> sort index tuples with a CTID tie-breaker.
> 
> > Postgres seems to assume that the high degree of correlation of the table
> > column seen in pg_stats is how it will get data from the index scan, which
> > assumption seems to be very poor on what turns out to be a higly fragmented
> > index.  Is there a way to help it to understand otherwise??
> 
> Your complaint is vague. Are you complaining about the planner making
> a poor choice? I don't think that's the issue here, because you never
> made any firm statement about the planner making a choice that was
> worth than an alternative that it had available.
> 
> If you're arguing for the idea that B-Trees should reliably keep
> tuples in order by a tie-break condition, that seems difficult to
> implement, and likely not worth it in practice.

The ongoing discussion around how to effectively handle duplicate values
in B-Tree seems relevant to this.  In particular, if we're able to store
duplicate values efficiently and all the locations under a single key
are easily available then we could certainly sort those prior to going
and visiting them.

That's not quite the same as keeping the tuples in order in the heap,
but would more-or-less achieve the effect desired, I believe?

Thanks!

Stephen


Attachments:

  [application/pgp-signature] signature.asc (819B, 2-signature.asc)
  download

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: index fragmentation on insert-only table with non-unique column
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox