public inbox for [email protected]
help / color / mirror / Atom feedFrom: Peter Geoghegan <[email protected]>
To: Justin Pryzby <[email protected]>
Cc: [email protected]
Subject: Re: index fragmentation on insert-only table with non-unique column
Date: Tue, 24 May 2016 21:16:20 -0700
Message-ID: <CAH2-WznDpnJaabA1tQht5rUZRQUp3YnQ21QbA-ePZ3xLm_X7ww@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
List-Unsubscribe: <mailto:[email protected]?body=unsub%20pgsql-performance>
On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby <[email protected]> wrote:
> I was able to see great improvement without planner parameters by REINDEX the
> timestamp index. My theory is that the index/planner doesn't handle well the
> case of many tuples with same column value, and returns pages out of logical
> order. Reindex fixes that, rewriting the index data with pages in order
> (confirmed with pageinspect), which causes index scans to fetch heap data more
> or less monotonically (if not consecutively). strace shows that consecutive
> read()s are common (without intervening seeks). I gather this allows the OS
> readahead to kick in.
The basic problem is that the B-Tree code doesn't maintain this
property. However, B-Tree index builds will create an index that
initially has this property, because the tuplesort.c code happens to
sort index tuples with a CTID tie-breaker.
> Postgres seems to assume that the high degree of correlation of the table
> column seen in pg_stats is how it will get data from the index scan, which
> assumption seems to be very poor on what turns out to be a higly fragmented
> index. Is there a way to help it to understand otherwise??
Your complaint is vague. Are you complaining about the planner making
a poor choice? I don't think that's the issue here, because you never
made any firm statement about the planner making a choice that was
worth than an alternative that it had available.
If you're arguing for the idea that B-Trees should reliably keep
tuples in order by a tie-break condition, that seems difficult to
implement, and likely not worth it in practice.
--
Peter Geoghegan
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: index fragmentation on insert-only table with non-unique column
In-Reply-To: <CAH2-WznDpnJaabA1tQht5rUZRQUp3YnQ21QbA-ePZ3xLm_X7ww@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox