Date: Fri, 3 Jun 2016 18:54:06 -0500
From: Justin Pryzby <pryzby@telsasoft.com>
To: pgsql-performance@postgresql.org
Subject: Re: index fragmentation on insert-only table with
 non-unique column
Message-ID: <20160603235406.GN23616@telsasoft.com>
References: <20160524173914.GA11880@telsasoft.com>
 <CAMkU=1wsSEgByzYQ5uazD1gSsJh9RRYME6ehgJsUp3XcMm2Djg@mail.gmail.com>
 <20160525140034.GB21220@telsasoft.com>
 <CAGTBQpbqkQYRbnz-7MJEhCL9bRhaE3sX12LZh4YGh4m+xoUqYA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: 
 <CAGTBQpbqkQYRbnz-7MJEhCL9bRhaE3sX12LZh4YGh4m+xoUqYA@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

On Fri, Jun 03, 2016 at 06:26:33PM -0300, Claudio Freire wrote:
> On Wed, May 25, 2016 at 11:00 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >> > First, I found I was able to get 30-50min query results on full week's table by
> >> > prefering a seq scan to an index scan.  The row estimates seemed fine, and the
> >> > only condition is the timestamp, so the planner's use of index scan is as
> >> > expected.
> >>
> >> Can you show us the query?  I would expect a bitmap scan of the index
> >> (which would do what you want, but even more so), instead.
> > See explain, also showing additional tables/views being joined.  It's NOT doing
> > a bitmap scan though, and I'd be interested to find why; I'm sure that would've
> > improved this query enough so it never would've been an issue.
> > https://explain.depesz.com/s/s8KP
> >
> >  ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_01_recordopeningtime_idx on cdrs_huawei_pgwrecord_2016_05_01  (cost=0.56..1601734.57 rows=8943848 width=349)
> >        Index Cond: ((recordopeningtime >= '2016-05-07 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-08 00:00:00'::timestamp without time zone))
> 
> Please show your guc settings ( see
> https://wiki.postgresql.org/wiki/Server_Configuration )
> 
> A plan node like that, if it would result in I/O, with proper
> configuration should have selected a bitmap index/heap scan. If it
> didn't, it probably thinks it has more cache than it really does, and
> that would mean the wrong setting was set in effective_cache_size.

ts=# SELECT name, current_setting(name), SOURCE FROM pg_settings WHERE SOURCE='configuration file';
 dynamic_shared_memory_type  | posix               | configuration file
 effective_cache_size        | 64GB                | configuration file
 effective_io_concurrency    | 8                   | configuration file
 huge_pages                  | try                 | configuration file
 log_autovacuum_min_duration | 0                   | configuration file
 log_checkpoints             | on                  | configuration file
 maintenance_work_mem        | 6GB                 | configuration file
 max_connections             | 200                 | configuration file
 max_wal_size                | 4GB                 | configuration file
 min_wal_size                | 6GB                 | configuration file
 shared_buffers              | 8GB                 | configuration file
 wal_compression             | on                  | configuration file
 work_mem                    | 1GB                 | configuration file

I changed at least maintenance_work_mem since I originally wrote, to try to
avoid tempfiles during REINDEX (though I'm not sure it matters, as the
tempfiles are effective cached and may never actually be written).

It's entirely possible those settings aren't ideal.  The server has 72GB RAM.
There are usually very few (typically n<3 but at most a handful) nontrivial
queries running at once, if at all.

I wouldn't expect any data that's not recent (table data last 2 days or index
from this month) to be cached, and wouldn't expect that to be entirely cached,
either:

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE relname~'_2016_05_..$';
gb | 425.783050537109

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE relname~'_2016_05_...*idx';
gb | 60.0909423828125

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE relname~'_201605.*idx';
gb | 4.85528564453125

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE relname~'_201605$';
gb | 86.8688049316406

As a test, I did SET effective_cache_size='1MB', before running explain, and
still does:

|	 ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx on cdrs_huawei_pgwrecord_2016_05_29  (cost=0.44..1526689.49 rows=8342796 width=355)
|	       Index Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 00:00:00'::timestamp without time zone))

I Set enable_indexscan=0, and got:

|	 ->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_29  (cost=168006.10..4087526.04 rows=8342796 width=355)
|	       Recheck Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 00:00:00'::timestamp without time zone))
|	       ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx  (cost=0.00..165920.40 rows=8342796 width=0)
|		     Index Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 00:00:00'::timestamp without time zone))

Here's a minimal query which seems to isolate the symptom:

ts=# explain (analyze,buffers) SELECT sum(duration) FROM cdrs_huawei_pgwrecord_2016_05_22 WHERE recordopeningtime>='2016-05-22' AND recordopeningtime<'2016-05-23';
| Aggregate  (cost=2888731.67..2888731.68 rows=1 width=8) (actual time=388661.892..388661.892 rows=1 loops=1)
|   Buffers: shared hit=4058501 read=1295147 written=35800
|   ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_22_recordopeningtime_idx on cdrs_huawei_pgwrecord_2016_05_22  (cost=0.56..2867075.33 rows=8662534 w
|idth=8) (actual time=0.036..379332.910 rows=8575673 loops=1)
|         Index Cond: ((recordopeningtime >= '2016-05-22 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-23 00:00:00'::timestamp
| without time zone))
|         Buffers: shared hit=4058501 read=1295147 written=35800
| Planning time: 0.338 ms
| Execution time: 388661.947 ms

And here's an older one to avoid cache, with enable_indexscan=0 
|ts=# explain (analyze,buffers)  SELECT sum(duration) FROM cdrs_huawei_pgwrecord_2016_05_08 WHERE recordopeningtime>='2016-05-08' AND recordopeningtime<'2016-05-09';
| Aggregate  (cost=10006286.58..10006286.59 rows=1 width=8) (actual time=44219.156..44219.156 rows=1 loops=1)
|   Buffers: shared hit=118 read=1213887 written=50113
|   ->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_08  (cost=85142.24..9985848.96 rows=8175048 width=8) (actual time=708.024..40106.062 rows=8179338 loops=1)
|         Recheck Cond: ((recordopeningtime >= '2016-05-08 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-09 00:00:00'::timestamp without time zone))
|         Rows Removed by Index Recheck: 74909
|         Heap Blocks: lossy=1213568
|         Buffers: shared hit=118 read=1213887 written=50113
|         ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2016_05_08_recordopeningtime_idx1  (cost=0.00..83098.48 rows=8175048 width=0) (actual time=706.557..706.557 rows=12135680 loops=1)
|               Index Cond: ((recordopeningtime >= '2016-05-08 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-09 00:00:00'::timestamp without time zone))
|               Buffers: shared hit=117 read=320
| Planning time: 214.786 ms
| Execution time: 44228.874 ms
|(12 rows)

Thanks for your help.

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance