Subject: Re: less than 2 sec for response - possible?
To: pgsql-performance@postgresql.org
References: <38b8542e-e011-370d-3863-207126ac9b33@mail.ru>
 <4972.1467422585@sss.pgh.pa.us>
 <2f7771c1-40f2-8c9b-7c0a-232935ff93ea@mail.ru>
From: Pujol Mathieu <mathieu.pujol@realfusio.com>
Message-ID: <1e4aa8a7-94da-d880-837d-df40b97aaad6@realfusio.com>
Date: Tue, 5 Jul 2016 12:10:31 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <2f7771c1-40f2-8c9b-7c0a-232935ff93ea@mail.ru>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

Hello

Have you solved your problem ?

Could it be a conversion overhead from 'timestamp without time zone' to 
'date' ? In this case, I don't know if planer store constants as date or 
timestamp.

Mathieu Pujol

Le 02/07/2016 à 04:48, trafdev a écrit :
> Thanks Tom.
>
> I've created index on aid, date:
>
> create index aaa on stats.feed_sub(aid,date);
>
> and simplified a query (dropped gran as it's equal for all rows anyway):
>
> SELECT
>   sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip,
>   sum(stats.feed_sub.c_filt_doubleclick)  AS clicks_on_target,
>   sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period,
>   sum(stats.feed_sub.c_filt_fast_click)   AS fast_click,
>   sum(stats.feed_sub.c_filt_ip_mismatch)  AS ip_mismatch,
>   sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch,
>   sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch,
>   sum(stats.feed_sub.c_filt_ua_mismatch)  AS ua_mismatch,
>   sum(stats.feed_sub.c_filt_url_expired)  AS url_expired,
>   stats.feed_sub.subid                    AS stats_feed_sub_subid,
>   stats.feed_sub.sid                      AS stats_feed_sub_sid
> FROM stats.feed_sub
> WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND
>       stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND
>       stats.feed_sub.aid = 3
> GROUP BY
>   stats.feed_sub.subid, stats.feed_sub.sid;
>
> All data is in the cache and it still takes almost 5 seconds to complete:
>
> QUERY PLAN
> HashAggregate  (cost=792450.42..803727.24 rows=346979 width=86) 
> (actual time=4742.145..4882.468 rows=126533 loops=1)
> "  Group Key: subid, sid"
>   Buffers: shared hit=1350371
>   ->  Index Scan using aaa on feed_sub  (cost=0.43..697031.39 
> rows=3469783 width=86) (actual time=0.026..1655.394 rows=3588376 loops=1)
>         Index Cond: ((aid = 3) AND (date >= '2016-06-01 
> 00:00:00'::timestamp without time zone) AND (date <= '2016-06-30 
> 00:00:00'::timestamp without time zone))
>         Buffers: shared hit=1350371
> Planning time: 0.159 ms
> Execution time: 4899.934 ms
>
> It's better, but still is far from "<2 secs" goal.
>
> Any thoughts?
>
>
> On 07/01/16 18:23, Tom Lane wrote:
>> trafdev <trafdev@mail.ru> writes:
>>> CREATE INDEX ix_feed_sub_date
>>>    ON stats.feed_sub
>>>    USING brin
>>>    (date);
>>
>>> CREATE UNIQUE INDEX ixu_feed_sub
>>>    ON stats.feed_sub
>>>    USING btree
>>>    (date, gran, aid, pid, sid, fid, subid COLLATE 
>>> pg_catalog."default");
>>
>>> HashAggregate (cost=901171.72..912354.97 rows=344100 width=86) (actual
>>> time=7207.825..7335.473 rows=126044 loops=1)
>>> "  Group Key: subid, sid"
>>>    Buffers: shared hit=3635804
>>>    ->  Index Scan using ixu_feed_sub on feed_sub (cost=0.56..806544.38
>>> rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344 
>>> loops=1)
>>>          Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without
>>> time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time
>>> zone) AND (gran = '1 day'::interval) AND (aid = 3))
>>>          Buffers: shared hit=3635804
>>> Planning time: 0.150 ms
>>> Execution time: 7352.009 ms
>>
>> Neither of those indexes is terribly well designed for this query.
>> A btree index on (aid, gran, date) or (gran, aid, date) would work
>> much better.  See
>>
>> https://www.postgresql.org/docs/9.5/static/indexes-multicolumn.html
>>
>> You could rearrange the column order in that giant unique index
>> and get some of the benefit.  But if you're desperate to optimize
>> this particular query, an index not bearing so many irrelevant columns
>> would probably be better for it.
>>
>> An alternative way of thinking would be to create an index with those
>> three leading columns and then all of the other columns used by this
>> query as later columns.  That would be an even larger index, but it 
>> would
>> allow an index-only scan, which might be quite a lot faster. The fact
>> that you seem to be hitting about one page for each row retrieved says
>> that the data you need is pretty badly scattered, so constructing an 
>> index
>> that concentrates everything you need into one range of the index might
>> be the ticket.
>>
>> Either of these additional-index ideas is going to penalize table
>> insertions/updates, so keep an eye on that end of the performance
>> question too.
>>
>>             regards, tom lane
>>
>
>


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance