Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bKNNa-0006Fj-DY for pgsql-performance@arkaria.postgresql.org; Tue, 05 Jul 2016 10:14:58 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bKNNZ-0007Y6-DP for pgsql-performance@arkaria.postgresql.org; Tue, 05 Jul 2016 10:14:57 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bKNLt-0005hS-PR for pgsql-performance@postgresql.org; Tue, 05 Jul 2016 10:13:13 +0000 Received: from 2.mo6.mail-out.ovh.net ([46.105.76.65]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bKNLp-0005jj-QB for pgsql-performance@postgresql.org; Tue, 05 Jul 2016 10:13:13 +0000 Received: from player738.ha.ovh.net (b7.ovh.net [213.186.33.57]) by mo6.mail-out.ovh.net (Postfix) with ESMTP id BA959FFA333 for ; Tue, 5 Jul 2016 12:13:07 +0200 (CEST) Received: from [192.9.200.158] (unknown [92.175.71.125]) (Authenticated sender: mathieu.pujol@realfusio.com) by player738.ha.ovh.net (Postfix) with ESMTPSA id E0675B30 for ; Tue, 5 Jul 2016 12:10:32 +0200 (CEST) Subject: Re: less than 2 sec for response - possible? To: pgsql-performance@postgresql.org References: <38b8542e-e011-370d-3863-207126ac9b33@mail.ru> <4972.1467422585@sss.pgh.pa.us> <2f7771c1-40f2-8c9b-7c0a-232935ff93ea@mail.ru> From: Pujol Mathieu Message-ID: <1e4aa8a7-94da-d880-837d-df40b97aaad6@realfusio.com> Date: Tue, 5 Jul 2016 12:10:31 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <2f7771c1-40f2-8c9b-7c0a-232935ff93ea@mail.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Ovh-Tracer-Id: 9770840867182853168 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeeltddrvdeggddvvdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjnecuuegrihhlohhuthemuceftddtnecu X-Pg-Spam-Score: -1.9 (-) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Hello Have you solved your problem ? Could it be a conversion overhead from 'timestamp without time zone' to 'date' ? In this case, I don't know if planer store constants as date or timestamp. Mathieu Pujol Le 02/07/2016 à 04:48, trafdev a écrit : > Thanks Tom. > > I've created index on aid, date: > > create index aaa on stats.feed_sub(aid,date); > > and simplified a query (dropped gran as it's equal for all rows anyway): > > SELECT > sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip, > sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target, > sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period, > sum(stats.feed_sub.c_filt_fast_click) AS fast_click, > sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch, > sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch, > sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch, > sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch, > sum(stats.feed_sub.c_filt_url_expired) AS url_expired, > stats.feed_sub.subid AS stats_feed_sub_subid, > stats.feed_sub.sid AS stats_feed_sub_sid > FROM stats.feed_sub > WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND > stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND > stats.feed_sub.aid = 3 > GROUP BY > stats.feed_sub.subid, stats.feed_sub.sid; > > All data is in the cache and it still takes almost 5 seconds to complete: > > QUERY PLAN > HashAggregate (cost=792450.42..803727.24 rows=346979 width=86) > (actual time=4742.145..4882.468 rows=126533 loops=1) > " Group Key: subid, sid" > Buffers: shared hit=1350371 > -> Index Scan using aaa on feed_sub (cost=0.43..697031.39 > rows=3469783 width=86) (actual time=0.026..1655.394 rows=3588376 loops=1) > Index Cond: ((aid = 3) AND (date >= '2016-06-01 > 00:00:00'::timestamp without time zone) AND (date <= '2016-06-30 > 00:00:00'::timestamp without time zone)) > Buffers: shared hit=1350371 > Planning time: 0.159 ms > Execution time: 4899.934 ms > > It's better, but still is far from "<2 secs" goal. > > Any thoughts? > > > On 07/01/16 18:23, Tom Lane wrote: >> trafdev writes: >>> CREATE INDEX ix_feed_sub_date >>> ON stats.feed_sub >>> USING brin >>> (date); >> >>> CREATE UNIQUE INDEX ixu_feed_sub >>> ON stats.feed_sub >>> USING btree >>> (date, gran, aid, pid, sid, fid, subid COLLATE >>> pg_catalog."default"); >> >>> HashAggregate (cost=901171.72..912354.97 rows=344100 width=86) (actual >>> time=7207.825..7335.473 rows=126044 loops=1) >>> " Group Key: subid, sid" >>> Buffers: shared hit=3635804 >>> -> Index Scan using ixu_feed_sub on feed_sub (cost=0.56..806544.38 >>> rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344 >>> loops=1) >>> Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without >>> time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time >>> zone) AND (gran = '1 day'::interval) AND (aid = 3)) >>> Buffers: shared hit=3635804 >>> Planning time: 0.150 ms >>> Execution time: 7352.009 ms >> >> Neither of those indexes is terribly well designed for this query. >> A btree index on (aid, gran, date) or (gran, aid, date) would work >> much better. See >> >> https://www.postgresql.org/docs/9.5/static/indexes-multicolumn.html >> >> You could rearrange the column order in that giant unique index >> and get some of the benefit. But if you're desperate to optimize >> this particular query, an index not bearing so many irrelevant columns >> would probably be better for it. >> >> An alternative way of thinking would be to create an index with those >> three leading columns and then all of the other columns used by this >> query as later columns. That would be an even larger index, but it >> would >> allow an index-only scan, which might be quite a lot faster. The fact >> that you seem to be hitting about one page for each row retrieved says >> that the data you need is pretty badly scattered, so constructing an >> index >> that concentrates everything you need into one range of the index might >> be the ticket. >> >> Either of these additional-index ideas is going to penalize table >> insertions/updates, so keep an eye on that end of the performance >> question too. >> >> regards, tom lane >> > > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance