public inbox for [email protected]
help / color / mirror / Atom feedFrom: Torsten Zuehlsdorff <[email protected]>
To: trafdev <[email protected]>
To: [email protected]
Subject: Re: less than 2 sec for response - possible?
Date: Tue, 5 Jul 2016 13:39:40 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
List-Unsubscribe: <mailto:[email protected]?body=unsub%20pgsql-performance>
On 02.07.2016 02:54, trafdev wrote:
> Hi.
>
> I'm trying to build an OLAP-oriented DB based on PostgresSQL.
>
> User works with a paginated report in the web-browser. Interface allows
> to fetch data for a custom date-range selection,
> display individual rows (20-50 per page) and totals (for entire
> selection, even not visible on the current page) and sorting by any column.
>
> The main goal is to deliver results of the basic SELECT queries to the
> end-user in less than 2 seconds.
>
> I was able to achieve that except for one table (the biggest one).
>
> It consist of multiple dimensions (date, gran, aid, pid, sid, fid,
> subid) and metrics (see below).
> User can filter by any dimension and sort by any metric.
>
> Here is a CREATE script for this table:
>
> CREATE TABLE stats.feed_sub
> (
> date date NOT NULL,
> gran interval NOT NULL,
> aid smallint NOT NULL,
> pid smallint NOT NULL,
> sid smallint NOT NULL,
> fid smallint NOT NULL,
> subid text NOT NULL,
> rev_est_pub real NOT NULL,
> rev_est_feed real NOT NULL,
> rev_raw real NOT NULL,
> c_total bigint NOT NULL,
> c_passed bigint NOT NULL,
> q_total bigint NOT NULL,
> q_passed bigint NOT NULL,
> q_filt_geo bigint NOT NULL,
> q_filt_browser bigint NOT NULL,
> q_filt_os bigint NOT NULL,
> q_filt_ip bigint NOT NULL,
> q_filt_subid bigint NOT NULL,
> q_filt_pause bigint NOT NULL,
> q_filt_click_cap_ip bigint NOT NULL,
> q_filt_query_cap bigint NOT NULL,
> q_filt_click_cap bigint NOT NULL,
> q_filt_rev_cap bigint NOT NULL,
> q_filt_erpm_floor bigint NOT NULL,
> c_filt_click_cap_ip bigint NOT NULL,
> c_filt_doubleclick bigint NOT NULL,
> c_filt_url_expired bigint NOT NULL,
> c_filt_fast_click bigint NOT NULL,
> c_filt_delay_clicks bigint NOT NULL,
> c_filt_ip_mismatch bigint NOT NULL,
> c_filt_ref_mismatch bigint NOT NULL,
> c_filt_lng_mismatch bigint NOT NULL,
> c_filt_ua_mismatch bigint NOT NULL,
> res_impr bigint NOT NULL,
> rev_ver_pub real,
> rev_ver_feed real,
> c_ver bigint,
> q_filt_ref bigint NOT NULL
> )
> WITH (
> OIDS=FALSE
> );
>
> CREATE INDEX ix_feed_sub_date
> ON stats.feed_sub
> USING brin
> (date);
>
> CREATE UNIQUE INDEX ixu_feed_sub
> ON stats.feed_sub
> USING btree
> (date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default");
>
> Here is some sizing info (https://wiki.postgresql.org/wiki/Disk_Usage):
>
> relation,size
> stats.feed_sub,5644 MB
> stats.ixu_feed_sub,1594 MB
>
> row_estimate
> 15865627
>
> Here is the typical query (for totals beige):
> SELECT
> sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip,
> sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target,
> sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period,
> sum(stats.feed_sub.c_filt_fast_click) AS fast_click,
> sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch,
> sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch,
> sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch,
> sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch,
> sum(stats.feed_sub.c_filt_url_expired) AS url_expired,
> stats.feed_sub.subid AS stats_feed_sub_subid,
> stats.feed_sub.sid AS stats_feed_sub_sid
> FROM stats.feed_sub
> WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND
> stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND
> stats.feed_sub.gran = '1 day'
> AND stats.feed_sub.aid = 3
> GROUP BY
> stats.feed_sub.subid, stats.feed_sub.sid;
You cast every date to an timestamp. Why? You can adjust the index to:
CREATE UNIQUE INDEX ixu_feed_sub
ON stats.feed_sub
USING btree
(date::timestamp, gran, aid, pid, sid, fid, subid COLLATE
pg_catalog."default");
But since i see no need for the cast at all (maybe i missed it) try it
without!
Greetings,
Torsten
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: less than 2 sec for response - possible?
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox