Store/Retrieve time series data from PostgreSQL

public inbox for [email protected]  
help / color / mirror / Atom feed

Store/Retrieve time series data from PostgreSQL
4+ messages / 2 participants
[nested] [flat]

* Store/Retrieve time series data from PostgreSQL
@ 2017-09-14 11:51 Subramaniam C <[email protected]>
  2017-09-14 12:03 ` Re: Store/Retrieve time series data from PostgreSQL vinny <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Subramaniam C @ 2017-09-14 11:51 UTC (permalink / raw)
  To: pgsql-performance

Hi

*Requirement :- *
We need to retrieve latest health of around 1.5 million objects for a given
time.

*Implementation :-*
We are storing hourly data of each object in single row. Given below is the
schema :-

*CREATE TABLE health_timeseries (*

*       mobid text NOT NULL,
       hour bigint NOT NULL,
       health real[]
     );*

mobId - Object ID

hour - Epoch hour

health - Array of health values for a given hour of that object.

Each object has 2 hours of health data (i.e. 2 rows for each object)
so total no. of rows is around 3 million.

With the above approach the query to retrieve the latest health of all
objects for a given time duration is taking around    *85 seconds*. I
have tried to increase the work_mem, effective_cache, shared_buffer to
4 GB of PostgreSQL but still there was no improvement in the query
execution time.

*Query :-*

*select distinct on (health_timeseries.mobid) mobid,
health_timeseries.health, health_timeseries.hour from
health_timeseries where hour >=(1505211054000/(3600*1000))-1 and hour
<= 1505211054000/(3600*1000) ORDER BY health_timeseries.mobid DESC,
health_timeseries.hour DESC;*

*Hardware Configuration of PostgreSQL VM :-*

1. OS - Centos.

2. Postgresql version - 9.6.2

3. RAM - 16 GB RAM

4. CPU - 8 vCPU

Please let us know the hardware configuration of PostgreSQL for such
huge dataset?

And also let us know if there is any better schema/query to retrieve this data?

Thanks and Regards

Subramaniam

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Store/Retrieve time series data from PostgreSQL
  2017-09-14 11:51 Store/Retrieve time series data from PostgreSQL Subramaniam C <[email protected]>
@ 2017-09-14 12:03 ` vinny <[email protected]>
  2017-09-14 12:33   ` Re: Store/Retrieve time series data from PostgreSQL Subramaniam C <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: vinny @ 2017-09-14 12:03 UTC (permalink / raw)
  To: Subramaniam C <[email protected]>; +Cc: pgsql-performance

On 2017-09-14 13:51, Subramaniam C wrote:
> Hi
> 
> QUERY :-
> 
> _select distinct on (health_timeseries.mobid) mobid,
> health_timeseries.health, health_timeseries.hour from
> health_timeseries where hour >=(1505211054000/(3600*1000))-1 and hour
> <= 1505211054000/(3600*1000) ORDER BY health_timeseries.mobid DESC,
> health_timeseries.hour DESC;_
> 

Did you run EXPLAIN on this query to see what it is actually doing?

What you are doing how is selecting all rows from the last hour,
sorting them by mobid and hour, and then DISTINCT filters out al 
duplicates.

Sorting on mobid is therefor useless, DISTINCT still has to check all 
rows.

Sorting on mobid and hour will take a long time if there is no index for 
it,
so if you don't have an index on the mobid and hour together then you 
should probably try that.

But, see what EXPLAIN tells you first.

Regards,
Vincent.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Store/Retrieve time series data from PostgreSQL
  2017-09-14 11:51 Store/Retrieve time series data from PostgreSQL Subramaniam C <[email protected]>
  2017-09-14 12:03 ` Re: Store/Retrieve time series data from PostgreSQL vinny <[email protected]>
@ 2017-09-14 12:33   ` Subramaniam C <[email protected]>
  2017-09-14 12:38     ` Re: Store/Retrieve time series data from PostgreSQL Subramaniam C <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Subramaniam C @ 2017-09-14 12:33 UTC (permalink / raw)
  To: vinny <[email protected]>; +Cc: pgsql-performance

I created index on morbid and hour together. Given below is the EXPLAIN
output

------------------------------------------------------------------------------------------

 Unique  (cost=606127.16..621098.42 rows=1087028 width=200)

   ->  Sort  (cost=606127.16..613612.79 rows=2994252 width=200)

         Sort Key: mobid DESC, hour DESC

         ->  Seq Scan on health_timeseries  (cost=0.00..284039.00
rows=2994252 width=200)

               Filter: ((hour >= '418134'::bigint) AND (hour <=
'418135'::bigint))

On Thu, Sep 14, 2017 at 5:33 PM, vinny <[email protected]> wrote:

> On 2017-09-14 13:51, Subramaniam C wrote:
>
>> Hi
>>
>> QUERY :-
>>
>> _select distinct on (health_timeseries.mobid) mobid,
>> health_timeseries.health, health_timeseries.hour from
>> health_timeseries where hour >=(1505211054000/(3600*1000))-1 and hour
>> <= 1505211054000/(3600*1000) ORDER BY health_timeseries.mobid DESC,
>> health_timeseries.hour DESC;_
>>
>>
> Did you run EXPLAIN on this query to see what it is actually doing?
>
> What you are doing how is selecting all rows from the last hour,
> sorting them by mobid and hour, and then DISTINCT filters out al
> duplicates.
>
> Sorting on mobid is therefor useless, DISTINCT still has to check all rows.
>
> Sorting on mobid and hour will take a long time if there is no index for
> it,
> so if you don't have an index on the mobid and hour together then you
> should probably try that.
>
>
> But, see what EXPLAIN tells you first.
>
> Regards,
> Vincent.
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Store/Retrieve time series data from PostgreSQL
  2017-09-14 11:51 Store/Retrieve time series data from PostgreSQL Subramaniam C <[email protected]>
  2017-09-14 12:03 ` Re: Store/Retrieve time series data from PostgreSQL vinny <[email protected]>
  2017-09-14 12:33   ` Re: Store/Retrieve time series data from PostgreSQL Subramaniam C <[email protected]>
@ 2017-09-14 12:38     ` Subramaniam C <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Subramaniam C @ 2017-09-14 12:38 UTC (permalink / raw)
  To: vinny <[email protected]>; +Cc: pgsql-performance

With this query I am trying to get the latest hour for a given timestamp so
that I can get whole health array of all object for a given hour. So I am
doing DISTINCT on mobid and order by hour and mobid DESC.

On Thu, Sep 14, 2017 at 6:03 PM, Subramaniam C <[email protected]>
wrote:

> I created index on morbid and hour together. Given below is the EXPLAIN
> output
>
> ------------------------------------------------------------
> ------------------------------
>
>  Unique  (cost=606127.16..621098.42 rows=1087028 width=200)
>
>    ->  Sort  (cost=606127.16..613612.79 rows=2994252 width=200)
>
>          Sort Key: mobid DESC, hour DESC
>
>          ->  Seq Scan on health_timeseries  (cost=0.00..284039.00
> rows=2994252 width=200)
>
>                Filter: ((hour >= '418134'::bigint) AND (hour <=
> '418135'::bigint))
>
> On Thu, Sep 14, 2017 at 5:33 PM, vinny <[email protected]> wrote:
>
>> On 2017-09-14 13:51, Subramaniam C wrote:
>>
>>> Hi
>>>
>>> QUERY :-
>>>
>>> _select distinct on (health_timeseries.mobid) mobid,
>>> health_timeseries.health, health_timeseries.hour from
>>> health_timeseries where hour >=(1505211054000/(3600*1000))-1 and hour
>>> <= 1505211054000/(3600*1000) ORDER BY health_timeseries.mobid DESC,
>>> health_timeseries.hour DESC;_
>>>
>>>
>> Did you run EXPLAIN on this query to see what it is actually doing?
>>
>> What you are doing how is selecting all rows from the last hour,
>> sorting them by mobid and hour, and then DISTINCT filters out al
>> duplicates.
>>
>> Sorting on mobid is therefor useless, DISTINCT still has to check all
>> rows.
>>
>> Sorting on mobid and hour will take a long time if there is no index for
>> it,
>> so if you don't have an index on the mobid and hour together then you
>> should probably try that.
>>
>>
>> But, see what EXPLAIN tells you first.
>>
>> Regards,
>> Vincent.
>>
>
>


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2017-09-14 12:38 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2017-09-14 11:51 Store/Retrieve time series data from PostgreSQL Subramaniam C <[email protected]>
2017-09-14 12:03 ` vinny <[email protected]>
2017-09-14 12:33   ` Subramaniam C <[email protected]>
2017-09-14 12:38     ` Subramaniam C <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox