public inbox for [email protected]  
help / color / mirror / Atom feed
Slow update on column that is part of exclusion constraint
5+ messages / 2 participants
[nested] [flat]

* Slow update on column that is part of exclusion constraint
@ 2016-04-13 17:03  Adam Brusselback <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Adam Brusselback @ 2016-04-13 17:03 UTC (permalink / raw)
  To: pgsql-performance

Hey all, been running into some performance issues with one of my tables,
and it seems to be centered around index maintenance.

I have a table to store aggregated prices that are derived from sale data
over a configurable period, and a function that runs periodically that
inserts new prices if necessary, or "inactivates" the old record and
inserts new ones.  We use that price in calculations, and store the price
we used for a specific calculation for audit purposes, so the old record
cannot just be updated or deleted. We need a new record every time.

So to do that we have an "active_range" column on the table, and that is
 used to set the periods that each specific price was in use for (important
for audit).

The issue seems to be the exclusion constraint we have on the table to
ensure data consistency... It is ridiculously slow to insert / update.  To
update 200 records, it takes 45 seconds. To update 1500 rows takes about
3.5 min. To build the constraint/index fresh on the table (1,392,085 rows)
it takes about 4 min.

Looking at the schema, is there just a clearly better way to do this? I
know uuid is not supported by GiST indexes yet, so casting to text isn't
great, but necessary (as far as I know...) at the moment.

The table definition looks like so:
CREATE TABLE price_generated
(
  price_generated_id uuid NOT NULL DEFAULT gen_random_uuid(),
  product_id uuid NOT NULL,
  company_id uuid NOT NULL,
  date_range daterange NOT NULL,
  average_price numeric NOT NULL,
  average_price_delivered numeric NOT NULL,
  low_price numeric NOT NULL,
  low_price_delivered numeric NOT NULL,
  high_price numeric NOT NULL,
  high_price_delivered numeric NOT NULL,
  uom_type_id uuid NOT NULL,
  active_range tstzrange NOT NULL DEFAULT tstzrange(now(), NULL::timestamp
with time zone),
  CONSTRAINT price_generated_pkey PRIMARY KEY (price_generated_id),
  CONSTRAINT price_generated_company_id_fkey FOREIGN KEY (company_id)
      REFERENCES public.company (company_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT price_generated_product_id_fkey FOREIGN KEY (product_id)
      REFERENCES public.product (product_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT price_generated_company_product_date_active_excl EXCLUDE
  USING gist ((company_id::text) WITH =, (product_id::text) WITH =,
date_range WITH &&, active_range WITH &&)
);

CREATE INDEX idx_price_generated_prod_comp_date
  ON price_generated
  USING btree
  (product_id, company_id, date_range);


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Slow update on column that is part of exclusion constraint
@ 2016-04-13 17:14  Adam Brusselback <[email protected]>
  parent: Adam Brusselback <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Adam Brusselback @ 2016-04-13 17:14 UTC (permalink / raw)
  To: pgsql-performance

Sorry, brain stopped working and I forgot to include the normal info.

Postgres version: 9.5.1
Hardware: 2 core, 4gb Digital Ocean virtual server
OS: Debian

explain analyze for an example update:
'Update on price_generated  (cost=32.45..644.83 rows=1 width=157) (actual
time=29329.614..29329.614 rows=0 loops=1)'
'  ->  Nested Loop  (cost=32.45..644.83 rows=1 width=157) (actual
time=29329.608..29329.608 rows=0 loops=1)'
'        ->  HashAggregate  (cost=32.04..34.35 rows=231 width=52) (actual
time=1.137..2.090 rows=231 loops=1)'
'              Group Key: pti.product_id, pti.company_id, pti.date_range'
'              ->  Seq Scan on _prices_to_insert pti  (cost=0.00..30.31
rows=231 width=52) (actual time=0.060..0.678 rows=231 loops=1)'
'        ->  Index Scan using
price_generated_company_product_date_active_excl on price_generated
 (cost=0.41..2.63 rows=1 width=151) (actual time=126.949..126.949 rows=0
loops=231)'
'              Index Cond: (date_range = pti.date_range)'
'              Filter: ((upper(active_range) IS NULL) AND (pti.product_id =
product_id) AND (pti.company_id = company_id))'
'              Rows Removed by Filter: 29460'
'Planning time: 3.134 ms'
'Execution time: 29406.717 ms'


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Slow update on column that is part of exclusion constraint
@ 2016-04-13 18:54  Evgeniy Shishkin <[email protected]>
  parent: Adam Brusselback <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Evgeniy Shishkin @ 2016-04-13 18:54 UTC (permalink / raw)
  To: Adam Brusselback <[email protected]>; +Cc: pgsql-performance


> On 13 Apr 2016, at 20:14, Adam Brusselback <[email protected]> wrote:
> 
> Sorry, brain stopped working and I forgot to include the normal info.
> 
> Postgres version: 9.5.1
> Hardware: 2 core, 4gb Digital Ocean virtual server
> OS: Debian 
> 
> explain analyze for an example update:
> 'Update on price_generated  (cost=32.45..644.83 rows=1 width=157) (actual time=29329.614..29329.614 rows=0 loops=1)'
> '  ->  Nested Loop  (cost=32.45..644.83 rows=1 width=157) (actual time=29329.608..29329.608 rows=0 loops=1)'
> '        ->  HashAggregate  (cost=32.04..34.35 rows=231 width=52) (actual time=1.137..2.090 rows=231 loops=1)'
> '              Group Key: pti.product_id, pti.company_id, pti.date_range'
> '              ->  Seq Scan on _prices_to_insert pti  (cost=0.00..30.31 rows=231 width=52) (actual time=0.060..0.678 rows=231 loops=1)'
> '        ->  Index Scan using price_generated_company_product_date_active_excl on price_generated  (cost=0.41..2.63 rows=1 width=151) (actual time=126.949..126.949 rows=0 loops=231)'
> '              Index Cond: (date_range = pti.date_range)'
> '              Filter: ((upper(active_range) IS NULL) AND (pti.product_id = product_id) AND (pti.company_id = company_id))'
> '              Rows Removed by Filter: 29460'
> 'Planning time: 3.134 ms'
> 'Execution time: 29406.717 ms'

Well, you see execution time of 30 seconds because there are 231 index lookups,
each taking 126 ms.

And that lookup is slow because of
Filter: ((upper(active_range) IS NULL) AND (pti.product_id = product_id) AND (pti.company_id = company_id))'

Can you provide self-containing example of update?
I don't see there (upper(active_range) IS NULL condition is coming from.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Slow update on column that is part of exclusion constraint
@ 2016-04-14 04:17  Adam Brusselback <[email protected]>
  parent: Evgeniy Shishkin <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Adam Brusselback @ 2016-04-14 04:17 UTC (permalink / raw)
  To: pgsql-performance

So fair enough, it does seem to be related to the lookup rather than
maintenance on the index. I was misguided in my initial assumption.

Spent quite a bit of time trying to come up with a self contained test, and
it seems like I can't make it choose the GiST index unless I remove the
regular btree index in my test case, though the opposite is true for my
table in production.  Not really sure what that means as far as what I need
to do though. I've tried a vacuum full, analyze, rebuild index, drop and
re-add the constraint... It still uses that GiST index for this query.

Hell, a sequential scan is a ton faster even.

On Wed, Apr 13, 2016 at 2:54 PM, Evgeniy Shishkin <[email protected]>
wrote:

>
> > On 13 Apr 2016, at 20:14, Adam Brusselback <[email protected]>
> wrote:
> >
> > Sorry, brain stopped working and I forgot to include the normal info.
> >
> > Postgres version: 9.5.1
> > Hardware: 2 core, 4gb Digital Ocean virtual server
> > OS: Debian
> >
> > explain analyze for an example update:
> > 'Update on price_generated  (cost=32.45..644.83 rows=1 width=157)
> (actual time=29329.614..29329.614 rows=0 loops=1)'
> > '  ->  Nested Loop  (cost=32.45..644.83 rows=1 width=157) (actual
> time=29329.608..29329.608 rows=0 loops=1)'
> > '        ->  HashAggregate  (cost=32.04..34.35 rows=231 width=52)
> (actual time=1.137..2.090 rows=231 loops=1)'
> > '              Group Key: pti.product_id, pti.company_id, pti.date_range'
> > '              ->  Seq Scan on _prices_to_insert pti  (cost=0.00..30.31
> rows=231 width=52) (actual time=0.060..0.678 rows=231 loops=1)'
> > '        ->  Index Scan using
> price_generated_company_product_date_active_excl on price_generated
> (cost=0.41..2.63 rows=1 width=151) (actual time=126.949..126.949 rows=0
> loops=231)'
> > '              Index Cond: (date_range = pti.date_range)'
> > '              Filter: ((upper(active_range) IS NULL) AND
> (pti.product_id = product_id) AND (pti.company_id = company_id))'
> > '              Rows Removed by Filter: 29460'
> > 'Planning time: 3.134 ms'
> > 'Execution time: 29406.717 ms'
>
> Well, you see execution time of 30 seconds because there are 231 index
> lookups,
> each taking 126 ms.
>
> And that lookup is slow because of
> Filter: ((upper(active_range) IS NULL) AND (pti.product_id = product_id)
> AND (pti.company_id = company_id))'
>
> Can you provide self-containing example of update?
> I don't see there (upper(active_range) IS NULL condition is coming from.

SET work_mem = '512 MB';

DROP TABLE IF EXISTS price_generated_test;
DROP TABLE IF EXISTS _prices_to_insert;

CREATE TEMPORARY TABLE price_generated_test
(
  price_generated_id uuid NOT NULL DEFAULT gen_random_uuid(),
  product_id uuid NOT NULL,
  company_id uuid NOT NULL,
  date_range daterange NOT NULL,
  average_price numeric NOT NULL,
  average_price_delivered numeric NOT NULL,
  low_price numeric NOT NULL,
  low_price_delivered numeric NOT NULL,
  high_price numeric NOT NULL,
  high_price_delivered numeric NOT NULL,
  uom_type_id uuid NOT NULL,
  active_range tstzrange NOT NULL DEFAULT tstzrange(now(), NULL::timestamp with time zone),
  CONSTRAINT price_generated_test_pkey PRIMARY KEY (price_generated_id),
  CONSTRAINT price_generated_test_company_product_date_active_excl EXCLUDE 
  USING gist (gosimple.hex_to_bigint(company_id::text) WITH =, gosimple.hex_to_bigint(product_id::text) WITH =, date_range WITH &&, active_range WITH &&)
);

CREATE INDEX idx_price_generated_test_prod_comp_date
  ON price_generated_test
  USING btree
  (product_id, company_id, date_range);


INSERT INTO price_generated_test (product_id, company_id, date_range, average_price, average_price_delivered, low_price, low_price_delivered, high_price, high_price_delivered, uom_type_id, active_range)
SELECT p.product_id, c.company_id, d.date_range, avg(pr.price), avg(pr.price + pr.delivery), min(pr.price), min(pr.price + pr.delivery), max(pr.price), max(pr.price + pr.delivery), u.uom_type_id, ar.active_range
FROM (
	SELECT gen_random_uuid() as product_id
	FROM generate_series(1, 200)
) p
CROSS JOIN (
	SELECT gen_random_uuid() as company_id
	FROM generate_series(1, 60)
) c
CROSS JOIN (
	SELECT daterange((now() + (generate_series || ' months')::interval)::date, (now() + (generate_series + 1 || ' months')::interval)::date) as date_range
	FROM generate_series(1, 24)
) d
CROSS JOIN (
	SELECT gen_random_uuid() as uom_type_id
	FROM generate_series(1, 1)
) u 
CROSS JOIN (
	SELECT random() * 15 as price, random() * 3 as delivery
	FROM generate_series(1, 15)
) pr
CROSS JOIN (
	SELECT tstzrange(now() - (generate_series || ' days')::interval, CASE WHEN now() - (generate_series || ' days')::interval = now() THEN null ELSE now() - (generate_series || ' days')::interval END, '[]') as active_range
	FROM generate_series(0, 10, 10)
) ar
GROUP BY p.product_id, c.company_id, d.date_range, u.uom_type_id, ar.active_range;


CREATE TEMPORARY TABLE _prices_to_insert AS
SELECT product_id, company_id, date_range, average_price * random() * 1.5 average_price, average_price_delivered * random() * 1.5 average_price_delivered, low_price, low_price_delivered, high_price, high_price_delivered, uom_type_id
FROM price_generated_test
WHERE active_range @> now()
ORDER BY random()
LIMIT 200;



BEGIN;
-- If this index exists this test case will use it for the update
DROP INDEX idx_price_generated_test_prod_comp_date;
-- If this is enabled it will choose a seq scan where my regular table still won't
SET enable_seqscan = false;


UPDATE price_generated_test
SET active_range = tstzrange(lower(active_range), now())
WHERE upper(price_generated_test.active_range) IS NULL
AND EXISTS (
	SELECT 1
	FROM _prices_to_insert pti
	WHERE price_generated_test.product_id = pti.product_id
	AND price_generated_test.company_id = pti.company_id
	AND price_generated_test.date_range = pti.date_range);

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Attachments:

  [text/plain] excl constraint test case.sql (3.4K, 3-excl%20constraint%20test%20case.sql)
  download | inline:
SET work_mem = '512 MB';

DROP TABLE IF EXISTS price_generated_test;
DROP TABLE IF EXISTS _prices_to_insert;

CREATE TEMPORARY TABLE price_generated_test
(
  price_generated_id uuid NOT NULL DEFAULT gen_random_uuid(),
  product_id uuid NOT NULL,
  company_id uuid NOT NULL,
  date_range daterange NOT NULL,
  average_price numeric NOT NULL,
  average_price_delivered numeric NOT NULL,
  low_price numeric NOT NULL,
  low_price_delivered numeric NOT NULL,
  high_price numeric NOT NULL,
  high_price_delivered numeric NOT NULL,
  uom_type_id uuid NOT NULL,
  active_range tstzrange NOT NULL DEFAULT tstzrange(now(), NULL::timestamp with time zone),
  CONSTRAINT price_generated_test_pkey PRIMARY KEY (price_generated_id),
  CONSTRAINT price_generated_test_company_product_date_active_excl EXCLUDE 
  USING gist (gosimple.hex_to_bigint(company_id::text) WITH =, gosimple.hex_to_bigint(product_id::text) WITH =, date_range WITH &&, active_range WITH &&)
);

CREATE INDEX idx_price_generated_test_prod_comp_date
  ON price_generated_test
  USING btree
  (product_id, company_id, date_range);


INSERT INTO price_generated_test (product_id, company_id, date_range, average_price, average_price_delivered, low_price, low_price_delivered, high_price, high_price_delivered, uom_type_id, active_range)
SELECT p.product_id, c.company_id, d.date_range, avg(pr.price), avg(pr.price + pr.delivery), min(pr.price), min(pr.price + pr.delivery), max(pr.price), max(pr.price + pr.delivery), u.uom_type_id, ar.active_range
FROM (
	SELECT gen_random_uuid() as product_id
	FROM generate_series(1, 200)
) p
CROSS JOIN (
	SELECT gen_random_uuid() as company_id
	FROM generate_series(1, 60)
) c
CROSS JOIN (
	SELECT daterange((now() + (generate_series || ' months')::interval)::date, (now() + (generate_series + 1 || ' months')::interval)::date) as date_range
	FROM generate_series(1, 24)
) d
CROSS JOIN (
	SELECT gen_random_uuid() as uom_type_id
	FROM generate_series(1, 1)
) u 
CROSS JOIN (
	SELECT random() * 15 as price, random() * 3 as delivery
	FROM generate_series(1, 15)
) pr
CROSS JOIN (
	SELECT tstzrange(now() - (generate_series || ' days')::interval, CASE WHEN now() - (generate_series || ' days')::interval = now() THEN null ELSE now() - (generate_series || ' days')::interval END, '[]') as active_range
	FROM generate_series(0, 10, 10)
) ar
GROUP BY p.product_id, c.company_id, d.date_range, u.uom_type_id, ar.active_range;


CREATE TEMPORARY TABLE _prices_to_insert AS
SELECT product_id, company_id, date_range, average_price * random() * 1.5 average_price, average_price_delivered * random() * 1.5 average_price_delivered, low_price, low_price_delivered, high_price, high_price_delivered, uom_type_id
FROM price_generated_test
WHERE active_range @> now()
ORDER BY random()
LIMIT 200;



BEGIN;
-- If this index exists this test case will use it for the update
DROP INDEX idx_price_generated_test_prod_comp_date;
-- If this is enabled it will choose a seq scan where my regular table still won't
SET enable_seqscan = false;


UPDATE price_generated_test
SET active_range = tstzrange(lower(active_range), now())
WHERE upper(price_generated_test.active_range) IS NULL
AND EXISTS (
	SELECT 1
	FROM _prices_to_insert pti
	WHERE price_generated_test.product_id = pti.product_id
	AND price_generated_test.company_id = pti.company_id
	AND price_generated_test.date_range = pti.date_range);

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Slow update on column that is part of exclusion constraint
@ 2016-04-15 09:31  Evgeniy Shishkin <[email protected]>
  parent: Adam Brusselback <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Evgeniy Shishkin @ 2016-04-15 09:31 UTC (permalink / raw)
  To: Adam Brusselback <[email protected]>; +Cc: pgsql-performance


> On 14 Apr 2016, at 07:17, Adam Brusselback <[email protected]> wrote:
> 
> So fair enough, it does seem to be related to the lookup rather than maintenance on the index. I was misguided in my initial assumption.  
> 
> Spent quite a bit of time trying to come up with a self contained test, and it seems like I can't make it choose the GiST index unless I remove the regular btree index in my test case, though the opposite is true for my table in production.  Not really sure what that means as far as what I need to do though. I've tried a vacuum full, analyze, rebuild index, drop and re-add the constraint... It still uses that GiST index for this query.
> 
> Hell, a sequential scan is a ton faster even.
> 

As i understand it, postgres needs a way to find rows for update.
In explain analyze you provided, we see that it chose gist index for that.
And that is a poor chose. I think you need a proper btree index for update 
query to work properly fast. Like index on (product_id, company_id, date_range) WHERE upper(price_generated_test.active_range) IS NULL. 



> On Wed, Apr 13, 2016 at 2:54 PM, Evgeniy Shishkin <[email protected]> wrote:
> 
> > On 13 Apr 2016, at 20:14, Adam Brusselback <[email protected]> wrote:
> >
> > Sorry, brain stopped working and I forgot to include the normal info.
> >
> > Postgres version: 9.5.1
> > Hardware: 2 core, 4gb Digital Ocean virtual server
> > OS: Debian
> >
> > explain analyze for an example update:
> > 'Update on price_generated  (cost=32.45..644.83 rows=1 width=157) (actual time=29329.614..29329.614 rows=0 loops=1)'
> > '  ->  Nested Loop  (cost=32.45..644.83 rows=1 width=157) (actual time=29329.608..29329.608 rows=0 loops=1)'
> > '        ->  HashAggregate  (cost=32.04..34.35 rows=231 width=52) (actual time=1.137..2.090 rows=231 loops=1)'
> > '              Group Key: pti.product_id, pti.company_id, pti.date_range'
> > '              ->  Seq Scan on _prices_to_insert pti  (cost=0.00..30.31 rows=231 width=52) (actual time=0.060..0.678 rows=231 loops=1)'
> > '        ->  Index Scan using price_generated_company_product_date_active_excl on price_generated  (cost=0.41..2.63 rows=1 width=151) (actual time=126.949..126.949 rows=0 loops=231)'
> > '              Index Cond: (date_range = pti.date_range)'
> > '              Filter: ((upper(active_range) IS NULL) AND (pti.product_id = product_id) AND (pti.company_id = company_id))'
> > '              Rows Removed by Filter: 29460'
> > 'Planning time: 3.134 ms'
> > 'Execution time: 29406.717 ms'
> 
> Well, you see execution time of 30 seconds because there are 231 index lookups,
> each taking 126 ms.
> 
> And that lookup is slow because of
> Filter: ((upper(active_range) IS NULL) AND (pti.product_id = product_id) AND (pti.company_id = company_id))'
> 
> Can you provide self-containing example of update?
> I don't see there (upper(active_range) IS NULL condition is coming from.
> 
> <excl constraint test case.sql>
> -- 
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2016-04-15 09:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13 17:03 Slow update on column that is part of exclusion constraint Adam Brusselback <[email protected]>
2016-04-13 17:14 ` Adam Brusselback <[email protected]>
2016-04-13 18:54   ` Evgeniy Shishkin <[email protected]>
2016-04-14 04:17     ` Adam Brusselback <[email protected]>
2016-04-15 09:31       ` Evgeniy Shishkin <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox