Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cPBIq-0003PE-79 for pgsql-performance@arkaria.postgresql.org; Thu, 05 Jan 2017 16:54:12 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1cPBIp-0006xA-Lv for pgsql-performance@arkaria.postgresql.org; Thu, 05 Jan 2017 16:54:11 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1cPBH8-0004zb-HI for pgsql-performance@postgresql.org; Thu, 05 Jan 2017 16:52:26 +0000 Received: from mail-oi0-x236.google.com ([2607:f8b0:4003:c06::236]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1cPBH5-0008PE-9e for pgsql-performance@postgresql.org; Thu, 05 Jan 2017 16:52:25 +0000 Received: by mail-oi0-x236.google.com with SMTP id b126so586682315oia.2 for ; Thu, 05 Jan 2017 08:52:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=4r4VLRgRbNMTI3FMJfzyhQZlo1I+K6OvVVUbv8uoyvQ=; b=NdjyJ502CrAqETlR9GkJtxAF4YXHcS5fmSxoKgNMpGM8DC0rzCS6vuhCJo03hQydMb L2/EvNRJdg342TTd14VMEvti5IsNcrN+gnskC+Y5M/9IuuEDaNA5fdjjTfKMucLoE5rO bm4TZ1qLDpdr5WsyjHEKlK4xLuZ7HER1MALQ01B0wBmhkNRgWGm1oD8PwZqAlHFj0AgM /0u9yZzkBYGjIK5Yp0jIAhiSihFZ1VyxTf2LBl4u8d3+C1ZoCLX1ggzP6A5T5kAf5B1K L8hAqD4VtyvIaMw/is/j5kPla2dgwT1yxASyG4x8fR+TWbPmXwOqe3X1oUTPugujTemb +mOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=4r4VLRgRbNMTI3FMJfzyhQZlo1I+K6OvVVUbv8uoyvQ=; b=XpsFIJzu/wpJM2awWyCN9g3HdnYmDI0P7yg004SYUWNc55et2/mwT1PtV1wnuUKL1c A/sHqSOoF1iArHvAwz2sy2LcvTYgf1yVbqB9aSKl279ddriErFHDub05rngKLdQOgiTR A0+bfLjINE6X87xmheQMWaGEfZREpvJUTxKrmUQaztXklD980FJHD1GRiB7d0Y8SAPPJ XfcSOXrUVoB4uEQJNg6smpzylNBiYgMR9io8W9jhmVdE0a8dU0B4GMrv9764pLUMcgb+ cSl/ykJpVYrqNsdhGhY4X1g690a8GRovpmu3yTj4s2bIAnADdB3T3KeDMRQtjIMU0Diw G66w== X-Gm-Message-State: AIkVDXJ0y2j7lp/xyMIEbrBcWBGsvYEv+nUFnT9VNzyf0CnVA6KT4Orvu45zD7SqG3Uc0I7kSwzPLlL5M0ETug== X-Received: by 10.202.229.208 with SMTP id c199mr35615758oih.179.1483635141957; Thu, 05 Jan 2017 08:52:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.48.36 with HTTP; Thu, 5 Jan 2017 08:51:41 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Fl=C3=A1vio_Henrique?= Date: Thu, 5 Jan 2017 14:51:41 -0200 Message-ID: Subject: Re: Slow query after 9.3 to 9.6 migration To: Merlin Moncure Cc: postgres performance list Content-Type: multipart/alternative; boundary=001a114093a69cdef205455bb98b X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a114093a69cdef205455bb98b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all! Sorry the delay (holidays). Well, the most expensive sequencial scan was solved. I asked the db team to drop the index and recreate it and guess what: now postgresql is using it and the time dropped. (thank you, @Gerardo Herzig!) I think there's still room for improvement, but the problem is not so crucial right now. I'll try to investigate every help mentioned here. Thank you all. @Daniel Blanch I'll make some tests with a materialized view. Thank you. > On systems side: ask them if they have not changed anything in > effective_cache_size and shared_buffers parameters, I presume they haven= =E2=80=99t > change anything related to costs. Replying your comment, I think they tunned the server: effective_cache_size =3D 196GB shared_buffers =3D 24GB (this shouldn't be higher?) @Kevin Grittner sorry, but I'm not sure when the autovacuum is aggressive enough, but here my settings related: autovacuum |on autovacuum_analyze_scale_factor |0.05 autovacuum_analyze_threshold |10 autovacuum_freeze_max_age |200000000 autovacuum_max_workers |3 autovacuum_multixact_freeze_max_age |400000000 autovacuum_naptime |15s autovacuum_vacuum_cost_delay |10ms autovacuum_vacuum_cost_limit |-1 autovacuum_vacuum_scale_factor |0.1 autovacuum_vacuum_threshold |10 autovacuum_work_mem |-1 @Merlin Moncure > Big gains (if any) are likely due to indexing strategy. > I do see some suspicious casting, for example: > Join Filter: ((four_charlie.delta_tango)::integer =3D > (six_quebec.golf_bravo)::integer) > Are you casting in the query or joining through dissimilar data types? No casts in query. The joins are on same data types. Thank you all for the answers. Happy 2017! Fl=C3=A1vio Henrique -------------------------------------------------------- "There are only 10 types of people in the world: Those who understand binary, and those who don't" -------------------------------------------------------- On Thu, Jan 5, 2017 at 12:40 PM, Merlin Moncure wrote: > On Tue, Dec 27, 2016 at 5:50 PM, Fl=C3=A1vio Henrique > wrote: > > Hi there, fellow experts! > > > > I need an advice with query that became slower after 9.3 to 9.6 > migration. > > > > First of all, I'm from the dev team. > > > > Before migration, we (programmers) made some modifications on query bri= ng > > it's average time from 8s to 2-3s. > > > > As this query is the most executed on our system (it builds the user > panel > > to work), every bit that we can squeeze from it will be nice. > > > > Now, after server migration to 9.6 we're experiencing bad times with th= is > > query again. > > > > Unfortunately, I don't have the old query plain (9.3 version) to show > you, > > but in the actual version (9.6) I can see some buffers written that > tells me > > that something is wrong. > > > > Our server has 250GB of memory available, but the database team says th= at > > they can't do nothing to make this query better. I'm not sure, as some > > buffers are written on disk. > > > > Any tip/help will be much appreciated (even from the query side). > > > > Thank you! > > > > The query plan: https://explain.depesz.com/s/5KMn > > > > Note: I tried to add index on kilo_victor table already, but Postgresql > > still thinks that is better to do a seq scan. > > Hard to provide more without the query or the 'old' plan. Here are > some things you can try: > *) Set effective_io_concurrency high. You have some heap scanning > going on and this can sometimes help (but it should be marginal). > *) See if you can get any juice out of parallel query > *) try playing with enable_nestloop and enable_seqscan. these are > hail mary passes but worth a shot. > > Run the query back to back with same arguments in the same database > session. Does performance improve? > > Big gains (if any) are likely due to indexing strategy. > I do see some suspicious casting, for example: > > Join Filter: ((four_charlie.delta_tango)::integer =3D > (six_quebec.golf_bravo)::integer) > > Are you casting in the query or joining through dissimilar data types? > I suspect your database team might be incorrect. > > merlin > --001a114093a69cdef205455bb98b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all!
Sorry the delay (holidays).

Well, the most expensive sequencial scan was solved.
I ask= ed the db team to drop the index and recreate it and guess what: now postgr= esql is using it and the time dropped.
(thank you, @Gerardo Herzig!)
I think there's still room for improvement, but the p= roblem is not so crucial right now.
I'll try to investigate e= very help mentioned here. Thank you all.

@Daniel B= lanch
I'll make some tests w= ith a materialized view. Thank you.
On systems side: ask them if they have not= changed anything in effective_cache_size and shared_buffers parameters, I = presume they haven=E2=80=99t change anything related to costs.
=
Replying your comment, I think = they tunned the server:
e= ffective_cache_size =3D=C2=A0196GB<= /span>
shared_buffers =3D=C2= =A024GB (this shouldn't be higher?)

@= Kevin Grittner
sorry, but= I'm not sure when the autovacuum is aggressive enough, but here my set= tings related:
autov= acuum =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0|on =C2=A0 =C2=A0 =C2=A0 =C2=A0
autovacuum_analyze_scale_factor =C2=A0 =C2=A0 |= 0.05 =C2=A0 =C2=A0 =C2=A0
autovacuum_analyze_threshold =C2=A0 =C2=A0 =C2=A0 =C2=A0|10 =C2=A0 =C2=A0 = =C2=A0 =C2=A0
autovacuum_= freeze_max_age =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |200000000=C2=A0
autovacuum_max_workers =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|3 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0<= /span>
autovacuum_multixact_free= ze_max_age |400000000=C2=A0
autovacuum_naptime =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0|15s =C2=A0 =C2=A0 =C2=A0=C2=A0
autovacuum_vacuum_cost_delay =C2=A0 =C2=A0 =C2=A0 =C2=A0|1= 0ms =C2=A0 =C2=A0 =C2=A0
= autovacuum_vacuum_cost_limit =C2=A0 =C2=A0 =C2=A0 =C2=A0|-1 =C2=A0 =C2=A0 = =C2=A0 =C2=A0
autovacuum_= vacuum_scale_factor =C2=A0 =C2=A0 =C2=A0|0.1 =C2=A0 =C2=A0 =C2=A0=C2=A0
autovacuum_vacuum_threshold = =C2=A0 =C2=A0 =C2=A0 =C2=A0 |10 =C2=A0 =C2=A0 =C2=A0 =C2=A0
autovacuum_work_mem =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |-1 =C2=A0 =C2=A0 =C2=A0 =C2=A0<= /div>

@Merlin Moncure
Big gains= (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:
Join Filter: ((fou= r_charlie.delta_tango)::integer =3D
(six_quebec.golf_bravo)::integer)
Are you casting in the query or joining through= dissimilar data types?
No casts in query. The join= s are on same data types.=C2=A0

Thank you all for = the answers. Happy 2017!

Fl=C3=A1vio Henrique
------------------------------------------= --------------
"There are only 10 types of people in the world: Tho= se who understand binary, and those who don't"
----------------= ----------------------------------------

On Thu, Jan 5, 2017 at 12:40 PM, Merlin Monc= ure <mmoncure@gmail.com> wrote:
On Tue, Dec 27, 2016 at 5:50 PM, Fl=C3=A1vio Henriq= ue <yoshimit@gmail.com> wro= te:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 migrat= ion.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query br= ing
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the user p= anel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times wi= th this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to s= how you,
> but in the actual version (9.6) I can see some buffers written that te= lls me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team says t= hat
> they can't do nothing to make this query better. I'm not sure,= as some
> buffers are written on disk.
>
> Any tip/help will be much appreciated (even from the query side).
>
> Thank you!
>
> The query plan: https://explain.depesz.com/s/5KMn
>
> Note: I tried to add index on kilo_victor table already, but Postgresq= l
> still thinks that is better to do a seq scan.

Hard to provide more without the query or the 'old' plan.=C2= =A0 =C2=A0Here are
some things you can try:
*) Set effective_io_concurrency high.=C2=A0 =C2=A0 You have some heap scann= ing
going on and this can sometimes help (but it should be marginal).
*) See if you can get any juice out of parallel query
*) try playing with enable_nestloop and enable_seqscan.=C2=A0 =C2=A0these a= re
hail mary passes but worth a shot.

Run the query back to back with same arguments in the same database
session. Does performance improve?

Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:

Join Filter: ((four_charlie.delta_tango)::integer =3D
(six_quebec.golf_bravo)::integer)

Are you casting in the query or joining through dissimilar data types?
=C2=A0I suspect your database team might be incorrect.

merlin

--001a114093a69cdef205455bb98b--