Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cPCDo-0007CP-Nk for pgsql-performance@arkaria.postgresql.org; Thu, 05 Jan 2017 17:53:04 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1cPCDo-0003Pq-91 for pgsql-performance@arkaria.postgresql.org; Thu, 05 Jan 2017 17:53:04 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1cPCC6-0006mM-2I for pgsql-performance@postgresql.org; Thu, 05 Jan 2017 17:51:18 +0000 Received: from mail-wm0-x22c.google.com ([2a00:1450:400c:c09::22c]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1cPCC2-0001MF-9Z for pgsql-performance@postgresql.org; Thu, 05 Jan 2017 17:51:16 +0000 Received: by mail-wm0-x22c.google.com with SMTP id c85so255851508wmi.1 for ; Thu, 05 Jan 2017 09:51:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=swm2g0tYB/g+L+gVCNKIo9M8xlgYa2m306O9hNFHqhU=; b=FIPtcJ1651Oj7r7jnwvPvX44tFXLRi+xjoSiBhDD5RJcB4wG7MJsTeKzY/pRhGjYi0 BCQQdGj4i4CivNHMqnb8fsxgwGytp9p3KuhPfStrQU5KjCyQXdjQ9AT9Z1j2kNoT3i9b 4Z4+0Wg2/cqN4ohjXnqT+h5C5A6qV+Z/pbEFIxhccRJks9NUzfe3S1kLUfwpvchb13kC D3OKRDY0HG6c1gr1CGZWRPX7zXsIhgGgaO7Asb1/H+jsjCzV55Yy95aFEdmzyNzchf1H 1QMppSRI+zfMg0BATrX/o/1AOHkZXXHWFIYcXNdHQoIsw8kSuFTQZ6k0RLK/gaadYHaU EjVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=swm2g0tYB/g+L+gVCNKIo9M8xlgYa2m306O9hNFHqhU=; b=etWaA21oGS5BbEpZliaSHxoGMjLGep4zZUVisX9SeiU5QMCFxz8YLmR1VU2oyyZrD8 gUWMzM4yg+07PVGXUEhIFWxY68j5Mjr5R07J36fTBnjlpMmc5zVsrOYPmwCIGv1e46lH BSSj9jeakFzrtsRNMKyGK6yuuQjH+trZr5K0QWu4Nyn0LWYQxVvQDhWYSViZYpdXuVjz au0XOwsmlfXkp7FtaJnCoEr85ZxxI6frpLQcC0LuiYwt+eyANG7415pqPe1itDfGPVdC hZpByqMfhLTKRR3TutjvlTyMWaqSvfM+1PUoVaitWZRTQRPEQ/CdDMEr55cAxiFLnWbO DDag== X-Gm-Message-State: AIkVDXLDUHnrNWOc9ilWpDKUfeIRnD7lkZG4kDUiVAR+wNsELWLF2hh8GW3q54IsY6u15w== X-Received: by 10.223.143.55 with SMTP id p52mr133311wrb.64.1483638672736; Thu, 05 Jan 2017 09:51:12 -0800 (PST) Received: from [192.168.1.117] ([90.77.173.136]) by smtp.gmail.com with ESMTPSA id v3sm105147508wjp.13.2017.01.05.09.51.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 05 Jan 2017 09:51:12 -0800 (PST) From: Daniel Blanch Bataller Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_2F23939A-23CF-4AA1-BD68-583E552BE862" Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: Slow query after 9.3 to 9.6 migration Date: Thu, 5 Jan 2017 18:51:10 +0100 In-Reply-To: Cc: postgres performance list To: =?utf-8?Q?Fl=C3=A1vio_Henrique?= References: X-Mailer: Apple Mail (2.3259) X-Pg-Spam-Score: -1.7 (-) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --Apple-Mail=_2F23939A-23CF-4AA1-BD68-583E552BE862 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, If just recreating the index now it uses it, it might mean that the = index was bloated, that is, it grew so big that it was cheaper a seq = scan. I=E2=80=99ve seen another case recently where postgres 9.6 wasn=E2=80=99t = using the right index in a query, I was able to reproduce the issue = crafting index bigger, much bigger than it should be.=20 Can you record index size as it is now? Keep this info, and If problem = happens again check indexes size, and see if they have grow too much. i.e. SELECT relname, relpages, reltuples FROM pg_class WHERE relname =3D = =E2=80=98index_name' This might help to see if this is the problem, that indexes are growing = too much for some reason. Regards. P.S the other parameters don't seem to be the cause of the problem to = me. > El 5 ene 2017, a las 17:51, Fl=C3=A1vio Henrique = escribi=C3=B3: >=20 > Hi all! > Sorry the delay (holidays). >=20 > Well, the most expensive sequencial scan was solved. > I asked the db team to drop the index and recreate it and guess what: = now postgresql is using it and the time dropped. > (thank you, @Gerardo Herzig!) >=20 > I think there's still room for improvement, but the problem is not so = crucial right now. > I'll try to investigate every help mentioned here. Thank you all. >=20 > @Daniel Blanch > I'll make some tests with a materialized view. Thank you. > On systems side: ask them if they have not changed anything in = effective_cache_size and shared_buffers parameters, I presume they = haven=E2=80=99t change anything related to costs. > Replying your comment, I think they tunned the server: > effective_cache_size =3D 196GB > shared_buffers =3D 24GB (this shouldn't be higher?) >=20 > @Kevin Grittner > sorry, but I'm not sure when the autovacuum is aggressive enough, but = here my settings related: > autovacuum |on =20 > autovacuum_analyze_scale_factor |0.05 =20 > autovacuum_analyze_threshold |10 =20 > autovacuum_freeze_max_age |200000000=20 > autovacuum_max_workers |3 =20 > autovacuum_multixact_freeze_max_age |400000000=20 > autovacuum_naptime |15s =20 > autovacuum_vacuum_cost_delay |10ms =20 > autovacuum_vacuum_cost_limit |-1 =20 > autovacuum_vacuum_scale_factor |0.1 =20 > autovacuum_vacuum_threshold |10 =20 > autovacuum_work_mem |-1 =20 >=20 > @Merlin Moncure > Big gains (if any) are likely due to indexing strategy. > I do see some suspicious casting, for example: > Join Filter: ((four_charlie.delta_tango)::integer =3D > (six_quebec.golf_bravo)::integer) > Are you casting in the query or joining through dissimilar data types? > No casts in query. The joins are on same data types.=20 >=20 > Thank you all for the answers. Happy 2017! >=20 > Fl=C3=A1vio Henrique > -------------------------------------------------------- > "There are only 10 types of people in the world: Those who understand = binary, and those who don't" > -------------------------------------------------------- >=20 > On Thu, Jan 5, 2017 at 12:40 PM, Merlin Moncure > wrote: > On Tue, Dec 27, 2016 at 5:50 PM, Fl=C3=A1vio Henrique = > wrote: > > Hi there, fellow experts! > > > > I need an advice with query that became slower after 9.3 to 9.6 = migration. > > > > First of all, I'm from the dev team. > > > > Before migration, we (programmers) made some modifications on query = bring > > it's average time from 8s to 2-3s. > > > > As this query is the most executed on our system (it builds the user = panel > > to work), every bit that we can squeeze from it will be nice. > > > > Now, after server migration to 9.6 we're experiencing bad times with = this > > query again. > > > > Unfortunately, I don't have the old query plain (9.3 version) to = show you, > > but in the actual version (9.6) I can see some buffers written that = tells me > > that something is wrong. > > > > Our server has 250GB of memory available, but the database team says = that > > they can't do nothing to make this query better. I'm not sure, as = some > > buffers are written on disk. > > > > Any tip/help will be much appreciated (even from the query side). > > > > Thank you! > > > > The query plan: https://explain.depesz.com/s/5KMn = > > > > Note: I tried to add index on kilo_victor table already, but = Postgresql > > still thinks that is better to do a seq scan. >=20 > Hard to provide more without the query or the 'old' plan. Here are > some things you can try: > *) Set effective_io_concurrency high. You have some heap scanning > going on and this can sometimes help (but it should be marginal). > *) See if you can get any juice out of parallel query > *) try playing with enable_nestloop and enable_seqscan. these are > hail mary passes but worth a shot. >=20 > Run the query back to back with same arguments in the same database > session. Does performance improve? >=20 > Big gains (if any) are likely due to indexing strategy. > I do see some suspicious casting, for example: >=20 > Join Filter: ((four_charlie.delta_tango)::integer =3D > (six_quebec.golf_bravo)::integer) >=20 > Are you casting in the query or joining through dissimilar data types? > I suspect your database team might be incorrect. >=20 > merlin >=20 --Apple-Mail=_2F23939A-23CF-4AA1-BD68-583E552BE862 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
Hi,

If just recreating the index now it = uses it, it might mean that the index was bloated, that is, it grew so = big that it was cheaper a seq scan.

I=E2=80=99ve seen another case recently = where postgres 9.6 wasn=E2=80=99t using the right index in a query, I = was able to reproduce the issue crafting index bigger, much bigger than = it should be. 

Can you record index size as it is now? Keep this info, and = If problem happens again check indexes size, and see if they have grow = too much.

i.e. = SELECT relname, relpages, reltuples FROM pg_class WHERE relname =3D = =E2=80=98index_name'

This might help to see if this is the problem, that indexes = are growing too much for some reason.

Regards.

P.S the other parameters don't seem to = be the cause of the problem to me.

El 5 ene 2017, a las 17:51, = Fl=C3=A1vio Henrique <yoshimit@gmail.com> escribi=C3=B3:

Hi all!
Sorry the delay (holidays).

Well, the most expensive = sequencial scan was solved.
I asked the db team to = drop the index and recreate it and guess what: now postgresql is using = it and the time dropped.
(thank you, @Gerardo Herzig!)

I think there's still = room for improvement, but the problem is not so crucial right = now.
I'll try to investigate every help mentioned = here. Thank you all.

@Daniel Blanch
I'll make some tests with a = materialized view. Thank you.
On = systems side: ask them if they have not changed anything in = effective_cache_size and shared_buffers parameters, I presume they = haven=E2=80=99t change anything related to costs.
Replying your = comment, I think they tunned the server:
effective_cache_size = =3D 196GB
shared_buffers =3D 24GB (this = shouldn't be higher?)

@Kevin = Grittner
sorry, but I'm not sure when the autovacuum is aggressive = enough, but here my settings related:
autovacuum   =                     =    |on        
autovacuum_analyze_scale_factor     |0.05   =    
autovacuum_analyze_threshold        |10 =        
autovacuum_freeze_max_age   =         |200000000 
autovacuum_max_workers           =    |3         
autovacuum_multixact_freeze_max_age = |400000000 
autovacuum_naptime     =              |15s     =   
autovacuum_vacuum_cost_delay        |10ms =      
autovacuum_vacuum_cost_limit =        |-1       =  
autovacuum_vacuum_scale_factor      |0.1 =       
autovacuum_vacuum_threshold   =       |10        
autovacuum_work_mem=                 |-1   =      

@Merlin = Moncure
Big gains (if any) are likely due to indexing strategy.
I do see = some suspicious casting, for example:
Join Filter: = ((four_charlie.delta_tango)::integer =3D
(six_quebec.golf_bravo)::integer)Are you casting in the query or joining through dissimilar = data types?
No casts in query. The = joins are on same data types. 

Thank you all for the answers. Happy = 2017!

Fl=C3=A1vio = Henrique
--------------------------------------------------------
"There are only 10 types of people in the world: Those who = understand binary, and those who don't"
--------------------------------------------------------
<= /div>
On Thu, Jan 5, 2017 at 12:40 = PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On = Tue, Dec 27, 2016 at 5:50 PM, Fl=C3=A1vio Henrique <yoshimit@gmail.com> = wrote:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 = migration.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query = bring
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the = user panel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times = with this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to = show you,
> but in the actual version (9.6) I can see some buffers written that = tells me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team = says that
> they can't do nothing to make this query better. I'm not sure, as = some
> buffers are written on disk.
>
> Any tip/help will be much appreciated (even from the query = side).
>
> Thank you!
>
> The query plan: https://explain.depesz.com/s/5KMn
>
> Note: I tried to add index on kilo_victor table already, but = Postgresql
> still thinks that is better to do a seq scan.

Hard to provide more without the query or the 'old' plan.  =  Here are
some things you can try:
*) Set effective_io_concurrency high.    You have some heap = scanning
going on and this can sometimes help (but it should be marginal).
*) See if you can get any juice out of parallel query
*) try playing with enable_nestloop and enable_seqscan.  =  these are
hail mary passes but worth a shot.

Run the query back to back with same arguments in the same database
session. Does performance improve?

Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:

Join Filter: ((four_charlie.delta_tango)::integer =3D
(six_quebec.golf_bravo)::integer)

Are you casting in the query or joining through dissimilar data = types?
 I suspect your database team might be incorrect.

merlin


= --Apple-Mail=_2F23939A-23CF-4AA1-BD68-583E552BE862--