Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t3LcL-00C3d5-B3 for pgsql-general@arkaria.postgresql.org; Tue, 22 Oct 2024 20:36:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t3LcI-0037LE-PN for pgsql-general@arkaria.postgresql.org; Tue, 22 Oct 2024 20:36:35 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t3LcI-0037L5-A4 for pgsql-general@lists.postgresql.org; Tue, 22 Oct 2024 20:36:34 +0000 Received: from mail-oo1-xc30.google.com ([2607:f8b0:4864:20::c30]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t3LcF-002LDe-Qk for pgsql-general@lists.postgresql.org; Tue, 22 Oct 2024 20:36:33 +0000 Received: by mail-oo1-xc30.google.com with SMTP id 006d021491bc7-5eb5f07410aso3094620eaf.0 for ; Tue, 22 Oct 2024 13:36:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729629391; x=1730234191; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=GEaoHsdP5Fvwve94iURjHqlSC+j9nrAJmBUjWw0AlTA=; b=DFgP804Ww6IMfmBQENVssHPDnMGRW3EzlWdbvJEnUBBwoOYy6zN+t66Int8EiX0EKH H3o/5WTWHhsaYlgpsfmJLqY6M9H4Uw0a/4Cv66K9cR91LBGhqvgyxchzWAPdMq4wcc3O Jr9tW2vhgzoFOs2+UjNWm47cIbSCTNUZMfi9/5tcpPCGX6i5ndsh/YwmlH9mLNnfY+Po zUMwJg3BeiCTvYZiH4IVpcA6F7Q+PMafMVVZyIXbuHoI/ZFvUIcI9bFXspJ+N1tTts43 Bfpn2wokOHeO9cl/ZoPLKhg1ShaGsMYeNmbU3Fv3UhPMEGpoRTcm0w8XHrMms8JMMcg8 7tMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729629391; x=1730234191; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GEaoHsdP5Fvwve94iURjHqlSC+j9nrAJmBUjWw0AlTA=; b=k0NPAeGmm92a7zgslBREPpQVo82gtG4Rx+duwAl7hfvrvSvA2WkuMZuWsiDutSUaii Gy0ug8Qd4ZGjk0r/Odt3SHCVTns+9DyL5sVPEH5fdoRMuD9Kl3GlnLAWB2O0tvE+Ugc+ 51TqyS6JXBFSBYQBUsddJwNslEfABUDR5MF4yEvv2p530d+vdDiymwwR0KE/HA/89caE QsQT0qCehSD9VRXfwYjoRVipPSm4LNwk0rjNzcopGn79CtfOrr0PMijT8udn4Z7MbO9n VhmbQLL6F8KiJ49HnsIB/A2/Mwt5B610fs/YHLKoWmJHXBxIsRCNDA/0aGFE35glDyfd 2d+Q== X-Gm-Message-State: AOJu0YzZICeUu21MNs2q69hH0e6tFYCP5o0jW0qjC2Ukb/RIa2yj/JB/ quxFFdMghv//Lu3jmWrFa/YsENLHzEImdAe3iF8ufqUi1VUo5w2SjR0FWtqKxE/Q8sU6i8VaLnB 1cRb5PE+SB/Zm4cUaHY2P5mzTtsM= X-Google-Smtp-Source: AGHT+IFg50d/oj8XYd1PIhdCcy0iNfZSWyuo09LpdrQjJer6ppOsfklRSvwAwyD9xtVzJ+L58+fVZnEXMz2KFGFShXU= X-Received: by 2002:a05:6870:700b:b0:288:9adc:3670 with SMTP id 586e51a60fabf-28ccb9e56b0mr467563fac.29.1729629391009; Tue, 22 Oct 2024 13:36:31 -0700 (PDT) MIME-Version: 1.0 References: <20241016213610.wbfqj3r7gg5ba6eh@hjp.at> In-Reply-To: From: yudhi s Date: Wed, 23 Oct 2024 02:06:18 +0530 Message-ID: Subject: Re: Query performance issue To: Greg Sabino Mullane , jross@openvistas.net Cc: pgsql-general@lists.postgresql.org, hjp-pgsql@hjp.at Content-Type: multipart/alternative; boundary="00000000000071c381062516b910" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000071c381062516b910 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Oct 23, 2024 at 12:32=E2=80=AFAM Greg Sabino Mullane wrote: > To be frank, there is so much wrong with this query that it is hard to > know where to start. But a few top items: > > * Make sure all of the tables involved have been analyzed. You might want > to bump default_statistics_target up and see if that helps. > > * As mentioned already, increase work_mem, as you have things spilling to > disk (e.g. external merge Disk: 36280kB) > > * Don't use the "FROM table1, table2, table3" syntax but use "FROM table1 > JOIN table2 ON (...) JOIN table3 ON (...) > > * Try not to use subselects. Things like WHERE x IN (SELECT ...) are > expensive and hard to optimize. > > * You have useless GROUP BY clauses in there. Remove to simplify the quer= y > > * There is no LIMIT. Does the client really need all 135,214 rows? > > > I tried running the query by removing both the "group by" from the inner subqueries (I think the initial thought was that they will give distinct records to the outer query and will thus help), and added limit 500 at the last and also set the work_mem to 2GB for that session before running the query. But seeing the response increased to ~5 seconds (from ~3.1 seconds earlier). Below I have updated the execution plan for the same at the last section. https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8 Again , not able to clearly understand the third point you said below. Can you please clarify a bit more. Do you mean we should write it as exists /not exists rather IN and NOT IN and that will improve the performance? I hope the third point doesn't matter much as we have all equijoin used here. Correct me if I'm wrong. *"Try not to use subselects. Things like WHERE x IN (SELECT ...) are expensive and hard to optimize*." Additionally in the plan which mysql makes and showing the highest response time, is it suffering because of differences of the speed of the underlying IO/storage or is it just because of the optimization features which are available in postgres and not there in mysql ? Trying to understand if it can be identified from the execution plan itself. --00000000000071c381062516b910 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, Oct 23, 2024 at 12:32=E2=80= =AFAM Greg Sabino Mullane <htamfid= s@gmail.com> wrote:
To be frank, there is so much wrong with this q= uery that it is hard to know where to start. But a few top items:

* = Make sure all of the tables involved have been analyzed. You might want to = bump default_statistics_target up and see if that helps.

* As mentio= ned already, increase work_mem, as you have things spilling to disk (e.g. e= xternal merge Disk: 36280kB)

* Don't use the "FROM table1, = table2, table3" syntax but use "FROM table1 JOIN table2 ON (...) = JOIN table3 ON (...)

* Try not to use subselects. Things like WHERE = x IN (SELECT ...) are expensive and hard to optimize.

* You have us= eless GROUP BY clauses in there. Remove to simplify the query

* Ther= e is no LIMIT. Does the client really need all 135,214 rows?



I tried running the query by removing both t= he "group by" from the inner subqueries (I think the initial thou= ght was that they will give distinct records to the outer query and will th= us help), and added limit 500 at the last and also set the work_mem to 2GB = for that session before running the query. But seeing the response increase= d to ~5 seconds (from ~3.1 seconds earlier). Below I have updated the execu= tion plan for the same at the last section.

https://gist.github.com/da= tabasetech0073/746353a9e76d5e29b2fc6abdc80cdef8

Again , not able= to clearly understand the third point you said below. Can you please clari= fy a bit more. Do you mean we should write it as exists /not exists rather = IN and NOT IN and that will improve the performance? I hope the third point= doesn't=C2=A0matter much as we have all equijoin used here. Correct me= if I'm wrong.

"Try not to use subselects. Things l= ike WHERE x IN (SELECT ...) are expensive and hard to optimize."= =C2=A0

Additionally in the plan which mysql makes = and showing=C2=A0the highest response time, is it suffering because of diff= erences of the speed of the underlying IO/storage or is it just because of = the optimization features which are available in postgres and not there in = mysql ? Trying to understand if it can be identified from the execution pla= n itself.
--00000000000071c381062516b910--