Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dhLQx-0004KC-DH for pgsql-performance@arkaria.postgresql.org; Mon, 14 Aug 2017 19:53:55 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dhLQw-0001QQ-OB for pgsql-performance@arkaria.postgresql.org; Mon, 14 Aug 2017 19:53:54 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dhLQw-0001Q2-2c for pgsql-performance@postgresql.org; Mon, 14 Aug 2017 19:53:54 +0000 Received: from mail-wm0-x233.google.com ([2a00:1450:400c:c09::233]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dhLQs-0007Uy-GL for pgsql-performance@postgresql.org; Mon, 14 Aug 2017 19:53:53 +0000 Received: by mail-wm0-x233.google.com with SMTP id m85so1055566wma.0 for ; Mon, 14 Aug 2017 12:53:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=UfpjCjAboPhsKxTz6hzk7OOC0HRpbO783MOm5lwj4zc=; b=cw2LPOUqWR6Ty9xlw0Njs0c9QObjyBDoYArg/W4OqJplCXUXH8Ehpp2IfVJgVGEnQ+ 6+dew8pTlhpexS1Wv56vYVANVUYZF9b/Kc+QdIT/YZItQ7tgjphwlOG3bh9bms5Yx+CT XgqZH0N+DbpEWS3TFNxMCfRZgirwwoQXuYvb9q30F/vtoZc3YgaRwmDT8P87sLfltRlq 2V3x4/2odvket9NsybS9dwP4qcgKV1ZL1LEKkfobIdV1fNXHNIsyFKu+z7a1CE6dud5P 7CUPozb2hwAF6GlnP0F7X7R0iNX4UjCLHDkOGyGWqf/wvGrV07LGNZCz0+OvWUn97gzV kO0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=UfpjCjAboPhsKxTz6hzk7OOC0HRpbO783MOm5lwj4zc=; b=Yc6h2Bto5cExwglrsy66pNawK+M5eyrZv38ufctMSpJBRPjeiUQkpExpDddWAYJKyc RR1zH10O2QQb0Lk+Q+tXBFnIDyRZv5M+T0j9fBa63f8/QJjMJ0qXelomlCiah8JAX6vj 2r2D+sqNX//m5IDgzgb0/yVh6TNEa4J5TaaYoCirc6pH58WbRRHFGNmDnhjbFj4h9z02 qBS4GxhWOBMal/Bj50JaN+mKybeEsTxnuXYOKZrCL6XXiSy/jeuTDlwrZchjXZ1IV9TQ wuUUV+EySaRRk4YSZFbOGlazZGHCbPjI+wVIILJnlozgc3vjYam/OfZueFG2tI1YYIXi TXaw== X-Gm-Message-State: AHYfb5h5Qg6qT+jZoXmtuLBu4bnljGjCiQvioKEsoKeV3LdmCu9iIi3v bOuTOdtm2PnaHOXJovcZix0hB9q99+1T X-Received: by 10.28.131.193 with SMTP id f184mr43886wmd.117.1502740429349; Mon, 14 Aug 2017 12:53:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.130.113 with HTTP; Mon, 14 Aug 2017 12:53:48 -0700 (PDT) From: Jeremy Finzel Date: Mon, 14 Aug 2017 14:53:48 -0500 Message-ID: Subject: Odd sudden performance degradation related to temp object churn To: postgres performance list Content-Type: multipart/alternative; boundary="001a1144226e7b2a1e0556bc05c9" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a1144226e7b2a1e0556bc05c9 Content-Type: text/plain; charset="UTF-8" This particular db is on 9.3.15. Recently we had a serious performance degradation related to a batch job that creates 4-5 temp tables and 5 indexes. It is a really badly written job but what really confuses us is that this job has been running for years with no issue remotely approaching this one. We are also using pgpool. The job would kick off with 20-30 of similar queries running at once. The thing normally takes only 30ms or so to run - it only operates on 1 customer at a time (yes, it's horribly written). All of a sudden the cluster started thrashing and performance seriously degraded. We tried a number of things with no success: - Analyzed the whole database - Turned off full logging - Turned off synchronous commit - Vacuumed several of the catalog tables - Checked if we had an abnormal high amount of traffic this time - we didn't - No abnormal disk/network issues (we would have seen much larger issues if that had been the case) - Tried turning down the number of app nodes running What ended up completely resolving the issue was converting the query to use ctes instead of temp tables. That means we avoided the disk writing and the catalog churn, and useless indexes. However, we are baffled as to why this could make such a big difference when we had no issue like this before, and we have seen no systematic performance degradation in our system. Any insights would be greatly appreciated, as we are concerned not knowing the root cause. Thanks, Jeremy --001a1144226e7b2a1e0556bc05c9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
This particular db is on 9.3.15.=C2=A0 Recently we had a s= erious performance degradation related to a batch job that creates 4-5 temp= tables and 5 indexes.=C2=A0 It is a really badly written job but what real= ly confuses us is that this job has been running for years with no issue re= motely approaching this one.=C2=A0 We are also using pgpool.

=
The job would kick off with 20-30 of similar queries running at once.= =C2=A0 The thing normally takes only 30ms or so to run - it only operates o= n 1 customer at a time (yes, it's horribly written).=C2=A0 All of a sud= den the cluster started thrashing and performance seriously degraded.=C2=A0= We tried a number of things with no success:
  • Analyzed th= e whole database
  • Turned off full logging
  • Turned off synchro= nous commit
  • Vacuumed several of the catalog tables
  • Checked = if we had an abnormal high amount of traffic this time - we didn't
  • =
  • No abnormal disk/network issues (we would have seen much larger issues = if that had been the case)
  • Tried turning down the number of app nod= es running
What ended up completely resolving the issue was c= onverting the query to use ctes instead of temp tables.=C2=A0 That means we= avoided the disk writing and the catalog churn, and useless indexes.=C2=A0= However, we are baffled as to why this could make such a big difference wh= en we had no issue like this before, and we have seen no systematic perform= ance degradation in our system.

Any insights= would be greatly appreciated, as we are concerned not knowing the root cau= se.

Thanks,
Jeremy
--001a1144226e7b2a1e0556bc05c9--