Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dhMRC-0008GW-9F for pgsql-performance@arkaria.postgresql.org; Mon, 14 Aug 2017 20:58:14 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dhMRB-0007Cw-Ng for pgsql-performance@arkaria.postgresql.org; Mon, 14 Aug 2017 20:58:13 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dhMRA-0007Bu-M8 for pgsql-performance@postgresql.org; Mon, 14 Aug 2017 20:58:12 +0000 Received: from mail-qt0-x232.google.com ([2607:f8b0:400d:c0d::232]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dhMR6-0000XK-Ku for pgsql-performance@postgresql.org; Mon, 14 Aug 2017 20:58:11 +0000 Received: by mail-qt0-x232.google.com with SMTP id v29so58490386qtv.3 for ; Mon, 14 Aug 2017 13:58:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=CEcBc3TbplrXbF6UyUTaZksm6IajIscUO3uwHNGyKWc=; b=AwW/rhlGhazaLSGR9hsQx4M0yukQwh/F1QituBla+m4eDfJUHsB3cT3J6Fc/z7K88a moYRVlsOp5SPO828rzPU8NCQzh4kS8lX5yiOf6e0Vgq1tXzCu40SPzwibtWeenLIPcz+ i0I7URqRgeL0Po/s7+YhTEkthvuu18CQmTm1hrgz4Zw1b6RVrdGWoax+C2XVAFVCMbjh 9lIglyAQqykwOB3PcG/kNFVtdUpO6qaygXKp2srjPb0tKD+vPbG9q23OHmag0kBuKMBK Y4j+glG1ZRnV7aSW8ShTBtBPOU3Zle/1e9eeF7yxHGpJCnP0YCij1Oo/quYN4IMyH82d fJLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=CEcBc3TbplrXbF6UyUTaZksm6IajIscUO3uwHNGyKWc=; b=tSggf5uPqAyfsR0nKp24jkH1w720vnNQ4rESh0CpXxOqpNQECn3b9l41801sD4FjSk Z0Zqt1O7W+GSRKoc1UdHqRQ+BjpUVw2nmueJtUaoI53azm2WK3BFPC72pG7BtrfRFDlM AruiD6lJSAGRoP4BkaYjPIMBanqYDck3h0ZwKXvQIPslZFBO4tfIcBQEZCUdmmtSE5uR YXBrml0WPxJVlszXE9a7bqr2TguX2mk+bdnxjvQR0zf30+7+niN9RaYc5ad4Zm+akAM+ iW/LS0x6cXExPHmJLpfn2wLcX8j17ukFQwW4vSaH25QcTdCsl5mtygoS7njm+A2jl+oS MsXA== X-Gm-Message-State: AHYfb5g28cV+mEQ0Hwq4FZmBPMBr3eXcSNzFC/xkgZQ43JcoqkeWVOZC GVnOSMRKFQOmcFFEBucHU3OMDzaCpQ== X-Received: by 10.237.46.38 with SMTP id j35mr36552558qtd.285.1502744286757; Mon, 14 Aug 2017 13:58:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.105.164 with HTTP; Mon, 14 Aug 2017 13:58:06 -0700 (PDT) In-Reply-To: References: From: Scott Marlowe Date: Mon, 14 Aug 2017 14:58:06 -0600 Message-ID: Subject: Re: Odd sudden performance degradation related to temp object churn To: Jeremy Finzel Cc: postgres performance list Content-Type: text/plain; charset="UTF-8" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org On Mon, Aug 14, 2017 at 2:46 PM, Jeremy Finzel wrote: > On Mon, Aug 14, 2017 at 3:01 PM, Scott Marlowe > wrote: >> >> On Mon, Aug 14, 2017 at 1:53 PM, Jeremy Finzel wrote: >> > >> > Any insights would be greatly appreciated, as we are concerned not >> > knowing >> > the root cause. >> >> How are your disks setup? One big drive with everything on it? >> Separate disks for pg_xlog and pg's data dir and the OS logging? IO >> contention is one of the big killers of db performance. > > > It's one san volume ssd for the data and wal files. But logging and memory > spilling and archived xlogs go to a local ssd disk. > >> >> Logging likely isn't your problem, but yeah you don't need to log >> ERRYTHANG to see the problem either. Log long running queries temp >> usage, buffer usage, query plans on slow queries, stuff like that. >> >> You've likely hit a "tipping point" in terms of data size. Either it's >> cause the query planner to make a bad decision, or you're spilling to >> disk a lot more than you used to. >> >> Be sure to log temporary stuff with log_temp_files = 0 in your >> postgresql.conf and then look for temporary file in your logs. I bet >> you've started spilling into the same place as your temp tables are >> going, and by default that's your data directory. Adding another drive >> and moving pgsql's temp table space to it might help. > > > We would not have competition between disk spilling and temp tables because > what I described above - they are going to two different places. Also, I > neglected to mention that we turned on auto-explain during this crisis, and > found the query plan was good, it was just taking forever due to thrashing > just seconds after we kicked off the batches. I did NOT turn on log_analyze > and timing but it was enough to see there was no apparent query plan > regression. Also, we had no change in the performance/plan after > re-analyzing all tables. You do know that temp tables go into the default temp table space, just like sorts, right? Have you used something like iostat to see which volume is getting all the IO? > >> >> Also increasing work_mem (but don't go crazy, it's per sort, so can >> multiply fast on a busy server) > > > We are already up at 400MB, and this query was using memory in the low KB > levels because it is very small (1 - 20 rows of data per temp table, and no > expensive selects with missing indexes or anything). Ahh so it doesn't sound like it's spilling to disk then. Do the logs say yes or no on that? Basically use unix tools to look for where you're thrashing. iotop can be handy too. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance