MIME-Version: 1.0
In-Reply-To: 
 <CAMa1XUgPEUD0-r3RRoAk9H2fLKseYCLz2Rx_y3q6sGwHe3Y+qw@mail.gmail.com>
References: 
 <CAMa1XUi6pbugdB7zUA6weoPjmUHTzNLZcSEAh06-7FZRzkuHQw@mail.gmail.com>
 <CAOR=d=0YnvtGtF-WuWku03L=3tyLRXMCrAbuZBe7Whe2x3=Z9Q@mail.gmail.com>
 <CAMa1XUgPEUD0-r3RRoAk9H2fLKseYCLz2Rx_y3q6sGwHe3Y+qw@mail.gmail.com>
From: Scott Marlowe <scott.marlowe@gmail.com>
Date: Mon, 14 Aug 2017 14:58:06 -0600
Message-ID: 
 <CAOR=d=11i08qa6WbyYFLmB0FC_r8heEcEkPiDufuE7m1bPp7Jw@mail.gmail.com>
Subject: Re: Odd sudden performance degradation related to temp
 object churn
To: Jeremy Finzel <finzelj@gmail.com>
Cc: postgres performance list <pgsql-performance@postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

On Mon, Aug 14, 2017 at 2:46 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
> On Mon, Aug 14, 2017 at 3:01 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Mon, Aug 14, 2017 at 1:53 PM, Jeremy Finzel <finzelj@gmail.com> wrote:
>> >
>> > Any insights would be greatly appreciated, as we are concerned not
>> > knowing
>> > the root cause.
>>
>> How are your disks setup? One big drive with everything on it?
>> Separate disks for pg_xlog and pg's data dir and the OS logging? IO
>> contention is one of the big killers of db performance.
>
>
> It's one san volume ssd for the data and wal files.  But logging and memory
> spilling and archived xlogs go to a local ssd disk.
>
>>
>> Logging likely isn't your problem, but yeah you don't need to log
>> ERRYTHANG to see the problem either. Log long running queries temp
>> usage, buffer usage, query plans on slow queries, stuff like that.
>>
>> You've likely hit a "tipping point" in terms of data size. Either it's
>> cause the query planner to make a bad decision, or you're spilling to
>> disk a lot more than you used to.
>>
>> Be sure to log temporary stuff with log_temp_files = 0 in your
>> postgresql.conf and then look for temporary file in your logs. I bet
>> you've started spilling into the same place as your temp tables are
>> going, and by default that's your data directory. Adding another drive
>> and moving pgsql's temp table space to it might help.
>
>
> We would not have competition between disk spilling and temp tables because
> what I described above - they are going to two different places.  Also, I
> neglected to mention that we turned on auto-explain during this crisis, and
> found the query plan was good, it was just taking forever due to thrashing
> just seconds after we kicked off the batches.  I did NOT turn on log_analyze
> and timing but it was enough to see there was no apparent query plan
> regression.  Also, we had no change in the performance/plan after
> re-analyzing all tables.

You do know that temp tables go into the default temp table space,
just like sorts, right?

Have you used something like iostat to see which volume is getting all the IO?

>
>>
>> Also increasing work_mem (but don't go crazy, it's per sort, so can
>> multiply fast on a busy server)
>
>
> We are already up at 400MB, and this query was using memory in the low KB
> levels because it is very small (1 - 20 rows of data per temp table, and no
> expensive selects with missing indexes or anything).

Ahh so it doesn't sound like it's spilling to disk then. Do the logs
say yes or no on that?

Basically use unix tools to look for where you're thrashing. iotop can
be handy too.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance