Re: Very poor read performance, query independent

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Igor Neyman <[email protected]>
To: Charles Nadeau <[email protected]>
Cc: Jeff Janes <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Very poor read performance, query independent
Date: Fri, 14 Jul 2017 19:13:28 +0000
Message-ID: <CY1PR07MB2380D939241FBF466048DA7FDAAD0@CY1PR07MB2380.namprd07.prod.outlook.com> (raw)
In-Reply-To: <CADFyZw5pOiXpo_4zt2LeGneNuB91XHeOhL=9SzOQ36gWTX9rxA@mail.gmail.com>
References: <CADFyZw7aGoD0AaStxdyHByR5Qta=M5wx0v=iptKLhPUp+EOKvA@mail.gmail.com>
	<CAMkU=1z92pimpoDV1oTpd4AvSHS0CrLzQDFSxw1Z5cpmmjEOkw@mail.gmail.com>
	<CADFyZw7Am=mv-6RXOr_C2oxEhPyKjnbvFgaEPq+-aKw+j9oDrw@mail.gmail.com>
	<CAMkU=1zD=SNFF=U-HtRHcu2dAdqyorD6o_UCQoYHuoh-gcs_uA@mail.gmail.com>
	<CADFyZw4RTBtWUeo-6Q_GcsqawkF+gSe4JGEpASUE6msiu=01xA@mail.gmail.com>
	<DM5PR07MB28101CDEC281CFA491B455E5DAAF0@DM5PR07MB2810.namprd07.prod.outlook.com>
	<DM5PR07MB2810C1580B17D1BF15B15AE5DAAF0@DM5PR07MB2810.namprd07.prod.outlook.com>
	<CADFyZw5pOiXpo_4zt2LeGneNuB91XHeOhL=9SzOQ36gWTX9rxA@mail.gmail.com>
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

From: Charles Nadeau [mailto:[email protected]]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <[email protected]>
Cc: Jeff Janes <[email protected]>; [email protected]
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content of the directory that stores the temporary files, I found that I need at most 21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
      SELECT
         dstaddr
      FROM
         dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;
                                                                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual time=2227678.194..2227678.217 rows=50 loops=1)
         Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
         ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) (actual time=2227678.192..2227678.202 rows=50 loops=1)
               Sort Key: (((sum(flows.doctets) / '1024'::numeric) / '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
               Sort Method: quicksort  Memory: 654395kB
               Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
               ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
                     Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
                     ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
                           Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
                           Sort Method: external merge  Disk: 3049216kB
                           Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
                           ->  Gather  (cost=30060688.07..48003007.07 rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 loops=1)
                                 Workers Planned: 12
                                 Workers Launched: 12
                                 Buffers: shared hit=728798037 read=82974833
                                 ->  Hash Semi Join  (cost=30059688.07..47951761.31 rows=220376 width=20) (actual time=1268845.181..2007864.725 rows=7057357 loops=13)
                                       Hash Cond: (flows.dstaddr = flows_1.dstaddr)
                                       Buffers: shared hit=728795193 read=82974833
                                       ->  Nested Loop  (cost=0.03..17891246.86 rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
                                             Buffers: shared hit=590692229 read=14991777
                                             ->  Parallel Seq Scan on flows  (cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 rows=45371630 loops=13)
                                                   Buffers: shared hit=860990 read=14991777
                                             ->  Index Only Scan using mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=589831190)
                                                   Index Cond: (ipaddr >>= (flows.srcaddr)::ip4r)
                                                   Heap Fetches: 0
                                                   Buffers: shared hit=589831203
                                       ->  Hash  (cost=30059641.47..30059641.47 rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
                                             Buckets: 4194304 (originally 16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
                                             Buffers: shared hit=138102964 read=67983056
                                             ->  HashAggregate  (cost=30059561.64..30059601.56 rows=13305 width=4) (actual time=1265248.165..1267432.083 rows=3803508 loops=13)
                                                   Group Key: flows_1.dstaddr
                                                   Buffers: shared hit=138102964 read=67983056
                                                   ->  Nested Loop Anti Join  (cost=0.00..29729327.92 rows=660467447 width=4) (actual time=0.389..1201072.707 rows=125838232 loops=13)
                                                         Join Filter: (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
                                                         Rows Removed by Join Filter: 503353617
                                                         Buffers: shared hit=138102964 read=67983056
                                                         ->  Seq Scan on flows flows_1  (cost=0.00..17836152.73 rows=661128576 width=4) (actual time=0.322..343152.274 rows=589831190 loops=13)
                                                               Buffers: shared hit=138102915 read=67983056
                                                         ->  Materialize  (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2 loops=7667805470)
                                                               Buffers: shared hit=13
                                                               ->  Seq Scan on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual time=0.006..0.007 rows=4 loops=13)
                                                                     Buffers: shared hit=13
 Planning time: 0.941 ms
 Execution time: 2228345.171 ms
(48 rows)

With a work_mem at 6GB, I noticed that for the first 20 minutes the query was running, the i/o wait was much lower, hovering aroun 3% then it jumped 45% until almost the end of the query.

flowscompact and dstexterne are actually views. I use views to simplify query writing and to "abstract" queries that are use often in other queries. flowscompact is a view built on table flows (having about 590 million rows), it only keeps the most often used fields.
flows=# \d+ flowscompact;
                        View "public.flowscompact"
  Column   |           Type           | Modifiers | Storage | Description
-----------+--------------------------+-----------+---------+-------------
 flow_id   | bigint                   |           | plain   |
 sysuptime | bigint                   |           | plain   |
 exaddr    | ip4                      |           | plain   |
 dpkts     | integer                  |           | plain   |
 doctets   | bigint                   |           | plain   |
 first     | bigint                   |           | plain   |
 last      | bigint                   |           | plain   |
 srcaddr   | ip4                      |           | plain   |
 dstaddr   | ip4                      |           | plain   |
 srcport   | integer                  |           | plain   |
 dstport   | integer                  |           | plain   |
 prot      | smallint                 |           | plain   |
 tos       | smallint                 |           | plain   |
 tcp_flags | smallint                 |           | plain   |
 timestamp | timestamp with time zone |           | plain   |
View definition:
 SELECT flowstimestamp.flow_id,
    flowstimestamp.sysuptime,
    flowstimestamp.exaddr,
    flowstimestamp.dpkts,
    flowstimestamp.doctets,
    flowstimestamp.first,
    flowstimestamp.last,
    flowstimestamp.srcaddr,
    flowstimestamp.dstaddr,
    flowstimestamp.srcport,
    flowstimestamp.dstport,
    flowstimestamp.prot,
    flowstimestamp.tos,
    flowstimestamp.tcp_flags,
    flowstimestamp."timestamp"
   FROM flowstimestamp;
mynetworks is a table having one column and 4 rows; it contains a list of our network networks:
flows=# select * from mynetworks;
     ipaddr
----------------
 192.168.0.0/24<http://192.168.0.0/24;
 10.112.12.0/30<http://10.112.12.0/30;
 10.112.12.4/30<http://10.112.12.4/30;
 10.112.12.8/30<http://10.112.12.8/30;
(4 row)
flows=# \d+ mynetworks;
                    Table "public.mynetworks"
 Column | Type | Modifiers | Storage | Stats target | Description
--------+------+-----------+---------+--------------+-------------
 ipaddr | ip4r |           | plain   |              |
Indexes:
    "mynetworks_ipaddr_idx" gist (ipaddr)
dstexterne is a view listing all the destination IPv4 addresses not inside our network; it has one column and 3.8 million rows.
flows=# \d+ dstexterne;
              View "public.dstexterne"
 Column  | Type | Modifiers | Storage | Description
---------+------+-----------+---------+-------------
 dstaddr | ip4  |           | plain   |
View definition:
 SELECT DISTINCT flowscompact.dstaddr
   FROM flowscompact
     LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r
  WHERE mynetworks.ipaddr IS NULL;
Thanks!

Charles

Charles,

Don’t change temp_buffers.
Try to increase work_mem even more, say work_mem=’12GB’, because it’s still using disk for sorting (starting around 20th minute as you noticed).
See if this:
“Sort Method: external merge  Disk: 3049216kB”
goes away.
Igor

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Very poor read performance, query independent
  In-Reply-To: <CY1PR07MB2380D939241FBF466048DA7FDAAD0@CY1PR07MB2380.namprd07.prod.outlook.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox