Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dW1gD-0000Xn-4l for pgsql-performance@arkaria.postgresql.org; Fri, 14 Jul 2017 14:34:53 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dW1gC-0007mH-GA for pgsql-performance@arkaria.postgresql.org; Fri, 14 Jul 2017 14:34:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dW1gB-0007kv-50 for pgsql-performance@postgresql.org; Fri, 14 Jul 2017 14:34:51 +0000 Received: from mail-qk0-x22c.google.com ([2607:f8b0:400d:c09::22c]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dW1g6-0002xf-QH for pgsql-performance@postgresql.org; Fri, 14 Jul 2017 14:34:50 +0000 Received: by mail-qk0-x22c.google.com with SMTP id p73so13352813qka.2 for ; Fri, 14 Jul 2017 07:34:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:cc; bh=pzh0yU9A1SkDK16u7evEtdcLKcSiO2dBltgxsYg8k/s=; b=CivoRIf5iVjDNLLBbl6rb+WlExsY40F4miVVpI8maeioAZaI0xtv4r91MUOPJPoasg 8Fq4N/o1VRRWIFUSna60g0LKjngloz2LoZ+HvAin76LC4baYjDu7uPTPQxIcdXU09CA6 25S7WDbNjjPC+6LFuNXgtEtIkFH+XuxW3RPUxBDLtwFRGcPDGhcjd/FzNmKJF5UOi4Bn ZYMxUc7+UM8czj9DJvNJOGVaDVQrk0ZiOI6b0Ih04DwiOEXTdQjPPlVFLzoH7/PnYjKz BSmwMjxW8rW98skRcBD3MTsnt222qhzunC/eCcRhMxmzZfmoWlWSAk1WIKJhVXiZnPSG ob8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:cc; bh=pzh0yU9A1SkDK16u7evEtdcLKcSiO2dBltgxsYg8k/s=; b=ekLJ2wPrdA0tWAvl9LO0alpXeUg2pLPpClccqMp2TMevEEquJNRLrWkC+vgu7Xau74 atH4g/dz51m4TUJooJOOYtSgX2/KYtf+UqKKMev6UF+GyT7BFgxIpqfUgtVcDovJgPlA 7RpeKCdikUcAO9gDNOW+UTNKrDUmBdmdpSBFm0NBR+P1WCbqfgaUnneBvVgit3IX50xU r87oIf+RA59JdOAlbZv5hjHFlQEtNo0sohDShkc78JL47c8+Qq/P5cBMjpcFokTd9cYy 9boe3wDFKa5MCCjGyYowzCwDzYh/1oGpUAU/TQGjFNk8/NoHDiPhWIH+yUIeAB4bdisF 6H1g== X-Gm-Message-State: AIVw110SXPbR+9RPSFklWIm2IznqWXwLWQh4ZD0jkGS+x9G1uPkh0K3u UCfUH04Vbjrr8aSzawyk9U51k6QyyzAC X-Received: by 10.55.167.211 with SMTP id q202mt13103386qke.77.1500042885062; Fri, 14 Jul 2017 07:34:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.157.11 with HTTP; Fri, 14 Jul 2017 07:34:24 -0700 (PDT) In-Reply-To: References: From: Charles Nadeau Date: Fri, 14 Jul 2017 16:34:24 +0200 Message-ID: Subject: Re: Very poor read performance, query independent Cc: Igor Neyman , Andreas Kretschmer , "pgsql-performance@postgresql.org" Content-Type: multipart/alternative; boundary="001a114fc7105007d7055447f36a" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a114fc7105007d7055447f36a Content-Type: text/plain; charset="UTF-8" Mark, First I must say that I changed my disks configuration from 4 disks in RAID 10 to 5 disks in RAID 0 because I almost ran out of disk space during the last ingest of data. Here is the result test you asked. It was done with a cold cache: flows=# \timing Timing is on. flows=# explain select count(*) from flows; QUERY PLAN ------------------------------------------------------------ ----------------------------------- Finalize Aggregate (cost=17214914.09..17214914.09 rows=1 width=8) -> Gather (cost=17214914.07..17214914.09 rows=1 width=8) Workers Planned: 1 -> Partial Aggregate (cost=17213914.07..17213914.07 rows=1 width=8) -> Parallel Seq Scan on flows (cost=0.00..17019464.49 rows=388899162 width=0) (5 rows) Time: 171.835 ms flows=# select pg_relation_size('flows'); pg_relation_size ------------------ 129865867264 (1 row) Time: 57.157 ms flows=# select count(*) from flows; LOG: duration: 625546.522 ms statement: select count(*) from flows; count ----------- 589831190 (1 row) Time: 625546.662 ms The throughput reported by Postgresql is almost 198MB/s, and the throughput as mesured by dstat during the query execution was between 25 and 299MB/s. It is much better than what I had before! The i/o wait was about 12% all through the query. One thing I noticed is the discrepency between the read throughput reported by pg_activity and the one reported by dstat: pg_activity always report a value lower than dstat. Besides the change of disks configuration, here is what contributed the most to the improvment of the performance so far: Using Hugepage Increasing effective_io_concurrency to 256 Reducing random_page_cost from 22 to 4 Reducing min_parallel_relation_size to 512kB to have more workers when doing sequential parallel scan of my biggest table Thanks for recomending this test, I now know what the real throughput should be! Charles On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood < mark.kirkwood@catalyst.net.nz> wrote: > Hmm - how are you measuring that sequential scan speed of 4MB/s? I'd > recommend doing a very simple test e.g, here's one on my workstation - 13 > GB single table on 1 SATA drive - cold cache after reboot, sequential scan > using Postgres 9.6.2: > > bench=# EXPLAIN SELECT count(*) FROM pgbench_accounts; > QUERY PLAN > ------------------------------------------------------------ > ------------------------ > Aggregate (cost=2889345.00..2889345.01 rows=1 width=8) > -> Seq Scan on pgbench_accounts (cost=0.00..2639345.00 rows=100000000 > width=0) > (2 rows) > > > bench=# SELECT pg_relation_size('pgbench_accounts'); > pg_relation_size > ------------------ > 13429514240 > (1 row) > > bench=# SELECT count(*) FROM pgbench_accounts; > count > ----------- > 100000000 > (1 row) > > Time: 118884.277 ms > > > So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 sec). > Sure enough, while I was running the query iostat showed: > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sda 0.00 0.00 926.00 0.00 114.89 0.00 254.10 > 1.90 2.03 2.03 0.00 1.08 100.00 > > > So might be useful for us to see something like that from your system - > note you need to check you really have flushed the cache, and that no other > apps are using the db. > > regards > > Mark > > > On 12/07/17 00:46, Charles Nadeau wrote: > >> After reducing random_page_cost to 4 and testing more, I can report that >> the aggregate read throughput for parallel sequential scan is about 90MB/s. >> However the throughput for sequential scan is still around 4MB/s. >> >> > -- Charles Nadeau Ph.D. http://charlesnadeau.blogspot.com/ --001a114fc7105007d7055447f36a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Mark,

First I must say= that I changed my disks configuration from 4 disks in RAID 10 to 5 disks i= n RAID 0 because I almost ran out of disk space during the last ingest of d= ata.
Here is the result test you asked= . It was done with a cold cache:
flows=3D# \timing
Timing is on.
flows=3D# explain select count(*) from = flows;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 QUERY PLAN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
------------------------------<= wbr>-------------------------------------------------------------= ----
=C2=A0Finalize Aggregate =C2=A0(cost=3D17214914.09..17214914= .09 rows=3D1 width=3D8)
=C2=A0 =C2=A0-> =C2=A0Gather =C2= =A0(cost=3D17214914.07..17214914.09 rows=3D1 width=3D8)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Workers Planned: 1
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0-> =C2=A0Partial Aggregate =C2=A0(cost=3D17213914.07= ..17213914.07 rows=3D1 width=3D8)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Parallel Seq Scan on flows =C2= =A0(cost=3D0.00..17019464.49 rows=3D388899162 width=3D0)
(5 rows)=

Time: 171.835 ms
flows=3D# select pg_re= lation_size('flows');
=C2=A0pg_relation_size=C2=A0
<= div>------------------
=C2=A0 =C2=A0 =C2=A0129865867264
(1 row)

Time: 57.157 ms
flows=3D# selec= t count(*) from flows;
LOG: =C2=A0duration: 625546.522 ms =C2=A0s= tatement: select count(*) from flows;
=C2=A0 =C2=A0count =C2=A0= =C2=A0
-----------
=C2=A0589831190
(1 row)

Time: 625546.662 ms
The throughput reported by Postgresql is almost 198MB/s, = and the throughput as mesured by dstat during the query execution was betwe= en 25 and 299MB/s. It is much better than what I had before! The i/o wait w= as about 12% all through the query. One thing I noticed is the discrepency = between the read throughput reported by pg_activity and the one reported by= dstat: pg_activity always report a value lower than dstat.

Besides the= change of disks configuration, here is what contributed the most to the im= provment of the performance so far:
Using Hugepage
Increasing effective_io_concurrency to 256
Reducing rando= m_page_cost from 22 to 4
Reducing min_parallel_relation_size to 5= 12kB to have more workers when doing sequential parallel scan of my biggest= table

Thanks for recomending this test, I now know what t= he real throughput should be!

Charles

On Wed, Jul 12, 2017 at 4:11 AM, Mark K= irkwood <mark.kirkwood@catalyst.net.nz> wrote:
Hmm - how are you measuring that sequentia= l scan speed of 4MB/s? I'd recommend doing a very simple test e.g, here= 's one on my workstation - 13 GB single table on 1 SATA drive - cold ca= che after reboot, sequential scan using Postgres 9.6.2:

bench=3D#=C2=A0 EXPLAIN SELECT count(*) FROM pgbench_accounts;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0QUERY PLAN
-----------------------------------------------------------------= -------------------
=C2=A0Aggregate=C2=A0 (cost=3D2889345.00..2889345.01 rows=3D1 width=3D8) =C2=A0 =C2=A0->=C2=A0 Seq Scan on pgbench_accounts (cost=3D0.00..2639345= .00 rows=3D100000000 width=3D0)
(2 rows)


bench=3D#=C2=A0 SELECT pg_relation_size('pgbench_accounts'); =C2=A0pg_relation_size
------------------
=C2=A0 =C2=A0 =C2=A0 13429514240
(1 row)

bench=3D# SELECT count(*) FROM pgbench_accounts;
=C2=A0 =C2=A0count
-----------
=C2=A0100000000
(1 row)

Time: 118884.277 ms


So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 sec). S= ure enough, while I was running the query iostat showed:

Device:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rrqm/s=C2=A0 =C2=A0wrqm/s=C2=A0 = =C2=A0 =C2=A0r/s=C2=A0 =C2=A0 =C2=A0w/s=C2=A0 =C2=A0 rMB/s wMB/s avgrq-sz a= vgqu-sz=C2=A0 =C2=A0await r_await w_await=C2=A0 svctm=C2=A0 %util
sda=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00=C2=A0 =C2=A0= =C2=A00.00=C2=A0 926.00=C2=A0 =C2=A0 0.00 114.89=C2=A0 =C2=A0 =C2=A00.00= =C2=A0 =C2=A0254.10=C2=A0 =C2=A0 =C2=A01.90=C2=A0 =C2=A0 2.03=C2=A0 =C2=A0 = 2.03=C2=A0 =C2=A0 0.00=C2=A0 =C2=A01.08 100.00


So might be useful for us to see something like that from your system - not= e you need to check you really have flushed the cache, and that no other ap= ps are using the db.

regards

Mark


On 12/07/17 00:46, Charles Nadeau wrote:
After reducing random_page_cost to 4 and testing more, I can report that th= e aggregate read throughput for parallel sequential scan is about 90MB/s. H= owever the throughput for sequential scan is still around 4MB/s.





--
=
--001a114fc7105007d7055447f36a--