MIME-Version: 1.0
In-Reply-To: <a8fa0c8a-1269-338e-80a6-cb121574be7c@catalyst.net.nz>
References: 
 <CADFyZw7aGoD0AaStxdyHByR5Qta=M5wx0v=iptKLhPUp+EOKvA@mail.gmail.com>
 <dc5d2b63-0a1b-ff76-c88a-87c2c41bd5b8@a-kretschmer.de>
 <CADFyZw4UanW5TbFajWKWhN9XcW+8gtCXw+kssHo47Wpr1A=zJw@mail.gmail.com>
 <DM5PR07MB28103FD558CB6628CECE07BEDAA90@DM5PR07MB2810.namprd07.prod.outlook.com>
 <CADFyZw6JrhsLR_eYOeCjjiQMzz5bepk6AfMRBu0hnaQg+vN-=A@mail.gmail.com>
 <a8fa0c8a-1269-338e-80a6-cb121574be7c@catalyst.net.nz>
From: Charles Nadeau <charles.nadeau@gmail.com>
Date: Fri, 14 Jul 2017 16:34:24 +0200
Message-ID: 
 <CADFyZw6_gpDoRtO_zqdD7bjsBy7twHM=FV3w_ukKRgcnJ79MSg@mail.gmail.com>
Subject: Re: Very poor read performance, query independent
Cc: Igor Neyman <ineyman@perceptron.com>,
 Andreas Kretschmer <andreas@a-kretschmer.de>,
	"pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
Content-Type: multipart/alternative; boundary="001a114fc7105007d7055447f36a"
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

--001a114fc7105007d7055447f36a
Content-Type: text/plain; charset="UTF-8"

Mark,

First I must say that I changed my disks configuration from 4 disks in RAID
10 to 5 disks in RAID 0 because I almost ran out of disk space during the
last ingest of data.
Here is the result test you asked. It was done with a cold cache:

flows=# \timing
Timing is on.
flows=# explain select count(*) from flows;
                                          QUERY PLAN

------------------------------------------------------------
-----------------------------------
 Finalize Aggregate  (cost=17214914.09..17214914.09 rows=1 width=8)
   ->  Gather  (cost=17214914.07..17214914.09 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=17213914.07..17213914.07 rows=1
width=8)
               ->  Parallel Seq Scan on flows  (cost=0.00..17019464.49
rows=388899162 width=0)
(5 rows)

Time: 171.835 ms
flows=# select pg_relation_size('flows');
 pg_relation_size
------------------
     129865867264
(1 row)

Time: 57.157 ms
flows=# select count(*) from flows;
LOG:  duration: 625546.522 ms  statement: select count(*) from flows;
   count
-----------
 589831190
(1 row)

Time: 625546.662 ms

The throughput reported by Postgresql is almost 198MB/s, and the throughput
as mesured by dstat during the query execution was between 25 and 299MB/s.
It is much better than what I had before! The i/o wait was about 12% all
through the query. One thing I noticed is the discrepency between the read
throughput reported by pg_activity and the one reported by dstat:
pg_activity always report a value lower than dstat.

Besides the change of disks configuration, here is what contributed the
most to the improvment of the performance so far:

Using Hugepage
Increasing effective_io_concurrency to 256
Reducing random_page_cost from 22 to 4
Reducing min_parallel_relation_size to 512kB to have more workers when
doing sequential parallel scan of my biggest table


Thanks for recomending this test, I now know what the real throughput
should be!

Charles

On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood <
mark.kirkwood@catalyst.net.nz> wrote:

> Hmm - how are you measuring that sequential scan speed of 4MB/s? I'd
> recommend doing a very simple test e.g, here's one on my workstation - 13
> GB single table on 1 SATA drive - cold cache after reboot, sequential scan
> using Postgres 9.6.2:
>
> bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
>                                      QUERY PLAN
> ------------------------------------------------------------
> ------------------------
>  Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
>    ->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00 rows=100000000
> width=0)
> (2 rows)
>
>
> bench=#  SELECT pg_relation_size('pgbench_accounts');
>  pg_relation_size
> ------------------
>       13429514240
> (1 row)
>
> bench=# SELECT count(*) FROM pgbench_accounts;
>    count
> -----------
>  100000000
> (1 row)
>
> Time: 118884.277 ms
>
>
> So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 sec).
> Sure enough, while I was running the query iostat showed:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda               0.00     0.00  926.00    0.00 114.89     0.00   254.10
>    1.90    2.03    2.03    0.00   1.08 100.00
>
>
> So might be useful for us to see something like that from your system -
> note you need to check you really have flushed the cache, and that no other
> apps are using the db.
>
> regards
>
> Mark
>
>
> On 12/07/17 00:46, Charles Nadeau wrote:
>
>> After reducing random_page_cost to 4 and testing more, I can report that
>> the aggregate read throughput for parallel sequential scan is about 90MB/s.
>> However the throughput for sequential scan is still around 4MB/s.
>>
>>
>


-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/

--001a114fc7105007d7055447f36a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div style=3D"font-size:12.8px">Mark,</div><div style=3D"f=
ont-size:12.8px"><br></div><div style=3D"font-size:12.8px">First I must say=
 that I changed my disks configuration from 4 disks in RAID 10 to 5 disks i=
n RAID 0 because I almost ran out of disk space during the last ingest of d=
ata.</div><div style=3D"font-size:12.8px">Here is the result test you asked=
. It was done with a cold cache:</div><blockquote style=3D"font-size:12.8px=
;margin:0px 0px 0px 40px;border:none;padding:0px"><div>flows=3D# \timing<br=
></div><div>Timing is on.</div><div>flows=3D# explain select count(*) from =
flows;</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 QUERY PLAN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0</div><div>------------------------------<=
wbr>------------------------------<wbr>------------------------------<wbr>-=
----</div><div>=C2=A0Finalize Aggregate =C2=A0(cost=3D17214914.09..17214914=
.<wbr>09 rows=3D1 width=3D8)</div><div>=C2=A0 =C2=A0-&gt; =C2=A0Gather =C2=
=A0(cost=3D17214914.07..17214914.<wbr>09 rows=3D1 width=3D8)</div><div>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Workers Planned: 1</div><div>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Partial Aggregate =C2=A0(cost=3D17213914.07=
..17213914.<wbr>07 rows=3D1 width=3D8)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Parallel Seq Scan on flows =C2=
=A0(cost=3D0.00..17019464.49 rows=3D388899162 width=3D0)</div><div>(5 rows)=
</div><div><br></div><div>Time: 171.835 ms</div><div>flows=3D# select pg_re=
lation_size(&#39;flows&#39;);</div><div>=C2=A0pg_relation_size=C2=A0</div><=
div>------------------</div><div>=C2=A0 =C2=A0 =C2=A0129865867264</div><div=
>(1 row)</div><div><br></div><div>Time: 57.157 ms</div><div>flows=3D# selec=
t count(*) from flows;</div><div>LOG: =C2=A0duration: 625546.522 ms =C2=A0s=
tatement: select count(*) from flows;</div><div>=C2=A0 =C2=A0count =C2=A0=
=C2=A0</div><div>-----------</div><div>=C2=A0589831190</div><div>(1 row)</d=
iv><div><br></div><div>Time: 625546.662 ms</div></blockquote><div style=3D"=
font-size:12.8px">The throughput reported by Postgresql is almost 198MB/s, =
and the throughput as mesured by dstat during the query execution was betwe=
en 25 and 299MB/s. It is much better than what I had before! The i/o wait w=
as about 12% all through the query. One thing I noticed is the discrepency =
between the read throughput reported by pg_activity and the one reported by=
 dstat: pg_activity always report a value lower than dstat.</div><div style=
=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8px">Besides the=
 change of disks configuration, here is what contributed the most to the im=
provment of the performance so far:</div><blockquote style=3D"font-size:12.=
8px;margin:0px 0px 0px 40px;border:none;padding:0px"><div>Using Hugepage</d=
iv><div>Increasing effective_io_concurrency to 256</div><div>Reducing rando=
m_page_cost from 22 to 4</div><div>Reducing min_parallel_relation_size to 5=
12kB to have more workers when doing sequential parallel scan of my biggest=
 table</div></blockquote><div style=3D"font-size:12.8px"><br></div><div sty=
le=3D"font-size:12.8px">Thanks for recomending this test, I now know what t=
he real throughput should be!</div><div style=3D"font-size:12.8px"><br></di=
v><div style=3D"font-size:12.8px">Charles</div></div><div class=3D"gmail_ex=
tra"><br><div class=3D"gmail_quote">On Wed, Jul 12, 2017 at 4:11 AM, Mark K=
irkwood <span dir=3D"ltr">&lt;<a href=3D"mailto:mark.kirkwood@catalyst.net.=
nz" target=3D"_blank">mark.kirkwood@catalyst.net.nz</a>&gt;</span> wrote:<b=
r><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">Hmm - how are you measuring that sequentia=
l scan speed of 4MB/s? I&#39;d recommend doing a very simple test e.g, here=
&#39;s one on my workstation - 13 GB single table on 1 SATA drive - cold ca=
che after reboot, sequential scan using Postgres 9.6.2:<br>
<br>
bench=3D#=C2=A0 EXPLAIN SELECT count(*) FROM pgbench_accounts;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0QUERY PLAN<br>
------------------------------<wbr>------------------------------<wbr>-----=
-------------------<br>
=C2=A0Aggregate=C2=A0 (cost=3D2889345.00..2889345.01 rows=3D1 width=3D8)<br=
>
=C2=A0 =C2=A0-&gt;=C2=A0 Seq Scan on pgbench_accounts (cost=3D0.00..2639345=
.00 rows=3D100000000 width=3D0)<br>
(2 rows)<br>
<br>
<br>
bench=3D#=C2=A0 SELECT pg_relation_size(&#39;pgbench_acco<wbr>unts&#39;);<b=
r>
=C2=A0pg_relation_size<br>
------------------<br>
=C2=A0 =C2=A0 =C2=A0 13429514240<br>
(1 row)<br>
<br>
bench=3D# SELECT count(*) FROM pgbench_accounts;<br>
=C2=A0 =C2=A0count<br>
-----------<br>
=C2=A0100000000<br>
(1 row)<br>
<br>
Time: 118884.277 ms<br>
<br>
<br>
So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 sec). S=
ure enough, while I was running the query iostat showed:<br>
<br>
Device:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rrqm/s=C2=A0 =C2=A0wrqm/s=C2=A0 =
=C2=A0 =C2=A0r/s=C2=A0 =C2=A0 =C2=A0w/s=C2=A0 =C2=A0 rMB/s wMB/s avgrq-sz a=
vgqu-sz=C2=A0 =C2=A0await r_await w_await=C2=A0 svctm=C2=A0 %util<br>
sda=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00.00=C2=A0 =C2=A0=
 =C2=A00.00=C2=A0 926.00=C2=A0 =C2=A0 0.00 114.89=C2=A0 =C2=A0 =C2=A00.00=
=C2=A0 =C2=A0254.10=C2=A0 =C2=A0 =C2=A01.90=C2=A0 =C2=A0 2.03=C2=A0 =C2=A0 =
2.03=C2=A0 =C2=A0 0.00=C2=A0 =C2=A01.08 100.00<br>
<br>
<br>
So might be useful for us to see something like that from your system - not=
e you need to check you really have flushed the cache, and that no other ap=
ps are using the db.<br>
<br>
regards<span class=3D"HOEnZb"><font color=3D"#888888"><br>
<br>
Mark</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
On 12/07/17 00:46, Charles Nadeau wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
After reducing random_page_cost to 4 and testing more, I can report that th=
e aggregate read throughput for parallel sequential scan is about 90MB/s. H=
owever the throughput for sequential scan is still around 4MB/s.<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"gmail_signature" data-smartmail=3D"gmail_signature">Charles N=
adeau Ph.D.<br><a href=3D"http://charlesnadeau.blogspot.com/" target=3D"_bl=
ank">http://charlesnadeau.blogspot.com/</a></div>
</div>

--001a114fc7105007d7055447f36a--