Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ah7Tf-0008MZ-NQ for pgsql-performance@arkaria.postgresql.org; Sat, 19 Mar 2016 03:22:59 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1ah7Te-0006wp-LV for pgsql-performance@arkaria.postgresql.org; Sat, 19 Mar 2016 03:22:58 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1ah7Tc-0006wU-Kh for pgsql-performance@postgresql.org; Sat, 19 Mar 2016 03:22:56 +0000 Received: from aibo.runbox.com ([91.220.196.211]) by makus.postgresql.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1ah7TY-000250-F3 for pgsql-performance@postgresql.org; Sat, 19 Mar 2016 03:22:55 +0000 Received: from [10.9.9.210] (helo=mailfront10.runbox.com) by bars.runbox.com with esmtp (Exim 4.71) (envelope-from ) id 1ah7TV-00036C-0X for pgsql-performance@postgresql.org; Sat, 19 Mar 2016 04:22:49 +0100 Received: from cpe-76-176-177-1.san.res.rr.com ([76.176.177.1] helo=seasyslap4) by mailfront10.runbox.com with esmtpsa (uid:561468 ) (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.82) id 1ah7TN-0003dD-Ms for pgsql-performance@postgresql.org; Sat, 19 Mar 2016 04:22:42 +0100 From: "Mike Sofen" To: References: <01da01d18091$8d991c50$a8cb54f0$@runbox.com> In-Reply-To: Subject: Re: Disk Benchmarking Question Date: Fri, 18 Mar 2016 20:22:25 -0700 Message-ID: <00c601d1818e$90b27a50$b2176ef0$@runbox.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00C7_01D18153.E45528F0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQI0k5eacQMu4dP4xLoYYyMzqTGyywJLIA9EAQnRjRCef0E+8A== Content-Language: en-us X-Pg-Spam-Score: -1.6 (-) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org This is a multipart message in MIME format. ------=_NextPart_000_00C7_01D18153.E45528F0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sorry for the delay, long work day! =20 Ok, I THINK I understand where you=E2=80=99re going. Do it this way: 4 drives in Raid10 =3D 2 pairs of mirrored drives, aka still 2 active = drives (2 are failover). They are sharing the 12gbps SAS interface, but = that speed is quite irrelevant=E2=80=A6it=E2=80=99s just a giant pipe = for filling lots of drives. =20 =20 Each of your 2 drives has a max seq read/write spec 200 MBPs (WAY max). = When I say max, I mean, under totally edge laboratory conditions, = writing to the outer few tracks with purely sequential data (never = happens in the real world). With 2 drives running perfectly in raid 10, = the theoretical max would be 400mbps. Real world, less than half, on = sequential. =20 But random writes are the rulers of most activity in the data world = (think of writing a single row to a table =E2=80=93 a few thousand bytes = that might be plopped anywhere on the disk and then randomly retrieved. = So the MBPs throughput number becomes mostly meaningless (because the = data chunks are small and random), and IOPs and drive seek times become = king (thus my earlier comments). =20 So =E2=80=93 if you=E2=80=99re having disk performance issues with a = database, you either add more spinning disks (to increase = IOPs/distribute them) or switch to SSDs and forget about almost = everything=E2=80=A6 =20 Mike =20 ------------------ From: Dave Stibrany [mailto:dstibrany@gmail.com]=20 Sent: Friday, March 18, 2016 7:48 AM Hey Mike, =20 Thanks for the response. I think where I'm confused is that I thought = vendor specified MBps was an estimate of sequential read/write speed. = Therefore if you're in RAID10, you'd have 4x the sequential read speed = and 2x the sequential write speed. Am I misunderstanding something? =20 Also, when you mention that MBPs is the capacity of the interface, what = do you mean exactly. I've been taking interface speed to be the = electronic transfer speed, not the speed from the actual physical = medium, and more in the 6-12 gigabit range. =20 Please let me know if I'm way off on any of this, I'm hoping to have my = mental model updated. =20 Thanks! =20 Dave =20 On Thu, Mar 17, 2016 at 5:11 PM, Mike Sofen > wrote: Hi Dave, =20 Database disk performance has to take into account IOPs, and IMO, over = MBPs, since it=E2=80=99s the ability of the disk subsystem to write lots = of little bits (usually) versus writing giant globs, especially in = direct attached storage (like yours, versus a SAN). Most db disk = benchmarks revolve around IOPs=E2=80=A6and this is where SSDs utterly = crush spinning disks. =20 You can get maybe 200 IOPs out of each disk, you have 4 in raid 10 so = you get a whopping 400 IOPs. A single quality SSD (like the Samsung 850 = pro) will support a minimum of 40k IOPs on reads and 80k IOPs on writes. = That=E2=80=99s why SSDs are eliminating spinning disks when performance = is critical and budget allows. =20 Back to your question =E2=80=93 the MBPs is the capacity of interface, = so it makes sense that it=E2=80=99s the same for both reads and writes. = The perc raid controller will be saving your bacon on writes, with 2gb = cache (assuming it=E2=80=99s caching writes), so it becomes the = equivalent of an SSD up to the capacity limit of the write cache. With = only 400 iops of write speed, with a busy server you can easily saturate = the cache and then your system will drop to a crawl. =20 If I didn=E2=80=99t answer the intent of your question, feel free to = clarify for me. =20 Mike =20 From: pgsql-performance-owner@postgresql.org = = [mailto:pgsql-performance-owner@postgresql.org = ] On Behalf Of Dave = Stibrany Sent: Thursday, March 17, 2016 1:45 PM To: pgsql-performance@postgresql.org = =20 Subject: [PERFORM] Disk Benchmarking Question =20 I'm pretty new to benchmarking hard disks and I'm looking for some = advice on interpreting the results of some basic tests. =20 The server is: - Dell PowerEdge R430 - 1 x Intel Xeon E5-2620 2.4GHz - 32 GB RAM - 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10 - PERC H730P Raid Controller with 2GB cache in write back mode. =20 The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, = and an xfs volume for PGDATA. =20 I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. = I ran 'bonnie++ -n0 -f' on the root volume. =20 Here's a link to the bonnie test results https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=3D0 =20 The vendor stats say sustained throughput of 215 to 108 MBps, so I guess = I'd expect around 400-800 MBps read and 200-400 MBps write. In any case, = I'm pretty confused as to why the read and write sequential speeds are = almost identical. Does this look wrong? =20 Thanks, =20 Dave =20 =20 =20 =20 --=20 THIS IS A TEST ------=_NextPart_000_00C7_01D18153.E45528F0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable

Sorry for = the delay, long work day!

 

Ok, I THINK = I understand where you=E2=80=99re going.=C2=A0 Do it this = way:

4 drives in = Raid10 =3D 2 pairs of mirrored drives, aka still 2 active drives (2 are = failover).=C2=A0 They are sharing the 12gbps SAS interface, but that = speed is quite irrelevant=E2=80=A6it=E2=80=99s just a giant pipe for = filling lots of drives.=C2=A0

 

Each of your = 2 drives has a max seq read/write spec 200 MBPs (WAY max).=C2=A0 When I = say max, I mean, under totally edge laboratory conditions, writing to = the outer few tracks with purely sequential data (never happens in the = real world).=C2=A0 With 2 drives running perfectly in raid 10, the = theoretical max would be 400mbps.=C2=A0 Real world, less than half, on = sequential.

 

But random = writes are the rulers of most activity in the data world (think of = writing a single row to a table =E2=80=93 a few thousand bytes that = might be plopped anywhere on the disk and then randomly retrieved.=C2=A0 = So the MBPs throughput number becomes mostly meaningless (because the = data chunks are small and random), and IOPs and drive seek times become = king (thus my earlier comments).

 

So =E2=80=93 = if you=E2=80=99re having disk performance issues with a database, you = either add more spinning disks (to increase IOPs/distribute them) or = switch to SSDs and forget about almost = everything=E2=80=A6

 

Mike

 

-------------= -----

From:<= /b> = Dave Stibrany [mailto:dstibrany@gmail.com]
Sent: Friday, = March 18, 2016 7:48 AM

Hey = Mike,

 

Thanks for the response. I think where I'm confused is = that I thought vendor specified MBps was an estimate of sequential = read/write speed. Therefore if you're in RAID10, you'd have 4x the = sequential read speed and 2x the sequential write speed. Am I = misunderstanding something?

 

Also, when you mention that MBPs is the capacity of = the interface, what do you mean exactly. I've been taking interface = speed to be the electronic transfer speed, not the speed from the actual = physical medium, and more in the 6-12 gigabit = range.

 

Please let me know if I'm way off on any of this, I'm = hoping to have my mental model updated.

 

Thanks!

 

Dave

 

On Thu, = Mar 17, 2016 at 5:11 PM, Mike Sofen <msofen@runbox.com> = wrote:

Hi = Dave,

 =

Database = disk performance has to take into account IOPs, and IMO, over MBPs, = since it=E2=80=99s the ability of the disk subsystem to write lots of = little bits (usually) versus writing giant globs, especially in direct = attached storage (like yours, versus a SAN).  Most db disk = benchmarks revolve around IOPs=E2=80=A6and this is where SSDs utterly = crush spinning disks.

 =

You can get = maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get = a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 = pro) will support a minimum of 40k IOPs on reads and 80k IOPs on = writes.  That=E2=80=99s why SSDs are eliminating spinning disks = when performance is critical and budget allows.

 =

Back to your = question =E2=80=93 the MBPs is the capacity of interface, so it makes = sense that it=E2=80=99s the same for both reads and writes.  The = perc raid controller will be saving your bacon on writes, with 2gb cache = (assuming it=E2=80=99s caching writes), so it becomes the equivalent of = an SSD up to the capacity limit of the write cache.  With only 400 = iops of write speed, with a busy server you can easily saturate the = cache and then your system will drop to a crawl.

 =

If I = didn=E2=80=99t answer the intent of your question, feel free to clarify = for me.

 =

Mike

 =

From:<= /b> pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On = Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 = 1:45 PM
To: pgsql-performance@postgresql.org
Subject:= [PERFORM] Disk Benchmarking Question

 <= /o:p>

I'm pretty = new to benchmarking hard disks and I'm looking for some advice on = interpreting the results of some basic = tests.

 <= /o:p>

The server = is:

- Dell = PowerEdge R430

- 1 x Intel = Xeon E5-2620 2.4GHz

- 32 GB = RAM

- 4 x 600GB = 10k SAS Seagate ST600MM0088 in RAID 10

- PERC = H730P Raid Controller with 2GB cache in write back = mode.

 <= /o:p>

The OS is = Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an xfs = volume for PGDATA.

 <= /o:p>

I ran some = dd and bonnie++ tests and I'm a bit confused by the numbers. I ran = 'bonnie++ -n0 -f' on the root volume.

 <= /o:p>

Here's a = link to the bonnie test results

 <= /o:p>

The vendor = stats say sustained throughput of 215 to 108 MBps, so I guess I'd expect = around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty = confused as to why the read and write sequential speeds are almost = identical. Does this look wrong?

 <= /o:p>

Thanks,=

 <= /o:p>

Dave

 <= /o:p>

 <= /o:p>

 <= /o:p>



 

-- =

THIS IS A = TEST

------=_NextPart_000_00C7_01D18153.E45528F0--