Disk Benchmarking Question

public inbox for [email protected]  
help / color / mirror / Atom feed

Disk Benchmarking Question
7+ messages / 3 participants
[nested] [flat]

* Disk Benchmarking Question
@ 2016-03-17 20:45  Dave Stibrany <[email protected]>
  0 siblings, 2 replies; 7+ messages in thread

From: Dave Stibrany @ 2016-03-17 20:45 UTC (permalink / raw)
  To: pgsql-performance

I'm pretty new to benchmarking hard disks and I'm looking for some advice
on interpreting the results of some basic tests.

The server is:
- Dell PowerEdge R430
- 1 x Intel Xeon E5-2620 2.4GHz
- 32 GB RAM
- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10
- PERC H730P Raid Controller with 2GB cache in write back mode.

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and
an xfs volume for PGDATA.

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I
ran 'bonnie++ -n0 -f' on the root volume.

Here's a link to the bonnie test results
https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess
I'd expect around 400-800 MBps read and 200-400 MBps write. In any case,
I'm pretty confused as to why the read and write sequential speeds are
almost identical. Does this look wrong?

Thanks,

Dave

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Disk Benchmarking Question
@ 2016-03-17 21:11  Mike Sofen <[email protected]>
  parent: Dave Stibrany <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Mike Sofen @ 2016-03-17 21:11 UTC (permalink / raw)
  To: 'Dave Stibrany' <[email protected]>; pgsql-performance

Hi Dave,

Database disk performance has to take into account IOPs, and IMO, over MBPs, since it’s the ability of the disk subsystem to write lots of little bits (usually) versus writing giant globs, especially in direct attached storage (like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and this is where SSDs utterly crush spinning disks.

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs are eliminating spinning disks when performance is critical and budget allows.

Back to your question – the MBPs is the capacity of interface, so it makes sense that it’s the same for both reads and writes.  The perc raid controller will be saving your bacon on writes, with 2gb cache (assuming it’s caching writes), so it becomes the equivalent of an SSD up to the capacity limit of the write cache.  With only 400 iops of write speed, with a busy server you can easily saturate the cache and then your system will drop to a crawl.

If I didn’t answer the intent of your question, feel free to clarify for me.

Mike

From: [email protected] [mailto:[email protected]] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: [email protected]
Subject: [PERFORM] Disk Benchmarking Question

I'm pretty new to benchmarking hard disks and I'm looking for some advice on interpreting the results of some basic tests.

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an xfs volume for PGDATA.

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 'bonnie++ -n0 -f' on the root volume.

Here's a link to the bonnie test results

https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty confused as to why the read and write sequential speeds are almost identical. Does this look wrong?

Thanks,

Dave

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Disk Benchmarking Question
@ 2016-03-18 14:48  Dave Stibrany <[email protected]>
  parent: Mike Sofen <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Dave Stibrany @ 2016-03-18 14:48 UTC (permalink / raw)
  To: Mike Sofen <[email protected]>; +Cc: pgsql-performance

Hey Mike,

Thanks for the response. I think where I'm confused is that I thought
vendor specified MBps was an estimate of sequential read/write speed.
Therefore if you're in RAID10, you'd have 4x the sequential read speed and
2x the sequential write speed. Am I misunderstanding something?

Also, when you mention that MBPs is the capacity of the interface, what do
you mean exactly. I've been taking interface speed to be the electronic
transfer speed, not the speed from the actual physical medium, and more in
the 6-12 gigabit range.

Please let me know if I'm way off on any of this, I'm hoping to have my
mental model updated.

Thanks!

Dave

On Thu, Mar 17, 2016 at 5:11 PM, Mike Sofen <[email protected]> wrote:

> Hi Dave,
>
>
>
> Database disk performance has to take into account IOPs, and IMO, over
> MBPs, since it’s the ability of the disk subsystem to write lots of little
> bits (usually) versus writing giant globs, especially in direct attached
> storage (like yours, versus a SAN).  Most db disk benchmarks revolve around
> IOPs…and this is where SSDs utterly crush spinning disks.
>
>
>
> You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you
> get a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro)
> will support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s
> why SSDs are eliminating spinning disks when performance is critical and
> budget allows.
>
>
>
> Back to your question – the MBPs is the capacity of interface, so it makes
> sense that it’s the same for both reads and writes.  The perc raid
> controller will be saving your bacon on writes, with 2gb cache (assuming
> it’s caching writes), so it becomes the equivalent of an SSD up to the
> capacity limit of the write cache.  With only 400 iops of write speed, with
> a busy server you can easily saturate the cache and then your system will
> drop to a crawl.
>
>
>
> If I didn’t answer the intent of your question, feel free to clarify for
> me.
>
>
>
> Mike
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Dave Stibrany
> *Sent:* Thursday, March 17, 2016 1:45 PM
> *To:* [email protected]
> *Subject:* [PERFORM] Disk Benchmarking Question
>
>
>
> I'm pretty new to benchmarking hard disks and I'm looking for some advice
> on interpreting the results of some basic tests.
>
>
>
> The server is:
>
> - Dell PowerEdge R430
>
> - 1 x Intel Xeon E5-2620 2.4GHz
>
> - 32 GB RAM
>
> - 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10
>
> - PERC H730P Raid Controller with 2GB cache in write back mode.
>
>
>
> The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and
> an xfs volume for PGDATA.
>
>
>
> I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I
> ran 'bonnie++ -n0 -f' on the root volume.
>
>
>
> Here's a link to the bonnie test results
>
> https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0
>
>
>
> The vendor stats say sustained throughput of 215 to 108 MBps, so I guess
> I'd expect around 400-800 MBps read and 200-400 MBps write. In any case,
> I'm pretty confused as to why the read and write sequential speeds are
> almost identical. Does this look wrong?
>
>
>
> Thanks,
>
>
>
> Dave
>
>
>
>
>
>
>



-- 
*THIS IS A TEST*


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Disk Benchmarking Question
@ 2016-03-19 03:22  Mike Sofen <[email protected]>
  parent: Dave Stibrany <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Mike Sofen @ 2016-03-19 03:22 UTC (permalink / raw)
  To: pgsql-performance

Sorry for the delay, long work day!

Ok, I THINK I understand where you’re going.  Do it this way:

4 drives in Raid10 = 2 pairs of mirrored drives, aka still 2 active drives (2 are failover).  They are sharing the 12gbps SAS interface, but that speed is quite irrelevant…it’s just a giant pipe for filling lots of drives.  

Each of your 2 drives has a max seq read/write spec 200 MBPs (WAY max).  When I say max, I mean, under totally edge laboratory conditions, writing to the outer few tracks with purely sequential data (never happens in the real world).  With 2 drives running perfectly in raid 10, the theoretical max would be 400mbps.  Real world, less than half, on sequential.

But random writes are the rulers of most activity in the data world (think of writing a single row to a table – a few thousand bytes that might be plopped anywhere on the disk and then randomly retrieved.  So the MBPs throughput number becomes mostly meaningless (because the data chunks are small and random), and IOPs and drive seek times become king (thus my earlier comments).

So – if you’re having disk performance issues with a database, you either add more spinning disks (to increase IOPs/distribute them) or switch to SSDs and forget about almost everything…

Mike

------------------

From: Dave Stibrany [mailto:[email protected]] 
Sent: Friday, March 18, 2016 7:48 AM

Hey Mike,

Thanks for the response. I think where I'm confused is that I thought vendor specified MBps was an estimate of sequential read/write speed. Therefore if you're in RAID10, you'd have 4x the sequential read speed and 2x the sequential write speed. Am I misunderstanding something?

Also, when you mention that MBPs is the capacity of the interface, what do you mean exactly. I've been taking interface speed to be the electronic transfer speed, not the speed from the actual physical medium, and more in the 6-12 gigabit range.

Please let me know if I'm way off on any of this, I'm hoping to have my mental model updated.

Thanks!

Dave

On Thu, Mar 17, 2016 at 5:11 PM, Mike Sofen <[email protected] <mailto:[email protected]> > wrote:

Hi Dave,

Database disk performance has to take into account IOPs, and IMO, over MBPs, since it’s the ability of the disk subsystem to write lots of little bits (usually) versus writing giant globs, especially in direct attached storage (like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and this is where SSDs utterly crush spinning disks.

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs are eliminating spinning disks when performance is critical and budget allows.

Back to your question – the MBPs is the capacity of interface, so it makes sense that it’s the same for both reads and writes.  The perc raid controller will be saving your bacon on writes, with 2gb cache (assuming it’s caching writes), so it becomes the equivalent of an SSD up to the capacity limit of the write cache.  With only 400 iops of write speed, with a busy server you can easily saturate the cache and then your system will drop to a crawl.

If I didn’t answer the intent of your question, feel free to clarify for me.

Mike

From: [email protected] <mailto:[email protected]>  [mailto:[email protected] <mailto:[email protected]> ] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: [email protected] <mailto:[email protected]> 
Subject: [PERFORM] Disk Benchmarking Question

I'm pretty new to benchmarking hard disks and I'm looking for some advice on interpreting the results of some basic tests.

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an xfs volume for PGDATA.

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 'bonnie++ -n0 -f' on the root volume.

Here's a link to the bonnie test results

https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty confused as to why the read and write sequential speeds are almost identical. Does this look wrong?

Thanks,

Dave

-- 

THIS IS A TEST

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Disk Benchmarking Question
@ 2016-03-19 10:29  Scott Marlowe <[email protected]>
  parent: Dave Stibrany <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Scott Marlowe @ 2016-03-19 10:29 UTC (permalink / raw)
  To: Dave Stibrany <[email protected]>; +Cc: pgsql-performance

On Thu, Mar 17, 2016 at 2:45 PM, Dave Stibrany <[email protected]> wrote:
> I'm pretty new to benchmarking hard disks and I'm looking for some advice on
> interpreting the results of some basic tests.
>
> The server is:
> - Dell PowerEdge R430
> - 1 x Intel Xeon E5-2620 2.4GHz
> - 32 GB RAM
> - 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10
> - PERC H730P Raid Controller with 2GB cache in write back mode.
>
> The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and
> an xfs volume for PGDATA.
>
> I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I
> ran 'bonnie++ -n0 -f' on the root volume.
>
> Here's a link to the bonnie test results
> https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0
>
> The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd
> expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm
> pretty confused as to why the read and write sequential speeds are almost
> identical. Does this look wrong?

For future reference, it's good to include the data you linked to in
your post, as in 2, 5 or 10 years the postgresql discussion archives
will still be here but your dropbox may or may not, and then people
won't know what numbers you are referring to.

Given the size of your bonnie test set and the fact that you're using
RAID-10, the cache should make little or no difference. The RAID
controller may or may not interleave reads between all four drives.
Some do, some don't. It looks to me like yours doesn't. I.e. when
reading it's not reading all 4 disks at once, but just 2, 1 from each
pair.

But the important question here is what kind of workload are you
looking at throwing at this server? If it's going to be a reporting
database you may get as good or better read performance from RAID-5 as
RAID-10, especially if you add more drives. If you're looking at
transactional use then as Mike suggested SSDs might be your best
choice.

We run some big transactional dbs at work that are 4 to 6 TB and for
those we use 10 800GB SSDs in RAID-5 with the RAID controller cache
turned off. We can hit ~18k tps in pgbench on ~100GB test sets. With
the cache on we drop to 3 to 5k tps. With 512MB cache we overwrite the
cache every couple of seconds and it just gets in the way.

SSDs win hands down if you need random access speed. It's like a
Stanley Steamer (spinners) versus a Bugatti Veyron (SSDs).

For sequential throughput like a reporting server often spinners do
alright, as long as there's only one or two processes accessing your
data at a time. As soon as you start to get more accesses going as you
have RAID-10 pairs your performance will drop off noticeably.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Disk Benchmarking Question
@ 2016-03-19 10:32  Scott Marlowe <[email protected]>
  parent: Scott Marlowe <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Scott Marlowe @ 2016-03-19 10:32 UTC (permalink / raw)
  To: Dave Stibrany <[email protected]>; +Cc: pgsql-performance

On Sat, Mar 19, 2016 at 4:29 AM, Scott Marlowe <[email protected]> wrote:

> Given the size of your bonnie test set and the fact that you're using
> RAID-10, the cache should make little or no difference. The RAID
> controller may or may not interleave reads between all four drives.
> Some do, some don't. It looks to me like yours doesn't. I.e. when
> reading it's not reading all 4 disks at once, but just 2, 1 from each
> pair.

Point of clarification. It may be that if two processes are reading
the data set at once you'd get a sustained individual throughput that
matches what a single read can get.


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Disk Benchmarking Question
@ 2016-03-22 14:44  Dave Stibrany <[email protected]>
  parent: Scott Marlowe <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Dave Stibrany @ 2016-03-22 14:44 UTC (permalink / raw)
  To: pgsql-performance

Thanks for the feedback guys. I'm looking forward to the day when we
upgrade to SSDs.

For future reference, the bonnie++ numbers I was referring to are:

Size: 63G

Sequential Output:
------------------------
396505 K/sec
% CPU 21

Sequential Input:
------------------------
401117 K/sec
% CPU 21

Random Seeks:
----------------------
650.7 /sec
% CPU 25

I think a lot of my confusion resulted from expecting sequential reads to
be 4x the speed of a single disk because the disks are in RAID10. I'm
thinking now that the 4x only applies to random reads.

On Sat, Mar 19, 2016 at 6:32 AM, Scott Marlowe <[email protected]>
wrote:

> On Sat, Mar 19, 2016 at 4:29 AM, Scott Marlowe <[email protected]>
> wrote:
>
> > Given the size of your bonnie test set and the fact that you're using
> > RAID-10, the cache should make little or no difference. The RAID
> > controller may or may not interleave reads between all four drives.
> > Some do, some don't. It looks to me like yours doesn't. I.e. when
> > reading it's not reading all 4 disks at once, but just 2, 1 from each
> > pair.
>
> Point of clarification. It may be that if two processes are reading
> the data set at once you'd get a sustained individual throughput that
> matches what a single read can get.
>



-- 
*THIS IS A TEST*


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

end of thread, other threads:[~2016-03-22 14:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17 20:45 Disk Benchmarking Question Dave Stibrany <[email protected]>
2016-03-17 21:11 ` Mike Sofen <[email protected]>
2016-03-18 14:48   ` Dave Stibrany <[email protected]>
2016-03-19 03:22     ` Mike Sofen <[email protected]>
2016-03-19 10:29 ` Scott Marlowe <[email protected]>
2016-03-19 10:32   ` Scott Marlowe <[email protected]>
2016-03-22 14:44     ` Dave Stibrany <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox