public inbox for [email protected]  
help / color / mirror / Atom feed
10x faster sort performance on Skylake CPU vs Ivy Bridge
4+ messages / 3 participants
[nested] [flat]

* 10x faster sort performance on Skylake CPU vs Ivy Bridge
@ 2017-08-25 14:12 Felix Geisendörfer <[email protected]>
  2017-08-25 15:07 ` Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge Tom Lane <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Felix Geisendörfer @ 2017-08-25 14:12 UTC (permalink / raw)
  To: pgsql-performance

Hi,

I recently came across a performance difference between two machines that surprised me:

Postgres Version / OS on both machines: v9.6.3 / MacOS 10.12.5

Machine A: MacBook Pro Mid 2012, 2.7 GHz Intel Core i7 (Ivy Bridge), 8 MB L3 Cache, 16 GB 1600 MHz DDR3 [1]
Machine B: MacBook Pro Late 2016, 2.6 GHz Intel Core i7 (Skylake), 6 MB L3 Cache,16 GB 2133 MHz LPDDR3 [2]

Query Performance on Machine A: [3]

CTE Scan on zulu  (cost=40673.620..40742.300 rows=3434 width=56) (actual time=6339.404..6339.462 rows=58 loops=1)
  CTE zulu
      ->  HashAggregate  (cost=40639.280..40673.620 rows=3434 width=31) (actual time=6339.400..6339.434 rows=58 loops=1)
              Group Key: mike.two, mike.golf
            ->  Unique  (cost=37656.690..40038.310 rows=34341 width=64) (actual time=5937.934..6143.161 rows=298104 loops=1)
                  ->  Sort  (cost=37656.690..38450.560 rows=317549 width=64) (actual time=5937.933..6031.925 rows=316982 loops=1)
                          Sort Key: mike.two, mike.lima, mike.echo DESC, mike.quebec
                          Sort Method: quicksort  Memory: 56834kB
                        ->  Seq Scan on mike  (cost=0.000..8638.080 rows=317549 width=64) (actual time=0.019..142.831 rows=316982 loops=1)
                                Filter: (golf five NOT NULL)
                                Rows Removed by Filter: 26426

Query Performance on Machine B: [4]

CTE Scan on zulu  (cost=40621.420..40690.100 rows=3434 width=56) (actual time=853.436..853.472 rows=58 loops=1)
  CTE zulu
      ->  HashAggregate  (cost=40587.080..40621.420 rows=3434 width=31) (actual time=853.433..853.448 rows=58 loops=1)
              Group Key: mike.two, mike.golf
            ->  Unique  (cost=37608.180..39986.110 rows=34341 width=64) (actual time=634.412..761.678 rows=298104 loops=1)
                  ->  Sort  (cost=37608.180..38400.830 rows=317057 width=64) (actual time=634.411..694.719 rows=316982 loops=1)
                          Sort Key: mike.two, mike.lima, mike.echo DESC, mike.quebec
                          Sort Method: quicksort  Memory: 56834kB
                        ->  Seq Scan on mike  (cost=0.000..8638.080 rows=317057 width=64) (actual time=0.047..85.534 rows=316982 loops=1)
                                Filter: (golf five NOT NULL)
                                Rows Removed by Filter: 26426

As you can see, Machine A spends 5889ms on the Sort Node vs 609ms on Machine B when looking at the "Exclusive" time with explain.depesz.com [3][4]. I.e. Machine B is ~10x faster at sorting than Machine B (for this particular query).

My question is: Why?

I understand that this is a 3rd gen CPU vs a 6th gen, and that things have gotten faster despite stagnant clock speeds, but seeing a 10x difference still caught me off guard.

Does anybody have some pointers to understand where those gains are coming from? Is it the CPU, memory, or both? And in particular, why does Sort benefit so massively from the advancement here (~10x), but Seq Scan, Unique and HashAggregate don't benefit as much (~2x)?

As you can probably tell, my hardware knowledge is very superficial, so I apologize if this is a stupid question. But I'd genuinely like to improve my understanding and intuition about these things.

Cheers
Felix Geisendörfer

[1] http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-2.7-15-mid-2012-retina-d...
[2] http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-2.6-15-late-2016-retina-...
[3] https://explain.depesz.com/s/hmn
[4] https://explain.depesz.com/s/zVe

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge
  2017-08-25 14:12 10x faster sort performance on Skylake CPU vs Ivy Bridge Felix Geisendörfer <[email protected]>
@ 2017-08-25 15:07 ` Tom Lane <[email protected]>
  2017-08-25 15:43   ` Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge Peter Geoghegan <[email protected]>
  2017-08-27 10:56   ` Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge Felix Geisendörfer <[email protected]>
  0 siblings, 2 replies; 4+ messages in thread

From: Tom Lane @ 2017-08-25 15:07 UTC (permalink / raw)
  To: Felix Geisendörfer <[email protected]>; +Cc: pgsql-performance

=?utf-8?Q?Felix_Geisend=C3=B6rfer?= <[email protected]> writes:
> I recently came across a performance difference between two machines that surprised me:
> ...
> As you can see, Machine A spends 5889ms on the Sort Node vs 609ms on Machine B when looking at the "Exclusive" time with explain.depesz.com [3][4]. I.e. Machine B is ~10x faster at sorting than Machine B (for this particular query).

I doubt this is a hardware issue, it's more likely that you're comparing
apples and oranges.  The first theory that springs to mind is that the
sort keys are strings and you're using C locale on the faster machine but
some non-C locale on the slower.  strcoll() is pretty darn expensive
compared to strcmp() :-(

			regards, tom lane


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge
  2017-08-25 14:12 10x faster sort performance on Skylake CPU vs Ivy Bridge Felix Geisendörfer <[email protected]>
  2017-08-25 15:07 ` Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge Tom Lane <[email protected]>
@ 2017-08-25 15:43   ` Peter Geoghegan <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread

From: Peter Geoghegan @ 2017-08-25 15:43 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Felix Geisendörfer <[email protected]>; pgsql-performance

On Fri, Aug 25, 2017 at 8:07 AM, Tom Lane <[email protected]> wrote:
> I doubt this is a hardware issue, it's more likely that you're comparing
> apples and oranges.  The first theory that springs to mind is that the
> sort keys are strings and you're using C locale on the faster machine but
> some non-C locale on the slower.  strcoll() is pretty darn expensive
> compared to strcmp() :-(

strcoll() is very noticeably slower on macOS, too.

-- 
Peter Geoghegan


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge
  2017-08-25 14:12 10x faster sort performance on Skylake CPU vs Ivy Bridge Felix Geisendörfer <[email protected]>
  2017-08-25 15:07 ` Re: 10x faster sort performance on Skylake CPU vs Ivy Bridge Tom Lane <[email protected]>
@ 2017-08-27 10:56   ` Felix Geisendörfer <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread

From: Felix Geisendörfer @ 2017-08-27 10:56 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: pgsql-performance


> On Aug 25, 2017, at 17:07, Tom Lane <[email protected]> wrote:
> 
> =?utf-8?Q?Felix_Geisend=C3=B6rfer?= <[email protected]> writes:
>> I recently came across a performance difference between two machines that surprised me:
>> ...
>> As you can see, Machine A spends 5889ms on the Sort Node vs 609ms on Machine B when looking at the "Exclusive" time with explain.depesz.com [3][4]. I.e. Machine B is ~10x faster at sorting than Machine B (for this particular query).
> 
> I doubt this is a hardware issue, it's more likely that you're comparing
> apples and oranges.  The first theory that springs to mind is that the
> sort keys are strings and you're using C locale on the faster machine but
> some non-C locale on the slower.  strcoll() is pretty darn expensive
> compared to strcmp() :-(

You're right, that seems to be it.

Machine A was using strcoll() (lc_collate=en_US.UTF-8)
Machine B was using strcmp() (lc_collate=C)

After switching Machine A to use lc_collate=C, I get:

CTE Scan on zulu  (cost=40673.620..40742.300 rows=3434 width=56) (actual time=1368.610..1368.698 rows=58 loops=1)
  CTE zulu
      ->  HashAggregate  (cost=40639.280..40673.620 rows=3434 width=56) (actual time=1368.607..1368.659 rows=58 loops=1)
              Group Key: mike.two, ((mike.golf)::text)
            ->  Unique  (cost=37656.690..40038.310 rows=34341 width=104) (actual time=958.493..1168.128 rows=298104 loops=1)
                  ->  Sort  (cost=37656.690..38450.560 rows=317549 width=104) (actual time=958.491..1055.635 rows=316982 loops=1)
                          Sort Key: mike.two, ((mike.lima)::text) COLLATE "papa", mike.echo DESC, mike.quebec
                          Sort Method: quicksort  Memory: 56834kB
                        ->  Seq Scan on mike  (cost=0.000..8638.080 rows=317549 width=104) (actual time=0.043..172.496 rows=316982 loops=1)
                                Filter: (golf five NOT NULL)
                                Rows Removed by Filter: 26426

So Machine A needs 883ms [1] for the sort vs 609ms [2] for Machine B. That's  ~1.4x faster which seems reasonable :).

Sorry for the delayed response, I didn't have access to machine B to confirm this right away.

> 			regards, tom lane

This is my first post to a PostgreSQL mailing list, but I've been lurking
for a while. Thank you for taking the time for replying to e-mails such
as mine and all the work you've put into PostgreSQL over the years.
I'm deeply grateful.

> On Aug 25, 2017, at 17:43, Peter Geoghegan <[email protected]> wrote:
> 
> On Fri, Aug 25, 2017 at 8:07 AM, Tom Lane <[email protected]> wrote:
>> I doubt this is a hardware issue, it's more likely that you're comparing
>> apples and oranges.  The first theory that springs to mind is that the
>> sort keys are strings and you're using C locale on the faster machine but
>> some non-C locale on the slower.  strcoll() is pretty darn expensive
>> compared to strcmp() :-(
> 
> strcoll() is very noticeably slower on macOS, too.
> 

Thanks. This immediately explains what I saw when testing this query on a Linux machine that was also using lc_collate=en_US.UTF-8 but not being slowed down by it as much as the macOS machine.

[1] https://explain.depesz.com/s/LOqa
[2] https://explain.depesz.com/s/zVe

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2017-08-27 10:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2017-08-25 14:12 10x faster sort performance on Skylake CPU vs Ivy Bridge Felix Geisendörfer <[email protected]>
2017-08-25 15:07 ` Tom Lane <[email protected]>
2017-08-25 15:43   ` Peter Geoghegan <[email protected]>
2017-08-27 10:56   ` Felix Geisendörfer <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox