public inbox for [email protected]  
help / color / mirror / Atom feed
From: Bryan Green <[email protected]>
To: Ranier Vilela <[email protected]>
Cc: Pg Hackers <[email protected]>
Subject: Re: Avoid multiple calls to memcpy (src/backend/access/index/genam.c)
Date: Thu, 12 Mar 2026 12:48:38 -0500
Message-ID: <CAF+pBj-pAGnTh2un8RGcDqSYuMnwGhXv5_MteB77FNjf-Af=tg@mail.gmail.com> (raw)
In-Reply-To: <CAEudQApqk6DXWgqSBdHyH7+wSxJuk7D-DwkGODUcGkUWpYu0UA@mail.gmail.com>
References: <CAEudQApbWon+3Eb9x4WW_D-JkSt2mvfx99dXu9VZ4AeCuTh=fw@mail.gmail.com>
	<CAEudQApEfvhNT1fEPURzVcQH7G0A1ukh_ugoCGaErV6_dbndCQ@mail.gmail.com>
	<CAF+pBj_RS2KErTqQ6ORXjhVzmukG7Ve0wHU1Kq56xjJfFKwVqA@mail.gmail.com>
	<CAEudQAptRymgvmd5hQb2mk-Ft89XcSo_xvC74kv4JBA9v=D4Sg@mail.gmail.com>
	<CAF+pBj-K2bgNQRc9ih01WFmAWUaQtVbS37jLtYdYh5LOwOkF6A@mail.gmail.com>
	<CAEudQApqk6DXWgqSBdHyH7+wSxJuk7D-DwkGODUcGkUWpYu0UA@mail.gmail.com>

I don't think your version 1 memcpy is doing what you think it is doing.

On Thu, Mar 12, 2026 at 12:35 PM Ranier Vilela <[email protected]> wrote:

> Hi.
>
> Em seg., 9 de mar. de 2026 às 14:02, Bryan Green <[email protected]>
> escreveu:
>
>> I performed a micro-benchmark on my dual epyc (zen 2) server and version
>> 1 wins for small values of n.
>>
>> 20 runs:
>>
>> n       version       min  median    mean     max  stddev  noise%
>> -----------------------------------------------------------------------
>> n=1     version1     2.440   2.440   2.450   2.550   0.024    4.5%
>> n=1     version2     4.260   4.280   4.277   4.290   0.007    0.7%
>>
>> n=2     version1     2.740   2.750   2.757   2.880   0.029    5.1%
>> n=2     version2     3.970   3.980   3.980   4.020   0.010    1.3%
>>
>> n=4     version1     4.580   4.595   4.649   4.910   0.094    7.2%
>> n=4     version2     5.780   5.815   5.809   5.820   0.013    0.7%
>>
>> But, micro-benchmarks always make me nervous, so I looked at the actual
>> instruction cost for my
>> platform given the version 1 and version 2 code.
>>
>> If we count cpu cycles using the AMD Zen 2 instruction latency/throughput
>> tables:  version 1 (loop body)
>> has a critical path of ~5-6 cycles per iteration.  version 2 (loop body)
>> has ~3-4 cycles per iteration.
>>
>> The problem for version 2 is that the call to memcpy is ~24-30 cycles due
>> to the stub + function call + return
>> and branch predictor pressure on first call.  This probably results in
>> ~2.5 ns per iteration cost for version 2.
>>
>> So, no I wouldn't call it an optimization.  But, it will be interesting
>> to hear other opinions on this.
>>
> I made dirty and quick tests with two versions:
> gcc 15.2.0
> gcc -O2 memcpy1.c -o memcpy1
>
> The first test was with keys 10000000 and 10000000 loops:
> version1: on memcpy call
> done in 1873 nanoseconds
>
> version2: inlined memcpy
> not finish
>
> The second test was with keys 4 and 10000000 loops:
> version1: one memcpy call
> version2: inlined memcpy call
>
> version1: done in 1519 nanoseconds
> version2: done in 104981851 nanoseconds
> (1.44692e-05 times faster)
>
> version1: done in 1979 nanoseconds
> version2: done in 110568901 nanoseconds
> (1.78983e-05 times faster)
>
> version1: done in 1814 nanoseconds
> version2: done in 108555484 nanoseconds
> (1.67103e-05 times faster)
>
> version1: done in 1631 nanoseconds
> version2: done in 109867919 nanoseconds
> (1.48451e-05 times faster)
>
> version1: done in 1269 nanoseconds
> version2: done in 111639106 nanoseconds
> (1.1367e-05 times faster)
>
> Unless I'm doing something wrong, one call memcpy wins!
> memcpy1.c attached.
>
> best regards,
> Ranier Vilela
>


view thread (17+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Avoid multiple calls to memcpy (src/backend/access/index/genam.c)
  In-Reply-To: <CAF+pBj-pAGnTh2un8RGcDqSYuMnwGhXv5_MteB77FNjf-Af=tg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox