Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0jwQ-002ATf-2j for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 17:35:22 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0jwP-00GCwS-0I for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 17:35:21 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0jwO-00GCwK-2U for pgsql-hackers@lists.postgresql.org; Thu, 12 Mar 2026 17:35:21 +0000 Received: from mail-dy1-x132b.google.com ([2607:f8b0:4864:20::132b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0jwM-00000002Kgr-2dbl for pgsql-hackers@postgresql.org; Thu, 12 Mar 2026 17:35:20 +0000 Received: by mail-dy1-x132b.google.com with SMTP id 5a478bee46e88-2be1ab1fa7dso719889eec.0 for ; Thu, 12 Mar 2026 10:35:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773336917; cv=none; d=google.com; s=arc-20240605; b=LFhZ7nZ8HMXcg0oCVfkY63ymwtSicDrUc/Mg8h/dbM/SVlnI3zY9f8PRBs1UtzWTO4 ryz62UAA7UkhJalpL+bzVei816n/YTXDYr7lp7iPqzwx+FYlWL9Kcg2nq42LJ1baiYjp TMcMFSqLcaXyUWpQ6QYEoP9MrDLb1KXQmR7d5iNmq3Zua/xAxjoOI7xgQ3hdhIiKMOdu vYlIR3uzF0Sq7+2Txgy5vaL6nzcawVIIlgqPsB+HuX0169YNhVbHBPRq8ud29vJnlt8/ RQqiZxY4wTLvVvyOxceH1Hbo9DBsKJaM7F7Prvp+T8fXC+q69ORpOip15PLV3C/uweUg ONOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=KCdfN3GCSgaz2ZIuiPBAy3uqIELL6vTN+5TKhC3bwVo=; fh=BawSWetgE9wsa7rjWUMyEEUS/e58KHEh4Qu390NnxeE=; b=jo0RxuOx5viZciahASR/QtYzHHv4nxcohrgLMpP71g6tc2ZpbrkP6VslCfImLs96lP NJzr5SU9znsDoixNv3DGeH7roUuxhb8wSAWuMBm+g+eIJ6cqtTiem6drKl8dHZWZz9x7 6nldlSI3iofyRLts2FdQ8LamP8RQuJbHUfa7HIcHBA2MyltloXUP+7K6PO03ptusgdic eHnqwz+jGJxzzvTkf1jxKvKN5CJjHfNaTSU0weHsyttazT0NNprJzi3aiUaBjTTFf4AE /bQoGr2RaUkdYbYDLbNkHaoHpLk3XhCgSe1lROfxfxyweFgASGWkog5BiQs8/ys/ScMg 0iWA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773336917; x=1773941717; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KCdfN3GCSgaz2ZIuiPBAy3uqIELL6vTN+5TKhC3bwVo=; b=aHb9GFbUyVNzG+FGHRzuzksjE0uIROqHSEyx0S/xZhCKZL7vG9UNkDP0/u7Qkexcoe lLNbnfw3KTcnQhiGZTzrbXfDbgP3WgZa/IPA0sxHJ/VVhYXNMgMJyzC0t2TZPuy/JKAy Mi3eAmmLwJ8naWCaXnWmw0JcTaN0XkN9ua6yfryMIPYKH5/KdUJ1JWZ86v5WAOt9+iSJ 9igQ7fSirkaEaTJRuRI/M/734I0M2jRKNIy/srUfvcGWdbQK4ooGkq0FpVLl8zTvpOOH 1MtJtAFHDyuDZnsXxJQzATxlXYhmw8rlSskSxct5r8lo5IzyG58RVe4yz9zVB3lQYyey gJVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773336917; x=1773941717; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KCdfN3GCSgaz2ZIuiPBAy3uqIELL6vTN+5TKhC3bwVo=; b=HfBo7dvJX4ZNKYknQQ5Q+3bVyboni1q28PYdy0j33FvL83LeHqDPeOtBazta7ezncK 1rfoq9RGwk1o5rabYTkUx1rsRgACMSZJab0JO8JuEKe6v+asKI6FR5+DicuuEKKhlFEx vKwx2MgduMNcGTsR70EaMSPA/K329+YG1vvZF5oz58Td3qXzi+aQfrbDvCxe9/34NYUK HV800Oo6C11TNOw9i/zJvbrrFJUy8Lx4/6LXun0LnaQdlWldUkAdQV/bb1g2SOsZpwzf pDCF7F5CmLZ/tZqyBM/w7C14RiWInY/4Ox7FlU2n693yqOxZ3r8qfK+9uIgkaRSUoiqj 4MsQ== X-Gm-Message-State: AOJu0Yx08W3pI9bPfJHn+QbjfCRSogSZOxIhKbvge+D/Wxxr9P6S6oHy A1Mh3eNagGfqRcxMX/sLYiqrh6kfRp0aJ8UpPjnOgaIF/fdZT0jAnTCNfzFo3oHiUh7USSKFsWT qkPB2ARBwHwEm3coDYqgg+6bNHYWBPjg= X-Gm-Gg: ATEYQzxqGxODus8jipJZyjpXnBwMl2/eMjeWTrImhZD4T6yliOZOqi1w1cHcZxXC0rc lIR4d00gua1j/j7jMWFognOoyWDdOFvY2BCGod3WcbjDbHygDU5/aV1etrsQFpBCnu2AJh7umKs HSZaWrVSECrLmSmcxn/APGPY2gVHpI52md95U+3ygZQvISmwyDNgoG2IEcKlzgnnwVKK7BFOncE oBHizCJUJvG9i86PGZTT/OJGtv9VDNidOn/kMJ89ify5B7bBtpi4Yr0ASDQhlqDzLChZ1/Y6Aq/ ETmj8lg= X-Received: by 2002:a05:7300:a28d:b0:2b8:6abf:5ebf with SMTP id 5a478bee46e88-2bea5456eaamr257923eec.12.1773336916536; Thu, 12 Mar 2026 10:35:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ranier Vilela Date: Thu, 12 Mar 2026 11:35:04 -0300 X-Gm-Features: AaiRm513Cmlc1Cd9r1nNK6pk-Ust0ZLYpKId9Ilv_k-yUPhiNbJq_y0amMiN5Es Message-ID: Subject: Re: Avoid multiple calls to memcpy (src/backend/access/index/genam.c) To: Bryan Green Cc: Pg Hackers Content-Type: multipart/mixed; boundary="000000000000faaf27064cd72c94" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000faaf27064cd72c94 Content-Type: multipart/alternative; boundary="000000000000faaf25064cd72c92" --000000000000faaf25064cd72c92 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi. Em seg., 9 de mar. de 2026 =C3=A0s 14:02, Bryan Green escreveu: > I performed a micro-benchmark on my dual epyc (zen 2) server and version = 1 > wins for small values of n. > > 20 runs: > > n version min median mean max stddev noise% > ----------------------------------------------------------------------- > n=3D1 version1 2.440 2.440 2.450 2.550 0.024 4.5% > n=3D1 version2 4.260 4.280 4.277 4.290 0.007 0.7% > > n=3D2 version1 2.740 2.750 2.757 2.880 0.029 5.1% > n=3D2 version2 3.970 3.980 3.980 4.020 0.010 1.3% > > n=3D4 version1 4.580 4.595 4.649 4.910 0.094 7.2% > n=3D4 version2 5.780 5.815 5.809 5.820 0.013 0.7% > > But, micro-benchmarks always make me nervous, so I looked at the actual > instruction cost for my > platform given the version 1 and version 2 code. > > If we count cpu cycles using the AMD Zen 2 instruction latency/throughput > tables: version 1 (loop body) > has a critical path of ~5-6 cycles per iteration. version 2 (loop body) > has ~3-4 cycles per iteration. > > The problem for version 2 is that the call to memcpy is ~24-30 cycles due > to the stub + function call + return > and branch predictor pressure on first call. This probably results in > ~2.5 ns per iteration cost for version 2. > > So, no I wouldn't call it an optimization. But, it will be interesting t= o > hear other opinions on this. > I made dirty and quick tests with two versions: gcc 15.2.0 gcc -O2 memcpy1.c -o memcpy1 The first test was with keys 10000000 and 10000000 loops: version1: on memcpy call done in 1873 nanoseconds version2: inlined memcpy not finish The second test was with keys 4 and 10000000 loops: version1: one memcpy call version2: inlined memcpy call version1: done in 1519 nanoseconds version2: done in 104981851 nanoseconds (1.44692e-05 times faster) version1: done in 1979 nanoseconds version2: done in 110568901 nanoseconds (1.78983e-05 times faster) version1: done in 1814 nanoseconds version2: done in 108555484 nanoseconds (1.67103e-05 times faster) version1: done in 1631 nanoseconds version2: done in 109867919 nanoseconds (1.48451e-05 times faster) version1: done in 1269 nanoseconds version2: done in 111639106 nanoseconds (1.1367e-05 times faster) Unless I'm doing something wrong, one call memcpy wins! memcpy1.c attached. best regards, Ranier Vilela --000000000000faaf25064cd72c92 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi.

Em seg., 9 de mar. de 2026 = =C3=A0s 14:02, Bryan Green <db= ryan.green@gmail.com> escreveu:
I performed a micro-benchmark on my= dual epyc (zen 2) server and version 1 wins for small values of n.
20 runs:=C2=A0

n =C2=A0 =C2=A0 =C2=A0 versio= n =C2=A0 =C2=A0 =C2=A0 min =C2=A0median =C2=A0 =C2=A0mean =C2=A0 =C2=A0 max= =C2=A0stddev =C2=A0noise%
---------------------------------------------= --------------------------
n=3D1 =C2=A0 =C2=A0 version1 =C2=A0 =C2=A0 2.= 440 =C2=A0 2.440 =C2=A0 2.450 =C2=A0 2.550 =C2=A0 0.024 =C2=A0 =C2=A04.5%n=3D1 =C2=A0 =C2=A0 version2 =C2=A0 =C2=A0 4.260 =C2=A0 4.280 =C2=A0 4.27= 7 =C2=A0 4.290 =C2=A0 0.007 =C2=A0 =C2=A00.7%

n=3D2 =C2=A0 =C2=A0 ve= rsion1 =C2=A0 =C2=A0 2.740 =C2=A0 2.750 =C2=A0 2.757 =C2=A0 2.880 =C2=A0 0.= 029 =C2=A0 =C2=A05.1%
n=3D2 =C2=A0 =C2=A0 version2 =C2=A0 =C2=A0 3.970 = =C2=A0 3.980 =C2=A0 3.980 =C2=A0 4.020 =C2=A0 0.010 =C2=A0 =C2=A01.3%
n=3D4 =C2=A0 =C2=A0 version1 =C2=A0 =C2=A0 4.580 =C2=A0 4.595 =C2=A0 4.64= 9 =C2=A0 4.910 =C2=A0 0.094 =C2=A0 =C2=A07.2%
n=3D4 =C2=A0 =C2=A0 versio= n2 =C2=A0 =C2=A0 5.780 =C2=A0 5.815 =C2=A0 5.809 =C2=A0 5.820 =C2=A0 0.013 = =C2=A0 =C2=A00.7%

But, micro-benchmarks always mak= e me nervous, so I looked at the actual instruction cost for my=C2=A0
=
platform given the version 1 and version 2 code.

<= div>If we count cpu cycles using the AMD Zen 2 instruction latency/throughp= ut tables:=C2=A0 version 1 (loop body)=C2=A0
has a critical path = of ~5-6 cycles per iteration.=C2=A0 version 2 (loop body) has ~3-4 cycles p= er iteration.=C2=A0

The problem for version 2 is t= hat the call to memcpy is ~24-30 cycles due to the stub=C2=A0+ function cal= l=C2=A0+ return
and branch predictor pressure on first call.=C2= =A0 This probably results in ~2.5 ns per iteration cost for version 2.

So, no I wouldn't call it an optimization.=C2=A0 B= ut, it will be interesting to hear other opinions on this.=C2=A0
I made dirty and quick tests with two versions:
gcc 15.2.0
gcc -O2 memcpy1.c -o memcpy1

The first test was with keys=C2=A010000000 and=C2=A010000000 loops:=
version1: on memcpy call
done in 1873 nanoseconds
<= br>
version2: inlined memcpy
not finish

<= /div>
The second test was with keys 4 and=C2=A010000000 loops:
version1: one memcpy call
version2: inlined memcpy call

version1: done in 1519 nanoseconds
version2: done in 1= 04981851 nanoseconds
(1.44692e-05 times faster)

version1: = done in 1979 nanoseconds
version2: done in 110568901 nanoseconds
(1.7= 8983e-05 times faster)

version1: done in 1814 nanoseconds=
version2: done in 108555484 nanoseconds
(1.67103e-05 times faster)
version1: done in 1631 nanoseconds
version2: done in 10= 9867919 nanoseconds
(1.48451e-05 times faster)

version1: done in = 1269 nanoseconds
version2: done in 111639106 nanoseconds
(1.1367e-05 = times faster)

Unless I'm doing something wrong, one c= all memcpy wins!
memcpy1.c attached.

bes= t regards,
Ranier Vilela
--000000000000faaf25064cd72c92-- --000000000000faaf27064cd72c94 Content-Type: text/x-csrc; charset="US-ASCII"; name="memcpy1.c" Content-Disposition: attachment; filename="memcpy1.c" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_mmnkjb540 I2luY2x1ZGUgPHN0ZGxpYi5oPg0KI2luY2x1ZGUgPHN0cmluZy5oPg0KI2luY2x1ZGUgPHN0ZGlu dC5oPg0KDQojaW5jbHVkZSA8c3RkYm9vbC5oPg0KI2luY2x1ZGUgPHN0ZGRlZi5oPg0KI2luY2x1 ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8dGltZS5oPg0KI2luY2x1ZGUgPGltbWludHJpbi5oPg0K DQoNCi8qIENsb3NlciBhcHByb3hpbWF0aW9uIG9mIFNjYW5LZXlEYXRhIC0gaGFzIGZ1bmN0aW9u IHBvaW50ZXIgYW5kIERhdHVtICovDQp0eXBlZGVmIHZvaWQgKCpSZWdQcm9jZWR1cmUpKHZvaWQp Ow0KdHlwZWRlZiB1aW50cHRyX3QgRGF0dW07DQoNCnR5cGVkZWYgc3RydWN0IFNjYW5LZXlEYXRh DQp7DQogICAgaW50ICAgICAgICAgc2tfZmxhZ3M7DQogICAgaW50ICAgICAgICAgc2tfYXR0bm87 DQogICAgUmVnUHJvY2VkdXJlIHNrX2Z1bmM7DQogICAgRGF0dW0gICAgICAgc2tfYXJndW1lbnQ7 DQp9IFNjYW5LZXlEYXRhOw0KDQovKiAgKi8NCmNvbnN0IFNjYW5LZXlEYXRhICogdmVyc2lvbjEo aW50IG4sIGNvbnN0IFNjYW5LZXlEYXRhICAqIGtleSkNCnsNCiAgICBTY2FuS2V5RGF0YSAqaWR4 a2V5ID0gKFNjYW5LZXlEYXRhICopIG1hbGxvYyhuICogc2l6ZW9mKFNjYW5LZXlEYXRhKSk7DQoN CiAgICBtZW1jcHkoJmlkeGtleSwgJmtleSwgbiAqIHNpemVvZihTY2FuS2V5RGF0YSkpOw0KICAg IGZvciAoaW50IGkgPSAwOyBpIDwgbjsgaSsrKQ0KICAgIHsNCiAgICAgICAgaWR4a2V5W2ldLnNr X2F0dG5vID0gaSArIDE7DQogICAgfQ0KDQogICAgcmV0dXJuIGlkeGtleTsNCn0NCg0KLyogKi8N CmNvbnN0IFNjYW5LZXlEYXRhICogdmVyc2lvbjIoaW50IG4sIGNvbnN0IFNjYW5LZXlEYXRhICpr ZXkpDQp7DQogICAgU2NhbktleURhdGEgKmlkeGtleSA9IChTY2FuS2V5RGF0YSAqKSBtYWxsb2Mo biAqIHNpemVvZihTY2FuS2V5RGF0YSkpOw0KDQogICAgZm9yIChpbnQgaSA9IDA7IGkgPCBuOyBp KyspDQogICAgew0KICAgICAgICBtZW1jcHkoJmlkeGtleVtpXSwgJmtleVtpXSwgc2l6ZW9mKFNj YW5LZXlEYXRhKSk7DQogICAgICAgIGlkeGtleVtpXS5za19hdHRubyA9IGkgKyAxOw0KICAgIH0N Cg0KICAgIHJldHVybiBpZHhrZXk7DQp9DQoNCg0KDQoNCiNkZWZpbmUgTkFOT1NFQ19QRVJfU0VD IDEwMDAwMDAwMDANCg0KLy8gUmV0dXJucyBkaWZmZXJlbmNlIGluIG5hbm9zZWNvbmRzDQppbnQ2 NF90DQpnZXRfY2xvY2tfZGlmZihzdHJ1Y3QgdGltZXNwZWMgKnQxLCBzdHJ1Y3QgdGltZXNwZWMg KnQyKQ0Kew0KCWludDY0X3QgbmFub3NlYyA9ICh0MS0+dHZfc2VjIC0gdDItPnR2X3NlYykgKiBO QU5PU0VDX1BFUl9TRUM7DQoJbmFub3NlYyArPSAodDEtPnR2X25zZWMgLSB0Mi0+dHZfbnNlYyk7 DQoNCglyZXR1cm4gbmFub3NlYzsNCn0NCg0KDQoNCg0KLy8jZGVmaW5lIE5LRVlTIDEwMDAwMDAw IHZlcnNpb24yIGRvZXMgbm90IGZpbmlzaA0KI2RlZmluZSBOS0VZUyA0DQojZGVmaW5lIExPT1BT IDEwMDAwMDAwDQoNCnZvaWQgdGVzdDEoaW50IG4pDQp7DQoJU2NhbktleURhdGEgKmtleXM7DQoJ U2NhbktleURhdGEgKmlkeDsNCg0KICAgICAgICBrZXlzID0gKFNjYW5LZXlEYXRhICopIG1hbGxv YyhOS0VZUyAqIHNpemVvZihTY2FuS2V5RGF0YSkpOw0KCW1lbXNldChrZXlzLCAwLCBOS0VZUyAq IHNpemVvZihTY2FuS2V5RGF0YSkpOw0KDQoJZm9yKGludCBpID0gMDsgaSA8IG47IGkrKykNCgl7 DQoJCWlkeCA9IHZlcnNpb24xKE5LRVlTLCBrZXlzKTsNCgkJZnJlZShpZHgpOw0KCX0NCglmcmVl KGtleXMpOw0KfQ0KDQp2b2lkIHRlc3QyKGludCBuKQ0Kew0KCVNjYW5LZXlEYXRhICprZXlzOw0K CVNjYW5LZXlEYXRhICppZHg7DQoNCiAgICAgICAga2V5cyA9IChTY2FuS2V5RGF0YSAqKSBtYWxs b2MoTktFWVMgKiBzaXplb2YoU2NhbktleURhdGEpKTsNCgltZW1zZXQoa2V5cywgMCwgTktFWVMg KiBzaXplb2YoU2NhbktleURhdGEpKTsNCg0KCWZvcihpbnQgaSA9IDA7IGkgPCBuOyBpKyspDQoJ ew0KCQlpZHggPSB2ZXJzaW9uMihOS0VZUywga2V5cyk7DQoJCWZyZWUoaWR4KTsNCgl9DQoJZnJl ZShrZXlzKTsNCn0NCg0KDQppbnQgbWFpbih2b2lkKQ0Kew0KCXN0cnVjdCB0aW1lc3BlYyBzdGFy dCxlbmQ7DQoJaW50NjRfdCB2ZXJzaW9uMV90aW1lLCB2ZXJzaW9uMl90aW1lOw0KDQoJY2xvY2tf Z2V0dGltZShDTE9DS19QUk9DRVNTX0NQVVRJTUVfSUQsICZzdGFydCk7DQoJdGVzdDEoTE9PUFMp Ow0KCWNsb2NrX2dldHRpbWUoQ0xPQ0tfUFJPQ0VTU19DUFVUSU1FX0lELCAmZW5kKTsNCgl2ZXJz aW9uMV90aW1lID0gZ2V0X2Nsb2NrX2RpZmYoJmVuZCwgJnN0YXJ0KTsNCglwcmludGYoInZlcnNp b24xOiBkb25lIGluICVsbGQgbmFub3NlY29uZHNcbiIsIHZlcnNpb24xX3RpbWUpOwkNCg0KCWNs b2NrX2dldHRpbWUoQ0xPQ0tfUFJPQ0VTU19DUFVUSU1FX0lELCAmc3RhcnQpOw0KCXRlc3QyKExP T1BTKTsNCgljbG9ja19nZXR0aW1lKENMT0NLX1BST0NFU1NfQ1BVVElNRV9JRCwgJmVuZCk7DQoJ dmVyc2lvbjJfdGltZSA9IGdldF9jbG9ja19kaWZmKCZlbmQsICZzdGFydCk7DQoJcHJpbnRmKCJ2 ZXJzaW9uMjogZG9uZSBpbiAlbGxkIG5hbm9zZWNvbmRzXG4iLCB2ZXJzaW9uMl90aW1lKTsJDQoN CglwcmludGYoIiglZyB0aW1lcyBmYXN0ZXIpXG4iLCAoZG91YmxlKSB2ZXJzaW9uMV90aW1lIC8g dmVyc2lvbjJfdGltZSk7DQoNCglyZXR1cm4gMDsNCn0NCg== --000000000000faaf27064cd72c94--