Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0lgy-002ByJ-23 for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 19:27:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0lgv-00GzJF-2k for pgsql-hackers@arkaria.postgresql.org; Thu, 12 Mar 2026 19:27:30 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0lgv-00GzJ5-1i for pgsql-hackers@lists.postgresql.org; Thu, 12 Mar 2026 19:27:30 +0000 Received: from mail-dy1-x1330.google.com ([2607:f8b0:4864:20::1330]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0lgt-00000002La2-3TSW for pgsql-hackers@postgresql.org; Thu, 12 Mar 2026 19:27:29 +0000 Received: by mail-dy1-x1330.google.com with SMTP id 5a478bee46e88-2be19f05d7dso715441eec.1 for ; Thu, 12 Mar 2026 12:27:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773343645; cv=none; d=google.com; s=arc-20240605; b=Dau/dizoBw7rn7DWR+85MMJzPcRswKDwBT+CRehFwgjJFwqtbAJVBCfABM1FpBvnAK 0oays1wvXYv4YWsDFziu5+XELd1kIwav9IPfbMbpSnqwNM33+MTjyu1xJGfJ/3/M66N8 TS2gmQYVVTvdvloSnHeiDtQhGZXYCSee8dfZghLjlDjaD3dGz5l75p5jX+fp5XTXoAar wL7NxYjFVrgd0vlJjxsCCINxiiLDC8ffzTYmVDb6mf2f25lUKnpRFSFiI02lQPZtmimm 9kKKQy8+6un2Wkm2ZkvSmSC+lEEGTcGgsGMjNLB5CGWYpMJj1lSB6sG5HzlE81A45/2N eTtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=IDxU85QeoOVK/XpLxYBRnOIjDUeX0WzU3Z3AdRGYA+U=; fh=BawSWetgE9wsa7rjWUMyEEUS/e58KHEh4Qu390NnxeE=; b=YH62U42CgPl0IfSX0EwaLA3M0/1qb+0FIiysJvHir8zLfvkBn9hBY1Oh+Q7CSCLII5 GxDp+4ouRK3SdVfSdyOVuwt5GDeT7gq0n0JjFEl2NXeSa8R1hw6Yys2dgw8PMCZWUk6+ dX1buY0R/X38ggrPidXBr5UC+WSRjpAFAPDAnoh6BhiOIdVvj7mZhQkcRM46J6vp/apP sHUSbGu3eGUBe1AYuvlapl7ZX2kEsFxJzfBbjl8zEiOh9BaNz1yXLP+FfJkVVmyL9Exz 2e7LOsUIzhX6cstcswYO5HMFPtzdv2rG4oWKu2Iz55AtER2CJI/xFEi7mkWRT1qlTtDc Y+Iw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773343645; x=1773948445; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=IDxU85QeoOVK/XpLxYBRnOIjDUeX0WzU3Z3AdRGYA+U=; b=dB07ZKJWuEj/uz6a3l+iyyugealL9RdBU2ybT9zaQsPilS2zUqZXmRs2xlj/IrS1Yx wz7wZ7MkAkX1yvlzw/fJsO7Y65ga88aZ+/J0pT/QnUHdAh1bKUufmock2IElDgi0wb8a 3RLh0WxEQrhUVwKKLPzpUQ2WpdV7W48AKMbk/Nmuj5osnZL0n8wLZkITdqMPvUndynrt Q0ImwhsKKrnfxu3SZxIYM2ZmWSVXhoJF5fTBcCsJjPzybsNhLQvae5zcGeB0+ZNId+CG b4gBWmp5Oak0596ssBi3PsZaDmEIbDeYhqJ+QP7+hFYseqG6b264dPtkdPYIIBUe4W16 ZFIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773343645; x=1773948445; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IDxU85QeoOVK/XpLxYBRnOIjDUeX0WzU3Z3AdRGYA+U=; b=NvpIdQDVQ7sIsMZmhfLnr5LpP0ExSr2QnLAM0c9PJ0Y0XFSkn1nNUr50AzlrLF47ge oZDF0gcaoGlv6l0ukaApqJ2MupBb///I+IAmwgCSWagbdgP66LZnLCpXOZdfNOjLn3wt MwBrfeVF/HslHd/VDugl9RYfw8J+tg0lGnUMi43S+JGysB1fgfItPNRReer4aJALWZQr icaevnDM7bpuvsiDDzgZLIepgi0+dDcpp6zq2l5WJDXWfnKNAGrKZbEkFBjnDamdX5li V+vPKZVXgq93laZF9I5fS7bgd/rDE8XI97DC+I2pOchzwSGzs9JZ9QSqN47CH8nHkUjm xveg== X-Gm-Message-State: AOJu0YwMFklUA6ndiz18fgNSnGFGjtTdVrLrbumKE/RZ/0PF2XB/Nwr3 DGXt2SUvQ89G0gsp30EA77iOLL2d62Gcx4GQnCUdoB3CKjgeacYw+J7FzD+WnLQUot9M882+U+l 3JZEVvRe/YN2vRlVDPC+sbpCASyzbdD0= X-Gm-Gg: ATEYQzxwAKf/kKNJtU3PPLyAurIUBlx1wIPTb5zHp20uskKi3hDqvW+O8HcyeGBecbC rWhKy+6CISx9WdJVQC72GRSk64DuoIwK+2OUE41/7g9FRX4/8zzeNSF5TvIQ2syAhFFGW+abHWm FckOphEtt5u51gubORv8qyZH9SWKATbsKGzE8zyV8P7obcFecDysyhr7txCt4R+BbBeVCCtWR+T J/p3/DRj/qjXPmssveNiD2JYlIsv3YDQZoLUz5K+lVKmCwV7DHmiix+G5/LnkDK+5JfoJ2gygIM 13w+Hbg= X-Received: by 2002:a05:7301:1688:b0:2ba:956e:d26a with SMTP id 5a478bee46e88-2bea55edf97mr401927eec.36.1773343645384; Thu, 12 Mar 2026 12:27:25 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ranier Vilela Date: Thu, 12 Mar 2026 16:27:13 -0300 X-Gm-Features: AaiRm52LF6IVNFbNPPNXPjbO0biM1EoLw6KdLOJS1X7NvvRN2_kema7grXfrq4Y Message-ID: Subject: Re: Avoid multiple calls to memcpy (src/backend/access/index/genam.c) To: Bryan Green Cc: Pg Hackers Content-Type: multipart/alternative; boundary="0000000000000c446a064cd8bed9" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000c446a064cd8bed9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Em qui., 12 de mar. de 2026 =C3=A0s 16:21, Bryan Green escreveu: > I modified your memcpy1.c program to not inline the version functions. I > changed the memcpy function > call in version 1, added volatile to keep some DCE opportunities from > happening and added a range > of N values to keep the compiler from specializing the code for N =3D 4. > Before it did DCE and the test1 > function was just a ret. > > The interesting issue is the use of malloc versus the stack. The use of > malloc will probably track closer > with PG's use of palloc so I would say in that case this is an > optimization. It might be fun to compile PG > with and without the patch (in debug mode) and actually see what gets > generated for this function. > > Here are the results I got using your modified benchmark: > --- stack allocated --- > stack n=3D1 v1(patch): 49721599 ns v2(original): 21477302 ns ratio: > 2.315 original wins > stack n=3D2 v1(patch): 52065462 ns v2(original): 28765199 ns ratio: > 1.810 original wins > stack n=3D3 v1(patch): 58914958 ns v2(original): 39726110 ns ratio: > 1.483 original wins > stack n=3D4 v1(patch): 64585275 ns v2(original): 47046397 ns ratio: > 1.373 original wins > stack n=3D5 v1(patch): 73929844 ns v2(original): 58588698 ns ratio: > 1.262 original wins > stack n=3D6 v1(patch): 95465376 ns v2(original): 67807817 ns ratio: > 1.408 original wins > stack n=3D7 v1(patch): 86910226 ns v2(original): 76999488 ns ratio: > 1.129 original wins > stack n=3D8 v1(patch): 107765417 ns v2(original): 86046016 ns ratio: > 1.252 original wins > > --- malloc allocated --- > malloc n=3D1 v1(patch): 133283824 ns v2(original): 141361091 ns ratio: > 0.943 patch wins > malloc n=3D2 v1(patch): 145625895 ns v2(original): 180912711 ns ratio: > 0.805 patch wins > malloc n=3D3 v1(patch): 153975594 ns v2(original): 228459879 ns ratio: > 0.674 patch wins > malloc n=3D4 v1(patch): 154483094 ns v2(original): 248157408 ns ratio: > 0.623 patch wins > malloc n=3D5 v1(patch): 157710598 ns v2(original): 298795018 ns ratio: > 0.528 patch wins > malloc n=3D6 v1(patch): 165196636 ns v2(original): 332940132 ns ratio: > 0.496 patch wins > malloc n=3D7 v1(patch): 169576370 ns v2(original): 358438778 ns ratio: > 0.473 patch wins > malloc n=3D8 v1(patch): 184463815 ns v2(original): 403721513 ns ratio: > 0.457 patch wins > Thanks for your attention and tests. I think that patch can continue then. best regards, Ranier Vilela --0000000000000c446a064cd8bed9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Em qui., 12 de = mar. de 2026 =C3=A0s 16:21, Bryan Green <dbryan.green@gmail.com> escreveu:
I modified your memcpy= 1.c program to not inline the version functions.=C2=A0 I changed the memcpy= function
call in version 1, added volatile to keep some DCE opportunit= ies from happening and added a range
of N values to keep the comp= iler from specializing the code for N =3D 4.=C2=A0 Before it did DCE and th= e test1=C2=A0
function was just a ret.

T= he interesting issue is the use of malloc versus the stack.=C2=A0 The use o= f malloc will probably track closer
with PG's use of palloc s= o I would say in that case this is an optimization.=C2=A0 It might be fun t= o compile PG
with and without the patch (in debug mode) and actua= lly see what gets generated for this function.

Her= e are the results I got using your modified benchmark:
--- stack = allocated ---
stack =C2=A0n=3D1 =C2=A0v1(patch): 49721599 ns =C2=A0v2(or= iginal): 21477302 ns =C2=A0ratio: 2.315 =C2=A0original wins
stack =C2=A0= n=3D2 =C2=A0v1(patch): 52065462 ns =C2=A0v2(original): 28765199 ns =C2=A0ra= tio: 1.810 =C2=A0original wins
stack =C2=A0n=3D3 =C2=A0v1(patch): 589149= 58 ns =C2=A0v2(original): 39726110 ns =C2=A0ratio: 1.483 =C2=A0original win= s
stack =C2=A0n=3D4 =C2=A0v1(patch): 64585275 ns =C2=A0v2(original): 470= 46397 ns =C2=A0ratio: 1.373 =C2=A0original wins
stack =C2=A0n=3D5 =C2=A0= v1(patch): 73929844 ns =C2=A0v2(original): 58588698 ns =C2=A0ratio: 1.262 = =C2=A0original wins
stack =C2=A0n=3D6 =C2=A0v1(patch): 95465376 ns =C2= =A0v2(original): 67807817 ns =C2=A0ratio: 1.408 =C2=A0original wins
stac= k =C2=A0n=3D7 =C2=A0v1(patch): 86910226 ns =C2=A0v2(original): 76999488 ns = =C2=A0ratio: 1.129 =C2=A0original wins
stack =C2=A0n=3D8 =C2=A0v1(patch)= : 107765417 ns =C2=A0v2(original): 86046016 ns =C2=A0ratio: 1.252 =C2=A0ori= ginal wins

--- malloc allocated ---
malloc n=3D1 =C2=A0v1(patch):= 133283824 ns =C2=A0v2(original): 141361091 ns =C2=A0ratio: 0.943 =C2=A0pat= ch wins
malloc n=3D2 =C2=A0v1(patch): 145625895 ns =C2=A0v2(original): 1= 80912711 ns =C2=A0ratio: 0.805 =C2=A0patch wins
malloc n=3D3 =C2=A0v1(pa= tch): 153975594 ns =C2=A0v2(original): 228459879 ns =C2=A0ratio: 0.674 =C2= =A0patch wins
malloc n=3D4 =C2=A0v1(patch): 154483094 ns =C2=A0v2(origin= al): 248157408 ns =C2=A0ratio: 0.623 =C2=A0patch wins
malloc n=3D5 =C2= =A0v1(patch): 157710598 ns =C2=A0v2(original): 298795018 ns =C2=A0ratio: 0.= 528 =C2=A0patch wins
malloc n=3D6 =C2=A0v1(patch): 165196636 ns =C2=A0v2= (original): 332940132 ns =C2=A0ratio: 0.496 =C2=A0patch wins
malloc n=3D= 7 =C2=A0v1(patch): 169576370 ns =C2=A0v2(original): 358438778 ns =C2=A0rati= o: 0.473 =C2=A0patch wins
malloc n=3D8 =C2=A0v1(patch): 184463815 ns =C2= =A0v2(original): 403721513 ns =C2=A0ratio: 0.457 =C2=A0patch wins
Thanks for your attention and tests.

I think that patch can continue then.

best = regards,
Ranier Vilela
--0000000000000c446a064cd8bed9--