Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vn4MM-00213u-0R for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Feb 2026 00:33:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vn4MJ-002EHH-2u for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Feb 2026 00:33:35 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vn4MI-002EH8-2k for pgsql-hackers@lists.postgresql.org; Tue, 03 Feb 2026 00:33:35 +0000 Received: from fout-b6-smtp.messagingengine.com ([202.12.124.149]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vn4ME-00000000kWn-3qe6 for pgsql-hackers@lists.postgresql.org; Tue, 03 Feb 2026 00:33:34 +0000 Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfout.stl.internal (Postfix) with ESMTP id 845031D00065; Mon, 2 Feb 2026 19:33:28 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Mon, 02 Feb 2026 19:33:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1770078808; x=1770165208; bh=fvNRL2W+Be hxu8VbIM89wkYDak+hwCl86aWFW1QdbzE=; b=H5NqOEgJwb5Il1wzJNtw7ECj25 4I8fTctdjKbItQEqS3Ob+NInvID3faDDuJrGvolkJyuKh/Sx/TBL2cl6kAoJ2wjJ QfozFGgUq+UAb0bp7kMFUUqCrJ4Ik7tqpabU/yl/Yj4f8x0JRGJ3dd7BkAbG4a4k LYvyfyVrmA/TY894xJsiZLorc1smL7ggd1pAi5IzumqcmVVm4/I80xOKmoXiji/E 2nWsyyVkl6RUPMVzj/flUAAPlPuqB94VbCV/x89YZW6lWwuIlX44g5r6exko+5Cl c0FUSA8rxngTs3mHsPKKAo9m68Sbpv52zsyVzRDT528b+5GQ+BiscCvwE06g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1770078808; x=1770165208; bh=fvNRL2W+Behxu8VbIM89wkYDak+hwCl86aW FW1QdbzE=; b=UAA7gN/qEij2lx60fAVnShuMX3/AC1NbyJgwc7YWwu1vkUFloWf B8K4G5jE3lCKur2NOjpofGyAxIQSPyF+Ptfr+K+uqZwbT7AJWXfSxwwNNTO1Vmb0 zF2pUE7r2/drkaEgvkBCb6w8qjWedHEBX2MUOKbiA0rhaAbo41jbOaHuEzIwGV/S EewwJAgFXSvAO01eJvpNzWN6Gfr/Y+OYTvyTXCqKH6TyLJM1LkNkPxy37ClLCUSP vymHNcc1yp3WO6v+RTHRu9N8VyprnA8sTC2bbPu/pkuumVojCuy36jESNxOPYoNk kwZC1k6I/i7X9lRu/cSdH/AYqLBAA3Z3PKQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddujeeltdelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucenucfjughrpeffhffvvefukfhfgggtuggjsehttdfstd dttddvnecuhfhrohhmpeetnhgurhgvshcuhfhrvghunhguuceorghnughrvghssegrnhgr rhgriigvlhdruggvqeenucggtffrrghtthgvrhhnpeeffffgledvffegtdevlefgtdeggf fhvdekgfegteeiveejkeetudelveejhfeugeenucevlhhushhtvghrufhiiigvpedtnecu rfgrrhgrmhepmhgrihhlfhhrohhmpegrnhgurhgvshesrghnrghrrgiivghlrdguvgdpnh gspghrtghpthhtohepfedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepughgrhho fihlvgihmhhlsehgmhgrihhlrdgtohhmpdhrtghpthhtoheplhhirdgvvhgrnhdrtghhrg hosehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrsheslhhi shhtshdrphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 2 Feb 2026 19:33:27 -0500 (EST) Date: Mon, 2 Feb 2026 19:33:27 -0500 From: Andres Freund To: David Rowley Cc: Chao Li , PostgreSQL Developers Subject: Re: More speedups for tuple deformation Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-02-01 00:27:02 +1300, David Rowley wrote: > On Sat, 31 Jan 2026 at 06:11, Andres Freund wrote: > > This is why I like the idea of keeping track of whether we can rely on NOT > > NULL columns to be present (I think that means we're evaluating expressions > > other than constraint checks for new rows). It allows the leading NOT NULL > > fixed-width columns to be decoded without having to wait for a good chunk of > > the computations above. That's a performance boon even if we later have > > nullable or varlength columns. > > I can look into this. As we both know, we can't apply this > optimisation in every case as there are places in the code which form > then deform tuples before NOT NULL constraints are checked. Right. > Perhaps the slot can store a flag to mention if the optimisation is valid to > apply or not. It doesn't look like the flag can be part of the TupleDesc > since we cache those in relcache. I wonder if we should do it the other way round - use a special flag (and perhaps tuple descriptor) iff we are evaluating "unsanitizes" tuples, i.e. ones where the NOT NULLness might not yet be correct. > I'm imagining that TupleDescFinalize() calculates another field which could > be the max cached offset that's got a NOT NULL constraint and isn't > attmissing. I think this will need another dedicated loop in > slot_deform_heap_tuple() to loop up to that attribute before doing the > firstNonCacheOffsetAttr loop. I was imagining that we'd use the new value to enter the firstNonCacheOffsetAttr loop without having to depend on HeapTupleHeaderGetNatts() & HeapTupleHasNulls(). I.e. just use it to avoid the dependency on having to have completed the memory fetch for the header. > > > > Have you experimented setting isnull[] in a dedicated loop if there are nulls > > > > and then in this loop just checking isnull[attnum]? Seems like that could > > > > perhaps be combined with the work in first_null_attr() and be more efficient > > > > than doing an att_isnull() separately for each column. > > > > > > Yes. I experiment with that quite a bit. I wasn't able to make it any > > > faster than setting the isnull element in the same loop as the > > > tts_values element. What I did try was having a dedicated tight loop > > > like; for (int i = attnum; i < firstNullAttr; i++) isnull[i] = false;, > > > but the compiler would always try to optimise that into an inlined > > > memset which would result in poorly performing code in cases with a > > > small number of columns due to the size and alignment prechecks. > > > > Yea, that kind of transformation is pretty annoying and makes little sense > > here :(. > > > > I was thinking of actually computing the value of isnull[] based on the null > > bitmap (as you also try below). > > I've taken the code you posted in [1] to do this. Thanks for that. It > works very well. Nice! > I made it so the tts_isnull array size is rounded up to the next multiple of > 8. Right, that's what I assumed we'd need. > I've attached 3 graphs, which are now looking a bit better. The gcc > results are not quite as good. There's still a small regression with 0 > extra column test, and overall, the results are not as impressive as > clang's. I've not yet studied why. I suspect it's due to gcc thinking it'd be a good idea to vectorize the loop. I saw that happening on godbolt. Are your results better if you use #if defined(__clang__) #define pg_nounroll _Pragma("clang loop unroll(disable)") #define pg_novector _Pragma("clang loop vectorize(disable)") #elif defined(__GNUC__) #define pg_nounroll _Pragma("GCC unroll 0") #define pg_novector _Pragma("GCC novector") #else #define pg_nounroll #define pg_novector _Pragma("loop( no_vector )") #endif and put "pg_nounroll pg_novector" before the loop in populate_isnull_array()? That improves both gcc and clang code generation substantially for me, but with a considerably bigger improvement for gcc. Compiler Opt Isns gcc -O2 165 clang -O2 135 gcc -O3 532 clang -O3 135 Preventing vectorization & unrolling: gcc -O2 26 clang -O2 25 gcc -O3 26 clang -O3 25 It's somewhat scary to see this level of code size increase in a case where the compiler really has no information to think vectorizing really is beneficial... I'd expect it makes sense to combine the loop for first_null_attr() with the one for populate_isnull_array(). It might also prevent gcc from trying to vectorize the loop... Greetings, Andres Freund