Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpsHW-000T9V-2w for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Feb 2026 18:16:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vpsGW-000U6X-3C for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Feb 2026 18:15:13 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpsGW-000U6P-2H for pgsql-hackers@lists.postgresql.org; Tue, 10 Feb 2026 18:15:13 +0000 Received: from fhigh-a5-smtp.messagingengine.com ([103.168.172.156]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vpsGR-000000000FW-286r for pgsql-hackers@postgresql.org; Tue, 10 Feb 2026 18:15:12 +0000 Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.phl.internal (Postfix) with ESMTP id C688914001BA; Tue, 10 Feb 2026 13:15:06 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Tue, 10 Feb 2026 13:15:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1770747306; x=1770833706; bh=7BKyM2llvj dasSF0Bwh/oUAtZpt7BzeTWLE6VMsTaTo=; b=vPO4Yj5Akaumg1dQNL97D/QEwL SsuXTnMeXeyuPEZO8CAVM0sG0mW2DkQhK+rDLob5xh53lLhuPiYX73XFguMXjY4N 9hHELCbWs0qVHnll6IRDLmwXFRt9MvQxwRq9nInvmU7+wIhDzDy01kUfHh56z5Pa YGtZaqbGiFcJS4BeYjmHSsbsucgLrZAui4SZ5uOUDQ19acBOY62fH9fYgvEtw5Kq vPEcQg0uQ9b4m+cEX0XsVuKmw2rb0FcNCh72XUwSnbGV/BCZ1VczMqv4NE50VoQK YeTLmYCpA1nRyScyRzd0hypShYZasxgOdgypVOYcNnT1MsrH9tmbB1U3fv8A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1770747306; x=1770833706; bh=7BKyM2llvjdasSF0Bwh/oUAtZpt7BzeTWLE 6VMsTaTo=; b=YWyBw5wB/ZWFJFQ0ZwzyO2/vLEwU2fAHNoRtMfhq6WMo/HX+Fa8 wnFIsS2KaFvAhj7BZoccO2AsUFBclGzjBoqMvr1mzdP36vZSG0jIh9Ck3m90rRUC GWMjYlMPh8ly2BNToiIvblhD2MKBjtx62ozpCrE+sXTc6udTNf1gHeUIIKx+ko/q sFs7O507m4n20cDD9HDYgWBK4pdL8sE0zt7YrG3c5gSfltg9jWRchpKDCXy+P7Ji iXV1aTild4ZQBqbn8eC5J3c9WxF87tbzMv08ytT22kWyJhBqPrQAex8Z69U/lRdA dMYozgVC6XejrMIQuvCd1ZgTJH2RoNByWmA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvtddtfeejucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggujgesthdtredttddtvdenucfhrhhomheptehnughrvghs ucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecuggftrfgrth htvghrnhepvdfffeevhfetveffgeeiteefhfdtvdffjeevhfeuteegleduheetveduieet tddunecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprg hnughrvghssegrnhgrrhgriigvlhdruggvpdhnsggprhgtphhtthhopeefpdhmohguvgep shhmthhpohhuthdprhgtphhtthhopegsvghrthhrrghnuggurhhouhhvohhtrdhpghesgh hmrghilhdrtghomhdprhgtphhtthhopehhlhhinhhnrghkrgesihhkihdrfhhipdhrtghp thhtohepphhgshhqlhdqhhgrtghkvghrshesphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 10 Feb 2026 13:15:06 -0500 (EST) Date: Tue, 10 Feb 2026 13:15:01 -0500 From: Andres Freund To: Heikki Linnakangas Cc: Bertrand Drouvot , "pgsql-hackers@postgresql.org" Subject: Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Message-ID: References: <1cb0d7e9-d6dd-4517-a7cd-0ad98e1207f3@iki.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1cb0d7e9-d6dd-4517-a7cd-0ad98e1207f3@iki.fi> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: > On 10/02/2026 18:41, Andres Freund wrote: > > On 2026-02-10 17:52:16 +0200, Heikki Linnakangas wrote: > > > If there's a performance reason to keep have it be aligned - and maybe there > > > is - we should pad it explicitly. > > > > We should make it a power of two or such. There are some workloads where the > > indexing from GetPGProcByNumber() shows up, because it ends up having to be > > implemented as a 64 bit multiplication, which has a reasonably high latency > > (3-5 cycles). Whereas a shift has a latency of 1 and typically higher > > throughput too. > > Power of two means going to 1024 bytes. That's a lot of padding. Where have > you seen that show up? LWLock contention heavy code, due to the GetPGProcByNumber() in LWLockWakeup() and other similar places. > Attached is a patch to align to cache line boundary. That's straightforward > if that's what we want to do. Yea, I think we should do that. Even if we don't see a difference today, just because it's a hell to find production issues around this, because it is so dependent on what processes use which PGPROC etc and because false sharing issues are generally expensive to debug. > > Re false sharing: We should really separate stuff that changes (like > > e.g. pendingRecoveryConflicts) and never changing stuff (backendType). You > > don't need overlapping structs to have false sharing issues if you mix > > different access patterns inside a struct that's accessed across processes... > > Makes sense, although I don't want to optimize too hard for performance, at > the expense of readability. The current order is pretty random anyway, > though. Yea, I don't think we need to be perfect here. Just a bit less bad. And, as you say, the current order doesn't make a lot of sense. Just grouping things like - pid, pgxactoff, backendType (i.e. barely if ever changing) - wait_event_info, waitStart (i.e. very frequently changing, but typically accessed within one proc) - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and accessed across processes) > It'd probably be good to move the subxids cache to the end of the struct. > That'd act as natural padding, as it's not very frequently used, especially > the tail end of the cache. Yea, that'd make sense. > Or come to think of it, it might be good to move the subxids cache out of > PGPROC altogether. It's mostly frequently accessed in GetSnapshotData(), and > for that it'd actually be better if it was in a separate "mirrored" array, > similar to the main xid and subxidStates. That would eliminate the > pgprocnos[pgxactoff] lookup from GetSnapshotdata() altogether. I doubt it's worth it - that way we'd need to move a lot larger array around during [dis]connect. The subxids stuff is a lot larger than the xid, statusFlags arrays... > I'm a little reluctant to mess with this without a concrete benchmark > though. Got one in mind? I'm travelling this week, but I'll try to recreate the benchmarks I've seen this on. Unfortunately you really need a large machine to really see differences, without that the memory latency between cores is just too low to realistically see issues. Greetings, Andres Freund