public inbox for [email protected]help / color / mirror / Atom feed
pgsql: Separate RecoveryConflictReasons from procsignals 22+ messages / 5 participants [nested] [flat]
* pgsql: Separate RecoveryConflictReasons from procsignals @ 2026-02-10 14:32 Heikki Linnakangas <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-10 14:32 UTC (permalink / raw) To: [email protected] Separate RecoveryConflictReasons from procsignals Share the same PROCSIG_RECOVERY_CONFLICT flag for all recovery conflict reasons. To distinguish, have a bitmask in PGPROC to indicate the reason(s). Reviewed-by: Chao Li <[email protected]> Discussion: https://www.postgresql.org/message-id/[email protected] Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/17f51ea818753093f929b4c235f3b89ebcc7c5fb Modified Files -------------- src/backend/commands/dbcommands.c | 1 + src/backend/commands/tablespace.c | 1 + src/backend/replication/logical/logicalctl.c | 1 + src/backend/replication/slot.c | 6 +- src/backend/storage/buffer/bufmgr.c | 5 +- src/backend/storage/ipc/procarray.c | 136 ++++++++++++++++++--------- src/backend/storage/ipc/procsignal.c | 22 +---- src/backend/storage/ipc/standby.c | 61 ++++++------ src/backend/storage/lmgr/proc.c | 5 +- src/backend/tcop/postgres.c | 117 ++++++++++++----------- src/backend/utils/activity/pgstat_database.c | 18 ++-- src/backend/utils/adt/mcxtfuncs.c | 1 + src/include/storage/proc.h | 10 ++ src/include/storage/procarray.h | 7 +- src/include/storage/procsignal.h | 16 +--- src/include/storage/standby.h | 34 ++++++- src/include/tcop/tcopprot.h | 2 +- src/tools/pgindent/typedefs.list | 1 + 18 files changed, 258 insertions(+), 186 deletions(-) ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: pgsql: Separate RecoveryConflictReasons from procsignals @ 2026-02-10 15:19 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Bertrand Drouvot @ 2026-02-10 15:19 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: [email protected] Hi, On Tue, Feb 10, 2026 at 02:32:37PM +0000, Heikki Linnakangas wrote: > Separate RecoveryConflictReasons from procsignals > > Share the same PROCSIG_RECOVERY_CONFLICT flag for all recovery > conflict reasons. To distinguish, have a bitmask in PGPROC to indicate > the reason(s). I did not look at the thread, so sorry to be late, but that makes the size of PGPROC going from 832 to 840 bytes, so not a multiple of 64 anymore. Is that something to worry about? (same kind of discussion in [1]). [1]: https://postgr.es/m/tw53roer2j4quxh7vlyv62drc5fo6c6zdltvl6d2dttqa62uhi%40stwlpdwlftpj Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: pgsql: Separate RecoveryConflictReasons from procsignals @ 2026-02-10 15:52 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-10 15:52 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: [email protected] On 10/02/2026 17:19, Bertrand Drouvot wrote: > Hi, > > On Tue, Feb 10, 2026 at 02:32:37PM +0000, Heikki Linnakangas wrote: >> Separate RecoveryConflictReasons from procsignals >> >> Share the same PROCSIG_RECOVERY_CONFLICT flag for all recovery >> conflict reasons. To distinguish, have a bitmask in PGPROC to indicate >> the reason(s). > > I did not look at the thread, so sorry to be late, but that makes the size of PGPROC > going from 832 to 840 bytes, so not a multiple of 64 anymore. Is that something > to worry about? (same kind of discussion in [1]). > > [1]: https://postgr.es/m/tw53roer2j4quxh7vlyv62drc5fo6c6zdltvl6d2dttqa62uhi%40stwlpdwlftpj Right, that's a fair question. I hope the cache line alignment is not critical for performance, because the alignment is completely accidental today. I checked the size on different versions: master: 840 (after this commit) v18: 832 v17: 888 v14-v16: 880 So v18 was the outlier in that it happened to be 64-byte aligned. If there's a performance reason to keep have it be aligned - and maybe there is - we should pad it explicitly. - Heikki ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: pgsql: Separate RecoveryConflictReasons from procsignals @ 2026-02-10 16:41 Andres Freund <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Andres Freund @ 2026-02-10 16:41 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; [email protected] Hi, On 2026-02-10 17:52:16 +0200, Heikki Linnakangas wrote: > On 10/02/2026 17:19, Bertrand Drouvot wrote: > > Hi, > > > > On Tue, Feb 10, 2026 at 02:32:37PM +0000, Heikki Linnakangas wrote: > > > Separate RecoveryConflictReasons from procsignals > > > > > > Share the same PROCSIG_RECOVERY_CONFLICT flag for all recovery > > > conflict reasons. To distinguish, have a bitmask in PGPROC to indicate > > > the reason(s). > > > > I did not look at the thread, so sorry to be late, but that makes the size of PGPROC > > going from 832 to 840 bytes, so not a multiple of 64 anymore. Is that something > > to worry about? (same kind of discussion in [1]). > > > > [1]: https://postgr.es/m/tw53roer2j4quxh7vlyv62drc5fo6c6zdltvl6d2dttqa62uhi%40stwlpdwlftpj > > Right, that's a fair question. I hope the cache line alignment is not > critical for performance, because the alignment is completely accidental > today. I checked the size on different versions: > > master: 840 (after this commit) > v18: 832 > v17: 888 > v14-v16: 880 > > So v18 was the outlier in that it happened to be 64-byte aligned. > > If there's a performance reason to keep have it be aligned - and maybe there > is - we should pad it explicitly. We should make it a power of two or such. There are some workloads where the indexing from GetPGProcByNumber() shows up, because it ends up having to be implemented as a 64 bit multiplication, which has a reasonably high latency (3-5 cycles). Whereas a shift has a latency of 1 and typically higher throughput too. Re false sharing: We should really separate stuff that changes (like e.g. pendingRecoveryConflicts) and never changing stuff (backendType). You don't need overlapping structs to have false sharing issues if you mix different access patterns inside a struct that's accessed across processes... Greetings, Andres Freund ^ permalink raw reply [nested|flat] 22+ messages in thread
* PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-10 17:14 Heikki Linnakangas <[email protected]> parent: Andres Freund <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-10 17:14 UTC (permalink / raw) To: Andres Freund <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; [email protected] <[email protected]> (moving to pgsql-hackers) On 10/02/2026 18:41, Andres Freund wrote: > On 2026-02-10 17:52:16 +0200, Heikki Linnakangas wrote: >> If there's a performance reason to keep have it be aligned - and maybe there >> is - we should pad it explicitly. > > We should make it a power of two or such. There are some workloads where the > indexing from GetPGProcByNumber() shows up, because it ends up having to be > implemented as a 64 bit multiplication, which has a reasonably high latency > (3-5 cycles). Whereas a shift has a latency of 1 and typically higher > throughput too. Power of two means going to 1024 bytes. That's a lot of padding. Where have you seen that show up? Attached is a patch to align to cache line boundary. That's straightforward if that's what we want to do. > Re false sharing: We should really separate stuff that changes (like > e.g. pendingRecoveryConflicts) and never changing stuff (backendType). You > don't need overlapping structs to have false sharing issues if you mix > different access patterns inside a struct that's accessed across processes... Makes sense, although I don't want to optimize too hard for performance, at the expense of readability. The current order is pretty random anyway, though. It'd probably be good to move the subxids cache to the end of the struct. That'd act as natural padding, as it's not very frequently used, especially the tail end of the cache. Or come to think of it, it might be good to move the subxids cache out of PGPROC altogether. It's mostly frequently accessed in GetSnapshotData(), and for that it'd actually be better if it was in a separate "mirrored" array, similar to the main xid and subxidStates. That would eliminate the pgprocnos[pgxactoff] lookup from GetSnapshotdata() altogether. I'm a little reluctant to mess with this without a concrete benchmark though. Got one in mind? - Heikki Attachments: [text/x-patch] 0001-Align-PGPROC-to-cache-line-boundary.patch (1.5K, 2-0001-Align-PGPROC-to-cache-line-boundary.patch) download | inline diff: From a3e2d94f6a3ab215afd3e8cee624bd9c09ec5391 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 18:53:31 +0200 Subject: [PATCH 1/1] Align PGPROC to cache line boundary --- src/include/storage/proc.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index ac0df4aeaaa..53acce8a5a1 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -182,7 +182,7 @@ typedef enum * * See PROC_HDR for details. */ -struct PGPROC +typedef struct PGPROC { dlist_node links; /* list link if process is in a list */ dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ @@ -337,10 +337,18 @@ struct PGPROC PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ dlist_head lockGroupMembers; /* list of members, if I'm a leader */ dlist_node lockGroupLink; /* my member link, if I'm a member */ -}; - -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ +} +/* + * If compiler understands aligned pragma, use it to align the struct at cache + * line boundaries. This is just for performance, to (a) avoid false sharing + * and (b) to make the multiplication / division to convert between PGPROC * + * and ProcNumber be a little cheaper. + */ +#if defined(pg_attribute_aligned) + pg_attribute_aligned(PG_CACHE_LINE_SIZE) +#endif +PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.47.3 ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-10 18:15 Andres Freund <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Andres Freund @ 2026-02-10 18:15 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; [email protected] <[email protected]> Hi, On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: > On 10/02/2026 18:41, Andres Freund wrote: > > On 2026-02-10 17:52:16 +0200, Heikki Linnakangas wrote: > > > If there's a performance reason to keep have it be aligned - and maybe there > > > is - we should pad it explicitly. > > > > We should make it a power of two or such. There are some workloads where the > > indexing from GetPGProcByNumber() shows up, because it ends up having to be > > implemented as a 64 bit multiplication, which has a reasonably high latency > > (3-5 cycles). Whereas a shift has a latency of 1 and typically higher > > throughput too. > > Power of two means going to 1024 bytes. That's a lot of padding. Where have > you seen that show up? LWLock contention heavy code, due to the GetPGProcByNumber() in LWLockWakeup() and other similar places. > Attached is a patch to align to cache line boundary. That's straightforward > if that's what we want to do. Yea, I think we should do that. Even if we don't see a difference today, just because it's a hell to find production issues around this, because it is so dependent on what processes use which PGPROC etc and because false sharing issues are generally expensive to debug. > > Re false sharing: We should really separate stuff that changes (like > > e.g. pendingRecoveryConflicts) and never changing stuff (backendType). You > > don't need overlapping structs to have false sharing issues if you mix > > different access patterns inside a struct that's accessed across processes... > > Makes sense, although I don't want to optimize too hard for performance, at > the expense of readability. The current order is pretty random anyway, > though. Yea, I don't think we need to be perfect here. Just a bit less bad. And, as you say, the current order doesn't make a lot of sense. Just grouping things like - pid, pgxactoff, backendType (i.e. barely if ever changing) - wait_event_info, waitStart (i.e. very frequently changing, but typically accessed within one proc) - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and accessed across processes) > It'd probably be good to move the subxids cache to the end of the struct. > That'd act as natural padding, as it's not very frequently used, especially > the tail end of the cache. Yea, that'd make sense. > Or come to think of it, it might be good to move the subxids cache out of > PGPROC altogether. It's mostly frequently accessed in GetSnapshotData(), and > for that it'd actually be better if it was in a separate "mirrored" array, > similar to the main xid and subxidStates. That would eliminate the > pgprocnos[pgxactoff] lookup from GetSnapshotdata() altogether. I doubt it's worth it - that way we'd need to move a lot larger array around during [dis]connect. The subxids stuff is a lot larger than the xid, statusFlags arrays... > I'm a little reluctant to mess with this without a concrete benchmark > though. Got one in mind? I'm travelling this week, but I'll try to recreate the benchmarks I've seen this on. Unfortunately you really need a large machine to really see differences, without that the memory latency between cores is just too low to realistically see issues. Greetings, Andres Freund ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-10 19:15 Bertrand Drouvot <[email protected]> parent: Andres Freund <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Bertrand Drouvot @ 2026-02-10 19:15 UTC (permalink / raw) To: Andres Freund <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; [email protected] <[email protected]> Hi, On Tue, Feb 10, 2026 at 01:15:01PM -0500, Andres Freund wrote: > Hi, > > On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: > Yea, I don't think we need to be perfect here. Just a bit less bad. And, as > you say, the current order doesn't make a lot of sense. > Just grouping things like > - pid, pgxactoff, backendType (i.e. barely if ever changing) > - wait_event_info, waitStart (i.e. very frequently changing, but typically > accessed within one proc) > - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and > accessed across processes) With an ordering like in the attached (to apply on top of Heikki's patch), we're back to 832 bytes. But, then the pg_attribute_aligned() added in Heikki's patch makes it 896 bytes... " /* 816 | 16 */ dlist_node lockGroupLink; /* XXX 64-byte padding */ /* total size (bytes): 896 */ } " What about applying this new ordering and remove the pg_attribute_aligned()? (I thought the aligned attribute would be smarter than that and not add this 64 padding bytes). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com commit ea3aded2c4dec487c8a2ff5470cd7bae83e8dd47 Author: Bertrand Drouvot <[email protected]> Date: Tue Feb 10 18:54:29 2026 +0000 reorder diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 53acce8a5a1..5d41301c966 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -184,29 +184,16 @@ typedef enum */ typedef struct PGPROC { - dlist_node links; /* list link if process is in a list */ - dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ - - PGSemaphore sem; /* ONE semaphore to sleep on */ - ProcWaitStatus waitStatus; - - Latch procLatch; /* generic latch for process */ - - - TransactionId xid; /* id of top-level transaction currently being - * executed by this proc, if running and XID - * is assigned; else InvalidTransactionId. - * mirrored in ProcGlobal->xids[pgxactoff] */ - - TransactionId xmin; /* minimal running XID as it was when we were - * starting our xact, excluding LAZY VACUUM: - * vacuum must not remove tuples deleted by - * xid >= xmin ! */ - int pid; /* Backend's process ID; 0 if prepared xact */ - int pgxactoff; /* offset into various ProcGlobal->arrays with * data mirrored from this PGPROC */ + BackendType backendType; /* what kind of process is this? */ + + /* These fields are zero while a backend is still starting up: */ + Oid databaseId; /* OID of database this backend is using */ + Oid roleId; /* OID of role using this backend */ + Oid tempNamespaceId; /* OID of temp schema this backend is + * using */ /* * Currently running top-level transaction's virtual xid. Together these @@ -227,24 +214,44 @@ typedef struct PGPROC * InvalidLocalTransactionId */ } vxid; - /* These fields are zero while a backend is still starting up: */ - Oid databaseId; /* OID of database this backend is using */ - Oid roleId; /* OID of role using this backend */ + dlist_node links; /* list link if process is in a list */ + dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ - Oid tempNamespaceId; /* OID of temp schema this backend is - * using */ + uint32 wait_event_info; /* proc's wait information */ + pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition + * started */ - BackendType backendType; /* what kind of process is this? */ + TransactionId xid; /* id of top-level transaction currently being + * executed by this proc, if running and XID + * is assigned; else InvalidTransactionId. + * mirrored in ProcGlobal->xids[pgxactoff] */ - /* - * While in hot standby mode, shows that a conflict signal has been sent - * for the current transaction. Set/cleared while holding ProcArrayLock, - * though not required. Accessed without lock, if needed. - * - * This is a bitmask; each bit corresponds to a RecoveryConflictReason - * enum value. - */ - pg_atomic_uint32 pendingRecoveryConflicts; + TransactionId xmin; /* minimal running XID as it was when we were + * starting our xact, excluding LAZY VACUUM: + * vacuum must not remove tuples deleted by + * xid >= xmin ! */ + + ProcWaitStatus waitStatus; + int delayChkptFlags; /* for DELAY_CHKPT_* flags */ + + LOCKMODE waitLockMode; /* type of lock we're waiting for */ + LOCKMASK heldLocks; /* bitmask for lock types already held on this + * lock object by this backend */ + + uint8 statusFlags; /* this backend's status flags, see PROC_* + * above. mirrored in + * ProcGlobal->statusFlags[pgxactoff] */ + XidCacheStatus subxidStatus; /* mirrored with + * ProcGlobal->subxidStates[i] */ + + bool procArrayGroupMember; /* true, if member of ProcArray group + * waiting for XID clear */ + bool clogGroupMember; /* true, if member of clog group */ + bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ + + LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID + * lock */ + PGSemaphore sem; /* ONE semaphore to sleep on */ /* * Info about LWLock the process is currently waiting for, if any. @@ -255,6 +262,19 @@ typedef struct PGPROC */ uint8 lwWaiting; /* see LWLockWaitState */ uint8 lwWaitMode; /* lwlock mode being waited for */ + + Latch procLatch; /* generic latch for process */ + + /* + * While in hot standby mode, shows that a conflict signal has been sent + * for the current transaction. Set/cleared while holding ProcArrayLock, + * though not required. Accessed without lock, if needed. + * + * This is a bitmask; each bit corresponds to a RecoveryConflictReason + * enum value. + */ + pg_atomic_uint32 pendingRecoveryConflicts; + proclist_node lwWaitLink; /* position in LW lock wait list */ /* Support for condition variables. */ @@ -264,17 +284,6 @@ typedef struct PGPROC /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ PROCLOCK *waitProcLock; /* Per-holder info for awaited lock */ - LOCKMODE waitLockMode; /* type of lock we're waiting for */ - LOCKMASK heldLocks; /* bitmask for lock types already held on this - * lock object by this backend */ - pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition - * started */ - - int delayChkptFlags; /* for DELAY_CHKPT_* flags */ - - uint8 statusFlags; /* this backend's status flags, see PROC_* - * above. mirrored in - * ProcGlobal->statusFlags[pgxactoff] */ /* * Info to allow us to wait for synchronous replication, if needed. @@ -293,15 +302,11 @@ typedef struct PGPROC */ dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; - XidCacheStatus subxidStatus; /* mirrored with - * ProcGlobal->subxidStates[i] */ struct XidCache subxids; /* cache for subtransaction XIDs */ /* Support for group XID clearing. */ - /* true, if member of ProcArray group waiting for XID clear */ - bool procArrayGroupMember; - /* next ProcArray group member waiting for XID clear */ - pg_atomic_uint32 procArrayGroupNext; + pg_atomic_uint32 procArrayGroupNext; /* next ProcArray group member + * waiting for XID clear */ /* * latest transaction id among the transaction's main XID and @@ -309,10 +314,7 @@ typedef struct PGPROC */ TransactionId procArrayGroupMemberXid; - uint32 wait_event_info; /* proc's wait information */ - /* Support for group transaction status update. */ - bool clogGroupMember; /* true, if member of clog group */ pg_atomic_uint32 clogGroupNext; /* next clog group member */ TransactionId clogGroupMemberXid; /* transaction id of clog group member */ XidStatus clogGroupMemberXidStatus; /* transaction status of clog @@ -326,9 +328,6 @@ typedef struct PGPROC LWLock fpInfoLock; /* protects per-backend fast-path state */ uint64 *fpLockBits; /* lock modes held for each fast-path slot */ Oid *fpRelId; /* slots for rel oids */ - bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ - LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID - * lock */ /* * Support for lock groups. Use LockHashPartitionLockByProc on the group Attachments: [text/plain] order.txt (7.1K, 2-order.txt) download | inline diff: commit ea3aded2c4dec487c8a2ff5470cd7bae83e8dd47 Author: Bertrand Drouvot <[email protected]> Date: Tue Feb 10 18:54:29 2026 +0000 reorder diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 53acce8a5a1..5d41301c966 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -184,29 +184,16 @@ typedef enum */ typedef struct PGPROC { - dlist_node links; /* list link if process is in a list */ - dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ - - PGSemaphore sem; /* ONE semaphore to sleep on */ - ProcWaitStatus waitStatus; - - Latch procLatch; /* generic latch for process */ - - - TransactionId xid; /* id of top-level transaction currently being - * executed by this proc, if running and XID - * is assigned; else InvalidTransactionId. - * mirrored in ProcGlobal->xids[pgxactoff] */ - - TransactionId xmin; /* minimal running XID as it was when we were - * starting our xact, excluding LAZY VACUUM: - * vacuum must not remove tuples deleted by - * xid >= xmin ! */ - int pid; /* Backend's process ID; 0 if prepared xact */ - int pgxactoff; /* offset into various ProcGlobal->arrays with * data mirrored from this PGPROC */ + BackendType backendType; /* what kind of process is this? */ + + /* These fields are zero while a backend is still starting up: */ + Oid databaseId; /* OID of database this backend is using */ + Oid roleId; /* OID of role using this backend */ + Oid tempNamespaceId; /* OID of temp schema this backend is + * using */ /* * Currently running top-level transaction's virtual xid. Together these @@ -227,24 +214,44 @@ typedef struct PGPROC * InvalidLocalTransactionId */ } vxid; - /* These fields are zero while a backend is still starting up: */ - Oid databaseId; /* OID of database this backend is using */ - Oid roleId; /* OID of role using this backend */ + dlist_node links; /* list link if process is in a list */ + dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ - Oid tempNamespaceId; /* OID of temp schema this backend is - * using */ + uint32 wait_event_info; /* proc's wait information */ + pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition + * started */ - BackendType backendType; /* what kind of process is this? */ + TransactionId xid; /* id of top-level transaction currently being + * executed by this proc, if running and XID + * is assigned; else InvalidTransactionId. + * mirrored in ProcGlobal->xids[pgxactoff] */ - /* - * While in hot standby mode, shows that a conflict signal has been sent - * for the current transaction. Set/cleared while holding ProcArrayLock, - * though not required. Accessed without lock, if needed. - * - * This is a bitmask; each bit corresponds to a RecoveryConflictReason - * enum value. - */ - pg_atomic_uint32 pendingRecoveryConflicts; + TransactionId xmin; /* minimal running XID as it was when we were + * starting our xact, excluding LAZY VACUUM: + * vacuum must not remove tuples deleted by + * xid >= xmin ! */ + + ProcWaitStatus waitStatus; + int delayChkptFlags; /* for DELAY_CHKPT_* flags */ + + LOCKMODE waitLockMode; /* type of lock we're waiting for */ + LOCKMASK heldLocks; /* bitmask for lock types already held on this + * lock object by this backend */ + + uint8 statusFlags; /* this backend's status flags, see PROC_* + * above. mirrored in + * ProcGlobal->statusFlags[pgxactoff] */ + XidCacheStatus subxidStatus; /* mirrored with + * ProcGlobal->subxidStates[i] */ + + bool procArrayGroupMember; /* true, if member of ProcArray group + * waiting for XID clear */ + bool clogGroupMember; /* true, if member of clog group */ + bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ + + LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID + * lock */ + PGSemaphore sem; /* ONE semaphore to sleep on */ /* * Info about LWLock the process is currently waiting for, if any. @@ -255,6 +262,19 @@ typedef struct PGPROC */ uint8 lwWaiting; /* see LWLockWaitState */ uint8 lwWaitMode; /* lwlock mode being waited for */ + + Latch procLatch; /* generic latch for process */ + + /* + * While in hot standby mode, shows that a conflict signal has been sent + * for the current transaction. Set/cleared while holding ProcArrayLock, + * though not required. Accessed without lock, if needed. + * + * This is a bitmask; each bit corresponds to a RecoveryConflictReason + * enum value. + */ + pg_atomic_uint32 pendingRecoveryConflicts; + proclist_node lwWaitLink; /* position in LW lock wait list */ /* Support for condition variables. */ @@ -264,17 +284,6 @@ typedef struct PGPROC /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ PROCLOCK *waitProcLock; /* Per-holder info for awaited lock */ - LOCKMODE waitLockMode; /* type of lock we're waiting for */ - LOCKMASK heldLocks; /* bitmask for lock types already held on this - * lock object by this backend */ - pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition - * started */ - - int delayChkptFlags; /* for DELAY_CHKPT_* flags */ - - uint8 statusFlags; /* this backend's status flags, see PROC_* - * above. mirrored in - * ProcGlobal->statusFlags[pgxactoff] */ /* * Info to allow us to wait for synchronous replication, if needed. @@ -293,15 +302,11 @@ typedef struct PGPROC */ dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; - XidCacheStatus subxidStatus; /* mirrored with - * ProcGlobal->subxidStates[i] */ struct XidCache subxids; /* cache for subtransaction XIDs */ /* Support for group XID clearing. */ - /* true, if member of ProcArray group waiting for XID clear */ - bool procArrayGroupMember; - /* next ProcArray group member waiting for XID clear */ - pg_atomic_uint32 procArrayGroupNext; + pg_atomic_uint32 procArrayGroupNext; /* next ProcArray group member + * waiting for XID clear */ /* * latest transaction id among the transaction's main XID and @@ -309,10 +314,7 @@ typedef struct PGPROC */ TransactionId procArrayGroupMemberXid; - uint32 wait_event_info; /* proc's wait information */ - /* Support for group transaction status update. */ - bool clogGroupMember; /* true, if member of clog group */ pg_atomic_uint32 clogGroupNext; /* next clog group member */ TransactionId clogGroupMemberXid; /* transaction id of clog group member */ XidStatus clogGroupMemberXidStatus; /* transaction status of clog @@ -326,9 +328,6 @@ typedef struct PGPROC LWLock fpInfoLock; /* protects per-backend fast-path state */ uint64 *fpLockBits; /* lock modes held for each fast-path slot */ Oid *fpRelId; /* slots for rel oids */ - bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ - LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID - * lock */ /* * Support for lock groups. Use LockHashPartitionLockByProc on the group ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-10 19:46 Andres Freund <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Andres Freund @ 2026-02-10 19:46 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; [email protected] <[email protected]> Hi, On 2026-02-10 19:15:27 +0000, Bertrand Drouvot wrote: > On Tue, Feb 10, 2026 at 01:15:01PM -0500, Andres Freund wrote: > > On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: > > Yea, I don't think we need to be perfect here. Just a bit less bad. And, as > > you say, the current order doesn't make a lot of sense. > > Just grouping things like > > - pid, pgxactoff, backendType (i.e. barely if ever changing) > > - wait_event_info, waitStart (i.e. very frequently changing, but typically > > accessed within one proc) > > - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and > > accessed across processes) > > With an ordering like in the attached (to apply on top of Heikki's patch), we're > back to 832 bytes. You'd really need to insert padding between the sections to make it work... > But, then the pg_attribute_aligned() added in Heikki's patch makes it 896 bytes... > > " > /* 816 | 16 */ dlist_node lockGroupLink; > /* XXX 64-byte padding */ > > /* total size (bytes): 896 */ > } > " > > What about applying this new ordering and remove the pg_attribute_aligned()? > (I thought the aligned attribute would be smarter than that and not add this > 64 padding bytes). That's just because we have /* * Assumed cache line size. This doesn't affect correctness, but can be used * for low-level optimizations. This is mostly used to pad various data * structures, to ensure that highly-contended fields are on different cache * lines. Too small a value can hurt performance due to false sharing, while * the only downside of too large a value is a few bytes of wasted memory. * The default is 128, which should be large enough for all supported * platforms. */ #define PG_CACHE_LINE_SIZE 128 I don't think we need to worry about the number of bytes here very much. This isn't much compared to all the other overheads a connection slot has (like the memory for locks). Greetings, Andres Freund ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-10 20:53 Heikki Linnakangas <[email protected]> parent: Andres Freund <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-10 20:53 UTC (permalink / raw) To: Andres Freund <[email protected]>; Bertrand Drouvot <[email protected]>; +Cc: [email protected] <[email protected]> On 10/02/2026 21:46, Andres Freund wrote: > On 2026-02-10 19:15:27 +0000, Bertrand Drouvot wrote: >> On Tue, Feb 10, 2026 at 01:15:01PM -0500, Andres Freund wrote: >>> On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: >>> Yea, I don't think we need to be perfect here. Just a bit less bad. And, as >>> you say, the current order doesn't make a lot of sense. >>> Just grouping things like >>> - pid, pgxactoff, backendType (i.e. barely if ever changing) >>> - wait_event_info, waitStart (i.e. very frequently changing, but typically >>> accessed within one proc) >>> - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and >>> accessed across processes) >> >> With an ordering like in the attached (to apply on top of Heikki's patch), we're >> back to 832 bytes. > > You'd really need to insert padding between the sections to make it work... Here's my attempt at grouping things more logically. I didn't insert padding and also didn't try to avoid alignment padding. I tried to optimize for readability rather than size or performance. That said, I would assume that grouping things logically like this would also help to avoid false sharing. If not, inserting explicit padding seems like a a good fix. I also think we should split 'links' into two fields. For clarity. With this, sizeof(PGPROC) == 864 without the explicit alignment to PG_CACHE_LINE_SIZE, and 896 with it. - Heikki Attachments: [text/x-patch] v2-0001-Align-PGPROC-to-cache-line-boundary.patch (1.5K, 2-v2-0001-Align-PGPROC-to-cache-line-boundary.patch) download | inline diff: From 83481187f6e2f8c177bc85c522917d64e1b78b4b Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 18:53:31 +0200 Subject: [PATCH v2 1/4] Align PGPROC to cache line boundary --- src/include/storage/proc.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index ac0df4aeaaa..53acce8a5a1 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -182,7 +182,7 @@ typedef enum * * See PROC_HDR for details. */ -struct PGPROC +typedef struct PGPROC { dlist_node links; /* list link if process is in a list */ dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ @@ -337,10 +337,18 @@ struct PGPROC PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ dlist_head lockGroupMembers; /* list of members, if I'm a leader */ dlist_node lockGroupLink; /* my member link, if I'm a member */ -}; - -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ +} +/* + * If compiler understands aligned pragma, use it to align the struct at cache + * line boundaries. This is just for performance, to (a) avoid false sharing + * and (b) to make the multiplication / division to convert between PGPROC * + * and ProcNumber be a little cheaper. + */ +#if defined(pg_attribute_aligned) + pg_attribute_aligned(PG_CACHE_LINE_SIZE) +#endif +PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.47.3 [text/x-patch] v2-0002-Remove-useless-store.patch (900B, 3-v2-0002-Remove-useless-store.patch) download | inline diff: From f5485ae9eb4b12eb7e57b8b7e7fbdaa0df7c5575 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 21:54:56 +0200 Subject: [PATCH v2 2/4] Remove useless store It was a mishap introduced in commit 5764f611e1, which converted the loop to use dclist_foreach --- src/backend/storage/lmgr/lock.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 7f0cd784f79..e1168ad3837 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -4148,7 +4148,6 @@ GetSingleProcBlockerStatusData(PGPROC *blocked_proc, BlockedProcsData *data) if (queued_proc == blocked_proc) break; data->waiter_pids[data->npids++] = queued_proc->pid; - queued_proc = (PGPROC *) queued_proc->links.next; } bproc->num_locks = data->nlocks - bproc->first_lock; -- 2.47.3 [text/x-patch] v2-0003-Split-PGPROC-links-field-into-two-for-clarity.patch (13.0K, 4-v2-0003-Split-PGPROC-links-field-into-two-for-clarity.patch) download | inline diff: From b4cc9e67a3fb8344a7c4934723fde3d012f2105e Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 22:45:08 +0200 Subject: [PATCH v2 3/4] Split PGPROC 'links' field into two, for clarity The same field was mainly used for the position in a LOCK's wait queue, but also as the position in a the freelist when the PGPROC entry was not in use. The reuse saves a some memory, at the expense of readability, which seems like a bad tradeoff. If we wanted to make the struct smaller there's other things we could do, but we're actually just discussing adding padding to the struct for performance reasons. --- src/backend/access/transam/twophase.c | 2 +- src/backend/storage/lmgr/deadlock.c | 12 ++++----- src/backend/storage/lmgr/lock.c | 6 ++--- src/backend/storage/lmgr/proc.c | 37 ++++++++++++++------------- src/include/storage/proc.h | 8 +++--- 5 files changed, 32 insertions(+), 33 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index eabc4d48208..e4340b59640 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -447,7 +447,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, FullTransactionId fxid, /* Initialize the PGPROC entry */ MemSet(proc, 0, sizeof(PGPROC)); - dlist_node_init(&proc->links); proc->waitStatus = PROC_WAIT_STATUS_OK; if (LocalTransactionIdIsValid(MyProc->vxid.lxid)) { @@ -474,6 +473,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, FullTransactionId fxid, proc->lwWaiting = LW_WS_NOT_WAITING; proc->lwWaitMode = 0; proc->waitLock = NULL; + dlist_node_init(&proc->waitLink); proc->waitProcLock = NULL; pg_atomic_init_u64(&proc->waitStart, 0); for (i = 0; i < NUM_LOCK_PARTITIONS; i++) diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c index 0a8dd5eb7c2..d642500d4c9 100644 --- a/src/backend/storage/lmgr/deadlock.c +++ b/src/backend/storage/lmgr/deadlock.c @@ -260,7 +260,7 @@ DeadLockCheck(PGPROC *proc) /* Reset the queue and re-add procs in the desired order */ dclist_init(waitQueue); for (int j = 0; j < nProcs; j++) - dclist_push_tail(waitQueue, &procs[j]->links); + dclist_push_tail(waitQueue, &procs[j]->waitLink); #ifdef DEBUG_DEADLOCK PrintLockQueue(lock, "rearranged to:"); @@ -502,7 +502,7 @@ FindLockCycleRecurse(PGPROC *checkProc, * If the process is waiting, there is an outgoing waits-for edge to each * process that blocks it. */ - if (checkProc->links.next != NULL && checkProc->waitLock != NULL && + if (!dlist_node_is_detached(&checkProc->waitLink) && FindLockCycleRecurseMember(checkProc, checkProc, depth, softEdges, nSoftEdges)) return true; @@ -520,7 +520,7 @@ FindLockCycleRecurse(PGPROC *checkProc, memberProc = dlist_container(PGPROC, lockGroupLink, iter.cur); - if (memberProc->links.next != NULL && memberProc->waitLock != NULL && + if (!dlist_node_is_detached(&memberProc->waitLink) && memberProc->waitLock != NULL && memberProc != checkProc && FindLockCycleRecurseMember(memberProc, checkProc, depth, softEdges, nSoftEdges)) @@ -713,7 +713,7 @@ FindLockCycleRecurseMember(PGPROC *checkProc, { dclist_foreach(proc_iter, waitQueue) { - proc = dlist_container(PGPROC, links, proc_iter.cur); + proc = dlist_container(PGPROC, waitLink, proc_iter.cur); if (proc->lockGroupLeader == checkProcLeader) lastGroupMember = proc; @@ -728,7 +728,7 @@ FindLockCycleRecurseMember(PGPROC *checkProc, { PGPROC *leader; - proc = dlist_container(PGPROC, links, proc_iter.cur); + proc = dlist_container(PGPROC, waitLink, proc_iter.cur); leader = proc->lockGroupLeader == NULL ? proc : proc->lockGroupLeader; @@ -877,7 +877,7 @@ TopoSort(LOCK *lock, i = 0; dclist_foreach(proc_iter, waitQueue) { - proc = dlist_container(PGPROC, links, proc_iter.cur); + proc = dlist_container(PGPROC, waitLink, proc_iter.cur); topoProcs[i++] = proc; } Assert(i == queue_size); diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index e1168ad3837..d930c66cdbd 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -2052,13 +2052,13 @@ RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode) /* Make sure proc is waiting */ Assert(proc->waitStatus == PROC_WAIT_STATUS_WAITING); - Assert(proc->links.next != NULL); + Assert(!dlist_node_is_detached(&proc->waitLink)); Assert(waitLock); Assert(!dclist_is_empty(&waitLock->waitProcs)); Assert(0 < lockmethodid && lockmethodid < lengthof(LockMethods)); /* Remove proc from lock's wait queue */ - dclist_delete_from_thoroughly(&waitLock->waitProcs, &proc->links); + dclist_delete_from_thoroughly(&waitLock->waitProcs, &proc->waitLink); /* Undo increments of request counts by waiting process */ Assert(waitLock->nRequested > 0); @@ -4143,7 +4143,7 @@ GetSingleProcBlockerStatusData(PGPROC *blocked_proc, BlockedProcsData *data) /* Collect PIDs from the lock's wait queue, stopping at blocked_proc */ dclist_foreach(proc_iter, waitQueue) { - PGPROC *queued_proc = dlist_container(PGPROC, links, proc_iter.cur); + PGPROC *queued_proc = dlist_container(PGPROC, waitLink, proc_iter.cur); if (queued_proc == blocked_proc) break; diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 31ccdb1ef89..dbc2ad931b8 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -331,25 +331,25 @@ InitProcGlobal(void) if (i < MaxConnections) { /* PGPROC for normal backend, add to freeProcs list */ - dlist_push_tail(&ProcGlobal->freeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->freeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->freeProcs; } else if (i < MaxConnections + autovacuum_worker_slots + NUM_SPECIAL_WORKER_PROCS) { /* PGPROC for AV or special worker, add to autovacFreeProcs list */ - dlist_push_tail(&ProcGlobal->autovacFreeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->autovacFreeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->autovacFreeProcs; } else if (i < MaxConnections + autovacuum_worker_slots + NUM_SPECIAL_WORKER_PROCS + max_worker_processes) { /* PGPROC for bgworker, add to bgworkerFreeProcs list */ - dlist_push_tail(&ProcGlobal->bgworkerFreeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->bgworkerFreeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->bgworkerFreeProcs; } else if (i < MaxBackends) { /* PGPROC for walsender, add to walsenderFreeProcs list */ - dlist_push_tail(&ProcGlobal->walsenderFreeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->walsenderFreeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->walsenderFreeProcs; } @@ -438,7 +438,7 @@ InitProcess(void) if (!dlist_is_empty(procgloballist)) { - MyProc = dlist_container(PGPROC, links, dlist_pop_head_node(procgloballist)); + MyProc = dlist_container(PGPROC, freeProcsLink, dlist_pop_head_node(procgloballist)); SpinLockRelease(ProcStructLock); } else @@ -471,7 +471,7 @@ InitProcess(void) * Initialize all fields of MyProc, except for those previously * initialized by InitProcGlobal. */ - dlist_node_init(&MyProc->links); + dlist_node_init(&MyProc->freeProcsLink); MyProc->waitStatus = PROC_WAIT_STATUS_OK; MyProc->fpVXIDLock = false; MyProc->fpLocalTransactionId = InvalidLocalTransactionId; @@ -493,6 +493,7 @@ InitProcess(void) MyProc->lwWaiting = LW_WS_NOT_WAITING; MyProc->lwWaitMode = 0; MyProc->waitLock = NULL; + dlist_node_init(&MyProc->waitLink); MyProc->waitProcLock = NULL; pg_atomic_write_u64(&MyProc->waitStart, 0); #ifdef USE_ASSERT_CHECKING @@ -672,7 +673,7 @@ InitAuxiliaryProcess(void) * Initialize all fields of MyProc, except for those previously * initialized by InitProcGlobal. */ - dlist_node_init(&MyProc->links); + dlist_node_init(&MyProc->freeProcsLink); MyProc->waitStatus = PROC_WAIT_STATUS_OK; MyProc->fpVXIDLock = false; MyProc->fpLocalTransactionId = InvalidLocalTransactionId; @@ -689,6 +690,7 @@ InitAuxiliaryProcess(void) MyProc->lwWaiting = LW_WS_NOT_WAITING; MyProc->lwWaitMode = 0; MyProc->waitLock = NULL; + dlist_node_init(&MyProc->waitLink); MyProc->waitProcLock = NULL; pg_atomic_write_u64(&MyProc->waitStart, 0); #ifdef USE_ASSERT_CHECKING @@ -849,7 +851,7 @@ LockErrorCleanup(void) partitionLock = LockHashPartitionLock(lockAwaited->hashcode); LWLockAcquire(partitionLock, LW_EXCLUSIVE); - if (!dlist_node_is_detached(&MyProc->links)) + if (!dlist_node_is_detached(&MyProc->waitLink)) { /* We could not have been granted the lock yet */ RemoveFromWaitQueue(MyProc, lockAwaited->hashcode); @@ -981,7 +983,7 @@ ProcKill(int code, Datum arg) /* Leader exited first; return its PGPROC. */ SpinLockAcquire(ProcStructLock); - dlist_push_head(procgloballist, &leader->links); + dlist_push_head(procgloballist, &leader->freeProcsLink); SpinLockRelease(ProcStructLock); } } @@ -1026,7 +1028,7 @@ ProcKill(int code, Datum arg) Assert(dlist_is_empty(&proc->lockGroupMembers)); /* Return PGPROC structure (and semaphore) to appropriate freelist */ - dlist_push_tail(procgloballist, &proc->links); + dlist_push_tail(procgloballist, &proc->freeProcsLink); } /* Update shared estimate of spins_per_delay */ @@ -1215,7 +1217,7 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait) dclist_foreach(iter, waitQueue) { - PGPROC *proc = dlist_container(PGPROC, links, iter.cur); + PGPROC *proc = dlist_container(PGPROC, waitLink, iter.cur); /* * If we're part of the same locking group as this waiter, its @@ -1279,9 +1281,9 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait) * Insert self into queue, at the position determined above. */ if (insert_before) - dclist_insert_before(waitQueue, &insert_before->links, &MyProc->links); + dclist_insert_before(waitQueue, &insert_before->waitLink, &MyProc->waitLink); else - dclist_push_tail(waitQueue, &MyProc->links); + dclist_push_tail(waitQueue, &MyProc->waitLink); lock->waitMask |= LOCKBIT_ON(lockmode); @@ -1715,13 +1717,13 @@ ProcSleep(LOCALLOCK *locallock) void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus) { - if (dlist_node_is_detached(&proc->links)) + if (dlist_node_is_detached(&proc->waitLink)) return; Assert(proc->waitStatus == PROC_WAIT_STATUS_WAITING); /* Remove process from wait queue */ - dclist_delete_from_thoroughly(&proc->waitLock->waitProcs, &proc->links); + dclist_delete_from_thoroughly(&proc->waitLock->waitProcs, &proc->waitLink); /* Clean up process' state and pass it the ok/fail signal */ proc->waitLock = NULL; @@ -1752,7 +1754,7 @@ ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock) dclist_foreach_modify(miter, waitQueue) { - PGPROC *proc = dlist_container(PGPROC, links, miter.cur); + PGPROC *proc = dlist_container(PGPROC, waitLink, miter.cur); LOCKMODE lockmode = proc->waitLockMode; /* @@ -1816,8 +1818,7 @@ CheckDeadLock(void) * We check by looking to see if we've been unlinked from the wait queue. * This is safe because we hold the lock partition lock. */ - if (MyProc->links.prev == NULL || - MyProc->links.next == NULL) + if (dlist_node_is_detached(&MyProc->waitLink)) { result = DS_NO_DEADLOCK; goto check_done; diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 53acce8a5a1..f642830fcf8 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -154,10 +154,6 @@ typedef enum * Each backend has a PGPROC struct in shared memory. There is also a list of * currently-unused PGPROC structs that will be reallocated to new backends. * - * links: list link for any list the PGPROC is in. When waiting for a lock, - * the PGPROC is linked into that lock's waitProcs queue. A recycled PGPROC - * is linked into ProcGlobal's freeProcs list. - * * Note: twophase.c also sets up a dummy PGPROC struct for each currently * prepared transaction. These PGPROCs appear in the ProcArray data structure * so that the prepared transactions appear to be still running and are @@ -184,8 +180,9 @@ typedef enum */ typedef struct PGPROC { - dlist_node links; /* list link if process is in a list */ dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ + dlist_node freeProcsLink; /* link in procgloballist, when in recycled + * state */ PGSemaphore sem; /* ONE semaphore to sleep on */ ProcWaitStatus waitStatus; @@ -263,6 +260,7 @@ typedef struct PGPROC /* Info about lock the process is currently waiting for, if any. */ /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ + dlist_node waitLink; /* position in waitLock->waitProcs queue */ PROCLOCK *waitProcLock; /* Per-holder info for awaited lock */ LOCKMODE waitLockMode; /* type of lock we're waiting for */ LOCKMASK heldLocks; /* bitmask for lock types already held on this -- 2.47.3 [text/x-patch] v2-0004-Rearrange-fields-in-PGPROC-for-clarity.patch (8.0K, 5-v2-0004-Rearrange-fields-in-PGPROC-for-clarity.patch) download | inline diff: From 455db18cb525d151f0dee40738023efbcdecdae4 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 22:47:22 +0200 Subject: [PATCH v2 4/4] Rearrange fields in PGPROC, for clarity --- src/include/storage/proc.h | 124 +++++++++++++++++++++---------------- 1 file changed, 71 insertions(+), 53 deletions(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index f642830fcf8..b1077562cb5 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -184,27 +184,31 @@ typedef struct PGPROC dlist_node freeProcsLink; /* link in procgloballist, when in recycled * state */ - PGSemaphore sem; /* ONE semaphore to sleep on */ - ProcWaitStatus waitStatus; - - Latch procLatch; /* generic latch for process */ + /*---- Backend identity ----*/ + /* + * These fields that don't change after backend startup, or only very + * rarely + */ + int pid; /* Backend's process ID; 0 if prepared xact */ + BackendType backendType; /* what kind of process is this? */ - TransactionId xid; /* id of top-level transaction currently being - * executed by this proc, if running and XID - * is assigned; else InvalidTransactionId. - * mirrored in ProcGlobal->xids[pgxactoff] */ - - TransactionId xmin; /* minimal running XID as it was when we were - * starting our xact, excluding LAZY VACUUM: - * vacuum must not remove tuples deleted by - * xid >= xmin ! */ + /* These fields are zero while a backend is still starting up: */ + Oid databaseId; /* OID of database this backend is using */ + Oid roleId; /* OID of role using this backend */ - int pid; /* Backend's process ID; 0 if prepared xact */ + Oid tempNamespaceId; /* OID of temp schema this backend is + * using */ int pgxactoff; /* offset into various ProcGlobal->arrays with * data mirrored from this PGPROC */ + uint8 statusFlags; /* this backend's status flags, see PROC_* + * above. mirrored in + * ProcGlobal->statusFlags[pgxactoff] */ + + /*---- Transactions and snapshots ----*/ + /* * Currently running top-level transaction's virtual xid. Together these * form a VirtualTransactionId, but we don't use that struct because this @@ -224,14 +228,27 @@ typedef struct PGPROC * InvalidLocalTransactionId */ } vxid; - /* These fields are zero while a backend is still starting up: */ - Oid databaseId; /* OID of database this backend is using */ - Oid roleId; /* OID of role using this backend */ + TransactionId xid; /* id of top-level transaction currently being + * executed by this proc, if running and XID + * is assigned; else InvalidTransactionId. + * mirrored in ProcGlobal->xids[pgxactoff] */ - Oid tempNamespaceId; /* OID of temp schema this backend is - * using */ + TransactionId xmin; /* minimal running XID as it was when we were + * starting our xact, excluding LAZY VACUUM: + * vacuum must not remove tuples deleted by + * xid >= xmin ! */ - BackendType backendType; /* what kind of process is this? */ + XidCacheStatus subxidStatus; /* mirrored with + * ProcGlobal->subxidStates[i] */ + struct XidCache subxids; /* cache for subtransaction XIDs */ + + /*---- Inter-process signaling ----*/ + + Latch procLatch; /* generic latch for process */ + + PGSemaphore sem; /* ONE semaphore to sleep on */ + + int delayChkptFlags; /* for DELAY_CHKPT_* flags */ /* * While in hot standby mode, shows that a conflict signal has been sent @@ -243,6 +260,8 @@ typedef struct PGPROC */ pg_atomic_uint32 pendingRecoveryConflicts; + /*---- LWLock waiting ----*/ + /* * Info about LWLock the process is currently waiting for, if any. * @@ -257,6 +276,16 @@ typedef struct PGPROC /* Support for condition variables. */ proclist_node cvWaitLink; /* position in CV wait list */ + /*---- Lock manager data ----*/ + + /* + * Support for lock groups. Use LockHashPartitionLockByProc on the group + * leader to get the LWLock protecting these fields. + */ + PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ + dlist_head lockGroupMembers; /* list of members, if I'm a leader */ + dlist_node lockGroupLink; /* my member link, if I'm a member */ + /* Info about lock the process is currently waiting for, if any. */ /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ @@ -265,14 +294,28 @@ typedef struct PGPROC LOCKMODE waitLockMode; /* type of lock we're waiting for */ LOCKMASK heldLocks; /* bitmask for lock types already held on this * lock object by this backend */ + pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition * started */ - int delayChkptFlags; /* for DELAY_CHKPT_* flags */ + ProcWaitStatus waitStatus; - uint8 statusFlags; /* this backend's status flags, see PROC_* - * above. mirrored in - * ProcGlobal->statusFlags[pgxactoff] */ + /* + * All PROCLOCK objects for locks held or awaited by this backend are + * linked into one of these lists, according to the partition number of + * their lock. + */ + dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; + + /*-- recording fast-path locks taken by this backend. --*/ + LWLock fpInfoLock; /* protects per-backend fast-path state */ + uint64 *fpLockBits; /* lock modes held for each fast-path slot */ + Oid *fpRelId; /* slots for rel oids */ + bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ + LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID + * lock */ + + /*---- Synchronous replication waiting ----*/ /* * Info to allow us to wait for synchronous replication, if needed. @@ -284,18 +327,8 @@ typedef struct PGPROC int syncRepState; /* wait state for sync rep */ dlist_node syncRepLinks; /* list link if process is in syncrep queue */ - /* - * All PROCLOCK objects for locks held or awaited by this backend are - * linked into one of these lists, according to the partition number of - * their lock. - */ - dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; - - XidCacheStatus subxidStatus; /* mirrored with - * ProcGlobal->subxidStates[i] */ - struct XidCache subxids; /* cache for subtransaction XIDs */ + /*---- Support for group XID clearing. ----*/ - /* Support for group XID clearing. */ /* true, if member of ProcArray group waiting for XID clear */ bool procArrayGroupMember; /* next ProcArray group member waiting for XID clear */ @@ -307,9 +340,7 @@ typedef struct PGPROC */ TransactionId procArrayGroupMemberXid; - uint32 wait_event_info; /* proc's wait information */ - - /* Support for group transaction status update. */ + /*---- Support for group transaction status update. ----*/ bool clogGroupMember; /* true, if member of clog group */ pg_atomic_uint32 clogGroupNext; /* next clog group member */ TransactionId clogGroupMemberXid; /* transaction id of clog group member */ @@ -320,21 +351,8 @@ typedef struct PGPROC XLogRecPtr clogGroupMemberLsn; /* WAL location of commit record for clog * group member */ - /* Lock manager data, recording fast-path locks taken by this backend. */ - LWLock fpInfoLock; /* protects per-backend fast-path state */ - uint64 *fpLockBits; /* lock modes held for each fast-path slot */ - Oid *fpRelId; /* slots for rel oids */ - bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ - LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID - * lock */ - - /* - * Support for lock groups. Use LockHashPartitionLockByProc on the group - * leader to get the LWLock protecting these fields. - */ - PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ - dlist_head lockGroupMembers; /* list of members, if I'm a leader */ - dlist_node lockGroupLink; /* my member link, if I'm a member */ + /*---- Status reporting ----*/ + uint32 wait_event_info; /* proc's wait information */ } /* -- 2.47.3 ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-11 04:40 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 2 replies; 22+ messages in thread From: Bertrand Drouvot @ 2026-02-11 04:40 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> Hi, On Tue, Feb 10, 2026 at 10:53:58PM +0200, Heikki Linnakangas wrote: > On 10/02/2026 21:46, Andres Freund wrote: > > On 2026-02-10 19:15:27 +0000, Bertrand Drouvot wrote: > > > On Tue, Feb 10, 2026 at 01:15:01PM -0500, Andres Freund wrote: > > > > On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: > > > > Yea, I don't think we need to be perfect here. Just a bit less bad. And, as > > > > you say, the current order doesn't make a lot of sense. > > > > Just grouping things like > > > > - pid, pgxactoff, backendType (i.e. barely if ever changing) > > > > - wait_event_info, waitStart (i.e. very frequently changing, but typically > > > > accessed within one proc) > > > > - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and > > > > accessed across processes) > > > > > > With an ordering like in the attached (to apply on top of Heikki's patch), we're > > > back to 832 bytes. > > > > You'd really need to insert padding between the sections to make it work... > > Here's my attempt at grouping things more logically. Thanks! > I didn't insert padding > and also didn't try to avoid alignment padding. I tried to optimize for > readability rather than size or performance. Yeah, my attempt was to put the size back to 832 bytes but that's probably not worth it as stated by Andres up-thread. > That said, I would assume that > grouping things logically like this would also help to avoid false sharing. > If not, inserting explicit padding seems like a a good fix. > > I also think we should split 'links' into two fields. For clarity. > A few comments: 0001: + * and (b) to make the multiplication / division to convert between PGPROC * + * and ProcNumber be a little cheaper Is that correct if PGPROC size is not a power of 2? 0002: Good catch! 0003: 1/ There is one missing change in PrintLockQueue() ("links" is still used, and that should be replaced by "waitLink"). 2/ change the comment on top of ProcWakeup? " /* * ProcWakeup -- wake up a process by setting its latch. * * Also remove the process from the wait queue and set its links invalid. " s/links/waitLink/? Also, out of curiosity, with 0003 in place PGPROC size goes from 840 to 856. 0004: The grouping looks Ok to me. Just one nit for the added comments: + /*---- Backend identity ----*/ + /*---- Transactions and snapshots ----*/ + /*---- Inter-process signaling ----*/ + /*---- LWLock waiting ----*/ + /*---- Lock manager data ----*/ + /*---- Synchronous replication waiting ----*/ + /*---- Support for group XID clearing. ----*/ + /*---- Support for group transaction status update. ----*/ + /*---- Status reporting ----*/ Some have period and some don't. > With this, sizeof(PGPROC) == 864 without the explicit alignment to > PG_CACHE_LINE_SIZE, and 896 with it. I can see 876 -> 896 on my side: /* 872 | 4 */ uint32 wait_event_info; /* XXX 20-byte padding */ /* total size (bytes): 896 */ } Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-11 10:03 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 2 replies; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-11 10:03 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> On 11/02/2026 06:40, Bertrand Drouvot wrote: > A few comments: > > 0001: > > + * and (b) to make the multiplication / division to convert between PGPROC * > + * and ProcNumber be a little cheaper > > Is that correct if PGPROC size is not a power of 2? You're right, it's not. > 0002: Good catch! Committed that. >> With this, sizeof(PGPROC) == 864 without the explicit alignment to >> PG_CACHE_LINE_SIZE, and 896 with it. > > I can see 876 -> 896 on my side: > > /* 872 | 4 */ uint32 wait_event_info; > /* XXX 20-byte padding */ > > /* total size (bytes): 896 */ > } Interesting. I've attached 'pahole bin/postgres' output from my laptop. It's Linux on arm64. This is with my v2 patches to rearrange the fields, but with the "pg_attribute_aligned(PG_CACHE_LINE_SIZE)" removed. - Heikki struct PGPROC { dlist_head * procgloballist; /* 0 8 */ dlist_node freeProcsLink; /* 8 16 */ int pid; /* 24 4 */ BackendType backendType; /* 28 4 */ Oid databaseId; /* 32 4 */ Oid roleId; /* 36 4 */ Oid tempNamespaceId; /* 40 4 */ int pgxactoff; /* 44 4 */ uint8 statusFlags; /* 48 1 */ /* XXX 3 bytes hole, try to pack */ struct { ProcNumber procNumber; /* 52 4 */ LocalTransactionId lxid; /* 56 4 */ } vxid; /* 52 8 */ TransactionId xid; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ TransactionId xmin; /* 64 4 */ XidCacheStatus subxidStatus; /* 68 2 */ /* XXX 2 bytes hole, try to pack */ struct XidCache subxids; /* 72 256 */ /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */ Latch procLatch; /* 328 16 */ PGSemaphore sem; /* 344 8 */ int delayChkptFlags; /* 352 4 */ pg_atomic_uint32 pendingRecoveryConflicts; /* 356 4 */ uint8 lwWaiting; /* 360 1 */ uint8 lwWaitMode; /* 361 1 */ /* XXX 2 bytes hole, try to pack */ proclist_node lwWaitLink; /* 364 8 */ proclist_node cvWaitLink; /* 372 8 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 6 boundary (384 bytes) --- */ PGPROC * lockGroupLeader; /* 384 8 */ dlist_head lockGroupMembers; /* 392 16 */ dlist_node lockGroupLink; /* 408 16 */ LOCK * waitLock; /* 424 8 */ dlist_node waitLink; /* 432 16 */ /* --- cacheline 7 boundary (448 bytes) --- */ PROCLOCK * waitProcLock; /* 448 8 */ LOCKMODE waitLockMode; /* 456 4 */ LOCKMASK heldLocks; /* 460 4 */ pg_atomic_uint64 waitStart __attribute__((__aligned__(8))); /* 464 8 */ ProcWaitStatus waitStatus; /* 472 4 */ /* XXX 4 bytes hole, try to pack */ dlist_head myProcLocks[16]; /* 480 256 */ /* --- cacheline 11 boundary (704 bytes) was 32 bytes ago --- */ LWLock fpInfoLock; /* 736 16 */ uint64 * fpLockBits; /* 752 8 */ Oid * fpRelId; /* 760 8 */ /* --- cacheline 12 boundary (768 bytes) --- */ _Bool fpVXIDLock; /* 768 1 */ /* XXX 3 bytes hole, try to pack */ LocalTransactionId fpLocalTransactionId; /* 772 4 */ XLogRecPtr waitLSN; /* 776 8 */ int syncRepState; /* 784 4 */ /* XXX 4 bytes hole, try to pack */ dlist_node syncRepLinks; /* 792 16 */ _Bool procArrayGroupMember; /* 808 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 procArrayGroupNext; /* 812 4 */ TransactionId procArrayGroupMemberXid; /* 816 4 */ _Bool clogGroupMember; /* 820 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 clogGroupNext; /* 824 4 */ TransactionId clogGroupMemberXid; /* 828 4 */ /* --- cacheline 13 boundary (832 bytes) --- */ XidStatus clogGroupMemberXidStatus; /* 832 4 */ /* XXX 4 bytes hole, try to pack */ int64 clogGroupMemberPage; /* 840 8 */ XLogRecPtr clogGroupMemberLsn; /* 848 8 */ uint32 wait_event_info; /* 856 4 */ /* size: 864, cachelines: 14, members: 51 */ /* sum members: 828, holes: 10, sum holes: 32 */ /* padding: 4 */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); Attachments: [text/plain] pahole-PGPROC.txt (4.5K, 2-pahole-PGPROC.txt) download | inline: struct PGPROC { dlist_head * procgloballist; /* 0 8 */ dlist_node freeProcsLink; /* 8 16 */ int pid; /* 24 4 */ BackendType backendType; /* 28 4 */ Oid databaseId; /* 32 4 */ Oid roleId; /* 36 4 */ Oid tempNamespaceId; /* 40 4 */ int pgxactoff; /* 44 4 */ uint8 statusFlags; /* 48 1 */ /* XXX 3 bytes hole, try to pack */ struct { ProcNumber procNumber; /* 52 4 */ LocalTransactionId lxid; /* 56 4 */ } vxid; /* 52 8 */ TransactionId xid; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ TransactionId xmin; /* 64 4 */ XidCacheStatus subxidStatus; /* 68 2 */ /* XXX 2 bytes hole, try to pack */ struct XidCache subxids; /* 72 256 */ /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */ Latch procLatch; /* 328 16 */ PGSemaphore sem; /* 344 8 */ int delayChkptFlags; /* 352 4 */ pg_atomic_uint32 pendingRecoveryConflicts; /* 356 4 */ uint8 lwWaiting; /* 360 1 */ uint8 lwWaitMode; /* 361 1 */ /* XXX 2 bytes hole, try to pack */ proclist_node lwWaitLink; /* 364 8 */ proclist_node cvWaitLink; /* 372 8 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 6 boundary (384 bytes) --- */ PGPROC * lockGroupLeader; /* 384 8 */ dlist_head lockGroupMembers; /* 392 16 */ dlist_node lockGroupLink; /* 408 16 */ LOCK * waitLock; /* 424 8 */ dlist_node waitLink; /* 432 16 */ /* --- cacheline 7 boundary (448 bytes) --- */ PROCLOCK * waitProcLock; /* 448 8 */ LOCKMODE waitLockMode; /* 456 4 */ LOCKMASK heldLocks; /* 460 4 */ pg_atomic_uint64 waitStart __attribute__((__aligned__(8))); /* 464 8 */ ProcWaitStatus waitStatus; /* 472 4 */ /* XXX 4 bytes hole, try to pack */ dlist_head myProcLocks[16]; /* 480 256 */ /* --- cacheline 11 boundary (704 bytes) was 32 bytes ago --- */ LWLock fpInfoLock; /* 736 16 */ uint64 * fpLockBits; /* 752 8 */ Oid * fpRelId; /* 760 8 */ /* --- cacheline 12 boundary (768 bytes) --- */ _Bool fpVXIDLock; /* 768 1 */ /* XXX 3 bytes hole, try to pack */ LocalTransactionId fpLocalTransactionId; /* 772 4 */ XLogRecPtr waitLSN; /* 776 8 */ int syncRepState; /* 784 4 */ /* XXX 4 bytes hole, try to pack */ dlist_node syncRepLinks; /* 792 16 */ _Bool procArrayGroupMember; /* 808 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 procArrayGroupNext; /* 812 4 */ TransactionId procArrayGroupMemberXid; /* 816 4 */ _Bool clogGroupMember; /* 820 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 clogGroupNext; /* 824 4 */ TransactionId clogGroupMemberXid; /* 828 4 */ /* --- cacheline 13 boundary (832 bytes) --- */ XidStatus clogGroupMemberXidStatus; /* 832 4 */ /* XXX 4 bytes hole, try to pack */ int64 clogGroupMemberPage; /* 840 8 */ XLogRecPtr clogGroupMemberLsn; /* 848 8 */ uint32 wait_event_info; /* 856 4 */ /* size: 864, cachelines: 14, members: 51 */ /* sum members: 828, holes: 10, sum holes: 32 */ /* padding: 4 */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-11 10:58 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 1 sibling, 0 replies; 22+ messages in thread From: Bertrand Drouvot @ 2026-02-11 10:58 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> Hi, On Wed, Feb 11, 2026 at 12:03:51PM +0200, Heikki Linnakangas wrote: > On 11/02/2026 06:40, Bertrand Drouvot wrote: > > > With this, sizeof(PGPROC) == 864 without the explicit alignment to > > > PG_CACHE_LINE_SIZE, and 896 with it. > > > > I can see 876 -> 896 on my side: > > > > /* 872 | 4 */ uint32 wait_event_info; > > /* XXX 20-byte padding */ > > > > /* total size (bytes): 896 */ > > } > > Interesting. I've attached 'pahole bin/postgres' output from my laptop. It's > Linux on arm64. Thanks! Got it: I was using "-DCACHEDEBUG -DWAL_DEBUG -DLOCK_DEBUG -DDEBUG_DEADLOCK" so that with LOCK_DEBUG in place then LWLock size is 32 bytes (vs 16 in your case). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-13 08:03 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 1 sibling, 2 replies; 22+ messages in thread From: Bertrand Drouvot @ 2026-02-13 08:03 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> Hi, On Wed, Feb 11, 2026 at 12:03:51PM +0200, Heikki Linnakangas wrote: > On 11/02/2026 06:40, Bertrand Drouvot wrote: > > A few comments: > > > > 0001: > > > > + * and (b) to make the multiplication / division to convert between PGPROC * > > + * and ProcNumber be a little cheaper > > > > Is that correct if PGPROC size is not a power of 2? > > You're right, it's not. Looking more closely at: " /* GCC supports aligned and packed */ #if defined(__GNUC__) #define pg_attribute_aligned(a) __attribute__((aligned(a))) #define pg_attribute_packed() __attribute__((packed)) #elif defined(_MSC_VER) /* * MSVC supports aligned. * * Packing is also possible but only by wrapping the entire struct definition * which doesn't fit into our current macro declarations. */ #define pg_attribute_aligned(a) __declspec(align(a)) #else /* * NB: aligned and packed are not given default definitions because they * affect code functionality; they *must* be implemented by the compiler * if they are to be used. */ #endif " and what the patch adds: +/* + * If compiler understands aligned pragma, use it to align the struct at cache + * line boundaries. This is just for performance, to (a) avoid false sharing + * and (b) to make the multiplication / division to convert between PGPROC * + * and ProcNumber be a little cheaper. + */ +#if defined(pg_attribute_aligned) + pg_attribute_aligned(PG_CACHE_LINE_SIZE) +#endif +PGPROC; It means that PGPROC is "acceptable" without padding (on compiler that does not understand the aligned attribute). OTOH, looking at: " typedef union WALInsertLockPadded { WALInsertLock l; char pad[PG_CACHE_LINE_SIZE]; } WALInsertLockPadded; " It seems to mean that WALInsertLockPadded is unacceptable without padding (since it's not using pg_attribute_aligned()). That looks ok to see PGPROC as an "acceptable" one, if not, should we use the union trick? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-20 21:01 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 0 replies; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-20 21:01 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> On 11/02/2026 06:40, Bertrand Drouvot wrote: > On Tue, Feb 10, 2026 at 10:53:58PM +0200, Heikki Linnakangas wrote: > 0003: > > 1/ There is one missing change in PrintLockQueue() ("links" is still used, and > that should be replaced by "waitLink"). > > 2/ change the comment on top of ProcWakeup? > > " > /* > * ProcWakeup -- wake up a process by setting its latch. > * > * Also remove the process from the wait queue and set its links invalid. > " > > s/links/waitLink/? Fixed those and pushed this "Split PGPROC 'links' field into two, for clarity" patch. Thanks for the review! - Heikki ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-20 21:03 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 1 reply; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-20 21:03 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> On 11/02/2026 06:40, Bertrand Drouvot wrote: > 0004: > > The grouping looks Ok to me. Just one nit for the added comments: > > + /*---- Backend identity ----*/ > + /*---- Transactions and snapshots ----*/ > + /*---- Inter-process signaling ----*/ > + /*---- LWLock waiting ----*/ > + /*---- Lock manager data ----*/ > + /*---- Synchronous replication waiting ----*/ > + /*---- Support for group XID clearing. ----*/ > + /*---- Support for group transaction status update. ----*/ > + /*---- Status reporting ----*/ > > Some have period and some don't. Fixed that, and changed to different style for the delimiter comments. It's a matter of taste, but I think the new style is closer to what we use elsewhere. On 13/02/2026 10:03, Bertrand Drouvot wrote: > and what the patch adds: > > +/* > + * If compiler understands aligned pragma, use it to align the struct at cache > + * line boundaries. This is just for performance, to (a) avoid false sharing > + * and (b) to make the multiplication / division to convert between PGPROC * > + * and ProcNumber be a little cheaper. > + */ > +#if defined(pg_attribute_aligned) > + pg_attribute_aligned(PG_CACHE_LINE_SIZE) > +#endif > +PGPROC; > > It means that PGPROC is "acceptable" without padding (on compiler that does not > understand the aligned attribute). > > OTOH, looking at: > > " > typedef union WALInsertLockPadded > { > WALInsertLock l; > char pad[PG_CACHE_LINE_SIZE]; > } WALInsertLockPadded; > " > > It seems to mean that WALInsertLockPadded is unacceptable without padding (since > it's not using pg_attribute_aligned()). > > That looks ok to see PGPROC as an "acceptable" one, if not, should we use the > union trick? It seems acceptable to just not align it if the compiler doesn't support it. This is just a performance optimization, after all. Attached is new versions the remaining patches. I think these are ready to be committed. I don't have the hardware and test case that would be sensitive enough for these changes in memory layout, so I haven't done any performance testing of this. I'd guess this no worse than the old layout. Grouping fields together which are used together is usually a good strategy. I don't feel obliged to do performance testing of this, given that no one did that for the old layout either, so if it happened to be really good from a performance point of view, it was purely accidental. But if anyone has the means and interest to do that, that'd be much appreciated of course. - Heikki Attachments: [text/x-patch] v2-0001-Rearrange-fields-in-PGPROC-for-clarity.patch (9.7K, 2-v2-0001-Rearrange-fields-in-PGPROC-for-clarity.patch) download | inline diff: From cc75db39184b6560ded5f97b5202c14187e9f801 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Fri, 20 Feb 2026 22:30:01 +0200 Subject: [PATCH v2 1/2] Rearrange fields in PGPROC, for clarity The ordering was pretty random, making it hard to get an overview of what's in it. Group related fields together, and add comments to act as separators between the groups. Reviewed-by: Bertrand Drouvot <[email protected]> Discussion: https://www.postgresql.org/message-id/[email protected] --- src/include/storage/proc.h | 147 +++++++++++++++++++++++-------------- 1 file changed, 92 insertions(+), 55 deletions(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index e165b414241..b2fd4d02959 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -178,33 +178,41 @@ typedef enum * * See PROC_HDR for details. */ -struct PGPROC +typedef struct PGPROC { dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ dlist_node freeProcsLink; /* link in procgloballist, when in recycled * state */ - PGSemaphore sem; /* ONE semaphore to sleep on */ - ProcWaitStatus waitStatus; - - Latch procLatch; /* generic latch for process */ + /************************************************************************ + * Backend identity + ************************************************************************/ + /* + * These fields that don't change after backend startup, or only very + * rarely + */ + int pid; /* Backend's process ID; 0 if prepared xact */ + BackendType backendType; /* what kind of process is this? */ - TransactionId xid; /* id of top-level transaction currently being - * executed by this proc, if running and XID - * is assigned; else InvalidTransactionId. - * mirrored in ProcGlobal->xids[pgxactoff] */ - - TransactionId xmin; /* minimal running XID as it was when we were - * starting our xact, excluding LAZY VACUUM: - * vacuum must not remove tuples deleted by - * xid >= xmin ! */ + /* These fields are zero while a backend is still starting up: */ + Oid databaseId; /* OID of database this backend is using */ + Oid roleId; /* OID of role using this backend */ - int pid; /* Backend's process ID; 0 if prepared xact */ + Oid tempNamespaceId; /* OID of temp schema this backend is + * using */ int pgxactoff; /* offset into various ProcGlobal->arrays with * data mirrored from this PGPROC */ + uint8 statusFlags; /* this backend's status flags, see PROC_* + * above. mirrored in + * ProcGlobal->statusFlags[pgxactoff] */ + + /************************************************************************ + * Transactions and snapshots + ************************************************************************/ + /* * Currently running top-level transaction's virtual xid. Together these * form a VirtualTransactionId, but we don't use that struct because this @@ -224,14 +232,30 @@ struct PGPROC * InvalidLocalTransactionId */ } vxid; - /* These fields are zero while a backend is still starting up: */ - Oid databaseId; /* OID of database this backend is using */ - Oid roleId; /* OID of role using this backend */ + TransactionId xid; /* id of top-level transaction currently being + * executed by this proc, if running and XID + * is assigned; else InvalidTransactionId. + * mirrored in ProcGlobal->xids[pgxactoff] */ - Oid tempNamespaceId; /* OID of temp schema this backend is - * using */ + TransactionId xmin; /* minimal running XID as it was when we were + * starting our xact, excluding LAZY VACUUM: + * vacuum must not remove tuples deleted by + * xid >= xmin ! */ - BackendType backendType; /* what kind of process is this? */ + XidCacheStatus subxidStatus; /* mirrored with + * ProcGlobal->subxidStates[i] */ + struct XidCache subxids; /* cache for subtransaction XIDs */ + + + /************************************************************************ + * Inter-process signaling + ************************************************************************/ + + Latch procLatch; /* generic latch for process */ + + PGSemaphore sem; /* ONE semaphore to sleep on */ + + int delayChkptFlags; /* for DELAY_CHKPT_* flags */ /* * While in hot standby mode, shows that a conflict signal has been sent @@ -243,6 +267,10 @@ struct PGPROC */ pg_atomic_uint32 pendingRecoveryConflicts; + /************************************************************************ + * LWLock waiting + ************************************************************************/ + /* * Info about LWLock the process is currently waiting for, if any. * @@ -257,6 +285,18 @@ struct PGPROC /* Support for condition variables. */ proclist_node cvWaitLink; /* position in CV wait list */ + /************************************************************************ + * Lock manager data + ************************************************************************/ + + /* + * Support for lock groups. Use LockHashPartitionLockByProc on the group + * leader to get the LWLock protecting these fields. + */ + PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ + dlist_head lockGroupMembers; /* list of members, if I'm a leader */ + dlist_node lockGroupLink; /* my member link, if I'm a member */ + /* Info about lock the process is currently waiting for, if any. */ /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ @@ -265,14 +305,30 @@ struct PGPROC LOCKMODE waitLockMode; /* type of lock we're waiting for */ LOCKMASK heldLocks; /* bitmask for lock types already held on this * lock object by this backend */ + pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition * started */ - int delayChkptFlags; /* for DELAY_CHKPT_* flags */ + ProcWaitStatus waitStatus; - uint8 statusFlags; /* this backend's status flags, see PROC_* - * above. mirrored in - * ProcGlobal->statusFlags[pgxactoff] */ + /* + * All PROCLOCK objects for locks held or awaited by this backend are + * linked into one of these lists, according to the partition number of + * their lock. + */ + dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; + + /*-- recording fast-path locks taken by this backend. --*/ + LWLock fpInfoLock; /* protects per-backend fast-path state */ + uint64 *fpLockBits; /* lock modes held for each fast-path slot */ + Oid *fpRelId; /* slots for rel oids */ + bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ + LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID + * lock */ + + /************************************************************************ + * Synchronous replication waiting + ************************************************************************/ /* * Info to allow us to wait for synchronous replication, if needed. @@ -284,18 +340,10 @@ struct PGPROC int syncRepState; /* wait state for sync rep */ dlist_node syncRepLinks; /* list link if process is in syncrep queue */ - /* - * All PROCLOCK objects for locks held or awaited by this backend are - * linked into one of these lists, according to the partition number of - * their lock. - */ - dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; - - XidCacheStatus subxidStatus; /* mirrored with - * ProcGlobal->subxidStates[i] */ - struct XidCache subxids; /* cache for subtransaction XIDs */ + /************************************************************************ + * Support for group XID clearing + ************************************************************************/ - /* Support for group XID clearing. */ /* true, if member of ProcArray group waiting for XID clear */ bool procArrayGroupMember; /* next ProcArray group member waiting for XID clear */ @@ -307,9 +355,10 @@ struct PGPROC */ TransactionId procArrayGroupMemberXid; - uint32 wait_event_info; /* proc's wait information */ + /************************************************************************ + * Support for group transaction status update + ************************************************************************/ - /* Support for group transaction status update. */ bool clogGroupMember; /* true, if member of clog group */ pg_atomic_uint32 clogGroupNext; /* next clog group member */ TransactionId clogGroupMemberXid; /* transaction id of clog group member */ @@ -320,24 +369,12 @@ struct PGPROC XLogRecPtr clogGroupMemberLsn; /* WAL location of commit record for clog * group member */ - /* Lock manager data, recording fast-path locks taken by this backend. */ - LWLock fpInfoLock; /* protects per-backend fast-path state */ - uint64 *fpLockBits; /* lock modes held for each fast-path slot */ - Oid *fpRelId; /* slots for rel oids */ - bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ - LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID - * lock */ + /************************************************************************ + * Status reporting + ************************************************************************/ - /* - * Support for lock groups. Use LockHashPartitionLockByProc on the group - * leader to get the LWLock protecting these fields. - */ - PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ - dlist_head lockGroupMembers; /* list of members, if I'm a leader */ - dlist_node lockGroupLink; /* my member link, if I'm a member */ -}; - -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ + uint32 wait_event_info; /* proc's wait information */ +} PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.47.3 [text/x-patch] v2-0002-Align-PGPROC-to-cache-line-boundary.patch (1.5K, 3-v2-0002-Align-PGPROC-to-cache-line-boundary.patch) download | inline diff: From ce1218929f2e14106c0c1160bfa941ffe0b6b745 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Fri, 20 Feb 2026 22:58:40 +0200 Subject: [PATCH v2 2/2] Align PGPROC to cache line boundary On common architectures, the PGPROC happened to be a multiple of 64 bytes on PG 18, but it's changed on 'master' since. There was worry that changing the alignment might hurt performance, due to false cacheline sharing across PGPROC elements. However, there was no explicit alignment, so any alignment to cache lines was accidental. Add explicit alignment to remove worry about false sharing. Reviewed-by: Bertrand Drouvot <[email protected]> Discussion: https://www.postgresql.org/message-id/[email protected] --- src/include/storage/proc.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index b2fd4d02959..a8d2e7db1a1 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -374,8 +374,16 @@ typedef struct PGPROC ************************************************************************/ uint32 wait_event_info; /* proc's wait information */ -} PGPROC; +} +/* + * If compiler understands aligned pragma, use it to align the struct at cache + * line boundaries. This is just for performance, to avoid false sharing. + */ +#if defined(pg_attribute_aligned) + pg_attribute_aligned(PG_CACHE_LINE_SIZE) +#endif +PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.47.3 ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-21 09:42 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Bertrand Drouvot @ 2026-02-21 09:42 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> Hi, On Fri, Feb 20, 2026 at 11:03:09PM +0200, Heikki Linnakangas wrote: > On 11/02/2026 06:40, Bertrand Drouvot wrote: > > That looks ok to see PGPROC as an "acceptable" one, if not, should we use the > > union trick? > > It seems acceptable to just not align it if the compiler doesn't support it. > This is just a performance optimization, after all. Agreed. > Attached is new versions the remaining patches. I think these are ready to > be committed. Thanks! One nit, 0001 is adding the typedef: " -struct PGPROC +typedef struct PGPROC . . . -}; - -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ + uint32 wait_event_info; /* proc's wait information */ +} PGPROC; " Would that make more sense to add the typedef when we introduce the explicit alignment in 0002 (like it was done in your previous v2-0001-Align-PGPROC-to-cache-line-boundary.patch up-thread)? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-22 11:15 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 0 replies; 22+ messages in thread From: Heikki Linnakangas @ 2026-02-22 11:15 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> On 21/02/2026 11:42, Bertrand Drouvot wrote: > On Fri, Feb 20, 2026 at 11:03:09PM +0200, Heikki Linnakangas wrote: >> On 11/02/2026 06:40, Bertrand Drouvot wrote: >>> That looks ok to see PGPROC as an "acceptable" one, if not, should we use the >>> union trick? >> >> It seems acceptable to just not align it if the compiler doesn't support it. >> This is just a performance optimization, after all. > > Agreed. > >> Attached is new versions the remaining patches. I think these are ready to >> be committed. > > Thanks! > > One nit, 0001 is adding the typedef: > > " > -struct PGPROC > +typedef struct PGPROC > . > . > . > -}; > - > -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ > + uint32 wait_event_info; /* proc's wait information */ > +} PGPROC; > " > > Would that make more sense to add the typedef when we introduce the explicit > alignment in 0002 (like it was done in your previous > v2-0001-Align-PGPROC-to-cache-line-boundary.patch up-thread)? Yeah, I had it as part of the other commit in the previous version, but decided to make it part of the other one, so that it's more clear what the alignment. I don't think it matters much either way though. Pushed, thanks for the review! - Heikki ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-23 16:13 Peter Eisentraut <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 1 reply; 22+ messages in thread From: Peter Eisentraut @ 2026-02-23 16:13 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; [email protected] <[email protected]> On 13.02.26 09:03, Bertrand Drouvot wrote: > +/* > + * If compiler understands aligned pragma, use it to align the struct at cache > + * line boundaries. This is just for performance, to (a) avoid false sharing > + * and (b) to make the multiplication / division to convert between PGPROC * > + * and ProcNumber be a little cheaper. > + */ > +#if defined(pg_attribute_aligned) > + pg_attribute_aligned(PG_CACHE_LINE_SIZE) > +#endif > +PGPROC; You can/should use C11 standard alignas(), so you don't need to worry about whether it's supported or not. ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-24 11:28 Bertrand Drouvot <[email protected]> parent: Peter Eisentraut <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Bertrand Drouvot @ 2026-02-24 11:28 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Andres Freund <[email protected]>; [email protected] <[email protected]> Hi, On Mon, Feb 23, 2026 at 05:13:29PM +0100, Peter Eisentraut wrote: > On 13.02.26 09:03, Bertrand Drouvot wrote: > > +/* > > + * If compiler understands aligned pragma, use it to align the struct at cache > > + * line boundaries. This is just for performance, to (a) avoid false sharing > > + * and (b) to make the multiplication / division to convert between PGPROC * > > + * and ProcNumber be a little cheaper. > > + */ > > +#if defined(pg_attribute_aligned) > > + pg_attribute_aligned(PG_CACHE_LINE_SIZE) > > +#endif > > +PGPROC; > > You can/should use C11 standard alignas(), so you don't need to worry about > whether it's supported or not. Oh right, I did not notice 300c8f53247 and following like e7075a3405c, d4c0f91f7d5 and 97e04c74bed. PFA, 0001 doing so for PGPROC and PgAioUringContext. As those are typedef, the patch puts alignas within the struct. For PGPROC at the start of the struct, I think that placing it on the first member is the right location because it ensures the whole struct is aligned to PG_CACHE_LINE_SIZE without adding padding before this member. For example if I set it on backendType, then it adds 100 bytes of padding and the struct is obviously still a multiple of PG_CACHE_LINE_SIZE but is now 1024 bytes (instead of 896). For PgAioUringContext at completion_lock (like suggested by Andres in [1]), which is also the start of the struct. I checked and the padding for those are exactly the same after the changes. 0002, is also making use of alignas in ItemPointerData, but this one is more tricky so I'm not sure that's worth it (given the fact that we still need to keep pg_attribute_aligned() as explained by Peter in [2]). [1]: https://postgr.es/m/lsgnps74ictxtm5karcobenxtt5rvoylvbvx7ja6r5rjcund7v%40owxqgv7gisns [2]: https://postgr.es/m/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-03-16 10:42 Peter Eisentraut <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Peter Eisentraut @ 2026-03-16 10:42 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Andres Freund <[email protected]>; [email protected] <[email protected]> On 24.02.26 12:28, Bertrand Drouvot wrote: >> You can/should use C11 standard alignas(), so you don't need to worry about >> whether it's supported or not. > > Oh right, I did not notice 300c8f53247 and following like e7075a3405c, d4c0f91f7d5 > and 97e04c74bed. > > PFA, 0001 doing so for PGPROC and PgAioUringContext. As those are typedef, > the patch puts alignas within the struct. > > For PGPROC at the start of the struct, I think that placing it on the first member > is the right location because it ensures the whole struct is aligned to PG_CACHE_LINE_SIZE > without adding padding before this member. For example if I set it on backendType, > then it adds 100 bytes of padding and the struct is obviously still a multiple of > PG_CACHE_LINE_SIZE but is now 1024 bytes (instead of 896). > > For PgAioUringContext at completion_lock (like suggested by Andres in [1]), which > is also the start of the struct. > > I checked and the padding for those are exactly the same after the changes. I have committed the 0001 patch. > 0002, is also making use of alignas in ItemPointerData, but this one is more > tricky so I'm not sure that's worth it (given the fact that we still need to > keep pg_attribute_aligned() as explained by Peter in [2]). This doesn't seem worth it to me. Let's skip it. ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-03-16 11:16 Bertrand Drouvot <[email protected]> parent: Peter Eisentraut <[email protected]> 0 siblings, 1 reply; 22+ messages in thread From: Bertrand Drouvot @ 2026-03-16 11:16 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Andres Freund <[email protected]>; [email protected] <[email protected]> Hi, On Mon, Mar 16, 2026 at 11:42:35AM +0100, Peter Eisentraut wrote: > On 24.02.26 12:28, Bertrand Drouvot wrote: > > > You can/should use C11 standard alignas(), so you don't need to worry about > > > whether it's supported or not. > > > > Oh right, I did not notice 300c8f53247 and following like e7075a3405c, d4c0f91f7d5 > > and 97e04c74bed. > > > > PFA, 0001 doing so for PGPROC and PgAioUringContext. As those are typedef, > > the patch puts alignas within the struct. > > > > For PGPROC at the start of the struct, I think that placing it on the first member > > is the right location because it ensures the whole struct is aligned to PG_CACHE_LINE_SIZE > > without adding padding before this member. For example if I set it on backendType, > > then it adds 100 bytes of padding and the struct is obviously still a multiple of > > PG_CACHE_LINE_SIZE but is now 1024 bytes (instead of 896). > > > > For PgAioUringContext at completion_lock (like suggested by Andres in [1]), which > > is also the start of the struct. > > > > I checked and the padding for those are exactly the same after the changes. > > I have committed the 0001 patch. Thanks! I don't know why but I don't see it in https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary, or in the github repo though. > > 0002, is also making use of alignas in ItemPointerData, but this one is more > > tricky so I'm not sure that's worth it (given the fact that we still need to > > keep pg_attribute_aligned() as explained by Peter in [2]). > > This doesn't seem worth it to me. Let's skip it. Yeah, that makes sense. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-03-16 15:52 Bertrand Drouvot <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 0 replies; 22+ messages in thread From: Bertrand Drouvot @ 2026-03-16 15:52 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Andres Freund <[email protected]>; [email protected] <[email protected]> Hi, On Mon, Mar 16, 2026 at 11:16:51AM +0000, Bertrand Drouvot wrote: > Hi, > > On Mon, Mar 16, 2026 at 11:42:35AM +0100, Peter Eisentraut wrote: > > On 24.02.26 12:28, Bertrand Drouvot wrote: > > > > You can/should use C11 standard alignas(), so you don't need to worry about > > > > whether it's supported or not. > > > > > > Oh right, I did not notice 300c8f53247 and following like e7075a3405c, d4c0f91f7d5 > > > and 97e04c74bed. > > > > > > PFA, 0001 doing so for PGPROC and PgAioUringContext. As those are typedef, > > > the patch puts alignas within the struct. > > > > > > For PGPROC at the start of the struct, I think that placing it on the first member > > > is the right location because it ensures the whole struct is aligned to PG_CACHE_LINE_SIZE > > > without adding padding before this member. For example if I set it on backendType, > > > then it adds 100 bytes of padding and the struct is obviously still a multiple of > > > PG_CACHE_LINE_SIZE but is now 1024 bytes (instead of 896). > > > > > > For PgAioUringContext at completion_lock (like suggested by Andres in [1]), which > > > is also the start of the struct. > > > > > > I checked and the padding for those are exactly the same after the changes. > > > > I have committed the 0001 patch. > > Thanks! I don't know why but I don't see it in https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary, > or in the github repo though. FWIW, I reached out to sysadmins@ and it has now been fixed. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 22+ messages in thread
end of thread, other threads:[~2026-03-16 15:52 UTC | newest] Thread overview: 22+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-02-10 14:32 pgsql: Separate RecoveryConflictReasons from procsignals Heikki Linnakangas <[email protected]> 2026-02-10 15:19 ` Bertrand Drouvot <[email protected]> 2026-02-10 15:52 ` Heikki Linnakangas <[email protected]> 2026-02-10 16:41 ` Andres Freund <[email protected]> 2026-02-10 17:14 ` PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Heikki Linnakangas <[email protected]> 2026-02-10 18:15 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Andres Freund <[email protected]> 2026-02-10 19:15 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-02-10 19:46 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Andres Freund <[email protected]> 2026-02-10 20:53 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Heikki Linnakangas <[email protected]> 2026-02-11 04:40 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-02-11 10:03 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Heikki Linnakangas <[email protected]> 2026-02-11 10:58 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-02-13 08:03 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-02-20 21:03 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Heikki Linnakangas <[email protected]> 2026-02-21 09:42 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-02-22 11:15 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Heikki Linnakangas <[email protected]> 2026-02-23 16:13 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Peter Eisentraut <[email protected]> 2026-02-24 11:28 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-03-16 10:42 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Peter Eisentraut <[email protected]> 2026-03-16 11:16 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-03-16 15:52 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Bertrand Drouvot <[email protected]> 2026-02-20 21:01 ` Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Heikki Linnakangas <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox