public inbox for [email protected]help / color / mirror / Atom feed
Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) 14+ messages / 4 participants [nested] [flat]
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-10 19:46 Andres Freund <[email protected]> 0 siblings, 1 reply; 14+ messages in thread From: Andres Freund @ 2026-02-10 19:46 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; pgsql-hackers Hi, On 2026-02-10 19:15:27 +0000, Bertrand Drouvot wrote: > On Tue, Feb 10, 2026 at 01:15:01PM -0500, Andres Freund wrote: > > On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: > > Yea, I don't think we need to be perfect here. Just a bit less bad. And, as > > you say, the current order doesn't make a lot of sense. > > Just grouping things like > > - pid, pgxactoff, backendType (i.e. barely if ever changing) > > - wait_event_info, waitStart (i.e. very frequently changing, but typically > > accessed within one proc) > > - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and > > accessed across processes) > > With an ordering like in the attached (to apply on top of Heikki's patch), we're > back to 832 bytes. You'd really need to insert padding between the sections to make it work... > But, then the pg_attribute_aligned() added in Heikki's patch makes it 896 bytes... > > " > /* 816 | 16 */ dlist_node lockGroupLink; > /* XXX 64-byte padding */ > > /* total size (bytes): 896 */ > } > " > > What about applying this new ordering and remove the pg_attribute_aligned()? > (I thought the aligned attribute would be smarter than that and not add this > 64 padding bytes). That's just because we have /* * Assumed cache line size. This doesn't affect correctness, but can be used * for low-level optimizations. This is mostly used to pad various data * structures, to ensure that highly-contended fields are on different cache * lines. Too small a value can hurt performance due to false sharing, while * the only downside of too large a value is a few bytes of wasted memory. * The default is 128, which should be large enough for all supported * platforms. */ #define PG_CACHE_LINE_SIZE 128 I don't think we need to worry about the number of bytes here very much. This isn't much compared to all the other overheads a connection slot has (like the memory for locks). Greetings, Andres Freund ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-10 20:53 Heikki Linnakangas <[email protected]> parent: Andres Freund <[email protected]> 0 siblings, 1 reply; 14+ messages in thread From: Heikki Linnakangas @ 2026-02-10 20:53 UTC (permalink / raw) To: Andres Freund <[email protected]>; Bertrand Drouvot <[email protected]>; +Cc: pgsql-hackers On 10/02/2026 21:46, Andres Freund wrote: > On 2026-02-10 19:15:27 +0000, Bertrand Drouvot wrote: >> On Tue, Feb 10, 2026 at 01:15:01PM -0500, Andres Freund wrote: >>> On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: >>> Yea, I don't think we need to be perfect here. Just a bit less bad. And, as >>> you say, the current order doesn't make a lot of sense. >>> Just grouping things like >>> - pid, pgxactoff, backendType (i.e. barely if ever changing) >>> - wait_event_info, waitStart (i.e. very frequently changing, but typically >>> accessed within one proc) >>> - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and >>> accessed across processes) >> >> With an ordering like in the attached (to apply on top of Heikki's patch), we're >> back to 832 bytes. > > You'd really need to insert padding between the sections to make it work... Here's my attempt at grouping things more logically. I didn't insert padding and also didn't try to avoid alignment padding. I tried to optimize for readability rather than size or performance. That said, I would assume that grouping things logically like this would also help to avoid false sharing. If not, inserting explicit padding seems like a a good fix. I also think we should split 'links' into two fields. For clarity. With this, sizeof(PGPROC) == 864 without the explicit alignment to PG_CACHE_LINE_SIZE, and 896 with it. - Heikki Attachments: [text/x-patch] v2-0001-Align-PGPROC-to-cache-line-boundary.patch (1.5K, 2-v2-0001-Align-PGPROC-to-cache-line-boundary.patch) download | inline diff: From 83481187f6e2f8c177bc85c522917d64e1b78b4b Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 18:53:31 +0200 Subject: [PATCH v2 1/4] Align PGPROC to cache line boundary --- src/include/storage/proc.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index ac0df4aeaaa..53acce8a5a1 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -182,7 +182,7 @@ typedef enum * * See PROC_HDR for details. */ -struct PGPROC +typedef struct PGPROC { dlist_node links; /* list link if process is in a list */ dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ @@ -337,10 +337,18 @@ struct PGPROC PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ dlist_head lockGroupMembers; /* list of members, if I'm a leader */ dlist_node lockGroupLink; /* my member link, if I'm a member */ -}; - -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ +} +/* + * If compiler understands aligned pragma, use it to align the struct at cache + * line boundaries. This is just for performance, to (a) avoid false sharing + * and (b) to make the multiplication / division to convert between PGPROC * + * and ProcNumber be a little cheaper. + */ +#if defined(pg_attribute_aligned) + pg_attribute_aligned(PG_CACHE_LINE_SIZE) +#endif +PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.47.3 [text/x-patch] v2-0002-Remove-useless-store.patch (900B, 3-v2-0002-Remove-useless-store.patch) download | inline diff: From f5485ae9eb4b12eb7e57b8b7e7fbdaa0df7c5575 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 21:54:56 +0200 Subject: [PATCH v2 2/4] Remove useless store It was a mishap introduced in commit 5764f611e1, which converted the loop to use dclist_foreach --- src/backend/storage/lmgr/lock.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 7f0cd784f79..e1168ad3837 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -4148,7 +4148,6 @@ GetSingleProcBlockerStatusData(PGPROC *blocked_proc, BlockedProcsData *data) if (queued_proc == blocked_proc) break; data->waiter_pids[data->npids++] = queued_proc->pid; - queued_proc = (PGPROC *) queued_proc->links.next; } bproc->num_locks = data->nlocks - bproc->first_lock; -- 2.47.3 [text/x-patch] v2-0003-Split-PGPROC-links-field-into-two-for-clarity.patch (13.0K, 4-v2-0003-Split-PGPROC-links-field-into-two-for-clarity.patch) download | inline diff: From b4cc9e67a3fb8344a7c4934723fde3d012f2105e Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 22:45:08 +0200 Subject: [PATCH v2 3/4] Split PGPROC 'links' field into two, for clarity The same field was mainly used for the position in a LOCK's wait queue, but also as the position in a the freelist when the PGPROC entry was not in use. The reuse saves a some memory, at the expense of readability, which seems like a bad tradeoff. If we wanted to make the struct smaller there's other things we could do, but we're actually just discussing adding padding to the struct for performance reasons. --- src/backend/access/transam/twophase.c | 2 +- src/backend/storage/lmgr/deadlock.c | 12 ++++----- src/backend/storage/lmgr/lock.c | 6 ++--- src/backend/storage/lmgr/proc.c | 37 ++++++++++++++------------- src/include/storage/proc.h | 8 +++--- 5 files changed, 32 insertions(+), 33 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index eabc4d48208..e4340b59640 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -447,7 +447,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, FullTransactionId fxid, /* Initialize the PGPROC entry */ MemSet(proc, 0, sizeof(PGPROC)); - dlist_node_init(&proc->links); proc->waitStatus = PROC_WAIT_STATUS_OK; if (LocalTransactionIdIsValid(MyProc->vxid.lxid)) { @@ -474,6 +473,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, FullTransactionId fxid, proc->lwWaiting = LW_WS_NOT_WAITING; proc->lwWaitMode = 0; proc->waitLock = NULL; + dlist_node_init(&proc->waitLink); proc->waitProcLock = NULL; pg_atomic_init_u64(&proc->waitStart, 0); for (i = 0; i < NUM_LOCK_PARTITIONS; i++) diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c index 0a8dd5eb7c2..d642500d4c9 100644 --- a/src/backend/storage/lmgr/deadlock.c +++ b/src/backend/storage/lmgr/deadlock.c @@ -260,7 +260,7 @@ DeadLockCheck(PGPROC *proc) /* Reset the queue and re-add procs in the desired order */ dclist_init(waitQueue); for (int j = 0; j < nProcs; j++) - dclist_push_tail(waitQueue, &procs[j]->links); + dclist_push_tail(waitQueue, &procs[j]->waitLink); #ifdef DEBUG_DEADLOCK PrintLockQueue(lock, "rearranged to:"); @@ -502,7 +502,7 @@ FindLockCycleRecurse(PGPROC *checkProc, * If the process is waiting, there is an outgoing waits-for edge to each * process that blocks it. */ - if (checkProc->links.next != NULL && checkProc->waitLock != NULL && + if (!dlist_node_is_detached(&checkProc->waitLink) && FindLockCycleRecurseMember(checkProc, checkProc, depth, softEdges, nSoftEdges)) return true; @@ -520,7 +520,7 @@ FindLockCycleRecurse(PGPROC *checkProc, memberProc = dlist_container(PGPROC, lockGroupLink, iter.cur); - if (memberProc->links.next != NULL && memberProc->waitLock != NULL && + if (!dlist_node_is_detached(&memberProc->waitLink) && memberProc->waitLock != NULL && memberProc != checkProc && FindLockCycleRecurseMember(memberProc, checkProc, depth, softEdges, nSoftEdges)) @@ -713,7 +713,7 @@ FindLockCycleRecurseMember(PGPROC *checkProc, { dclist_foreach(proc_iter, waitQueue) { - proc = dlist_container(PGPROC, links, proc_iter.cur); + proc = dlist_container(PGPROC, waitLink, proc_iter.cur); if (proc->lockGroupLeader == checkProcLeader) lastGroupMember = proc; @@ -728,7 +728,7 @@ FindLockCycleRecurseMember(PGPROC *checkProc, { PGPROC *leader; - proc = dlist_container(PGPROC, links, proc_iter.cur); + proc = dlist_container(PGPROC, waitLink, proc_iter.cur); leader = proc->lockGroupLeader == NULL ? proc : proc->lockGroupLeader; @@ -877,7 +877,7 @@ TopoSort(LOCK *lock, i = 0; dclist_foreach(proc_iter, waitQueue) { - proc = dlist_container(PGPROC, links, proc_iter.cur); + proc = dlist_container(PGPROC, waitLink, proc_iter.cur); topoProcs[i++] = proc; } Assert(i == queue_size); diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index e1168ad3837..d930c66cdbd 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -2052,13 +2052,13 @@ RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode) /* Make sure proc is waiting */ Assert(proc->waitStatus == PROC_WAIT_STATUS_WAITING); - Assert(proc->links.next != NULL); + Assert(!dlist_node_is_detached(&proc->waitLink)); Assert(waitLock); Assert(!dclist_is_empty(&waitLock->waitProcs)); Assert(0 < lockmethodid && lockmethodid < lengthof(LockMethods)); /* Remove proc from lock's wait queue */ - dclist_delete_from_thoroughly(&waitLock->waitProcs, &proc->links); + dclist_delete_from_thoroughly(&waitLock->waitProcs, &proc->waitLink); /* Undo increments of request counts by waiting process */ Assert(waitLock->nRequested > 0); @@ -4143,7 +4143,7 @@ GetSingleProcBlockerStatusData(PGPROC *blocked_proc, BlockedProcsData *data) /* Collect PIDs from the lock's wait queue, stopping at blocked_proc */ dclist_foreach(proc_iter, waitQueue) { - PGPROC *queued_proc = dlist_container(PGPROC, links, proc_iter.cur); + PGPROC *queued_proc = dlist_container(PGPROC, waitLink, proc_iter.cur); if (queued_proc == blocked_proc) break; diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 31ccdb1ef89..dbc2ad931b8 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -331,25 +331,25 @@ InitProcGlobal(void) if (i < MaxConnections) { /* PGPROC for normal backend, add to freeProcs list */ - dlist_push_tail(&ProcGlobal->freeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->freeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->freeProcs; } else if (i < MaxConnections + autovacuum_worker_slots + NUM_SPECIAL_WORKER_PROCS) { /* PGPROC for AV or special worker, add to autovacFreeProcs list */ - dlist_push_tail(&ProcGlobal->autovacFreeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->autovacFreeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->autovacFreeProcs; } else if (i < MaxConnections + autovacuum_worker_slots + NUM_SPECIAL_WORKER_PROCS + max_worker_processes) { /* PGPROC for bgworker, add to bgworkerFreeProcs list */ - dlist_push_tail(&ProcGlobal->bgworkerFreeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->bgworkerFreeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->bgworkerFreeProcs; } else if (i < MaxBackends) { /* PGPROC for walsender, add to walsenderFreeProcs list */ - dlist_push_tail(&ProcGlobal->walsenderFreeProcs, &proc->links); + dlist_push_tail(&ProcGlobal->walsenderFreeProcs, &proc->freeProcsLink); proc->procgloballist = &ProcGlobal->walsenderFreeProcs; } @@ -438,7 +438,7 @@ InitProcess(void) if (!dlist_is_empty(procgloballist)) { - MyProc = dlist_container(PGPROC, links, dlist_pop_head_node(procgloballist)); + MyProc = dlist_container(PGPROC, freeProcsLink, dlist_pop_head_node(procgloballist)); SpinLockRelease(ProcStructLock); } else @@ -471,7 +471,7 @@ InitProcess(void) * Initialize all fields of MyProc, except for those previously * initialized by InitProcGlobal. */ - dlist_node_init(&MyProc->links); + dlist_node_init(&MyProc->freeProcsLink); MyProc->waitStatus = PROC_WAIT_STATUS_OK; MyProc->fpVXIDLock = false; MyProc->fpLocalTransactionId = InvalidLocalTransactionId; @@ -493,6 +493,7 @@ InitProcess(void) MyProc->lwWaiting = LW_WS_NOT_WAITING; MyProc->lwWaitMode = 0; MyProc->waitLock = NULL; + dlist_node_init(&MyProc->waitLink); MyProc->waitProcLock = NULL; pg_atomic_write_u64(&MyProc->waitStart, 0); #ifdef USE_ASSERT_CHECKING @@ -672,7 +673,7 @@ InitAuxiliaryProcess(void) * Initialize all fields of MyProc, except for those previously * initialized by InitProcGlobal. */ - dlist_node_init(&MyProc->links); + dlist_node_init(&MyProc->freeProcsLink); MyProc->waitStatus = PROC_WAIT_STATUS_OK; MyProc->fpVXIDLock = false; MyProc->fpLocalTransactionId = InvalidLocalTransactionId; @@ -689,6 +690,7 @@ InitAuxiliaryProcess(void) MyProc->lwWaiting = LW_WS_NOT_WAITING; MyProc->lwWaitMode = 0; MyProc->waitLock = NULL; + dlist_node_init(&MyProc->waitLink); MyProc->waitProcLock = NULL; pg_atomic_write_u64(&MyProc->waitStart, 0); #ifdef USE_ASSERT_CHECKING @@ -849,7 +851,7 @@ LockErrorCleanup(void) partitionLock = LockHashPartitionLock(lockAwaited->hashcode); LWLockAcquire(partitionLock, LW_EXCLUSIVE); - if (!dlist_node_is_detached(&MyProc->links)) + if (!dlist_node_is_detached(&MyProc->waitLink)) { /* We could not have been granted the lock yet */ RemoveFromWaitQueue(MyProc, lockAwaited->hashcode); @@ -981,7 +983,7 @@ ProcKill(int code, Datum arg) /* Leader exited first; return its PGPROC. */ SpinLockAcquire(ProcStructLock); - dlist_push_head(procgloballist, &leader->links); + dlist_push_head(procgloballist, &leader->freeProcsLink); SpinLockRelease(ProcStructLock); } } @@ -1026,7 +1028,7 @@ ProcKill(int code, Datum arg) Assert(dlist_is_empty(&proc->lockGroupMembers)); /* Return PGPROC structure (and semaphore) to appropriate freelist */ - dlist_push_tail(procgloballist, &proc->links); + dlist_push_tail(procgloballist, &proc->freeProcsLink); } /* Update shared estimate of spins_per_delay */ @@ -1215,7 +1217,7 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait) dclist_foreach(iter, waitQueue) { - PGPROC *proc = dlist_container(PGPROC, links, iter.cur); + PGPROC *proc = dlist_container(PGPROC, waitLink, iter.cur); /* * If we're part of the same locking group as this waiter, its @@ -1279,9 +1281,9 @@ JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait) * Insert self into queue, at the position determined above. */ if (insert_before) - dclist_insert_before(waitQueue, &insert_before->links, &MyProc->links); + dclist_insert_before(waitQueue, &insert_before->waitLink, &MyProc->waitLink); else - dclist_push_tail(waitQueue, &MyProc->links); + dclist_push_tail(waitQueue, &MyProc->waitLink); lock->waitMask |= LOCKBIT_ON(lockmode); @@ -1715,13 +1717,13 @@ ProcSleep(LOCALLOCK *locallock) void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus) { - if (dlist_node_is_detached(&proc->links)) + if (dlist_node_is_detached(&proc->waitLink)) return; Assert(proc->waitStatus == PROC_WAIT_STATUS_WAITING); /* Remove process from wait queue */ - dclist_delete_from_thoroughly(&proc->waitLock->waitProcs, &proc->links); + dclist_delete_from_thoroughly(&proc->waitLock->waitProcs, &proc->waitLink); /* Clean up process' state and pass it the ok/fail signal */ proc->waitLock = NULL; @@ -1752,7 +1754,7 @@ ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock) dclist_foreach_modify(miter, waitQueue) { - PGPROC *proc = dlist_container(PGPROC, links, miter.cur); + PGPROC *proc = dlist_container(PGPROC, waitLink, miter.cur); LOCKMODE lockmode = proc->waitLockMode; /* @@ -1816,8 +1818,7 @@ CheckDeadLock(void) * We check by looking to see if we've been unlinked from the wait queue. * This is safe because we hold the lock partition lock. */ - if (MyProc->links.prev == NULL || - MyProc->links.next == NULL) + if (dlist_node_is_detached(&MyProc->waitLink)) { result = DS_NO_DEADLOCK; goto check_done; diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 53acce8a5a1..f642830fcf8 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -154,10 +154,6 @@ typedef enum * Each backend has a PGPROC struct in shared memory. There is also a list of * currently-unused PGPROC structs that will be reallocated to new backends. * - * links: list link for any list the PGPROC is in. When waiting for a lock, - * the PGPROC is linked into that lock's waitProcs queue. A recycled PGPROC - * is linked into ProcGlobal's freeProcs list. - * * Note: twophase.c also sets up a dummy PGPROC struct for each currently * prepared transaction. These PGPROCs appear in the ProcArray data structure * so that the prepared transactions appear to be still running and are @@ -184,8 +180,9 @@ typedef enum */ typedef struct PGPROC { - dlist_node links; /* list link if process is in a list */ dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ + dlist_node freeProcsLink; /* link in procgloballist, when in recycled + * state */ PGSemaphore sem; /* ONE semaphore to sleep on */ ProcWaitStatus waitStatus; @@ -263,6 +260,7 @@ typedef struct PGPROC /* Info about lock the process is currently waiting for, if any. */ /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ + dlist_node waitLink; /* position in waitLock->waitProcs queue */ PROCLOCK *waitProcLock; /* Per-holder info for awaited lock */ LOCKMODE waitLockMode; /* type of lock we're waiting for */ LOCKMASK heldLocks; /* bitmask for lock types already held on this -- 2.47.3 [text/x-patch] v2-0004-Rearrange-fields-in-PGPROC-for-clarity.patch (8.0K, 5-v2-0004-Rearrange-fields-in-PGPROC-for-clarity.patch) download | inline diff: From 455db18cb525d151f0dee40738023efbcdecdae4 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Tue, 10 Feb 2026 22:47:22 +0200 Subject: [PATCH v2 4/4] Rearrange fields in PGPROC, for clarity --- src/include/storage/proc.h | 124 +++++++++++++++++++++---------------- 1 file changed, 71 insertions(+), 53 deletions(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index f642830fcf8..b1077562cb5 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -184,27 +184,31 @@ typedef struct PGPROC dlist_node freeProcsLink; /* link in procgloballist, when in recycled * state */ - PGSemaphore sem; /* ONE semaphore to sleep on */ - ProcWaitStatus waitStatus; - - Latch procLatch; /* generic latch for process */ + /*---- Backend identity ----*/ + /* + * These fields that don't change after backend startup, or only very + * rarely + */ + int pid; /* Backend's process ID; 0 if prepared xact */ + BackendType backendType; /* what kind of process is this? */ - TransactionId xid; /* id of top-level transaction currently being - * executed by this proc, if running and XID - * is assigned; else InvalidTransactionId. - * mirrored in ProcGlobal->xids[pgxactoff] */ - - TransactionId xmin; /* minimal running XID as it was when we were - * starting our xact, excluding LAZY VACUUM: - * vacuum must not remove tuples deleted by - * xid >= xmin ! */ + /* These fields are zero while a backend is still starting up: */ + Oid databaseId; /* OID of database this backend is using */ + Oid roleId; /* OID of role using this backend */ - int pid; /* Backend's process ID; 0 if prepared xact */ + Oid tempNamespaceId; /* OID of temp schema this backend is + * using */ int pgxactoff; /* offset into various ProcGlobal->arrays with * data mirrored from this PGPROC */ + uint8 statusFlags; /* this backend's status flags, see PROC_* + * above. mirrored in + * ProcGlobal->statusFlags[pgxactoff] */ + + /*---- Transactions and snapshots ----*/ + /* * Currently running top-level transaction's virtual xid. Together these * form a VirtualTransactionId, but we don't use that struct because this @@ -224,14 +228,27 @@ typedef struct PGPROC * InvalidLocalTransactionId */ } vxid; - /* These fields are zero while a backend is still starting up: */ - Oid databaseId; /* OID of database this backend is using */ - Oid roleId; /* OID of role using this backend */ + TransactionId xid; /* id of top-level transaction currently being + * executed by this proc, if running and XID + * is assigned; else InvalidTransactionId. + * mirrored in ProcGlobal->xids[pgxactoff] */ - Oid tempNamespaceId; /* OID of temp schema this backend is - * using */ + TransactionId xmin; /* minimal running XID as it was when we were + * starting our xact, excluding LAZY VACUUM: + * vacuum must not remove tuples deleted by + * xid >= xmin ! */ - BackendType backendType; /* what kind of process is this? */ + XidCacheStatus subxidStatus; /* mirrored with + * ProcGlobal->subxidStates[i] */ + struct XidCache subxids; /* cache for subtransaction XIDs */ + + /*---- Inter-process signaling ----*/ + + Latch procLatch; /* generic latch for process */ + + PGSemaphore sem; /* ONE semaphore to sleep on */ + + int delayChkptFlags; /* for DELAY_CHKPT_* flags */ /* * While in hot standby mode, shows that a conflict signal has been sent @@ -243,6 +260,8 @@ typedef struct PGPROC */ pg_atomic_uint32 pendingRecoveryConflicts; + /*---- LWLock waiting ----*/ + /* * Info about LWLock the process is currently waiting for, if any. * @@ -257,6 +276,16 @@ typedef struct PGPROC /* Support for condition variables. */ proclist_node cvWaitLink; /* position in CV wait list */ + /*---- Lock manager data ----*/ + + /* + * Support for lock groups. Use LockHashPartitionLockByProc on the group + * leader to get the LWLock protecting these fields. + */ + PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ + dlist_head lockGroupMembers; /* list of members, if I'm a leader */ + dlist_node lockGroupLink; /* my member link, if I'm a member */ + /* Info about lock the process is currently waiting for, if any. */ /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ @@ -265,14 +294,28 @@ typedef struct PGPROC LOCKMODE waitLockMode; /* type of lock we're waiting for */ LOCKMASK heldLocks; /* bitmask for lock types already held on this * lock object by this backend */ + pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition * started */ - int delayChkptFlags; /* for DELAY_CHKPT_* flags */ + ProcWaitStatus waitStatus; - uint8 statusFlags; /* this backend's status flags, see PROC_* - * above. mirrored in - * ProcGlobal->statusFlags[pgxactoff] */ + /* + * All PROCLOCK objects for locks held or awaited by this backend are + * linked into one of these lists, according to the partition number of + * their lock. + */ + dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; + + /*-- recording fast-path locks taken by this backend. --*/ + LWLock fpInfoLock; /* protects per-backend fast-path state */ + uint64 *fpLockBits; /* lock modes held for each fast-path slot */ + Oid *fpRelId; /* slots for rel oids */ + bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ + LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID + * lock */ + + /*---- Synchronous replication waiting ----*/ /* * Info to allow us to wait for synchronous replication, if needed. @@ -284,18 +327,8 @@ typedef struct PGPROC int syncRepState; /* wait state for sync rep */ dlist_node syncRepLinks; /* list link if process is in syncrep queue */ - /* - * All PROCLOCK objects for locks held or awaited by this backend are - * linked into one of these lists, according to the partition number of - * their lock. - */ - dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; - - XidCacheStatus subxidStatus; /* mirrored with - * ProcGlobal->subxidStates[i] */ - struct XidCache subxids; /* cache for subtransaction XIDs */ + /*---- Support for group XID clearing. ----*/ - /* Support for group XID clearing. */ /* true, if member of ProcArray group waiting for XID clear */ bool procArrayGroupMember; /* next ProcArray group member waiting for XID clear */ @@ -307,9 +340,7 @@ typedef struct PGPROC */ TransactionId procArrayGroupMemberXid; - uint32 wait_event_info; /* proc's wait information */ - - /* Support for group transaction status update. */ + /*---- Support for group transaction status update. ----*/ bool clogGroupMember; /* true, if member of clog group */ pg_atomic_uint32 clogGroupNext; /* next clog group member */ TransactionId clogGroupMemberXid; /* transaction id of clog group member */ @@ -320,21 +351,8 @@ typedef struct PGPROC XLogRecPtr clogGroupMemberLsn; /* WAL location of commit record for clog * group member */ - /* Lock manager data, recording fast-path locks taken by this backend. */ - LWLock fpInfoLock; /* protects per-backend fast-path state */ - uint64 *fpLockBits; /* lock modes held for each fast-path slot */ - Oid *fpRelId; /* slots for rel oids */ - bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ - LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID - * lock */ - - /* - * Support for lock groups. Use LockHashPartitionLockByProc on the group - * leader to get the LWLock protecting these fields. - */ - PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ - dlist_head lockGroupMembers; /* list of members, if I'm a leader */ - dlist_node lockGroupLink; /* my member link, if I'm a member */ + /*---- Status reporting ----*/ + uint32 wait_event_info; /* proc's wait information */ } /* -- 2.47.3 ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-11 04:40 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 2 replies; 14+ messages in thread From: Bertrand Drouvot @ 2026-02-11 04:40 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers Hi, On Tue, Feb 10, 2026 at 10:53:58PM +0200, Heikki Linnakangas wrote: > On 10/02/2026 21:46, Andres Freund wrote: > > On 2026-02-10 19:15:27 +0000, Bertrand Drouvot wrote: > > > On Tue, Feb 10, 2026 at 01:15:01PM -0500, Andres Freund wrote: > > > > On 2026-02-10 19:14:44 +0200, Heikki Linnakangas wrote: > > > > Yea, I don't think we need to be perfect here. Just a bit less bad. And, as > > > > you say, the current order doesn't make a lot of sense. > > > > Just grouping things like > > > > - pid, pgxactoff, backendType (i.e. barely if ever changing) > > > > - wait_event_info, waitStart (i.e. very frequently changing, but typically > > > > accessed within one proc) > > > > - sem, lwWaiting, waitLockMode (i.e. stuff that is updated frequently and > > > > accessed across processes) > > > > > > With an ordering like in the attached (to apply on top of Heikki's patch), we're > > > back to 832 bytes. > > > > You'd really need to insert padding between the sections to make it work... > > Here's my attempt at grouping things more logically. Thanks! > I didn't insert padding > and also didn't try to avoid alignment padding. I tried to optimize for > readability rather than size or performance. Yeah, my attempt was to put the size back to 832 bytes but that's probably not worth it as stated by Andres up-thread. > That said, I would assume that > grouping things logically like this would also help to avoid false sharing. > If not, inserting explicit padding seems like a a good fix. > > I also think we should split 'links' into two fields. For clarity. > A few comments: 0001: + * and (b) to make the multiplication / division to convert between PGPROC * + * and ProcNumber be a little cheaper Is that correct if PGPROC size is not a power of 2? 0002: Good catch! 0003: 1/ There is one missing change in PrintLockQueue() ("links" is still used, and that should be replaced by "waitLink"). 2/ change the comment on top of ProcWakeup? " /* * ProcWakeup -- wake up a process by setting its latch. * * Also remove the process from the wait queue and set its links invalid. " s/links/waitLink/? Also, out of curiosity, with 0003 in place PGPROC size goes from 840 to 856. 0004: The grouping looks Ok to me. Just one nit for the added comments: + /*---- Backend identity ----*/ + /*---- Transactions and snapshots ----*/ + /*---- Inter-process signaling ----*/ + /*---- LWLock waiting ----*/ + /*---- Lock manager data ----*/ + /*---- Synchronous replication waiting ----*/ + /*---- Support for group XID clearing. ----*/ + /*---- Support for group transaction status update. ----*/ + /*---- Status reporting ----*/ Some have period and some don't. > With this, sizeof(PGPROC) == 864 without the explicit alignment to > PG_CACHE_LINE_SIZE, and 896 with it. I can see 876 -> 896 on my side: /* 872 | 4 */ uint32 wait_event_info; /* XXX 20-byte padding */ /* total size (bytes): 896 */ } Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-11 10:03 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 2 replies; 14+ messages in thread From: Heikki Linnakangas @ 2026-02-11 10:03 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers On 11/02/2026 06:40, Bertrand Drouvot wrote: > A few comments: > > 0001: > > + * and (b) to make the multiplication / division to convert between PGPROC * > + * and ProcNumber be a little cheaper > > Is that correct if PGPROC size is not a power of 2? You're right, it's not. > 0002: Good catch! Committed that. >> With this, sizeof(PGPROC) == 864 without the explicit alignment to >> PG_CACHE_LINE_SIZE, and 896 with it. > > I can see 876 -> 896 on my side: > > /* 872 | 4 */ uint32 wait_event_info; > /* XXX 20-byte padding */ > > /* total size (bytes): 896 */ > } Interesting. I've attached 'pahole bin/postgres' output from my laptop. It's Linux on arm64. This is with my v2 patches to rearrange the fields, but with the "pg_attribute_aligned(PG_CACHE_LINE_SIZE)" removed. - Heikki struct PGPROC { dlist_head * procgloballist; /* 0 8 */ dlist_node freeProcsLink; /* 8 16 */ int pid; /* 24 4 */ BackendType backendType; /* 28 4 */ Oid databaseId; /* 32 4 */ Oid roleId; /* 36 4 */ Oid tempNamespaceId; /* 40 4 */ int pgxactoff; /* 44 4 */ uint8 statusFlags; /* 48 1 */ /* XXX 3 bytes hole, try to pack */ struct { ProcNumber procNumber; /* 52 4 */ LocalTransactionId lxid; /* 56 4 */ } vxid; /* 52 8 */ TransactionId xid; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ TransactionId xmin; /* 64 4 */ XidCacheStatus subxidStatus; /* 68 2 */ /* XXX 2 bytes hole, try to pack */ struct XidCache subxids; /* 72 256 */ /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */ Latch procLatch; /* 328 16 */ PGSemaphore sem; /* 344 8 */ int delayChkptFlags; /* 352 4 */ pg_atomic_uint32 pendingRecoveryConflicts; /* 356 4 */ uint8 lwWaiting; /* 360 1 */ uint8 lwWaitMode; /* 361 1 */ /* XXX 2 bytes hole, try to pack */ proclist_node lwWaitLink; /* 364 8 */ proclist_node cvWaitLink; /* 372 8 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 6 boundary (384 bytes) --- */ PGPROC * lockGroupLeader; /* 384 8 */ dlist_head lockGroupMembers; /* 392 16 */ dlist_node lockGroupLink; /* 408 16 */ LOCK * waitLock; /* 424 8 */ dlist_node waitLink; /* 432 16 */ /* --- cacheline 7 boundary (448 bytes) --- */ PROCLOCK * waitProcLock; /* 448 8 */ LOCKMODE waitLockMode; /* 456 4 */ LOCKMASK heldLocks; /* 460 4 */ pg_atomic_uint64 waitStart __attribute__((__aligned__(8))); /* 464 8 */ ProcWaitStatus waitStatus; /* 472 4 */ /* XXX 4 bytes hole, try to pack */ dlist_head myProcLocks[16]; /* 480 256 */ /* --- cacheline 11 boundary (704 bytes) was 32 bytes ago --- */ LWLock fpInfoLock; /* 736 16 */ uint64 * fpLockBits; /* 752 8 */ Oid * fpRelId; /* 760 8 */ /* --- cacheline 12 boundary (768 bytes) --- */ _Bool fpVXIDLock; /* 768 1 */ /* XXX 3 bytes hole, try to pack */ LocalTransactionId fpLocalTransactionId; /* 772 4 */ XLogRecPtr waitLSN; /* 776 8 */ int syncRepState; /* 784 4 */ /* XXX 4 bytes hole, try to pack */ dlist_node syncRepLinks; /* 792 16 */ _Bool procArrayGroupMember; /* 808 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 procArrayGroupNext; /* 812 4 */ TransactionId procArrayGroupMemberXid; /* 816 4 */ _Bool clogGroupMember; /* 820 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 clogGroupNext; /* 824 4 */ TransactionId clogGroupMemberXid; /* 828 4 */ /* --- cacheline 13 boundary (832 bytes) --- */ XidStatus clogGroupMemberXidStatus; /* 832 4 */ /* XXX 4 bytes hole, try to pack */ int64 clogGroupMemberPage; /* 840 8 */ XLogRecPtr clogGroupMemberLsn; /* 848 8 */ uint32 wait_event_info; /* 856 4 */ /* size: 864, cachelines: 14, members: 51 */ /* sum members: 828, holes: 10, sum holes: 32 */ /* padding: 4 */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); Attachments: [text/plain] pahole-PGPROC.txt (4.5K, 2-pahole-PGPROC.txt) download | inline: struct PGPROC { dlist_head * procgloballist; /* 0 8 */ dlist_node freeProcsLink; /* 8 16 */ int pid; /* 24 4 */ BackendType backendType; /* 28 4 */ Oid databaseId; /* 32 4 */ Oid roleId; /* 36 4 */ Oid tempNamespaceId; /* 40 4 */ int pgxactoff; /* 44 4 */ uint8 statusFlags; /* 48 1 */ /* XXX 3 bytes hole, try to pack */ struct { ProcNumber procNumber; /* 52 4 */ LocalTransactionId lxid; /* 56 4 */ } vxid; /* 52 8 */ TransactionId xid; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ TransactionId xmin; /* 64 4 */ XidCacheStatus subxidStatus; /* 68 2 */ /* XXX 2 bytes hole, try to pack */ struct XidCache subxids; /* 72 256 */ /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */ Latch procLatch; /* 328 16 */ PGSemaphore sem; /* 344 8 */ int delayChkptFlags; /* 352 4 */ pg_atomic_uint32 pendingRecoveryConflicts; /* 356 4 */ uint8 lwWaiting; /* 360 1 */ uint8 lwWaitMode; /* 361 1 */ /* XXX 2 bytes hole, try to pack */ proclist_node lwWaitLink; /* 364 8 */ proclist_node cvWaitLink; /* 372 8 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 6 boundary (384 bytes) --- */ PGPROC * lockGroupLeader; /* 384 8 */ dlist_head lockGroupMembers; /* 392 16 */ dlist_node lockGroupLink; /* 408 16 */ LOCK * waitLock; /* 424 8 */ dlist_node waitLink; /* 432 16 */ /* --- cacheline 7 boundary (448 bytes) --- */ PROCLOCK * waitProcLock; /* 448 8 */ LOCKMODE waitLockMode; /* 456 4 */ LOCKMASK heldLocks; /* 460 4 */ pg_atomic_uint64 waitStart __attribute__((__aligned__(8))); /* 464 8 */ ProcWaitStatus waitStatus; /* 472 4 */ /* XXX 4 bytes hole, try to pack */ dlist_head myProcLocks[16]; /* 480 256 */ /* --- cacheline 11 boundary (704 bytes) was 32 bytes ago --- */ LWLock fpInfoLock; /* 736 16 */ uint64 * fpLockBits; /* 752 8 */ Oid * fpRelId; /* 760 8 */ /* --- cacheline 12 boundary (768 bytes) --- */ _Bool fpVXIDLock; /* 768 1 */ /* XXX 3 bytes hole, try to pack */ LocalTransactionId fpLocalTransactionId; /* 772 4 */ XLogRecPtr waitLSN; /* 776 8 */ int syncRepState; /* 784 4 */ /* XXX 4 bytes hole, try to pack */ dlist_node syncRepLinks; /* 792 16 */ _Bool procArrayGroupMember; /* 808 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 procArrayGroupNext; /* 812 4 */ TransactionId procArrayGroupMemberXid; /* 816 4 */ _Bool clogGroupMember; /* 820 1 */ /* XXX 3 bytes hole, try to pack */ pg_atomic_uint32 clogGroupNext; /* 824 4 */ TransactionId clogGroupMemberXid; /* 828 4 */ /* --- cacheline 13 boundary (832 bytes) --- */ XidStatus clogGroupMemberXidStatus; /* 832 4 */ /* XXX 4 bytes hole, try to pack */ int64 clogGroupMemberPage; /* 840 8 */ XLogRecPtr clogGroupMemberLsn; /* 848 8 */ uint32 wait_event_info; /* 856 4 */ /* size: 864, cachelines: 14, members: 51 */ /* sum members: 828, holes: 10, sum holes: 32 */ /* padding: 4 */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-11 10:58 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 1 sibling, 0 replies; 14+ messages in thread From: Bertrand Drouvot @ 2026-02-11 10:58 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers Hi, On Wed, Feb 11, 2026 at 12:03:51PM +0200, Heikki Linnakangas wrote: > On 11/02/2026 06:40, Bertrand Drouvot wrote: > > > With this, sizeof(PGPROC) == 864 without the explicit alignment to > > > PG_CACHE_LINE_SIZE, and 896 with it. > > > > I can see 876 -> 896 on my side: > > > > /* 872 | 4 */ uint32 wait_event_info; > > /* XXX 20-byte padding */ > > > > /* total size (bytes): 896 */ > > } > > Interesting. I've attached 'pahole bin/postgres' output from my laptop. It's > Linux on arm64. Thanks! Got it: I was using "-DCACHEDEBUG -DWAL_DEBUG -DLOCK_DEBUG -DDEBUG_DEADLOCK" so that with LOCK_DEBUG in place then LWLock size is 32 bytes (vs 16 in your case). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-13 08:03 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 1 sibling, 2 replies; 14+ messages in thread From: Bertrand Drouvot @ 2026-02-13 08:03 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers Hi, On Wed, Feb 11, 2026 at 12:03:51PM +0200, Heikki Linnakangas wrote: > On 11/02/2026 06:40, Bertrand Drouvot wrote: > > A few comments: > > > > 0001: > > > > + * and (b) to make the multiplication / division to convert between PGPROC * > > + * and ProcNumber be a little cheaper > > > > Is that correct if PGPROC size is not a power of 2? > > You're right, it's not. Looking more closely at: " /* GCC supports aligned and packed */ #if defined(__GNUC__) #define pg_attribute_aligned(a) __attribute__((aligned(a))) #define pg_attribute_packed() __attribute__((packed)) #elif defined(_MSC_VER) /* * MSVC supports aligned. * * Packing is also possible but only by wrapping the entire struct definition * which doesn't fit into our current macro declarations. */ #define pg_attribute_aligned(a) __declspec(align(a)) #else /* * NB: aligned and packed are not given default definitions because they * affect code functionality; they *must* be implemented by the compiler * if they are to be used. */ #endif " and what the patch adds: +/* + * If compiler understands aligned pragma, use it to align the struct at cache + * line boundaries. This is just for performance, to (a) avoid false sharing + * and (b) to make the multiplication / division to convert between PGPROC * + * and ProcNumber be a little cheaper. + */ +#if defined(pg_attribute_aligned) + pg_attribute_aligned(PG_CACHE_LINE_SIZE) +#endif +PGPROC; It means that PGPROC is "acceptable" without padding (on compiler that does not understand the aligned attribute). OTOH, looking at: " typedef union WALInsertLockPadded { WALInsertLock l; char pad[PG_CACHE_LINE_SIZE]; } WALInsertLockPadded; " It seems to mean that WALInsertLockPadded is unacceptable without padding (since it's not using pg_attribute_aligned()). That looks ok to see PGPROC as an "acceptable" one, if not, should we use the union trick? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-20 21:01 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 0 replies; 14+ messages in thread From: Heikki Linnakangas @ 2026-02-20 21:01 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers On 11/02/2026 06:40, Bertrand Drouvot wrote: > On Tue, Feb 10, 2026 at 10:53:58PM +0200, Heikki Linnakangas wrote: > 0003: > > 1/ There is one missing change in PrintLockQueue() ("links" is still used, and > that should be replaced by "waitLink"). > > 2/ change the comment on top of ProcWakeup? > > " > /* > * ProcWakeup -- wake up a process by setting its latch. > * > * Also remove the process from the wait queue and set its links invalid. > " > > s/links/waitLink/? Fixed those and pushed this "Split PGPROC 'links' field into two, for clarity" patch. Thanks for the review! - Heikki ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-20 21:03 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 1 reply; 14+ messages in thread From: Heikki Linnakangas @ 2026-02-20 21:03 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers On 11/02/2026 06:40, Bertrand Drouvot wrote: > 0004: > > The grouping looks Ok to me. Just one nit for the added comments: > > + /*---- Backend identity ----*/ > + /*---- Transactions and snapshots ----*/ > + /*---- Inter-process signaling ----*/ > + /*---- LWLock waiting ----*/ > + /*---- Lock manager data ----*/ > + /*---- Synchronous replication waiting ----*/ > + /*---- Support for group XID clearing. ----*/ > + /*---- Support for group transaction status update. ----*/ > + /*---- Status reporting ----*/ > > Some have period and some don't. Fixed that, and changed to different style for the delimiter comments. It's a matter of taste, but I think the new style is closer to what we use elsewhere. On 13/02/2026 10:03, Bertrand Drouvot wrote: > and what the patch adds: > > +/* > + * If compiler understands aligned pragma, use it to align the struct at cache > + * line boundaries. This is just for performance, to (a) avoid false sharing > + * and (b) to make the multiplication / division to convert between PGPROC * > + * and ProcNumber be a little cheaper. > + */ > +#if defined(pg_attribute_aligned) > + pg_attribute_aligned(PG_CACHE_LINE_SIZE) > +#endif > +PGPROC; > > It means that PGPROC is "acceptable" without padding (on compiler that does not > understand the aligned attribute). > > OTOH, looking at: > > " > typedef union WALInsertLockPadded > { > WALInsertLock l; > char pad[PG_CACHE_LINE_SIZE]; > } WALInsertLockPadded; > " > > It seems to mean that WALInsertLockPadded is unacceptable without padding (since > it's not using pg_attribute_aligned()). > > That looks ok to see PGPROC as an "acceptable" one, if not, should we use the > union trick? It seems acceptable to just not align it if the compiler doesn't support it. This is just a performance optimization, after all. Attached is new versions the remaining patches. I think these are ready to be committed. I don't have the hardware and test case that would be sensitive enough for these changes in memory layout, so I haven't done any performance testing of this. I'd guess this no worse than the old layout. Grouping fields together which are used together is usually a good strategy. I don't feel obliged to do performance testing of this, given that no one did that for the old layout either, so if it happened to be really good from a performance point of view, it was purely accidental. But if anyone has the means and interest to do that, that'd be much appreciated of course. - Heikki Attachments: [text/x-patch] v2-0001-Rearrange-fields-in-PGPROC-for-clarity.patch (9.7K, 2-v2-0001-Rearrange-fields-in-PGPROC-for-clarity.patch) download | inline diff: From cc75db39184b6560ded5f97b5202c14187e9f801 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Fri, 20 Feb 2026 22:30:01 +0200 Subject: [PATCH v2 1/2] Rearrange fields in PGPROC, for clarity The ordering was pretty random, making it hard to get an overview of what's in it. Group related fields together, and add comments to act as separators between the groups. Reviewed-by: Bertrand Drouvot <[email protected]> Discussion: https://www.postgresql.org/message-id/[email protected] --- src/include/storage/proc.h | 147 +++++++++++++++++++++++-------------- 1 file changed, 92 insertions(+), 55 deletions(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index e165b414241..b2fd4d02959 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -178,33 +178,41 @@ typedef enum * * See PROC_HDR for details. */ -struct PGPROC +typedef struct PGPROC { dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ dlist_node freeProcsLink; /* link in procgloballist, when in recycled * state */ - PGSemaphore sem; /* ONE semaphore to sleep on */ - ProcWaitStatus waitStatus; - - Latch procLatch; /* generic latch for process */ + /************************************************************************ + * Backend identity + ************************************************************************/ + /* + * These fields that don't change after backend startup, or only very + * rarely + */ + int pid; /* Backend's process ID; 0 if prepared xact */ + BackendType backendType; /* what kind of process is this? */ - TransactionId xid; /* id of top-level transaction currently being - * executed by this proc, if running and XID - * is assigned; else InvalidTransactionId. - * mirrored in ProcGlobal->xids[pgxactoff] */ - - TransactionId xmin; /* minimal running XID as it was when we were - * starting our xact, excluding LAZY VACUUM: - * vacuum must not remove tuples deleted by - * xid >= xmin ! */ + /* These fields are zero while a backend is still starting up: */ + Oid databaseId; /* OID of database this backend is using */ + Oid roleId; /* OID of role using this backend */ - int pid; /* Backend's process ID; 0 if prepared xact */ + Oid tempNamespaceId; /* OID of temp schema this backend is + * using */ int pgxactoff; /* offset into various ProcGlobal->arrays with * data mirrored from this PGPROC */ + uint8 statusFlags; /* this backend's status flags, see PROC_* + * above. mirrored in + * ProcGlobal->statusFlags[pgxactoff] */ + + /************************************************************************ + * Transactions and snapshots + ************************************************************************/ + /* * Currently running top-level transaction's virtual xid. Together these * form a VirtualTransactionId, but we don't use that struct because this @@ -224,14 +232,30 @@ struct PGPROC * InvalidLocalTransactionId */ } vxid; - /* These fields are zero while a backend is still starting up: */ - Oid databaseId; /* OID of database this backend is using */ - Oid roleId; /* OID of role using this backend */ + TransactionId xid; /* id of top-level transaction currently being + * executed by this proc, if running and XID + * is assigned; else InvalidTransactionId. + * mirrored in ProcGlobal->xids[pgxactoff] */ - Oid tempNamespaceId; /* OID of temp schema this backend is - * using */ + TransactionId xmin; /* minimal running XID as it was when we were + * starting our xact, excluding LAZY VACUUM: + * vacuum must not remove tuples deleted by + * xid >= xmin ! */ - BackendType backendType; /* what kind of process is this? */ + XidCacheStatus subxidStatus; /* mirrored with + * ProcGlobal->subxidStates[i] */ + struct XidCache subxids; /* cache for subtransaction XIDs */ + + + /************************************************************************ + * Inter-process signaling + ************************************************************************/ + + Latch procLatch; /* generic latch for process */ + + PGSemaphore sem; /* ONE semaphore to sleep on */ + + int delayChkptFlags; /* for DELAY_CHKPT_* flags */ /* * While in hot standby mode, shows that a conflict signal has been sent @@ -243,6 +267,10 @@ struct PGPROC */ pg_atomic_uint32 pendingRecoveryConflicts; + /************************************************************************ + * LWLock waiting + ************************************************************************/ + /* * Info about LWLock the process is currently waiting for, if any. * @@ -257,6 +285,18 @@ struct PGPROC /* Support for condition variables. */ proclist_node cvWaitLink; /* position in CV wait list */ + /************************************************************************ + * Lock manager data + ************************************************************************/ + + /* + * Support for lock groups. Use LockHashPartitionLockByProc on the group + * leader to get the LWLock protecting these fields. + */ + PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ + dlist_head lockGroupMembers; /* list of members, if I'm a leader */ + dlist_node lockGroupLink; /* my member link, if I'm a member */ + /* Info about lock the process is currently waiting for, if any. */ /* waitLock and waitProcLock are NULL if not currently waiting. */ LOCK *waitLock; /* Lock object we're sleeping on ... */ @@ -265,14 +305,30 @@ struct PGPROC LOCKMODE waitLockMode; /* type of lock we're waiting for */ LOCKMASK heldLocks; /* bitmask for lock types already held on this * lock object by this backend */ + pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition * started */ - int delayChkptFlags; /* for DELAY_CHKPT_* flags */ + ProcWaitStatus waitStatus; - uint8 statusFlags; /* this backend's status flags, see PROC_* - * above. mirrored in - * ProcGlobal->statusFlags[pgxactoff] */ + /* + * All PROCLOCK objects for locks held or awaited by this backend are + * linked into one of these lists, according to the partition number of + * their lock. + */ + dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; + + /*-- recording fast-path locks taken by this backend. --*/ + LWLock fpInfoLock; /* protects per-backend fast-path state */ + uint64 *fpLockBits; /* lock modes held for each fast-path slot */ + Oid *fpRelId; /* slots for rel oids */ + bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ + LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID + * lock */ + + /************************************************************************ + * Synchronous replication waiting + ************************************************************************/ /* * Info to allow us to wait for synchronous replication, if needed. @@ -284,18 +340,10 @@ struct PGPROC int syncRepState; /* wait state for sync rep */ dlist_node syncRepLinks; /* list link if process is in syncrep queue */ - /* - * All PROCLOCK objects for locks held or awaited by this backend are - * linked into one of these lists, according to the partition number of - * their lock. - */ - dlist_head myProcLocks[NUM_LOCK_PARTITIONS]; - - XidCacheStatus subxidStatus; /* mirrored with - * ProcGlobal->subxidStates[i] */ - struct XidCache subxids; /* cache for subtransaction XIDs */ + /************************************************************************ + * Support for group XID clearing + ************************************************************************/ - /* Support for group XID clearing. */ /* true, if member of ProcArray group waiting for XID clear */ bool procArrayGroupMember; /* next ProcArray group member waiting for XID clear */ @@ -307,9 +355,10 @@ struct PGPROC */ TransactionId procArrayGroupMemberXid; - uint32 wait_event_info; /* proc's wait information */ + /************************************************************************ + * Support for group transaction status update + ************************************************************************/ - /* Support for group transaction status update. */ bool clogGroupMember; /* true, if member of clog group */ pg_atomic_uint32 clogGroupNext; /* next clog group member */ TransactionId clogGroupMemberXid; /* transaction id of clog group member */ @@ -320,24 +369,12 @@ struct PGPROC XLogRecPtr clogGroupMemberLsn; /* WAL location of commit record for clog * group member */ - /* Lock manager data, recording fast-path locks taken by this backend. */ - LWLock fpInfoLock; /* protects per-backend fast-path state */ - uint64 *fpLockBits; /* lock modes held for each fast-path slot */ - Oid *fpRelId; /* slots for rel oids */ - bool fpVXIDLock; /* are we holding a fast-path VXID lock? */ - LocalTransactionId fpLocalTransactionId; /* lxid for fast-path VXID - * lock */ + /************************************************************************ + * Status reporting + ************************************************************************/ - /* - * Support for lock groups. Use LockHashPartitionLockByProc on the group - * leader to get the LWLock protecting these fields. - */ - PGPROC *lockGroupLeader; /* lock group leader, if I'm a member */ - dlist_head lockGroupMembers; /* list of members, if I'm a leader */ - dlist_node lockGroupLink; /* my member link, if I'm a member */ -}; - -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ + uint32 wait_event_info; /* proc's wait information */ +} PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.47.3 [text/x-patch] v2-0002-Align-PGPROC-to-cache-line-boundary.patch (1.5K, 3-v2-0002-Align-PGPROC-to-cache-line-boundary.patch) download | inline diff: From ce1218929f2e14106c0c1160bfa941ffe0b6b745 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <[email protected]> Date: Fri, 20 Feb 2026 22:58:40 +0200 Subject: [PATCH v2 2/2] Align PGPROC to cache line boundary On common architectures, the PGPROC happened to be a multiple of 64 bytes on PG 18, but it's changed on 'master' since. There was worry that changing the alignment might hurt performance, due to false cacheline sharing across PGPROC elements. However, there was no explicit alignment, so any alignment to cache lines was accidental. Add explicit alignment to remove worry about false sharing. Reviewed-by: Bertrand Drouvot <[email protected]> Discussion: https://www.postgresql.org/message-id/[email protected] --- src/include/storage/proc.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index b2fd4d02959..a8d2e7db1a1 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -374,8 +374,16 @@ typedef struct PGPROC ************************************************************************/ uint32 wait_event_info; /* proc's wait information */ -} PGPROC; +} +/* + * If compiler understands aligned pragma, use it to align the struct at cache + * line boundaries. This is just for performance, to avoid false sharing. + */ +#if defined(pg_attribute_aligned) + pg_attribute_aligned(PG_CACHE_LINE_SIZE) +#endif +PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.47.3 ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-21 09:42 Bertrand Drouvot <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 1 reply; 14+ messages in thread From: Bertrand Drouvot @ 2026-02-21 09:42 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers Hi, On Fri, Feb 20, 2026 at 11:03:09PM +0200, Heikki Linnakangas wrote: > On 11/02/2026 06:40, Bertrand Drouvot wrote: > > That looks ok to see PGPROC as an "acceptable" one, if not, should we use the > > union trick? > > It seems acceptable to just not align it if the compiler doesn't support it. > This is just a performance optimization, after all. Agreed. > Attached is new versions the remaining patches. I think these are ready to > be committed. Thanks! One nit, 0001 is adding the typedef: " -struct PGPROC +typedef struct PGPROC . . . -}; - -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ + uint32 wait_event_info; /* proc's wait information */ +} PGPROC; " Would that make more sense to add the typedef when we introduce the explicit alignment in 0002 (like it was done in your previous v2-0001-Align-PGPROC-to-cache-line-boundary.patch up-thread)? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-22 11:15 Heikki Linnakangas <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 0 replies; 14+ messages in thread From: Heikki Linnakangas @ 2026-02-22 11:15 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers On 21/02/2026 11:42, Bertrand Drouvot wrote: > On Fri, Feb 20, 2026 at 11:03:09PM +0200, Heikki Linnakangas wrote: >> On 11/02/2026 06:40, Bertrand Drouvot wrote: >>> That looks ok to see PGPROC as an "acceptable" one, if not, should we use the >>> union trick? >> >> It seems acceptable to just not align it if the compiler doesn't support it. >> This is just a performance optimization, after all. > > Agreed. > >> Attached is new versions the remaining patches. I think these are ready to >> be committed. > > Thanks! > > One nit, 0001 is adding the typedef: > > " > -struct PGPROC > +typedef struct PGPROC > . > . > . > -}; > - > -/* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */ > + uint32 wait_event_info; /* proc's wait information */ > +} PGPROC; > " > > Would that make more sense to add the typedef when we introduce the explicit > alignment in 0002 (like it was done in your previous > v2-0001-Align-PGPROC-to-cache-line-boundary.patch up-thread)? Yeah, I had it as part of the other commit in the previous version, but decided to make it part of the other one, so that it's more clear what the alignment. I don't think it matters much either way though. Pushed, thanks for the review! - Heikki ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-23 16:13 Peter Eisentraut <[email protected]> parent: Bertrand Drouvot <[email protected]> 1 sibling, 1 reply; 14+ messages in thread From: Peter Eisentraut @ 2026-02-23 16:13 UTC (permalink / raw) To: Bertrand Drouvot <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers On 13.02.26 09:03, Bertrand Drouvot wrote: > +/* > + * If compiler understands aligned pragma, use it to align the struct at cache > + * line boundaries. This is just for performance, to (a) avoid false sharing > + * and (b) to make the multiplication / division to convert between PGPROC * > + * and ProcNumber be a little cheaper. > + */ > +#if defined(pg_attribute_aligned) > + pg_attribute_aligned(PG_CACHE_LINE_SIZE) > +#endif > +PGPROC; You can/should use C11 standard alignas(), so you don't need to worry about whether it's supported or not. ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-02-24 11:28 Bertrand Drouvot <[email protected]> parent: Peter Eisentraut <[email protected]> 0 siblings, 1 reply; 14+ messages in thread From: Bertrand Drouvot @ 2026-02-24 11:28 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Andres Freund <[email protected]>; pgsql-hackers Hi, On Mon, Feb 23, 2026 at 05:13:29PM +0100, Peter Eisentraut wrote: > On 13.02.26 09:03, Bertrand Drouvot wrote: > > +/* > > + * If compiler understands aligned pragma, use it to align the struct at cache > > + * line boundaries. This is just for performance, to (a) avoid false sharing > > + * and (b) to make the multiplication / division to convert between PGPROC * > > + * and ProcNumber be a little cheaper. > > + */ > > +#if defined(pg_attribute_aligned) > > + pg_attribute_aligned(PG_CACHE_LINE_SIZE) > > +#endif > > +PGPROC; > > You can/should use C11 standard alignas(), so you don't need to worry about > whether it's supported or not. Oh right, I did not notice 300c8f53247 and following like e7075a3405c, d4c0f91f7d5 and 97e04c74bed. PFA, 0001 doing so for PGPROC and PgAioUringContext. As those are typedef, the patch puts alignas within the struct. For PGPROC at the start of the struct, I think that placing it on the first member is the right location because it ensures the whole struct is aligned to PG_CACHE_LINE_SIZE without adding padding before this member. For example if I set it on backendType, then it adds 100 bytes of padding and the struct is obviously still a multiple of PG_CACHE_LINE_SIZE but is now 1024 bytes (instead of 896). For PgAioUringContext at completion_lock (like suggested by Andres in [1]), which is also the start of the struct. I checked and the padding for those are exactly the same after the changes. 0002, is also making use of alignas in ItemPointerData, but this one is more tricky so I'm not sure that's worth it (given the fact that we still need to keep pg_attribute_aligned() as explained by Peter in [2]). [1]: https://postgr.es/m/lsgnps74ictxtm5karcobenxtt5rvoylvbvx7ja6r5rjcund7v%40owxqgv7gisns [2]: https://postgr.es/m/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com Attachments: [text/x-diff] v1-0001-Use-C11-alignas-in-typedef-definitions.patch (2.8K, 2-v1-0001-Use-C11-alignas-in-typedef-definitions.patch) download | inline diff: From 3152fd1e4934378823d5e535f876cf1333cb086e Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <[email protected]> Date: Tue, 24 Feb 2026 09:21:19 +0000 Subject: [PATCH v1 1/2] Use C11 alignas in typedef definitions They were already using pg_attribute_aligned. This replaces that with alignas and moves that into the required syntactic position. Suggested-by: Peter Eisentraut <[email protected]> Author: Bertrand Drouvot <[email protected]> Discussion: https://postgr.es/m/d7a788fa-e609-4894-a8be-2f70e135424f%40eisentraut.org --- src/backend/storage/aio/method_io_uring.c | 16 ++++++++-------- src/include/storage/proc.h | 13 +++++-------- 2 files changed, 13 insertions(+), 16 deletions(-) 58.1% src/backend/storage/aio/ 41.8% src/include/storage/ diff --git a/src/backend/storage/aio/method_io_uring.c b/src/backend/storage/aio/method_io_uring.c index ed6e71bcd46..50ea9f8e51c 100644 --- a/src/backend/storage/aio/method_io_uring.c +++ b/src/backend/storage/aio/method_io_uring.c @@ -77,15 +77,15 @@ const IoMethodOps pgaio_uring_ops = { .wait_one = pgaio_uring_wait_one, }; -/* - * Per-backend state when using io_method=io_uring - * - * Align the whole struct to a cacheline boundary, to prevent false sharing - * between completion_lock and prior backend's io_uring_ring. - */ -typedef struct pg_attribute_aligned (PG_CACHE_LINE_SIZE) -PgAioUringContext +/* Per-backend state when using io_method=io_uring */ +typedef struct PgAioUringContext { + /* + * Align the whole struct to a cacheline boundary, to prevent false + * sharing between completion_lock and prior backend's io_uring_ring. + */ + alignas(PG_CACHE_LINE_SIZE) + /* * Multiple backends can process completions for this backend's io_uring * instance (e.g. when the backend issuing IO is busy doing something diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index a8d2e7db1a1..fe798d350f1 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -180,6 +180,11 @@ typedef enum */ typedef struct PGPROC { + /* + * Align the struct at cache line boundaries. This is just for + * performance, to avoid false sharing. + */ + alignas(PG_CACHE_LINE_SIZE) dlist_head *procgloballist; /* procglobal list that owns this PGPROC */ dlist_node freeProcsLink; /* link in procgloballist, when in recycled * state */ @@ -375,14 +380,6 @@ typedef struct PGPROC uint32 wait_event_info; /* proc's wait information */ } - -/* - * If compiler understands aligned pragma, use it to align the struct at cache - * line boundaries. This is just for performance, to avoid false sharing. - */ -#if defined(pg_attribute_aligned) - pg_attribute_aligned(PG_CACHE_LINE_SIZE) -#endif PGPROC; extern PGDLLIMPORT PGPROC *MyProc; -- 2.34.1 [text/x-diff] v1-0002-Use-C11-alignas-in-more-tricky-typedef-definition.patch (1.4K, 3-v1-0002-Use-C11-alignas-in-more-tricky-typedef-definition.patch) download | inline diff: From e908a2c0e2b0a9d8a6915f60439f75adfe885b66 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <[email protected]> Date: Tue, 24 Feb 2026 09:55:36 +0000 Subject: [PATCH v1 2/2] Use C11 alignas in more tricky typedef definition Same as commit "xxx" but the alignas has to be guarded because it makes sense if used together with pg_attribute_packed(). Suggested-by: Peter Eisentraut <[email protected]> Author: Bertrand Drouvot <[email protected]> Discussion: https://postgr.es/m/d7a788fa-e609-4894-a8be-2f70e135424f%40eisentraut.org --- src/include/storage/itemptr.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) 100.0% src/include/storage/ diff --git a/src/include/storage/itemptr.h b/src/include/storage/itemptr.h index 8b3392dbefe..a70c543dac3 100644 --- a/src/include/storage/itemptr.h +++ b/src/include/storage/itemptr.h @@ -35,14 +35,17 @@ */ typedef struct ItemPointerData { +/* If compiler understands packed pragma, use alignas with it */ +#if defined(pg_attribute_packed) + alignas(2) +#endif BlockIdData ip_blkid; OffsetNumber ip_posid; } -/* If compiler understands packed and aligned pragmas, use those */ -#if defined(pg_attribute_packed) && defined(pg_attribute_aligned) +/* If compiler understands packed pragma, use it with alignas */ +#if defined(pg_attribute_packed) pg_attribute_packed() - pg_attribute_aligned(2) #endif ItemPointerData; -- 2.34.1 ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-03-16 11:16 Bertrand Drouvot <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 1 reply; 14+ messages in thread From: Bertrand Drouvot @ 2026-03-16 11:16 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Andres Freund <[email protected]>; pgsql-hackers Hi, On Mon, Mar 16, 2026 at 11:42:35AM +0100, Peter Eisentraut wrote: > On 24.02.26 12:28, Bertrand Drouvot wrote: > > > You can/should use C11 standard alignas(), so you don't need to worry about > > > whether it's supported or not. > > > > Oh right, I did not notice 300c8f53247 and following like e7075a3405c, d4c0f91f7d5 > > and 97e04c74bed. > > > > PFA, 0001 doing so for PGPROC and PgAioUringContext. As those are typedef, > > the patch puts alignas within the struct. > > > > For PGPROC at the start of the struct, I think that placing it on the first member > > is the right location because it ensures the whole struct is aligned to PG_CACHE_LINE_SIZE > > without adding padding before this member. For example if I set it on backendType, > > then it adds 100 bytes of padding and the struct is obviously still a multiple of > > PG_CACHE_LINE_SIZE but is now 1024 bytes (instead of 896). > > > > For PgAioUringContext at completion_lock (like suggested by Andres in [1]), which > > is also the start of the struct. > > > > I checked and the padding for those are exactly the same after the changes. > > I have committed the 0001 patch. Thanks! I don't know why but I don't see it in https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary, or in the github repo though. > > 0002, is also making use of alignas in ItemPointerData, but this one is more > > tricky so I'm not sure that's worth it (given the fact that we still need to > > keep pg_attribute_aligned() as explained by Peter in [2]). > > This doesn't seem worth it to me. Let's skip it. Yeah, that makes sense. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 14+ messages in thread
* Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) @ 2026-03-16 15:52 Bertrand Drouvot <[email protected]> parent: Bertrand Drouvot <[email protected]> 0 siblings, 0 replies; 14+ messages in thread From: Bertrand Drouvot @ 2026-03-16 15:52 UTC (permalink / raw) To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Andres Freund <[email protected]>; pgsql-hackers Hi, On Mon, Mar 16, 2026 at 11:16:51AM +0000, Bertrand Drouvot wrote: > Hi, > > On Mon, Mar 16, 2026 at 11:42:35AM +0100, Peter Eisentraut wrote: > > On 24.02.26 12:28, Bertrand Drouvot wrote: > > > > You can/should use C11 standard alignas(), so you don't need to worry about > > > > whether it's supported or not. > > > > > > Oh right, I did not notice 300c8f53247 and following like e7075a3405c, d4c0f91f7d5 > > > and 97e04c74bed. > > > > > > PFA, 0001 doing so for PGPROC and PgAioUringContext. As those are typedef, > > > the patch puts alignas within the struct. > > > > > > For PGPROC at the start of the struct, I think that placing it on the first member > > > is the right location because it ensures the whole struct is aligned to PG_CACHE_LINE_SIZE > > > without adding padding before this member. For example if I set it on backendType, > > > then it adds 100 bytes of padding and the struct is obviously still a multiple of > > > PG_CACHE_LINE_SIZE but is now 1024 bytes (instead of 896). > > > > > > For PgAioUringContext at completion_lock (like suggested by Andres in [1]), which > > > is also the start of the struct. > > > > > > I checked and the padding for those are exactly the same after the changes. > > > > I have committed the 0001 patch. > > Thanks! I don't know why but I don't see it in https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary, > or in the github repo though. FWIW, I reached out to sysadmins@ and it has now been fixed. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 14+ messages in thread
end of thread, other threads:[~2026-03-16 15:52 UTC | newest] Thread overview: 14+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-02-10 19:46 Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals) Andres Freund <[email protected]> 2026-02-10 20:53 ` Heikki Linnakangas <[email protected]> 2026-02-11 04:40 ` Bertrand Drouvot <[email protected]> 2026-02-11 10:03 ` Heikki Linnakangas <[email protected]> 2026-02-11 10:58 ` Bertrand Drouvot <[email protected]> 2026-02-13 08:03 ` Bertrand Drouvot <[email protected]> 2026-02-20 21:03 ` Heikki Linnakangas <[email protected]> 2026-02-21 09:42 ` Bertrand Drouvot <[email protected]> 2026-02-22 11:15 ` Heikki Linnakangas <[email protected]> 2026-02-23 16:13 ` Peter Eisentraut <[email protected]> 2026-02-24 11:28 ` Bertrand Drouvot <[email protected]> 2026-03-16 11:16 ` Bertrand Drouvot <[email protected]> 2026-03-16 15:52 ` Bertrand Drouvot <[email protected]> 2026-02-20 21:01 ` Heikki Linnakangas <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox