public inbox for [email protected]help / color / mirror / Atom feed
pgsql: Improve performance of subsystems on top of SLRU 6+ messages / 3 participants [nested] [flat]
* pgsql: Improve performance of subsystems on top of SLRU @ 2024-02-28 16:07 Alvaro Herrera <[email protected]> 2024-03-03 14:29 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Alvaro Herrera @ 2024-02-28 16:07 UTC (permalink / raw) To: [email protected] Improve performance of subsystems on top of SLRU More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by bcdfa5f2e2f2 in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <[email protected]> Author: Dilip Kumar <[email protected]> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/53c2a97a92665be6bd7d70bd62ae6158fe4db96e Modified Files -------------- doc/src/sgml/config.sgml | 139 +++++++++ doc/src/sgml/monitoring.sgml | 9 +- src/backend/access/transam/clog.c | 243 +++++++++++----- src/backend/access/transam/commit_ts.c | 88 ++++-- src/backend/access/transam/multixact.c | 190 +++++++++---- src/backend/access/transam/slru.c | 357 +++++++++++++++++------- src/backend/access/transam/subtrans.c | 110 ++++++-- src/backend/commands/async.c | 61 ++-- src/backend/storage/lmgr/lwlock.c | 9 +- src/backend/storage/lmgr/lwlocknames.txt | 14 +- src/backend/storage/lmgr/predicate.c | 34 ++- src/backend/utils/activity/wait_event_names.txt | 15 +- src/backend/utils/init/globals.c | 9 + src/backend/utils/misc/guc_tables.c | 78 ++++++ src/backend/utils/misc/postgresql.conf.sample | 9 + src/include/access/clog.h | 1 - src/include/access/commit_ts.h | 1 - src/include/access/multixact.h | 4 - src/include/access/slru.h | 86 ++++-- src/include/access/subtrans.h | 3 - src/include/commands/async.h | 5 - src/include/miscadmin.h | 8 + src/include/storage/lwlock.h | 7 + src/include/storage/predicate.h | 4 - src/include/utils/guc_hooks.h | 11 + src/test/modules/test_slru/test_slru.c | 35 +-- 26 files changed, 1177 insertions(+), 353 deletions(-) ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: pgsql: Improve performance of subsystems on top of SLRU 2024-02-28 16:07 pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> @ 2024-03-03 14:29 ` Alvaro Herrera <[email protected]> 2024-03-03 21:14 ` Re: pgsql: Improve performance of subsystems on top of SLRU Tom Lane <[email protected]> 2024-03-04 06:14 ` Re: pgsql: Improve performance of subsystems on top of SLRU Dilip Kumar <[email protected]> 0 siblings, 2 replies; 6+ messages in thread From: Alvaro Herrera @ 2024-03-03 14:29 UTC (permalink / raw) To: [email protected]; +Cc: Dilip Kumar <[email protected]> On 2024-Feb-28, Alvaro Herrera wrote: > Improve performance of subsystems on top of SLRU Coverity had the following complaint about this commit: ________________________________________________________________________________________________________ *** CID NNNNNNN: Control flow issues (DEADCODE) /srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers() 1369 * and acquire the lock of the new bank. 1370 */ 1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); 1372 if (lock != prevlock) 1373 { 1374 if (prevlock != NULL) >>> CID 1592913: Control flow issues (DEADCODE) >>> Execution cannot reach this statement: "LWLockRelease(prevlock);". 1375 LWLockRelease(prevlock); 1376 LWLockAcquire(lock, LW_EXCLUSIVE); 1377 prevlock = lock; 1378 } 1379 1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi); And I think it's correct that this is somewhat bogus, or at least confusing: the only way to have control back here on line 1371 after having executed once is via the "goto retry" line below; and there we release "prevlock" and set it to NULL beforehand, so it's impossible for prevlock to be NULL. Looking closer I think this code is all confused, so I suggest to rework it as shown in the attached patch. I'll have a look at the other places where we use this "prevlock" coding pattern tomorrow. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: pgsql: Improve performance of subsystems on top of SLRU 2024-02-28 16:07 pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 2024-03-03 14:29 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> @ 2024-03-03 21:14 ` Tom Lane <[email protected]> 2024-03-04 15:17 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 1 sibling, 1 reply; 6+ messages in thread From: Tom Lane @ 2024-03-03 21:14 UTC (permalink / raw) To: Alvaro Herrera <[email protected]>; +Cc: [email protected]; Dilip Kumar <[email protected]> Alvaro Herrera <[email protected]> writes: > And I think it's correct that this is somewhat bogus, or at least > confusing: the only way to have control back here on line 1371 after > having executed once is via the "goto retry" line below; and there we > release "prevlock" and set it to NULL beforehand, so it's impossible for > prevlock to be NULL. Looking closer I think this code is all confused, > so I suggest to rework it as shown in the attached patch. This is certainly simpler, but I notice that it holds the current LWLock across the line ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember)); where the old code did not. Could the palloc take long enough that holding the lock is bad? Also, with this coding the "lock = NULL;" assignment just before "goto retry" is a dead store. Not sure if Coverity or other static analyzers would whine about that. regards, tom lane ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: pgsql: Improve performance of subsystems on top of SLRU 2024-02-28 16:07 pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 2024-03-03 14:29 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 2024-03-03 21:14 ` Re: pgsql: Improve performance of subsystems on top of SLRU Tom Lane <[email protected]> @ 2024-03-04 15:17 ` Alvaro Herrera <[email protected]> 2024-03-04 17:36 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Alvaro Herrera @ 2024-03-04 15:17 UTC (permalink / raw) To: Tom Lane <[email protected]>; Dilip Kumar <[email protected]>; +Cc: [email protected] On 2024-Mar-03, Tom Lane wrote: > This is certainly simpler, but I notice that it holds the current > LWLock across the line > > ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember)); > > where the old code did not. Could the palloc take long enough that > holding the lock is bad? Hmm, I guess most of the time it shouldn't be much of a problem (if the length is small so we can palloc without malloc'ing); but it could be in the worst case. But the fix is simple: just release the lock before. There's no correctness argument for holding it all the way down. I was just confused about how the original code worked. > Also, with this coding the "lock = NULL;" assignment just before > "goto retry" is a dead store. Not sure if Coverity or other static > analyzers would whine about that. Oh, right. I removed that. On 2024-Mar-04, Dilip Kumar wrote: > I am not sure about the other changes, I mean that makes the code much > simpler but now we are not releasing the 'MultiXactOffsetCtl' related > bank lock, and later in the following loop, we are comparing that lock > against 'MultiXactMemberCtl' related bank lock. This makes code > simpler because now in the loop we are sure that we are always holding > the lock but I do not like comparing the bank locks for 2 different > SLRUs, although there is no problem as there would not be a common > lock address, True. This can be addressed in the same way Tom's first comment is: just release the lock before entering the second loop, and setting lock to NULL. This brings the code to a similar state than before, except that the additional LWLock * variables are in a tighter scope. That's in 0001. Now, I had a look at the other users of slru.c and noticed in subtrans.c that StartupSUBTRANS we have some duplicate code that I think could be rewritten by making the "while" block test the condition at the end instead of at the start; changed that in 0002. I'll leave this one for later, because I want to add some test code for it -- right now it's pretty much test-uncovered code. I also looked at slru.c for uses of shared->bank_locks and noticed a couple that could be made simpler. That's 0003 here. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations) ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: pgsql: Improve performance of subsystems on top of SLRU 2024-02-28 16:07 pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 2024-03-03 14:29 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 2024-03-03 21:14 ` Re: pgsql: Improve performance of subsystems on top of SLRU Tom Lane <[email protected]> 2024-03-04 15:17 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> @ 2024-03-04 17:36 ` Alvaro Herrera <[email protected]> 0 siblings, 0 replies; 6+ messages in thread From: Alvaro Herrera @ 2024-03-04 17:36 UTC (permalink / raw) To: Tom Lane <[email protected]>; Dilip Kumar <[email protected]>; +Cc: [email protected] FWIW there's a stupid bug in 0002, which is fixed here. I'm writing a simple test for it. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell." ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: pgsql: Improve performance of subsystems on top of SLRU 2024-02-28 16:07 pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 2024-03-03 14:29 ` Re: pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> @ 2024-03-04 06:14 ` Dilip Kumar <[email protected]> 1 sibling, 0 replies; 6+ messages in thread From: Dilip Kumar @ 2024-03-04 06:14 UTC (permalink / raw) To: Alvaro Herrera <[email protected]>; +Cc: [email protected] On Mon, Mar 4, 2024 at 1:56 AM Alvaro Herrera <[email protected]> wrote: > > On 2024-Feb-28, Alvaro Herrera wrote: > > > Improve performance of subsystems on top of SLRU > > Coverity had the following complaint about this commit: > > ________________________________________________________________________________________________________ > *** CID NNNNNNN: Control flow issues (DEADCODE) > /srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers() > 1369 * and acquire the lock of the new bank. > 1370 */ > 1371 lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); > 1372 if (lock != prevlock) > 1373 { > 1374 if (prevlock != NULL) > >>> CID 1592913: Control flow issues (DEADCODE) > >>> Execution cannot reach this statement: "LWLockRelease(prevlock);". > 1375 LWLockRelease(prevlock); > 1376 LWLockAcquire(lock, LW_EXCLUSIVE); > 1377 prevlock = lock; > 1378 } > 1379 > 1380 slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi); > > And I think it's correct that this is somewhat bogus, or at least > confusing: the only way to have control back here on line 1371 after > having executed once is via the "goto retry" line below; and there we > release "prevlock" and set it to NULL beforehand, so it's impossible for > prevlock to be NULL. Looking closer I think this code is all confused, > so I suggest to rework it as shown in the attached patch. > > I'll have a look at the other places where we use this "prevlock" coding > pattern tomorrow. + /* Acquire the bank lock for the page we need. */ lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); - if (lock != prevlock) - { - if (prevlock != NULL) - LWLockRelease(prevlock); - LWLockAcquire(lock, LW_EXCLUSIVE); - prevlock = lock; - } + LWLockAcquire(lock, LW_EXCLUSIVE); This part is definitely an improvement. I am not sure about the other changes, I mean that makes the code much simpler but now we are not releasing the 'MultiXactOffsetCtl' related bank lock, and later in the following loop, we are comparing that lock against 'MultiXactMemberCtl' related bank lock. This makes code simpler because now in the loop we are sure that we are always holding the lock but I do not like comparing the bank locks for 2 different SLRUs, although there is no problem as there would not be a common lock address, anyway, I do not have any strong objection to what you have done here. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 6+ messages in thread
end of thread, other threads:[~2024-03-04 17:36 UTC | newest] Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-02-28 16:07 pgsql: Improve performance of subsystems on top of SLRU Alvaro Herrera <[email protected]> 2024-03-03 14:29 ` Alvaro Herrera <[email protected]> 2024-03-03 21:14 ` Tom Lane <[email protected]> 2024-03-04 15:17 ` Alvaro Herrera <[email protected]> 2024-03-04 17:36 ` Alvaro Herrera <[email protected]> 2024-03-04 06:14 ` Dilip Kumar <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox