Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wHLzy-00760N-04 for pgsql-hackers@arkaria.postgresql.org; Mon, 27 Apr 2026 13:27:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wHLyw-00E23m-1c for pgsql-hackers@arkaria.postgresql.org; Mon, 27 Apr 2026 13:26:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wHLyv-00E23e-1a for pgsql-hackers@lists.postgresql.org; Mon, 27 Apr 2026 13:26:38 +0000 Received: from fout-a4-smtp.messagingengine.com ([103.168.172.147]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wHLyr-0000000329w-38KP for pgsql-hackers@lists.postgresql.org; Mon, 27 Apr 2026 13:26:36 +0000 Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfout.phl.internal (Postfix) with ESMTP id 61448EC15F1; Mon, 27 Apr 2026 09:26:33 -0400 (EDT) Received: from phl-imap-02 ([10.202.2.81]) by phl-compute-02.internal (MEProxy); Mon, 27 Apr 2026 09:26:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=burd.me; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1777296393; x=1777382793; bh=QxzUzaNwrF XkGeWIzZtfBHwOdKvFPsQTmIpho4TYG2c=; b=aWNwyN0aasgk9Z7PZma40UPmpr s5iJZcGouU/8M/4roG1pGY9vZ2yNaEhIEIZivK6XvzC2UtGblbfDWE3Z7QstIhMv y3rOuBS51pcl/pU/fyHuchlJfIi3YBkkFClvG9IaXPde6ikxX76ZQSwJd4N55nl+ u/XwYqR69bqMz1t/ZqgZLhbBbN4/mlSy5yzbdZEY5wq13v1EFT7jkUBZA4bVkU8p 71WA+AijaiSRIKdaAyW3BLwwrCDlUy7YH6NhcuMjlqUbTvVEOwIFZxRBfQjzBuKR mg4ReL2ZrxP3bxqrOaCJ09tZMGAESHs85T48JgIXiqq1TAntGqM88O2sX4rg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1777296393; x=1777382793; bh=QxzUzaNwrFXkGeWIzZtfBHwOdKvFPsQTmIp ho4TYG2c=; b=ng0er7U+jdfjbRRAOMxg69szaPM5pnntknBCstBdERB3rfAd5eJ Z8ybJtgVCcRMEd1hM0Y5VoKahItwbhlVQuGeVPNH0ATqfq893dFMyM+xhtEI4qIf EqolmhQU3Nr5npUXWf5SZTnB4QSv5J9ukn/ncWVdRs1m750q6+ygawBA36nkdJ5D qrTiEvt422cbPyBXc49TIQZ69o/BfgO7PHcc36pP6Y2bIIB4SFwfCTwhMUy5x2bM UmBpGMbJpdX7HvdxJOlydCqt+ahGOIdnUDXk6ytc8WoqeU/VXUYU2LAh0ftgu0YC 6uR6lDiHwk5T4Wa9YbVNPilVD2xtW6K7vuA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdejkeekudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjug hrpefoggffhffvvefkjghfufgtsehmtderreertdejnecuhfhrohhmpedfifhrvghguceu uhhrugdfuceoghhrvghgsegsuhhrugdrmhgvqeenucggtffrrghtthgvrhhnpefgteejue ejgeekffethfdvledtkeeffefgvedtleetkeeggeduheefudehudfgleenucffohhmrghi nhepphhoshhtghhrvghsqhhlrdhorhhgnecuvehluhhsthgvrhfuihiivgeptdenucfrrg hrrghmpehmrghilhhfrhhomhepghhrvghgsegsuhhrugdrmhgvpdhnsggprhgtphhtthho peehpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehmlhhoughjsegrmhgriihonh drtghomhdprhgtphhtthhopegrnhgurhgvshesrghnrghrrgiivghlrdguvgdprhgtphht thhopehnrghthhgrnhgusghoshhsrghrthesghhmrghilhdrtghomhdprhgtphhtthhope hpghhsqhhlqdhhrggtkhgvrhhssehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrghdp rhgtphhtthhopehtohhmrghssehvohhnughrrgdrmhgv X-ME-Proxy: Feedback-ID: i675e48f3:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id 7FC6B700069; Mon, 27 Apr 2026 09:26:32 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface MIME-Version: 1.0 X-ThreadId: A5esK-b6BnZB Date: Mon, 27 Apr 2026 09:26:12 -0400 From: "Greg Burd" To: "PostgreSQL Hackers" Cc: "Andres Freund" , "Tomas Vondra" , "Nathan Bossart" Message-Id: <3d85395b-698e-4f0e-873e-17b3f277f7c7@app.fastmail.com> In-Reply-To: <79629577-3ad8-4b1c-a469-ebc2cb4c5104@app.fastmail.com> References: <79629577-3ad8-4b1c-a469-ebc2cb4c5104@app.fastmail.com> Subject: Re: [PATCH] Batched clock sweep to reduce cross-socket atomic contention Content-Type: multipart/mixed; boundary=e46b7057cc77e38ee9fd32d31c281a797874ffcd List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --e46b7057cc77e38ee9fd32d31c281a797874ffcd Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sat, Apr 25, 2026, at 4:08 PM, Greg Burd wrote: > Hello hackers, Hi again, attached is v2: 0001 - unchanged, batches clock-sweep to reduce contention 0002 - changed ComputeClockBatchSize() such that non-NUMA multi-core sys= tems use batches as well and no longer default to batch size 1 Details below... > A colleague of mine, Jim Mlodgenski, has been poking at NUMA behavior=20 > on some of the newer AWS bare-metal instance types (r8i in particular,=20 > which exposes 6 NUMA nodes via SNC3 on a 2-socket box), and in the=20 > process landed on a very small change to freelist.c that I think is=20 > worth showing around. His patch is attached with some tweaks of my ow= n. > > Full disclosure: the exploration that led Jim to this patch idea was=20 > done with help from an AI assistant (Kiro); the idea, the benchmarking= ,=20 > and the final shape of the patch are human-driven, but I wanted to be=20 > up front about how his investigation started. Happy to discuss that=20 > separately if people want to. > > The one-line summary: instead of advancing nextVictimBuffer one buffer=20 > at a time via pg_atomic_fetch_add_u32, each backend claims a batch of=20 > 64 consecutive buffer IDs from the shared hand and then iterates them=20 > privately. Global sweep order is preserved -- every buffer is still > visited exactly once per complete pass -- but the atomic contention on=20 > that one cache line drops by roughly the batch size. > > > Why this matters > ---------------- > > On multi-socket boxes under eviction pressure, every backend that need= s=20 > a victim buffer ends up CAS'ing the same cache line. On a single=20 > socket, a locked RMW on that cache line stays warm in L1/L2 and=20 > completes in ~20ns. On 2+ sockets, the line bounces over QPI/UPI at=20 > ~100-200ns per op, and with hundreds of backends running=20 > StrategyGetBuffer() concurrently, the line ping-pongs constantly. It'= s=20 > a textbook NUMA scalability bottleneck, and once shared_buffers is=20 > smaller than the working set and the sweep is running continuously,=20 > that single atomic is what you hit in a perf profile (elevated=20 > bus-cycles, cache-misses on the cache line holding nextVictimBuffer). > > Andres pointed at the same spot in his pgconf.eu 2024 talk, and Tomas=20 > called it out in the "Adding basic NUMA awareness" thread [1] -- so=20 > this isn't news to anyone who's been looking at this area. What I=20 > think is new is a fix that's just this, without any of the surrounding=20 > architectural change. > > The framing (credit to Jim): the clock hand is doing two jobs. It=20 > *coordinates* backends so they don't redundantly decrement usage_count=20 > on the same buffers and so they eventually visit every buffer in the=20 > pool exactly once per pass. It also *serializes* access to the=20 > counter. Coordination is the part we want. Serialization is the part=20 > that's killing us on bigger NUMA boxes. Batching keeps the=20 > coordination and thins out the serialization. > > > How it works > ------------ > > Two per-backend statics, MyBatchPos and MyBatchEnd. When a backend=20 > calls ClockSweepTick() and its local batch is exhausted, it does a=20 > single fetch-add of CLOCK_SWEEP_BATCH_SIZE (64) against=20 > nextVictimBuffer and now owns that range. Subsequent ticks just bump=20 > the local counter. > > Wraparound got a small rewrite. The original code had the backend tha= t=20 > crossed NBuffers drive completePasses++ under the spinlock via a CAS=20 > loop. With batching, multiple backends can each land a fetch-add that=20 > returns a value >=3D NBuffers in the same pass, so the logic now is:=20 > whoever sees a start >=3D NBuffers takes the spinlock, re-reads the=20 > counter, and if it's still out of range does a single CAS to wrap it=20 > and bumps completePasses. If somebody else already wrapped, we just=20 > release and move on. StrategySyncStart() still sees a consistent=20 > (nextVictimBuffer, completePasses) pair. > > The batch size is gated on whether we actually have multiple NUMA=20 > nodes. On a single-socket box the atomic is already socket-local,=20 > batching just makes backends skip further ahead than they need to, so=20 > we fall back to batch size 1 -- which is bit-for-bit the original=20 > behavior. The guard: > > if (pg_numa_init() !=3D -1 && pg_numa_get_max_node() >=3D 1) > ClockSweepBatchSize =3D Min(CLOCK_SWEEP_BATCH_SIZE, (uint32) N= Buffers); > else > ClockSweepBatchSize =3D 1; > > Min() against NBuffers covers the small-shared_buffers corner so a=20 > batch never wraps the pool multiple times in one claim. Thinking more about this approach led me to believe that this non-NUMA d= efault is wrong and induces overhead for a very common case. > Does batching mess up the meaning of usage_count? > -------------------------------------------------- > > Short answer: no. I want to walk through this because it was my first=20 > concern too, and I think it's the question that will come up most on=20 > review. > > The clock sweep's usage_count is an access-frequency approximation=20 > measured in units of *complete passes*. A buffer with usage_count =3D= N=20 > survives N passes without a re-pin. The semantic meaning lives at pas= s=20 > granularity, not at individual-buffer granularity. > > What batching changes: intra-pass temporal ordering. Without batching= ,=20 > with N backends sweeping, decrements are interleaved -- backend A hits=20 > B[0], backend B hits B[1], backend C hits B[2]. With batching, backen= d=20 > A hits B[0..63] in a tight local burst, then backend B hits B[64..127]= ,=20 > etc. The 64-buffer chunks are decremented in bursts rather than=20 > individually. > > Why it doesn't matter: > > 1. Every buffer still gets decremented exactly once per complete > pass. The invariant the algorithm actually depends on is > untouched. > > 2. A buffer's survival window is the time between consecutive > passes. That's milliseconds to seconds under load. Whether > B[0] gets decremented 50us before or 50us after B[63] within > the same pass is below the resolution of anything usage_count > is trying to measure. > > 3. The bgwriter's feedback loop reads (nextVictimBuffer, > completePasses, numBufferAllocs) via StrategySyncStart() every > ~200ms. nextVictimBuffer still advances at the same *total* > rate (64 per atomic op, but atomic ops happen 1/64 as often). > The position it reports can jitter by up to 64 buffers relative > to the one-at-a-time case, but BgBufferSync()'s smoothed > estimates operate over thousands of buffers per cycle, so the > jitter disappears into the averaging. numBufferAllocs still > increments once per allocation. strategy_delta, > smoothed_alloc, smoothed_density, reusable_buffers_est -- all > unaffected in any way I can see. > > Table form, because it's easier to argue with: > > Property | Unpatched | Batched > ----------------------------------+----------------+---------------- > Buffers visited per pass | NBuffers | NBuffers > Decrements per buffer per pass | 1 | 1 > Eviction threshold | usage_count=3D=3D0 | usage_count= =3D=3D0 > Max survival (passes) | 6 | 6 > Decrement ordering within a pass | interleaved | chunked > bgwriter allocation rate signal | accurate | accurate > Cross-socket atomic traffic | 1 per buffer | 1 per 64 > > There is one subtle difference worth naming. When a backend finds a=20 > victim at B[5] of its batch, it returns with MyBatchEnd still sitting=20 > at B[63]. The next time that backend needs a victim it resumes at=20 > B[6], not at wherever the global hand now points. So the backend=20 > drains its batch over multiple StrategyGetBuffer() calls rather than=20 > all at once. Under heavy load, where batches are consumed in=20 > microseconds, this is invisible. Under light load, the implication is=20 > that some buffers can sit with slightly stale usage_count for longer=20 > than they would have before. But "light load" means "the sweep is=20 > barely moving and nothing wants to evict anyway" -- so the effect > doesn't show up where it would hurt. > > There's also a small positive side-effect: cache locality. The backen= d=20 > that just touched BufferDescriptor[B[0]] has the adjacent descriptors=20 > warm in L1/L2. Walking B[0..63] locally is cheaper than walking a=20 > striped interleaving where each descriptor was last touched by a=20 > different core. I haven't tried to isolate this in perf, but it falls=20 > out naturally. > > > Benchmarks > ---------- > > Jim ran these; I'm still working on reproducing them locally and will=20 > post independent numbers in a follow-up. All bare metal, Linux, huge=20 > pages enabled throughout (more on that below), postmaster pinned to=20 > node 0 with `numactl --cpunodebind=3D0` because otherwise stock TPS=20 > varied from 31K to 40K depending on which node the postmaster happened=20 > to land on at launch -- worth flagging for anyone trying to reproduce. > > Workload is pgbench scale 3000 (~45GB) with shared_buffers=3D32GB, so = the=20 > working set always spills and the sweep is hot. > > r8i.metal-96xl (384 vCPUs, 2 sockets, 6 NUMA nodes via SNC3): > > pgbench RO: > Clients Stock Patched Delta > 64 31,457 36,353 +16% > 128 31,678 37,864 +20% > 256 31,510 37,558 +19% > 384 31,431 37,464 +19% > 512 31,329 37,040 +18% > > pgbench RW: > Clients Stock Patched Delta > 64 7,685 7,713 0% > 128 10,420 10,541 +1% > 256 12,393 12,463 +1% > 384 15,317 15,197 -1% > 512 17,930 17,978 0% > > m6i.metal (128 vCPUs, 2 sockets, Ice Lake): > RO +19-20%, RW within noise. > > c8i.metal-48xl (192 vCPUs, 1 socket): > Single-socket -> batch_size=3D1 -> original code path. No > behavioral change. (I double-checked this one specifically > because it's the sanity test for the gate.) > > HammerDB TPC-C on m6i.metal (1000 warehouses): > VUs Stock Patched Delta > 128 358,518 349,787 -2% > 256 332,098 330,272 -1% > 384 365,782 377,519 +3% > 512 370,663 386,526 +4% > > No TPC-C regression, which was the thing we were most worried about. A= n=20 > earlier attempt (per-socket partitioned sweep, see below) was -13% on=20 > this same workload. > > The general shape is: the scaling curve flattens later. Unpatched, TP= S=20 > tops out around 128 clients and stays flat up to 512 because backends=20 > are spending cycles waiting on the cache line rather than > doing work. Patched, the curve keeps rising past the point where=20 > unpatched plateaus. > > Huge pages caveat: all of the above was run with huge pages on, on=20 > large-memory instances (the r8i.96xl has 3TB, so Jim never considered=20 > running without them). We have not characterized the non-huge-pages=20 > case. That's on my list; I don't expect it to change the conclusion,=20 > but I shouldn't speak for data I haven't collected. > > > Relationship to Tomas's NUMA series > ----------------------------------- > > Tomas posted a multi-patch NUMA-awareness series in [1] covering buffe= r=20 > interleaving across nodes, partitioned freelists, partitioned clock=20 > sweep, PGPROC interleaving, and related pieces. I want to be careful=20 > here because I don't think we should frame this patch as competing wit= h=20 > that work. > > One thing I found striking as I re-read the thread: in the benchmarks=20 > Tomas posted later in the series, *most of the benefit comes from=20 > partitioning the clock sweep*, and the NUMA memory-placement layer on=20 > top sometimes runs slower than partitioning alone. His own conclusion= ,=20 > quoted roughly: the benefit mostly comes from just partitioning the=20 > clock sweep, and it's largely independent of the NUMA stuff; the NUMA=20 > partitioning is often slower. > > That observation is the thing that makes me think batching is worth=20 > considering on its own. It's going after the same bottleneck Tomas's=20 > partitioning addresses, but: > > - without splitting global eviction visibility (which is where > cross-partition stealing gets complicated), > - without requiring NUMA-aware buffer placement (which has huge > page alignment, descriptor-partition-mid-page, and resize > complications that are still being worked out in that thread), > - without touching PGPROC or bgwriter. > > What this patch does *not* do: > - place buffers on specific NUMA nodes > - partition the freelist > - touch PGPROC > - add new GUCs > - change bgwriter > > What this patch *does* do: > - target exactly the clock-sweep contention that Tomas's > partitioning targets, and reduce it by ~64x, in ~30 lines. > > If Tomas's series lands in full, this patch becomes redundant for its=20 > primary use case (though even within a partitioned sweep, the=20 > per-partition atomic still benefits from batching, so it's arguably a=20 > useful primitive either way). If Tomas's series lands incrementally=20 > over several cycles -- which the open items in that thread suggest is=20 > the realistic path -- this gets us a real chunk of the multi-socket wi= n=20 > now. > > This patch is also orthogonal to my earlier thread about removing the=20 > freelist entirely [2], but given the proximity to that code Jim agreed=20 > that I could propose/steward it here on the list for consideration. > > > Open questions / things I'd like feedback on > -------------------------------------------- > > - Batch size. 64 is a round number that worked well in testing, but > Nathan raised the reasonable point that on small shared_buffers > with high concurrency, a fixed 64 could be unfortunate. Options: > scale with shared_buffers (Min(64, NBuffers / N) for some N), scale > with max_connections, keep it fixed but let operators tune it, or > make it a function of NUMA node count. I don't have a strong > opinion yet; the Min(batch, NBuffers) cap covers the "obviously > wrong" corner but doesn't speak to the "several hundred backends > on a few-MB shared_buffers" shape. Numbers/ideas/proposals welcome. > > - NUMA detection. The gate uses pg_numa_init() / > pg_numa_get_max_node(). On systems where libnuma isn't available, > or where get_mempolicy is blocked (some container configurations), > we fall back to batch size 1. That's safe but it misses the > "single socket, many cores, still benefits from fewer atomics" > case. Might be worth a way to force-enable, or batching on all > systems with a smaller batch size when single-socket. I'd like to > measure before deciding. > > - Eviction pattern on reads. Nathan also flagged that with batching, > the buffers a backend ends up pinning in one StrategyGetBuffer() > call will tend to be contiguous in buffer-id space rather than > scattered, which is a different allocation pattern than today. > The usage_count analysis above says this is benign, but if anyone > has an intuition for a workload where this would be observable > (e.g., something that cares about the mapping between buffer-id > and relation locality), I'd like to hear it. > > - nextVictimBuffer wraparound. The current code has a mild overflow > concern papered over with "highly unlikely and wouldn't be > particularly harmful". With batching this is no worse than before, > but if we're already touching this function, it might be worth > thinking about whether to tighten it up in the same patch or a > follow-up. > > - Should the non-NUMA value for this be derived from core counts that > imply L1/L2 cache layouts or simply default to 8 rather than 1 to > realize some benefit? So, I'm answering my own question here. Yes, it should. Ideas below. > - Should there be a postgresql.conf setting for this that takes > precedence? > > > I'll run the non-huge-pages variant, reproduce the r8i numbers, poke a= t=20 > the small-shared_buffers corner, and post perf stat output showing the=20 > atomic/cache-miss deltas over the next few days. In the meantime,=20 > eyeballs and skepticism welcome -- I would especially welcome comments=20 > from Andres, who's been in this code recently, and from Tomas, whose=20 > series has the most overlap. > > I realize that we're past feature freeze and working on release notes=20 > for v19, so the chances of merging this are slim to none. I think thi= s=20 > could be considered a "performance bug fix for NUMA systems" in this=20 > release, but that is stretching it a bit. It is a big ask at this=20 > stage to land a change like this. > > best. > > -greg > > [1]=20 > https://www.postgresql.org/message-id/099b9433-2855-4f1b-b421-d078a5d8= 2017@vondra.me > [2]=20 > https://www.postgresql.org/message-id/f0e3c02e-e217-4f04-8dab-1e7e80a2= 28c0@burd.me > Attachments: > * v1-0001-Reduce-clock-sweep-atomic-contention-by-claiming-.patch ComputeClockBatchSize() has two phases: select a base batch from hardwar= e topology, then cap it to prevent over-claiming. Phase 1: Base batch from topology int ncpus =3D pg_get_online_cpus(); int numa_nodes =3D (pg_numa_init() !=3D -1) ? pg_numa_get_max_node() += 1 : 1; if (numa_nodes > 1) base_batch =3D 64; else if (ncpus > 16) base_batch =3D 32; else if (ncpus > 8) base_batch =3D 16; else if (ncpus > 4) base_batch =3D 8; else base_batch =3D 1; The reasoning at each tier: - NUMA (multi-socket): Atomic ops cross the interconnect (QPI/UPI/Infi= nity Fabric). Round-trip latency is ~100-300ns vs ~10-40ns intra-socket. Batch=3D64 amortizes that heavily. - >16 cores, single socket: Still significant L3 contention, many cores competing for the same cache line. Batch=3D32 cuts atomic ops by 32x. - 9-16 cores: Moderate contention. Batch=3D16. - 5-8 cores: Light contention. Batch=3D8. - <=3D4 cores: Almost no contention. Batch=3D1 (no batching). The over= head of=20 batching logic isn't worth it, and there's a fairness tradeoff - bat= ching means one backend "owns" a range of buffers temporarily, which matte= rs=20 more when there are few buffers per backend. Phase 2: Cap to prevent over-claiming max_batch =3D (MaxBackends > 0) ? pool_nbuffers / (2 * MaxBackends) : pool_nbuffers / 200; if (max_batch < 1) max_batch =3D 1; return Min(base_batch, Min(max_batch, pool_nbuffers)); The cap ensures that if every backend simultaneously claims a batch, the= total claimed doesn't exceed half the pool: batch_size * MaxBackends <=3D pool_nbuffers / 2 Why half? If all backends claimed the entire pool simultaneously, they'd= each be sweeping overlapping ranges, thus wasting work and defeating th= e purpose. Keeping total claims under 50% of the pool means at any insta= nt, at most half the buffers are "in flight" being evaluated by backends= , and the other half are available for normal operation. For a small dynamic pool (say 4096 buffers with MaxBackends=3D200), the = cap computes to 4096 / 400 =3D 10, which overrides any larger base_batch= . For the default pool with shared_buffers =3D 8GB (1M buffers) and MaxB= ackends=3D200, the cap is 1000000 / 400 =3D 2500 which is well above the= max base_batch of 64, so the base_batch wins. The pool_nbuffers floor at the end handles the degenerate case of a pool= smaller than the batch size. The Tradeoff Larger batches reduce atomic contention but increase sweep unevenness, o= ne backend might sweep through "cold" buffers while another's batch happ= ens to land on "hot" ones. The tiered approach balances this: batch aggr= essively only when the hardware topology makes contention the dominant c= ost (NUMA, many-core), and stay conservative on small systems where fair= ness matters more. I think this is better because: 1. The original patch only batched on multi-socket NUMA systems. The new= algorithm also provides atomic contention benefits on large single-sock= et systems (>16 cores) where L3 cache contention matters. 2. Conservative on small systems: Systems with =E2=89=A44 cores get batc= h_size=3D1 (original behavior) since batching overhead outweighs content= ion benefits and fairness matters more. 3. Prevents pathological over-claiming: The cap mechanism prevents scena= rios where many backends claim huge batches relative to a small buffer p= ool. Based on the algorithm, here's what different systems would get: System CPUs NUMA Total RAM Shr Buf Batch Size= Atomic Reduction = =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D= =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D = =20 r8i.metal-96xl 384 multi 3072GB 2457.6GB 64 = 64x = =20 m6i.metal 128 multi 512GB 409.6GB 64 = 64x = =20 c8i.metal-48xl 192 1 socket 192GB 153.6GB 32 = 32x = =20 Large server 64 multi 256GB 204.8GB 64 = 64x Medium server 32 1 socket 64GB 51.2GB 32 = 32x = =20 Small server 16 1 socket 32GB 25.6GB 16 = 16x = =20 Developer machine 8 1 socket 16GB 12.8GB 8 = 8x Small VM 4 1 socket 4GB 3.2GB 1 = no change Overloaded VM 8 1 socket 4GB 3.2GB 8 = 8x best. -greg --e46b7057cc77e38ee9fd32d31c281a797874ffcd Content-Disposition: attachment; filename*0="v2-0001-Reduce-clock-sweep-atomic-contention-by-claiming-.pat"; filename*1="ch" Content-Type: application/octet-stream; name="=?UTF-8?Q?v2-0001-Reduce-clock-sweep-atomic-contention-by-claiming-.patc?= =?UTF-8?Q?h?=" Content-Transfer-Encoding: base64 RnJvbSBiZGNmOTBmYmQ4OWEwYWVjMzk3YTNkNTcyMjRhZTczMjk1OTczM2Y5IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBHcmVnIEJ1cmQgPGdyZWdAYnVyZC5tZT4KRGF0ZTog U2F0LCAyNSBBcHIgMjAyNiAxNTo1MjozNiAtMDQwMApTdWJqZWN0OiBbUEFUQ0ggdjIgMS8y XSBSZWR1Y2UgY2xvY2stc3dlZXAgYXRvbWljIGNvbnRlbnRpb24gYnkgY2xhaW1pbmcKIGJ1 ZmZlcnMgaW4gYmF0Y2hlcwoKU3RyYXRlZ3lHZXRCdWZmZXIoKSBhZHZhbmNlcyBuZXh0Vmlj dGltQnVmZmVyIHZpYQpwZ19hdG9taWNfZmV0Y2hfYWRkX3UzMiguLi4sIDEpIG9uIGV2ZXJ5 IHRpY2suICBPbiBtdWx0aS1zb2NrZXQKc3lzdGVtcyB0aGUgY2FjaGUgbGluZSBob2xkaW5n IHRoZSBjb3VudGVyIGhhcyB0byB0cmF2ZWwgb3ZlciB0aGUKaW50ZXJjb25uZWN0IG9uIGVh Y2ggb3BlcmF0aW9uLCBwdXNoaW5nIGEgc3dlZXAgdGljayBmcm9tIH4yMG5zICh0aGUKc2Ft ZS1zb2NrZXQgY2FzZSkgaW50byB0aGUgfjEwMC0yMDBucyByYW5nZS4gIFdpdGggaHVuZHJl ZHMgb2YKY29uY3VycmVudCBiYWNrZW5kcyB1bmRlciBldmljdGlvbiBwcmVzc3VyZSwgdGhh dCBvbmUgY2FjaGUgbGluZQpiZWNvbWVzIHRoZSBkb21pbmFudCBjb3N0IGluIHRoZSBzd2Vl cCwgdmlzaWJsZSBhcyBlbGV2YXRlZApidXMtY3ljbGVzIGFuZCBjYWNoZS1taXNzZXMgaW4g cGVyZiBwcm9maWxlcy4KCkVhY2ggYmFja2VuZCBub3cgY2xhaW1zIGEgcmFuZ2Ugb2YgQ0xP Q0tfU1dFRVBfQkFUQ0hfU0laRSAoNjQpCmNvbnNlY3V0aXZlIGJ1ZmZlciBJRHMgd2l0aCBh IHNpbmdsZSBmZXRjaC1hZGQgYW5kIGl0ZXJhdGVzIHRocm91Z2gKdGhlbSBwcml2YXRlbHku ICBUaGUgc3dlZXAgc3RpbGwgYWR2YW5jZXMgdGhyb3VnaCB0aGUgcG9vbCBpbiBvcmRlciwK ZWFjaCBidWZmZXIgaXMgc3RpbGwgdmlzaXRlZCBleGFjdGx5IG9uY2UgcGVyIGNvbXBsZXRl IHBhc3MsIGFuZAp1c2FnZV9jb3VudCBpcyBzdGlsbCBkZWNyZW1lbnRlZCBleGFjdGx5IG9u Y2UgcGVyIGJ1ZmZlciBwZXIgcGFzczsKdGhlIG1lYW5pbmcgb2YgdXNhZ2VfY291bnQgYXMg ImhvdyBtYW55IGNvbXBsZXRlIHBhc3NlcyBhIGJ1ZmZlcgpzdXJ2aXZlcyB3aXRob3V0IGEg cmUtcGluIiBpcyBwcmVzZXJ2ZWQuICBXaGF0IGNoYW5nZXMgaXMgdGhlCnRlbXBvcmFsIG9y ZGVyaW5nIG9mIGRlY3JlbWVudHMgd2l0aGluIGEgc2luZ2xlIHBhc3MsIHdoaWNoIHRoZQph bGdvcml0aG0gZG9lcyBub3QgZGVwZW5kIG9uLgoKV3JhcGFyb3VuZCBoYW5kbGluZyBpcyBh ZGp1c3RlZDogd2l0aCBiYXRjaGluZywgbXVsdGlwbGUgYmFja2VuZHMKY2FuIGVhY2ggc2Vl IHRoZWlyIGZldGNoLWFkZCByZXR1cm4gYSB2YWx1ZSBwYXN0IE5CdWZmZXJzIHdpdGhpbgp0 aGUgc2FtZSBwYXNzLiAgQW55IHN1Y2ggYmFja2VuZCB0YWtlcyBidWZmZXJfc3RyYXRlZ3lf bG9jaywKcmUtcmVhZHMgdGhlIGNvdW50ZXIsIGFuZCBpZiBpdCBpcyBzdGlsbCBvdXQgb2Yg cmFuZ2Ugd3JhcHMgaXQgd2l0aAphIHNpbmdsZSBDQVMgYW5kIGluY3JlbWVudHMgY29tcGxl dGVQYXNzZXMuICBTdHJhdGVneVN5bmNTdGFydCgpCmNvbnRpbnVlcyB0byBzZWUgYSBjb25z aXN0ZW50IChuZXh0VmljdGltQnVmZmVyLCBjb21wbGV0ZVBhc3NlcykKcGFpci4KCkJhdGNo aW5nIGlzIG9ubHkgdXNlZnVsIHdoZW4gdGhlIGF0b21pYyBpcyBhY3R1YWxseSBjb250ZW5k ZWQKYWNyb3NzIG5vZGVzLCBzbyBpdCBpcyBhcHBsaWVkIG9ubHkgd2hlbiBsaWJudW1hIHJl cG9ydHMgbW9yZSB0aGFuCm9uZSBub2RlIChwZ19udW1hX2dldF9tYXhfbm9kZSgpID49IDEp OyBvdGhlcndpc2UgdGhlIGJhdGNoIHNpemUKc3RheXMgYXQgMSBhbmQgdGhlIGNvZGUgcGF0 aCBtYXRjaGVzIG1hc3RlciBiaXQtZm9yLWJpdC4gIFRoZSBiYXRjaAppcyBhbHNvIGNhcHBl ZCBhdCBOQnVmZmVycyBzbyBhIGNsYWltIGNhbm5vdCB3cmFwIHRoZSBwb29sIG1vcmUKdGhh biBvbmNlLgoKQ28tQXV0aG9yZWQtYnk6IEppbSBNbG9kZ2Vuc2tpIDxtbG9kakBhbWF6b24u Y29tPgpDby1BdXRob3JlZC1ieTogR3JlZyBCdXJkIDxncmVnQGJ1cmQubWU+Ci0tLQogc3Jj L2JhY2tlbmQvc3RvcmFnZS9idWZmZXIvZnJlZWxpc3QuYyB8IDEzNiArKysrKysrKysrKysr KysrKystLS0tLS0tLQogMSBmaWxlIGNoYW5nZWQsIDk0IGluc2VydGlvbnMoKyksIDQyIGRl bGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9iYWNrZW5kL3N0b3JhZ2UvYnVmZmVyL2Zy ZWVsaXN0LmMgYi9zcmMvYmFja2VuZC9zdG9yYWdlL2J1ZmZlci9mcmVlbGlzdC5jCmluZGV4 IGZkYjViYWQ3OTEwLi5lODZlZDFmN2RhMCAxMDA2NDQKLS0tIGEvc3JjL2JhY2tlbmQvc3Rv cmFnZS9idWZmZXIvZnJlZWxpc3QuYworKysgYi9zcmMvYmFja2VuZC9zdG9yYWdlL2J1ZmZl ci9mcmVlbGlzdC5jCkBAIC0yMiw2ICsyMiw3IEBACiAjaW5jbHVkZSAic3RvcmFnZS9wcm9j LmgiCiAjaW5jbHVkZSAic3RvcmFnZS9zaG1lbS5oIgogI2luY2x1ZGUgInN0b3JhZ2Uvc3Vi c3lzdGVtcy5oIgorI2luY2x1ZGUgInBvcnQvcGdfbnVtYS5oIgogCiAjZGVmaW5lIElOVF9B Q0NFU1NfT05DRSh2YXIpCSgoaW50KSgqKCh2b2xhdGlsZSBpbnQgKikmKHZhcikpKSkKIApA QCAtMTAwLDY4ICsxMDEsMTAxIEBAIHN0YXRpYyBCdWZmZXJEZXNjICpHZXRCdWZmZXJGcm9t UmluZyhCdWZmZXJBY2Nlc3NTdHJhdGVneSBzdHJhdGVneSwKIHN0YXRpYyB2b2lkIEFkZEJ1 ZmZlclRvUmluZyhCdWZmZXJBY2Nlc3NTdHJhdGVneSBzdHJhdGVneSwKIAkJCQkJCQlCdWZm ZXJEZXNjICpidWYpOwogCisvKgorICogTnVtYmVyIG9mIGJ1ZmZlciBJRHMgdG8gY2xhaW0g ZnJvbSB0aGUgc2hhcmVkIGNsb2NrIGhhbmQgYXQgb25jZS4KKyAqIExhcmdlciB2YWx1ZXMg cmVkdWNlIGNvbnRlbnRpb24gb24gdGhlIHNoYXJlZCBhdG9taWMuICBXaXRoIGEgYmF0Y2gK KyAqIHNpemUgb2YgNjQsIGNvbmN1cnJlbnQgYmFja2VuZHMgc3dlZXAgbm9uLW92ZXJsYXBw aW5nIGNodW5rcyBvZiA2NAorICogYnVmZmVycyByYXRoZXIgdGhhbiBpbnRlcmxlYXZpbmcg b25lIGJ1ZmZlciBhdCBhIHRpbWUuICBUaGUgZ2xvYmFsCisgKiBzd2VlcCBvcmRlciBpcyBw cmVzZXJ2ZWQg4oCUIGVhY2ggYnVmZmVyIGlzIHN0aWxsIHZpc2l0ZWQgZXhhY3RseSBvbmNl CisgKiBwZXIgY29tcGxldGUgcGFzcy4KKyAqLworI2RlZmluZSBDTE9DS19TV0VFUF9CQVRD SF9TSVpFIDY0CisKKy8qCisgKiBQZXItYmFja2VuZCBzdGF0ZSBmb3IgYmF0Y2hlZCBjbG9j ayBzd2VlcC4KKyAqLworc3RhdGljIHVpbnQzMiBNeUJhdGNoUG9zID0gMDsJLyogbmV4dCBi dWZmZXIgd2l0aGluIGJhdGNoICovCitzdGF0aWMgdWludDMyIE15QmF0Y2hFbmQgPSAwOwkv KiBvbmUgcGFzdCBsYXN0IGJ1ZmZlciBpbiBiYXRjaCAqLworCisvKgorICogRWZmZWN0aXZl IGJhdGNoIHNpemUgZm9yIHRoZSBjbG9jayBzd2VlcCwgY29tcHV0ZWQgb25jZSBhdCBzdGFy dHVwLgorICogT24gbm9uLU5VTUEgc3lzdGVtcyAoc2luZ2xlIHNvY2tldCwgbm8gbGlibnVt YSwgb3IgY29udGFpbmVycyBibG9ja2luZworICogZ2V0X21lbXBvbGljeSksIHRoaXMgaXMg MSAtLSB0aGUgb3JpZ2luYWwgb25lLWF0LWEtdGltZSBiZWhhdmlvci4KKyAqIE9uIG11bHRp LW5vZGUgTlVNQSBzeXN0ZW1zLCB0aGlzIGlzIE1pbihDTE9DS19TV0VFUF9CQVRDSF9TSVpF LCBOQnVmZmVycykKKyAqIHRvIHJlZHVjZSBjcm9zcy1zb2NrZXQgYXRvbWljIGNvbnRlbnRp b24gb24gbmV4dFZpY3RpbUJ1ZmZlci4KKyAqLworc3RhdGljIHVpbnQzMiBDbG9ja1N3ZWVw QmF0Y2hTaXplID0gMTsKKworc3RhdGljIGlubGluZSB1aW50MzIKK0VmZmVjdGl2ZUJhdGNo U2l6ZSh2b2lkKQoreworCXJldHVybiBDbG9ja1N3ZWVwQmF0Y2hTaXplOworfQorCiAvKgog ICogQ2xvY2tTd2VlcFRpY2sgLSBIZWxwZXIgcm91dGluZSBmb3IgU3RyYXRlZ3lHZXRCdWZm ZXIoKQogICoKLSAqIE1vdmUgdGhlIGNsb2NrIGhhbmQgb25lIGJ1ZmZlciBhaGVhZCBvZiBp dHMgY3VycmVudCBwb3NpdGlvbiBhbmQgcmV0dXJuIHRoZQotICogaWQgb2YgdGhlIGJ1ZmZl ciBub3cgdW5kZXIgdGhlIGhhbmQuCisgKiBSZXR1cm4gdGhlIG5leHQgYnVmZmVyIHRvIGNv bnNpZGVyIGZvciBldmljdGlvbi4gIEJhY2tlbmRzIGNsYWltIGJhdGNoZXMKKyAqIG9mIGNv bnNlY3V0aXZlIGJ1ZmZlciBJRHMgZnJvbSB0aGUgc2hhcmVkIGNsb2NrIGhhbmQsIHRoZW4g aXRlcmF0ZSB0aHJvdWdoCisgKiB0aGVtIGxvY2FsbHkgd2l0aG91dCBmdXJ0aGVyIGF0b21p YyBvcGVyYXRpb25zLiAgVGhpcyBwcmVzZXJ2ZXMgdGhlIGdsb2JhbAorICogc3dlZXAgb3Jk ZXIgd2hpbGUgcmVkdWNpbmcgY3Jvc3Mtc29ja2V0IGNvbnRlbnRpb24gb24gdGhlIHNoYXJl ZCBjb3VudGVyLgogICovCiBzdGF0aWMgaW5saW5lIHVpbnQzMgogQ2xvY2tTd2VlcFRpY2so dm9pZCkKIHsKIAl1aW50MzIJCXZpY3RpbTsKIAotCS8qCi0JICogQXRvbWljYWxseSBtb3Zl IGhhbmQgYWhlYWQgb25lIGJ1ZmZlciAtIGlmIHRoZXJlJ3Mgc2V2ZXJhbCBwcm9jZXNzZXMK LQkgKiBkb2luZyB0aGlzLCB0aGlzIGNhbiBsZWFkIHRvIGJ1ZmZlcnMgYmVpbmcgcmV0dXJu ZWQgc2xpZ2h0bHkgb3V0IG9mCi0JICogYXBwYXJlbnQgb3JkZXIuCi0JICovCi0JdmljdGlt ID0KLQkJcGdfYXRvbWljX2ZldGNoX2FkZF91MzIoJlN0cmF0ZWd5Q29udHJvbC0+bmV4dFZp Y3RpbUJ1ZmZlciwgMSk7Ci0KLQlpZiAodmljdGltID49IE5CdWZmZXJzKQorCWlmIChNeUJh dGNoUG9zID49IE15QmF0Y2hFbmQpCiAJewotCQl1aW50MzIJCW9yaWdpbmFsVmljdGltID0g dmljdGltOwotCi0JCS8qIGFsd2F5cyB3cmFwIHdoYXQgd2UgbG9vayB1cCBpbiBCdWZmZXJE ZXNjcmlwdG9ycyAqLwotCQl2aWN0aW0gPSB2aWN0aW0gJSBOQnVmZmVyczsKLQogCQkvKgot CQkgKiBJZiB3ZSdyZSB0aGUgb25lIHRoYXQganVzdCBjYXVzZWQgYSB3cmFwYXJvdW5kLCBm b3JjZQotCQkgKiBjb21wbGV0ZVBhc3NlcyB0byBiZSBpbmNyZW1lbnRlZCB3aGlsZSBob2xk aW5nIHRoZSBzcGlubG9jay4gV2UKLQkJICogbmVlZCB0aGUgc3BpbmxvY2sgc28gU3RyYXRl Z3lTeW5jU3RhcnQoKSBjYW4gcmV0dXJuIGEgY29uc2lzdGVudAotCQkgKiB2YWx1ZSBjb25z aXN0aW5nIG9mIG5leHRWaWN0aW1CdWZmZXIgYW5kIGNvbXBsZXRlUGFzc2VzLgorCQkgKiBD bGFpbSBhIG5ldyBiYXRjaCBmcm9tIHRoZSBzaGFyZWQgY2xvY2sgaGFuZC4gIFRoaXMgaXMg dGhlIG9ubHkKKwkJICogYXRvbWljIG9wZXJhdGlvbiBwZXIgYmF0Y2gsIHJlZHVjaW5nIGNv bnRlbnRpb24gYnkgdGhlIGJhdGNoIHNpemUuCiAJCSAqLwotCQlpZiAodmljdGltID09IDAp CisJCXVpbnQzMgkJc3RhcnQ7CisJCXVpbnQzMgkJYmF0Y2hfc2l6ZSA9IEVmZmVjdGl2ZUJh dGNoU2l6ZSgpOworCisJCXN0YXJ0ID0gcGdfYXRvbWljX2ZldGNoX2FkZF91MzIoJlN0cmF0 ZWd5Q29udHJvbC0+bmV4dFZpY3RpbUJ1ZmZlciwKKwkJCQkJCQkJCQliYXRjaF9zaXplKTsK KworCQlpZiAoc3RhcnQgPj0gKHVpbnQzMikgTkJ1ZmZlcnMpCiAJCXsKLQkJCXVpbnQzMgkJ ZXhwZWN0ZWQ7Ci0JCQl1aW50MzIJCXdyYXBwZWQ7Ci0JCQlib29sCQlzdWNjZXNzID0gZmFs c2U7CisJCQlzdGFydCA9IHN0YXJ0ICUgTkJ1ZmZlcnM7CiAKLQkJCWV4cGVjdGVkID0gb3Jp Z2luYWxWaWN0aW0gKyAxOworCQkJLyoKKwkJCSAqIElmIHRoZSBjb3VudGVyIGhhcyBncm93 biBiZXlvbmQgTkJ1ZmZlcnMsIHRyeSB0byB3cmFwIGl0IGJhY2suCisJCQkgKiBXZSBtdXN0 IGhvbGQgdGhlIHNwaW5sb2NrIHNvIFN0cmF0ZWd5U3luY1N0YXJ0KCkgY2FuIHJlYWQKKwkJ CSAqIG5leHRWaWN0aW1CdWZmZXIgYW5kIGNvbXBsZXRlUGFzc2VzIGNvbnNpc3RlbnRseS4K KwkJCSAqCisJCQkgKiBNdWx0aXBsZSBiYWNrZW5kcyBtYXkgZW50ZXIgdGhpcyBzZWN0aW9u IGNvbmN1cnJlbnRseS4gQWZ0ZXIKKwkJCSAqIGFjcXVpcmluZyB0aGUgc3BpbmxvY2ssIHJl LXJlYWQgdGhlIGNvdW50ZXI6IGlmIGFub3RoZXIgYmFja2VuZAorCQkJICogYWxyZWFkeSB3 cmFwcGVkIGl0IGJlbG93IE5CdWZmZXJzLCB3ZSdyZSBkb25lLgorCQkJICovCisJCQlTcGlu TG9ja0FjcXVpcmUoJlN0cmF0ZWd5Q29udHJvbC0+YnVmZmVyX3N0cmF0ZWd5X2xvY2spOwog Ci0JCQl3aGlsZSAoIXN1Y2Nlc3MpCiAJCQl7Ci0JCQkJLyoKLQkJCQkgKiBBY3F1aXJlIHRo ZSBzcGlubG9jayB3aGlsZSBpbmNyZWFzaW5nIGNvbXBsZXRlUGFzc2VzLiBUaGF0Ci0JCQkJ ICogYWxsb3dzIG90aGVyIHJlYWRlcnMgdG8gcmVhZCBuZXh0VmljdGltQnVmZmVyIGFuZAot CQkJCSAqIGNvbXBsZXRlUGFzc2VzIGluIGEgY29uc2lzdGVudCBtYW5uZXIgd2hpY2ggaXMg cmVxdWlyZWQgZm9yCi0JCQkJICogU3RyYXRlZ3lTeW5jU3RhcnQoKS4gIEluIHRoZW9yeSBk ZWxheWluZyB0aGUgaW5jcmVtZW50Ci0JCQkJICogY291bGQgbGVhZCB0byBhbiBvdmVyZmxv dyBvZiBuZXh0VmljdGltQnVmZmVycywgYnV0IHRoYXQncwotCQkJCSAqIGhpZ2hseSB1bmxp a2VseSBhbmQgd291bGRuJ3QgYmUgcGFydGljdWxhcmx5IGhhcm1mdWwuCi0JCQkJICovCi0J CQkJU3BpbkxvY2tBY3F1aXJlKCZTdHJhdGVneUNvbnRyb2wtPmJ1ZmZlcl9zdHJhdGVneV9s b2NrKTsKLQotCQkJCXdyYXBwZWQgPSBleHBlY3RlZCAlIE5CdWZmZXJzOworCQkJCXVpbnQz MgkJY3VycmVudDsKKwkJCQl1aW50MzIJCXdyYXBwZWQ7CiAKLQkJCQlzdWNjZXNzID0gcGdf YXRvbWljX2NvbXBhcmVfZXhjaGFuZ2VfdTMyKCZTdHJhdGVneUNvbnRyb2wtPm5leHRWaWN0 aW1CdWZmZXIsCi0JCQkJCQkJCQkJCQkJCSAmZXhwZWN0ZWQsIHdyYXBwZWQpOwotCQkJCWlm IChzdWNjZXNzKQotCQkJCQlTdHJhdGVneUNvbnRyb2wtPmNvbXBsZXRlUGFzc2VzKys7Ci0J CQkJU3BpbkxvY2tSZWxlYXNlKCZTdHJhdGVneUNvbnRyb2wtPmJ1ZmZlcl9zdHJhdGVneV9s b2NrKTsKKwkJCQljdXJyZW50ID0gcGdfYXRvbWljX3JlYWRfdTMyKCZTdHJhdGVneUNvbnRy b2wtPm5leHRWaWN0aW1CdWZmZXIpOworCQkJCWlmIChjdXJyZW50ID49ICh1aW50MzIpIE5C dWZmZXJzKQorCQkJCXsKKwkJCQkJd3JhcHBlZCA9IGN1cnJlbnQgJSBOQnVmZmVyczsKKwkJ CQkJaWYgKHBnX2F0b21pY19jb21wYXJlX2V4Y2hhbmdlX3UzMigmU3RyYXRlZ3lDb250cm9s LT5uZXh0VmljdGltQnVmZmVyLAorCQkJCQkJCQkJCQkJCSAgICZjdXJyZW50LCB3cmFwcGVk KSkKKwkJCQkJCVN0cmF0ZWd5Q29udHJvbC0+Y29tcGxldGVQYXNzZXMrKzsKKwkJCQl9CiAJ CQl9CisKKwkJCVNwaW5Mb2NrUmVsZWFzZSgmU3RyYXRlZ3lDb250cm9sLT5idWZmZXJfc3Ry YXRlZ3lfbG9jayk7CiAJCX0KKworCQlNeUJhdGNoUG9zID0gc3RhcnQ7CisJCU15QmF0Y2hF bmQgPSBzdGFydCArIGJhdGNoX3NpemU7CiAJfQorCisJdmljdGltID0gTXlCYXRjaFBvcyAl IE5CdWZmZXJzOworCU15QmF0Y2hQb3MrKzsKKwogCXJldHVybiB2aWN0aW07CiB9CiAKQEAg LTQwOCw2ICs0NDIsMjQgQEAgU3RyYXRlZ3lDdGxTaG1lbUluaXQodm9pZCAqYXJnKQogCiAJ LyogTm8gcGVuZGluZyBub3RpZmljYXRpb24gKi8KIAlTdHJhdGVneUNvbnRyb2wtPmJnd3By b2NubyA9IC0xOworCisJLyoKKwkgKiBEZXRlcm1pbmUgdGhlIGVmZmVjdGl2ZSBjbG9jay1z d2VlcCBiYXRjaCBzaXplLgorCSAqCisJICogT24gbXVsdGktbm9kZSBOVU1BIHN5c3RlbXMs IGNsYWltaW5nIGJhdGNoZXMgb2YgYnVmZmVycyBmcm9tIHRoZSBzaGFyZWQKKwkgKiBjbG9j ayBoYW5kIHJlZHVjZXMgY3Jvc3Mtc29ja2V0IGNvbnRlbnRpb24gb24gdGhlIGF0b21pYyBj b3VudGVyLiAgT24KKwkgKiBzaW5nbGUtc29ja2V0IHN5c3RlbXMsIGJhdGNoaW5nIHByb3Zp ZGVzIG5vIGJlbmVmaXQgKHRoZSBhdG9taWMgaXMKKwkgKiBhbHJlYWR5IHNvY2tldC1sb2Nh bCkgYW5kIGp1c3QgY2F1c2VzIGJhY2tlbmRzIHRvIHNraXAgYnVmZmVycywgc28gd2UKKwkg KiB1c2UgYmF0Y2ggc2l6ZSAxIGZvciB0aGUgb3JpZ2luYWwgYmVoYXZpb3IuCisJICoKKwkg KiBwZ19udW1hX2luaXQoKSByZXR1cm5zIC0xIHdoZW4gTlVNQSBpcyB1bmF2YWlsYWJsZS4K KwkgKiBwZ19udW1hX2dldF9tYXhfbm9kZSgpIHJldHVybnMgMCBmb3IgYSBzaW5nbGUgTlVN QSBub2RlLgorCSAqLworCWlmIChwZ19udW1hX2luaXQoKSAhPSAtMSAmJiBwZ19udW1hX2dl dF9tYXhfbm9kZSgpID49IDEpCisJCUNsb2NrU3dlZXBCYXRjaFNpemUgPSBNaW4oQ0xPQ0tf U1dFRVBfQkFUQ0hfU0laRSwKKwkJCQkJCQkJICAodWludDMyKSBOQnVmZmVycyk7CisJZWxz ZQorCQlDbG9ja1N3ZWVwQmF0Y2hTaXplID0gMTsKIH0KIAogCi0tIAoyLjUwLjEgKEFwcGxl IEdpdC0xNTUpCgo= --e46b7057cc77e38ee9fd32d31c281a797874ffcd Content-Disposition: attachment; filename*0="v2-0002-Improve-clock-sweep-batch-sizing-with-CPU-aware-a.pat"; filename*1="ch" Content-Type: application/octet-stream; name="=?UTF-8?Q?v2-0002-Improve-clock-sweep-batch-sizing-with-CPU-aware-a.patc?= =?UTF-8?Q?h?=" Content-Transfer-Encoding: base64 RnJvbSBkY2U0NzNlMWEyNDZkODc3ZjBmMjMxM2JhN2EyOWE0ZDBlMzVkYzI3IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBHcmVnIEJ1cmQgPGdyZWdidXJkQGFtYXpvbi5jb20+ CkRhdGU6IE1vbiwgMjcgQXByIDIwMjYgMDg6MjU6NDAgLTA0MDAKU3ViamVjdDogW1BBVENI IHYyIDIvMl0gSW1wcm92ZSBjbG9jayBzd2VlcCBiYXRjaCBzaXppbmcgd2l0aCBDUFUtYXdh cmUKIGFsZ29yaXRobQpNSU1FLVZlcnNpb246IDEuMApDb250ZW50LVR5cGU6IHRleHQvcGxh aW47IGNoYXJzZXQ9VVRGLTgKQ29udGVudC1UcmFuc2Zlci1FbmNvZGluZzogOGJpdAoKUmVw bGFjZSBzaW1wbGUgTlVNQS1vbmx5IGJhdGNoIHNpemluZyB3aXRoIGEgdGllcmVkIGFwcHJv YWNoOgoKLSBOVU1BIHN5c3RlbXMgKG11bHRpLXNvY2tldCk6IGJhdGNoPTY0IChoaWdoIGlu dGVyY29ubmVjdCBsYXRlbmN5KQotIFNpbmdsZSBzb2NrZXQgPjE2IGNvcmVzOiBiYXRjaD0z MiAoTDMgY2FjaGUgY29udGVudGlvbikKLSBTaW5nbGUgc29ja2V0IDktMTYgY29yZXM6IGJh dGNoPTE2IChtb2RlcmF0ZSBjb250ZW50aW9uKQotIFNpbmdsZSBzb2NrZXQgNS04IGNvcmVz OiBiYXRjaD04IChsaWdodCBjb250ZW50aW9uKQotIFNpbmdsZSBzb2NrZXQg4omkNCBjb3Jl czogYmF0Y2g9MSAobm8gYmF0Y2hpbmcgb3ZlcmhlYWQpCgpBbHNvIGFkZHMgb3Zlci1jbGFp bWluZyBwcm90ZWN0aW9uOiBiYXRjaF9zaXplIMOXIE1heEJhY2tlbmRzIOKJpCBwb29sX3Np emUvMgp0byBlbnN1cmUgdG90YWwgY2xhaW1lZCBidWZmZXJzIHN0YXkgdW5kZXIgNTAlIG9m IHRoZSBwb29sLgoKVGhpcyBwcm92aWRlcyBhdG9taWMgY29udGVudGlvbiBiZW5lZml0cyBv biBsYXJnZSBzaW5nbGUtc29ja2V0IHN5c3RlbXMKd2hpbGUgbWFpbnRhaW5pbmcgdGhlIG9y aWdpbmFsIGJlaGF2aW9yIG9uIHNtYWxsIHN5c3RlbXMgd2hlcmUgZmFpcm5lc3MKbWF0dGVy cyBtb3JlIHRoYW4gdGhyb3VnaHB1dC4KCkF1dGhvcmVkLWJ5OiBHcmVnIEJ1cmQgPGdyZWdA YnVyZC5tZT4KLS0tCiBzcmMvYmFja2VuZC9zdG9yYWdlL2J1ZmZlci9mcmVlbGlzdC5jIHwg OTEgKysrKysrKysrKysrKysrKysrKysrKy0tLS0tCiAxIGZpbGUgY2hhbmdlZCwgNzcgaW5z ZXJ0aW9ucygrKSwgMTQgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2JhY2tlbmQv c3RvcmFnZS9idWZmZXIvZnJlZWxpc3QuYyBiL3NyYy9iYWNrZW5kL3N0b3JhZ2UvYnVmZmVy L2ZyZWVsaXN0LmMKaW5kZXggZTg2ZWQxZjdkYTAuLjQ3NmQ3ZjQyMGE1IDEwMDY0NAotLS0g YS9zcmMvYmFja2VuZC9zdG9yYWdlL2J1ZmZlci9mcmVlbGlzdC5jCisrKyBiL3NyYy9iYWNr ZW5kL3N0b3JhZ2UvYnVmZmVyL2ZyZWVsaXN0LmMKQEAgLTI0LDYgKzI0LDEwIEBACiAjaW5j bHVkZSAic3RvcmFnZS9zdWJzeXN0ZW1zLmgiCiAjaW5jbHVkZSAicG9ydC9wZ19udW1hLmgi CiAKKyNpZmRlZiBIQVZFX1VOSVNURF9ICisjaW5jbHVkZSA8dW5pc3RkLmg+CisjZW5kaWYK KwogI2RlZmluZSBJTlRfQUNDRVNTX09OQ0UodmFyKQkoKGludCkoKigodm9sYXRpbGUgaW50 ICopJih2YXIpKSkpCiAKIApAQCAtNjEsNiArNjUsOCBAQCBzdGF0aWMgQnVmZmVyU3RyYXRl Z3lDb250cm9sICpTdHJhdGVneUNvbnRyb2wgPSBOVUxMOwogCiBzdGF0aWMgdm9pZCBTdHJh dGVneUN0bFNobWVtUmVxdWVzdCh2b2lkICphcmcpOwogc3RhdGljIHZvaWQgU3RyYXRlZ3lD dGxTaG1lbUluaXQodm9pZCAqYXJnKTsKK3N0YXRpYyBpbnQJcGdfZ2V0X29ubGluZV9jcHVz KHZvaWQpOworc3RhdGljIHVpbnQzMiBDb21wdXRlQ2xvY2tCYXRjaFNpemUoaW50IHBvb2xf bmJ1ZmZlcnMpOwogCiBjb25zdCBTaG1lbUNhbGxiYWNrcyBTdHJhdGVneUN0bFNobWVtQ2Fs bGJhY2tzID0gewogCS5yZXF1ZXN0X2ZuID0gU3RyYXRlZ3lDdGxTaG1lbVJlcXVlc3QsCkBA IC00MTEsNiArNDE3LDY5IEBAIFN0cmF0ZWd5Tm90aWZ5QmdXcml0ZXIoaW50IGJnd3Byb2Nu bykKIAlTcGluTG9ja1JlbGVhc2UoJlN0cmF0ZWd5Q29udHJvbC0+YnVmZmVyX3N0cmF0ZWd5 X2xvY2spOwogfQogCisvKgorICogcGdfZ2V0X29ubGluZV9jcHVzIC0tIGdldCB0aGUgbnVt YmVyIG9mIG9ubGluZSBDUFUgY29yZXMKKyAqLworc3RhdGljIGludAorcGdfZ2V0X29ubGlu ZV9jcHVzKHZvaWQpCit7CisjaWZkZWYgX1NDX05QUk9DRVNTT1JTX09OTE4KKwlsb25nCQlu Y3B1cyA9IHN5c2NvbmYoX1NDX05QUk9DRVNTT1JTX09OTE4pOworCisJaWYgKG5jcHVzID4g MCkKKwkJcmV0dXJuIChpbnQpIG5jcHVzOworI2VuZGlmCisJLyogRmFsbGJhY2sgaWYgc3lz Y29uZiBpcyB1bmF2YWlsYWJsZSBvciBmYWlscyAqLworCXJldHVybiAxOworfQorCisvKgor ICogQ29tcHV0ZUNsb2NrQmF0Y2hTaXplIC0tIGNvbXB1dGUgdGhlIGVmZmVjdGl2ZSBjbG9j ay1zd2VlcCBiYXRjaCBzaXplCisgKgorICogVGhlIGZ1bmN0aW9uIGhhcyB0d28gcGhhc2Vz OiBzZWxlY3QgYSBiYXNlIGJhdGNoIGZyb20gaGFyZHdhcmUgdG9wb2xvZ3ksCisgKiB0aGVu IGNhcCBpdCB0byBwcmV2ZW50IG92ZXItY2xhaW1pbmcuCisgKgorICogUGhhc2UgMTogQmFz ZSBiYXRjaCBmcm9tIHRvcG9sb2d5CisgKiAtIE5VTUEgKG11bHRpLXNvY2tldCk6IGJhdGNo PTY0IChoaWdoIGNyb3NzLXNvY2tldCBsYXRlbmN5KQorICogLSA+MTYgY29yZXMsIHNpbmds ZSBzb2NrZXQ6IGJhdGNoPTMyIChMMyBjb250ZW50aW9uKQorICogLSA5LTE2IGNvcmVzOiBi YXRjaD0xNiAobW9kZXJhdGUgY29udGVudGlvbikKKyAqIC0gNS04IGNvcmVzOiBiYXRjaD04 IChsaWdodCBjb250ZW50aW9uKQorICogLSA8PTQgY29yZXM6IGJhdGNoPTEgKG5vIGJhdGNo aW5nIG92ZXJoZWFkKQorICoKKyAqIFBoYXNlIDI6IENhcCB0byBwcmV2ZW50IG92ZXItY2xh aW1pbmcKKyAqIC0gRW5zdXJlIGJhdGNoX3NpemUgKiBNYXhCYWNrZW5kcyA8PSBwb29sX25i dWZmZXJzIC8gMgorICogLSBLZWVwcyB0b3RhbCBjbGFpbXMgdW5kZXIgNTAlIG9mIHRoZSBw b29sCisgKi8KK3N0YXRpYyB1aW50MzIKK0NvbXB1dGVDbG9ja0JhdGNoU2l6ZShpbnQgcG9v bF9uYnVmZmVycykKK3sKKwlpbnQJCQluY3B1cyA9IHBnX2dldF9vbmxpbmVfY3B1cygpOwor CWludAkJCW51bWFfbm9kZXMgPSAocGdfbnVtYV9pbml0KCkgIT0gLTEpID8gcGdfbnVtYV9n ZXRfbWF4X25vZGUoKSArIDEgOiAxOworCXVpbnQzMgkJYmFzZV9iYXRjaDsKKwl1aW50MzIJ CW1heF9iYXRjaDsKKworCS8qIFBoYXNlIDE6IEJhc2UgYmF0Y2ggZnJvbSB0b3BvbG9neSAq LworCWlmIChudW1hX25vZGVzID4gMSkKKwkJYmFzZV9iYXRjaCA9IDY0OworCWVsc2UgaWYg KG5jcHVzID4gMTYpCisJCWJhc2VfYmF0Y2ggPSAzMjsKKwllbHNlIGlmIChuY3B1cyA+IDgp CisJCWJhc2VfYmF0Y2ggPSAxNjsKKwllbHNlIGlmIChuY3B1cyA+IDQpCisJCWJhc2VfYmF0 Y2ggPSA4OworCWVsc2UKKwkJYmFzZV9iYXRjaCA9IDE7CisKKwkvKiBQaGFzZSAyOiBDYXAg dG8gcHJldmVudCBvdmVyLWNsYWltaW5nICovCisJbWF4X2JhdGNoID0gKE1heEJhY2tlbmRz ID4gMCkKKwkJPyBwb29sX25idWZmZXJzIC8gKDIgKiBNYXhCYWNrZW5kcykKKwkJOiBwb29s X25idWZmZXJzIC8gMjAwOworCWlmIChtYXhfYmF0Y2ggPCAxKQorCQltYXhfYmF0Y2ggPSAx OworCisJcmV0dXJuIE1pbihiYXNlX2JhdGNoLCBNaW4obWF4X2JhdGNoLCAodWludDMyKSBw b29sX25idWZmZXJzKSk7Cit9CisKIAogLyoKICAqIFN0cmF0ZWd5Q3RsU2htZW1SZXF1ZXN0 IC0tIHJlcXVlc3Qgc2hhcmVkIG1lbW9yeSBmb3IgdGhlIGJ1ZmZlcgpAQCAtNDQ0LDIyICs1 MTMsMTYgQEAgU3RyYXRlZ3lDdGxTaG1lbUluaXQodm9pZCAqYXJnKQogCVN0cmF0ZWd5Q29u dHJvbC0+Ymd3cHJvY25vID0gLTE7CiAKIAkvKgotCSAqIERldGVybWluZSB0aGUgZWZmZWN0 aXZlIGNsb2NrLXN3ZWVwIGJhdGNoIHNpemUuCisJICogQ29tcHV0ZSB0aGUgZWZmZWN0aXZl IGNsb2NrLXN3ZWVwIGJhdGNoIHNpemUgYmFzZWQgb24gaGFyZHdhcmUKKwkgKiB0b3BvbG9n eS4KIAkgKgotCSAqIE9uIG11bHRpLW5vZGUgTlVNQSBzeXN0ZW1zLCBjbGFpbWluZyBiYXRj aGVzIG9mIGJ1ZmZlcnMgZnJvbSB0aGUgc2hhcmVkCi0JICogY2xvY2sgaGFuZCByZWR1Y2Vz IGNyb3NzLXNvY2tldCBjb250ZW50aW9uIG9uIHRoZSBhdG9taWMgY291bnRlci4gIE9uCi0J ICogc2luZ2xlLXNvY2tldCBzeXN0ZW1zLCBiYXRjaGluZyBwcm92aWRlcyBubyBiZW5lZml0 ICh0aGUgYXRvbWljIGlzCi0JICogYWxyZWFkeSBzb2NrZXQtbG9jYWwpIGFuZCBqdXN0IGNh dXNlcyBiYWNrZW5kcyB0byBza2lwIGJ1ZmZlcnMsIHNvIHdlCi0JICogdXNlIGJhdGNoIHNp emUgMSBmb3IgdGhlIG9yaWdpbmFsIGJlaGF2aW9yLgotCSAqCi0JICogcGdfbnVtYV9pbml0 KCkgcmV0dXJucyAtMSB3aGVuIE5VTUEgaXMgdW5hdmFpbGFibGUuCi0JICogcGdfbnVtYV9n ZXRfbWF4X25vZGUoKSByZXR1cm5zIDAgZm9yIGEgc2luZ2xlIE5VTUEgbm9kZS4KKwkgKiBU aGlzIHVzZXMgYSB0aWVyZWQgYXBwcm9hY2g6IGxhcmdlciBiYXRjaGVzIG9uIE5VTUEgc3lz dGVtcyBhbmQKKwkgKiBtYW55LWNvcmUgc2luZ2xlLXNvY2tldCBzeXN0ZW1zIHdoZXJlIGF0 b21pYyBjb250ZW50aW9uIGlzIGhpZ2gsCisJICogc21hbGxlciBiYXRjaGVzIG9yIG5vIGJh dGNoaW5nIG9uIGZldy1jb3JlIHN5c3RlbXMgd2hlcmUgZmFpcm5lc3MKKwkgKiBtYXR0ZXJz IG1vcmUuIFRoZSBiYXRjaCBzaXplIGlzIGFsc28gY2FwcGVkIHRvIHByZXZlbnQgb3Zlci1j bGFpbWluZworCSAqIHdoZW4gdGhlcmUgYXJlIG1hbnkgYmFja2VuZHMgcmVsYXRpdmUgdG8g dGhlIGJ1ZmZlciBwb29sIHNpemUuCiAJICovCi0JaWYgKHBnX251bWFfaW5pdCgpICE9IC0x ICYmIHBnX251bWFfZ2V0X21heF9ub2RlKCkgPj0gMSkKLQkJQ2xvY2tTd2VlcEJhdGNoU2l6 ZSA9IE1pbihDTE9DS19TV0VFUF9CQVRDSF9TSVpFLAotCQkJCQkJCQkgICh1aW50MzIpIE5C dWZmZXJzKTsKLQllbHNlCi0JCUNsb2NrU3dlZXBCYXRjaFNpemUgPSAxOworCUNsb2NrU3dl ZXBCYXRjaFNpemUgPSBDb21wdXRlQ2xvY2tCYXRjaFNpemUoTkJ1ZmZlcnMpOwogfQogCiAK LS0gCjIuNTAuMSAoQXBwbGUgR2l0LTE1NSkKCg== --e46b7057cc77e38ee9fd32d31c281a797874ffcd--