Feedback-ID: i675e48f3:Fastmail
MIME-Version: 1.0
Date: Sat, 25 Apr 2026 16:08:02 -0400
From: "Greg Burd" <greg@burd.me>
To: "PostgreSQL Hackers" <pgsql-hackers@lists.postgresql.org>
Cc: "Andres Freund" <andres@anarazel.de>, "Tomas Vondra" <tomas@vondra.me>,
 "Nathan Bossart" <nathandbossart@gmail.com>
Message-Id: <79629577-3ad8-4b1c-a469-ebc2cb4c5104@app.fastmail.com>
Subject: [PATCH] Batched clock sweep to reduce cross-socket atomic contention
Content-Type: multipart/mixed;
 boundary=17433b00b01b98fb89b021ca533c5f80a3c06626
Archived-At: 
 <https://www.postgresql.org/message-id/79629577-3ad8-4b1c-a469-ebc2cb4c5104%40app.fastmail.com>
Precedence: bulk

--17433b00b01b98fb89b021ca533c5f80a3c06626
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Hello hackers,

A colleague of mine, Jim Mlodgenski, has been poking at NUMA behavior on some of the newer AWS bare-metal instance types (r8i in particular, which exposes 6 NUMA nodes via SNC3 on a 2-socket box), and in the process landed on a very small change to freelist.c that I think is worth showing around.  His patch is attached with some tweaks of my own.

Full disclosure: the exploration that led Jim to this patch idea was done with help from an AI assistant (Kiro); the idea, the benchmarking, and the final shape of the patch are human-driven, but I wanted to be up front about how his investigation started.  Happy to discuss that separately if people want to.

The one-line summary: instead of advancing nextVictimBuffer one buffer at a time via pg_atomic_fetch_add_u32, each backend claims a batch of 64 consecutive buffer IDs from the shared hand and then iterates them privately.  Global sweep order is preserved -- every buffer is still
visited exactly once per complete pass -- but the atomic contention on that one cache line drops by roughly the batch size.


Why this matters
----------------

On multi-socket boxes under eviction pressure, every backend that needs a victim buffer ends up CAS'ing the same cache line.  On a single socket, a locked RMW on that cache line stays warm in L1/L2 and completes in ~20ns.  On 2+ sockets, the line bounces over QPI/UPI at ~100-200ns per op, and with hundreds of backends running StrategyGetBuffer() concurrently, the line ping-pongs constantly.  It's a textbook NUMA scalability bottleneck, and once shared_buffers is smaller than the working set and the sweep is running continuously, that single atomic is what you hit in a perf profile (elevated bus-cycles, cache-misses on the cache line holding nextVictimBuffer).

Andres pointed at the same spot in his pgconf.eu 2024 talk, and Tomas called it out in the "Adding basic NUMA awareness" thread [1] -- so this isn't news to anyone who's been looking at this area.  What I think is new is a fix that's just this, without any of the surrounding architectural change.

The framing (credit to Jim): the clock hand is doing two jobs.  It *coordinates* backends so they don't redundantly decrement usage_count on the same buffers and so they eventually visit every buffer in the pool exactly once per pass.  It also *serializes* access to the counter.  Coordination is the part we want.  Serialization is the part that's killing us on bigger NUMA boxes.  Batching keeps the coordination and thins out the serialization.


How it works
------------

Two per-backend statics, MyBatchPos and MyBatchEnd.  When a backend calls ClockSweepTick() and its local batch is exhausted, it does a single fetch-add of CLOCK_SWEEP_BATCH_SIZE (64) against nextVictimBuffer and now owns that range.  Subsequent ticks just bump the local counter.

Wraparound got a small rewrite.  The original code had the backend that crossed NBuffers drive completePasses++ under the spinlock via a CAS loop.  With batching, multiple backends can each land a fetch-add that returns a value >= NBuffers in the same pass, so the logic now is: whoever sees a start >= NBuffers takes the spinlock, re-reads the counter, and if it's still out of range does a single CAS to wrap it and bumps completePasses.  If somebody else already wrapped, we just release and move on.  StrategySyncStart() still sees a consistent (nextVictimBuffer, completePasses) pair.

The batch size is gated on whether we actually have multiple NUMA nodes.  On a single-socket box the atomic is already socket-local, batching just makes backends skip further ahead than they need to, so we fall back to batch size 1 -- which is bit-for-bit the original behavior.  The guard:

    if (pg_numa_init() != -1 && pg_numa_get_max_node() >= 1)
        ClockSweepBatchSize = Min(CLOCK_SWEEP_BATCH_SIZE, (uint32) NBuffers);
    else
        ClockSweepBatchSize = 1;

Min() against NBuffers covers the small-shared_buffers corner so a batch never wraps the pool multiple times in one claim.


Does batching mess up the meaning of usage_count?
--------------------------------------------------

Short answer: no.  I want to walk through this because it was my first concern too, and I think it's the question that will come up most on review.

The clock sweep's usage_count is an access-frequency approximation measured in units of *complete passes*.  A buffer with usage_count = N survives N passes without a re-pin.  The semantic meaning lives at pass granularity, not at individual-buffer granularity.

What batching changes: intra-pass temporal ordering.  Without batching, with N backends sweeping, decrements are interleaved -- backend A hits B[0], backend B hits B[1], backend C hits B[2].  With batching, backend A hits B[0..63] in a tight local burst, then backend B hits B[64..127], etc.  The 64-buffer chunks are decremented in bursts rather than individually.

Why it doesn't matter:

  1. Every buffer still gets decremented exactly once per complete
     pass.  The invariant the algorithm actually depends on is
     untouched.

  2. A buffer's survival window is the time between consecutive
     passes.  That's milliseconds to seconds under load.  Whether
     B[0] gets decremented 50us before or 50us after B[63] within
     the same pass is below the resolution of anything usage_count
     is trying to measure.

  3. The bgwriter's feedback loop reads (nextVictimBuffer,
     completePasses, numBufferAllocs) via StrategySyncStart() every
     ~200ms.  nextVictimBuffer still advances at the same *total*
     rate (64 per atomic op, but atomic ops happen 1/64 as often).
     The position it reports can jitter by up to 64 buffers relative
     to the one-at-a-time case, but BgBufferSync()'s smoothed
     estimates operate over thousands of buffers per cycle, so the
     jitter disappears into the averaging.  numBufferAllocs still
     increments once per allocation.  strategy_delta,
     smoothed_alloc, smoothed_density, reusable_buffers_est -- all
     unaffected in any way I can see.

Table form, because it's easier to argue with:

  Property                          | Unpatched      | Batched
  ----------------------------------+----------------+----------------
  Buffers visited per pass          | NBuffers       | NBuffers
  Decrements per buffer per pass    | 1              | 1
  Eviction threshold                | usage_count==0 | usage_count==0
  Max survival (passes)             | 6              | 6
  Decrement ordering within a pass  | interleaved    | chunked
  bgwriter allocation rate signal   | accurate       | accurate
  Cross-socket atomic traffic       | 1 per buffer   | 1 per 64

There is one subtle difference worth naming.  When a backend finds a victim at B[5] of its batch, it returns with MyBatchEnd still sitting at B[63].  The next time that backend needs a victim it resumes at B[6], not at wherever the global hand now points.  So the backend drains its batch over multiple StrategyGetBuffer() calls rather than all at once.  Under heavy load, where batches are consumed in microseconds, this is invisible.  Under light load, the implication is that some buffers can sit with slightly stale usage_count for longer than they would have before.  But "light load" means "the sweep is barely moving and nothing wants to evict anyway" -- so the effect
doesn't show up where it would hurt.

There's also a small positive side-effect: cache locality.  The backend that just touched BufferDescriptor[B[0]] has the adjacent descriptors warm in L1/L2.  Walking B[0..63] locally is cheaper than walking a striped interleaving where each descriptor was last touched by a different core.  I haven't tried to isolate this in perf, but it falls out naturally.


Benchmarks
----------

Jim ran these; I'm still working on reproducing them locally and will post independent numbers in a follow-up.  All bare metal, Linux, huge pages enabled throughout (more on that below), postmaster pinned to node 0 with `numactl --cpunodebind=0` because otherwise stock TPS varied from 31K to 40K depending on which node the postmaster happened to land on at launch -- worth flagging for anyone trying to reproduce.

Workload is pgbench scale 3000 (~45GB) with shared_buffers=32GB, so the working set always spills and the sweep is hot.

  r8i.metal-96xl (384 vCPUs, 2 sockets, 6 NUMA nodes via SNC3):

    pgbench RO:
      Clients   Stock    Patched   Delta
      64        31,457   36,353    +16%
      128       31,678   37,864    +20%
      256       31,510   37,558    +19%
      384       31,431   37,464    +19%
      512       31,329   37,040    +18%

    pgbench RW:
      Clients   Stock    Patched   Delta
      64         7,685    7,713     0%
      128       10,420   10,541    +1%
      256       12,393   12,463    +1%
      384       15,317   15,197    -1%
      512       17,930   17,978     0%

  m6i.metal (128 vCPUs, 2 sockets, Ice Lake):
    RO +19-20%, RW within noise.

  c8i.metal-48xl (192 vCPUs, 1 socket):
    Single-socket -> batch_size=1 -> original code path.  No
    behavioral change.  (I double-checked this one specifically
    because it's the sanity test for the gate.)

  HammerDB TPC-C on m6i.metal (1000 warehouses):
    VUs   Stock     Patched   Delta
    128   358,518   349,787    -2%
    256   332,098   330,272    -1%
    384   365,782   377,519    +3%
    512   370,663   386,526    +4%

No TPC-C regression, which was the thing we were most worried about. An earlier attempt (per-socket partitioned sweep, see below) was -13% on this same workload.

The general shape is: the scaling curve flattens later.  Unpatched, TPS tops out around 128 clients and stays flat up to 512 because backends are spending cycles waiting on the cache line rather than
doing work.  Patched, the curve keeps rising past the point where unpatched plateaus.

Huge pages caveat: all of the above was run with huge pages on, on large-memory instances (the r8i.96xl has 3TB, so Jim never considered running without them).  We have not characterized the non-huge-pages case.  That's on my list; I don't expect it to change the conclusion, but I shouldn't speak for data I haven't collected.


Relationship to Tomas's NUMA series
-----------------------------------

Tomas posted a multi-patch NUMA-awareness series in [1] covering buffer interleaving across nodes, partitioned freelists, partitioned clock sweep, PGPROC interleaving, and related pieces.  I want to be careful here because I don't think we should frame this patch as competing with that work.

One thing I found striking as I re-read the thread: in the benchmarks Tomas posted later in the series, *most of the benefit comes from partitioning the clock sweep*, and the NUMA memory-placement layer on top sometimes runs slower than partitioning alone.  His own conclusion, quoted roughly: the benefit mostly comes from just partitioning the clock sweep, and it's largely independent of the NUMA stuff; the NUMA partitioning is often slower.

That observation is the thing that makes me think batching is worth considering on its own.  It's going after the same bottleneck Tomas's partitioning addresses, but:

  - without splitting global eviction visibility (which is where
    cross-partition stealing gets complicated),
  - without requiring NUMA-aware buffer placement (which has huge
    page alignment, descriptor-partition-mid-page, and resize
    complications that are still being worked out in that thread),
  - without touching PGPROC or bgwriter.

What this patch does *not* do:
  - place buffers on specific NUMA nodes
  - partition the freelist
  - touch PGPROC
  - add new GUCs
  - change bgwriter

What this patch *does* do:
  - target exactly the clock-sweep contention that Tomas's
    partitioning targets, and reduce it by ~64x, in ~30 lines.

If Tomas's series lands in full, this patch becomes redundant for its primary use case (though even within a partitioned sweep, the per-partition atomic still benefits from batching, so it's arguably a useful primitive either way).  If Tomas's series lands incrementally over several cycles -- which the open items in that thread suggest is the realistic path -- this gets us a real chunk of the multi-socket win now.

This patch is also orthogonal to my earlier thread about removing the freelist entirely [2], but given the proximity to that code Jim agreed that I could propose/steward it here on the list for consideration.


Open questions / things I'd like feedback on
--------------------------------------------

- Batch size.  64 is a round number that worked well in testing, but
  Nathan raised the reasonable point that on small shared_buffers
  with high concurrency, a fixed 64 could be unfortunate.  Options:
  scale with shared_buffers (Min(64, NBuffers / N) for some N), scale
  with max_connections, keep it fixed but let operators tune it, or
  make it a function of NUMA node count.  I don't have a strong
  opinion yet; the Min(batch, NBuffers) cap covers the "obviously
  wrong" corner but doesn't speak to the "several hundred backends
  on a few-MB shared_buffers" shape.  Numbers/ideas/proposals welcome.

- NUMA detection.  The gate uses pg_numa_init() /
  pg_numa_get_max_node().  On systems where libnuma isn't available,
  or where get_mempolicy is blocked (some container configurations),
  we fall back to batch size 1.  That's safe but it misses the
  "single socket, many cores, still benefits from fewer atomics"
  case.  Might be worth a way to force-enable, or batching on all
  systems with a smaller batch size when single-socket.  I'd like to
  measure before deciding.

- Eviction pattern on reads.  Nathan also flagged that with batching,
  the buffers a backend ends up pinning in one StrategyGetBuffer()
  call will tend to be contiguous in buffer-id space rather than
  scattered, which is a different allocation pattern than today.
  The usage_count analysis above says this is benign, but if anyone
  has an intuition for a workload where this would be observable
  (e.g., something that cares about the mapping between buffer-id
  and relation locality), I'd like to hear it.

- nextVictimBuffer wraparound.  The current code has a mild overflow
  concern papered over with "highly unlikely and wouldn't be
  particularly harmful".  With batching this is no worse than before,
  but if we're already touching this function, it might be worth
  thinking about whether to tighten it up in the same patch or a
  follow-up.

- Should the non-NUMA value for this be derived from core counts that
  imply L1/L2 cache layouts or simply default to 8 rather than 1 to
  realize some benefit?

- Should there be a postgresql.conf setting for this that takes
  precedence?


I'll run the non-huge-pages variant, reproduce the r8i numbers, poke at the small-shared_buffers corner, and post perf stat output showing the atomic/cache-miss deltas over the next few days.  In the meantime, eyeballs and skepticism welcome -- I would especially welcome comments from Andres, who's been in this code recently, and from Tomas, whose series has the most overlap.

I realize that we're past feature freeze and working on release notes for v19, so the chances of merging this are slim to none.  I think this could be considered a "performance bug fix for NUMA systems" in this release, but that is stretching it a bit.  It is a big ask at this stage to land a change like this.

best.

-greg

[1] https://www.postgresql.org/message-id/099b9433-2855-4f1b-b421-d078a5d82017@vondra.me
[2] https://www.postgresql.org/message-id/f0e3c02e-e217-4f04-8dab-1e7e80a228c0@burd.me
--17433b00b01b98fb89b021ca533c5f80a3c06626
Content-Disposition: attachment;
	filename*0="v1-0001-Reduce-clock-sweep-atomic-contention-by-claiming-.pat";
	filename*1="ch"
Content-Type: application/octet-stream;
	name="=?UTF-8?Q?v1-0001-Reduce-clock-sweep-atomic-contention-by-claiming-.patc?=
 =?UTF-8?Q?h?="
Content-Transfer-Encoding: base64

RnJvbSBiZGNmOTBmYmQ4OWEwYWVjMzk3YTNkNTcyMjRhZTczMjk1OTczM2Y5IE1vbiBTZXAg
MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBHcmVnIEJ1cmQgPGdyZWdAYnVyZC5tZT4KRGF0ZTog
U2F0LCAyNSBBcHIgMjAyNiAxNTo1MjozNiAtMDQwMApTdWJqZWN0OiBbUEFUQ0ggdjFdIFJl
ZHVjZSBjbG9jay1zd2VlcCBhdG9taWMgY29udGVudGlvbiBieSBjbGFpbWluZyBidWZmZXJz
CiBpbiBiYXRjaGVzCgpTdHJhdGVneUdldEJ1ZmZlcigpIGFkdmFuY2VzIG5leHRWaWN0aW1C
dWZmZXIgdmlhCnBnX2F0b21pY19mZXRjaF9hZGRfdTMyKC4uLiwgMSkgb24gZXZlcnkgdGlj
ay4gIE9uIG11bHRpLXNvY2tldApzeXN0ZW1zIHRoZSBjYWNoZSBsaW5lIGhvbGRpbmcgdGhl
IGNvdW50ZXIgaGFzIHRvIHRyYXZlbCBvdmVyIHRoZQppbnRlcmNvbm5lY3Qgb24gZWFjaCBv
cGVyYXRpb24sIHB1c2hpbmcgYSBzd2VlcCB0aWNrIGZyb20gfjIwbnMgKHRoZQpzYW1lLXNv
Y2tldCBjYXNlKSBpbnRvIHRoZSB+MTAwLTIwMG5zIHJhbmdlLiAgV2l0aCBodW5kcmVkcyBv
Zgpjb25jdXJyZW50IGJhY2tlbmRzIHVuZGVyIGV2aWN0aW9uIHByZXNzdXJlLCB0aGF0IG9u
ZSBjYWNoZSBsaW5lCmJlY29tZXMgdGhlIGRvbWluYW50IGNvc3QgaW4gdGhlIHN3ZWVwLCB2
aXNpYmxlIGFzIGVsZXZhdGVkCmJ1cy1jeWNsZXMgYW5kIGNhY2hlLW1pc3NlcyBpbiBwZXJm
IHByb2ZpbGVzLgoKRWFjaCBiYWNrZW5kIG5vdyBjbGFpbXMgYSByYW5nZSBvZiBDTE9DS19T
V0VFUF9CQVRDSF9TSVpFICg2NCkKY29uc2VjdXRpdmUgYnVmZmVyIElEcyB3aXRoIGEgc2lu
Z2xlIGZldGNoLWFkZCBhbmQgaXRlcmF0ZXMgdGhyb3VnaAp0aGVtIHByaXZhdGVseS4gIFRo
ZSBzd2VlcCBzdGlsbCBhZHZhbmNlcyB0aHJvdWdoIHRoZSBwb29sIGluIG9yZGVyLAplYWNo
IGJ1ZmZlciBpcyBzdGlsbCB2aXNpdGVkIGV4YWN0bHkgb25jZSBwZXIgY29tcGxldGUgcGFz
cywgYW5kCnVzYWdlX2NvdW50IGlzIHN0aWxsIGRlY3JlbWVudGVkIGV4YWN0bHkgb25jZSBw
ZXIgYnVmZmVyIHBlciBwYXNzOwp0aGUgbWVhbmluZyBvZiB1c2FnZV9jb3VudCBhcyAiaG93
IG1hbnkgY29tcGxldGUgcGFzc2VzIGEgYnVmZmVyCnN1cnZpdmVzIHdpdGhvdXQgYSByZS1w
aW4iIGlzIHByZXNlcnZlZC4gIFdoYXQgY2hhbmdlcyBpcyB0aGUKdGVtcG9yYWwgb3JkZXJp
bmcgb2YgZGVjcmVtZW50cyB3aXRoaW4gYSBzaW5nbGUgcGFzcywgd2hpY2ggdGhlCmFsZ29y
aXRobSBkb2VzIG5vdCBkZXBlbmQgb24uCgpXcmFwYXJvdW5kIGhhbmRsaW5nIGlzIGFkanVz
dGVkOiB3aXRoIGJhdGNoaW5nLCBtdWx0aXBsZSBiYWNrZW5kcwpjYW4gZWFjaCBzZWUgdGhl
aXIgZmV0Y2gtYWRkIHJldHVybiBhIHZhbHVlIHBhc3QgTkJ1ZmZlcnMgd2l0aGluCnRoZSBz
YW1lIHBhc3MuICBBbnkgc3VjaCBiYWNrZW5kIHRha2VzIGJ1ZmZlcl9zdHJhdGVneV9sb2Nr
LApyZS1yZWFkcyB0aGUgY291bnRlciwgYW5kIGlmIGl0IGlzIHN0aWxsIG91dCBvZiByYW5n
ZSB3cmFwcyBpdCB3aXRoCmEgc2luZ2xlIENBUyBhbmQgaW5jcmVtZW50cyBjb21wbGV0ZVBh
c3Nlcy4gIFN0cmF0ZWd5U3luY1N0YXJ0KCkKY29udGludWVzIHRvIHNlZSBhIGNvbnNpc3Rl
bnQgKG5leHRWaWN0aW1CdWZmZXIsIGNvbXBsZXRlUGFzc2VzKQpwYWlyLgoKQmF0Y2hpbmcg
aXMgb25seSB1c2VmdWwgd2hlbiB0aGUgYXRvbWljIGlzIGFjdHVhbGx5IGNvbnRlbmRlZAph
Y3Jvc3Mgbm9kZXMsIHNvIGl0IGlzIGFwcGxpZWQgb25seSB3aGVuIGxpYm51bWEgcmVwb3J0
cyBtb3JlIHRoYW4Kb25lIG5vZGUgKHBnX251bWFfZ2V0X21heF9ub2RlKCkgPj0gMSk7IG90
aGVyd2lzZSB0aGUgYmF0Y2ggc2l6ZQpzdGF5cyBhdCAxIGFuZCB0aGUgY29kZSBwYXRoIG1h
dGNoZXMgbWFzdGVyIGJpdC1mb3ItYml0LiAgVGhlIGJhdGNoCmlzIGFsc28gY2FwcGVkIGF0
IE5CdWZmZXJzIHNvIGEgY2xhaW0gY2Fubm90IHdyYXAgdGhlIHBvb2wgbW9yZQp0aGFuIG9u
Y2UuCgpDby1BdXRob3JlZC1ieTogSmltIE1sb2RnZW5za2kgPG1sb2RqQGFtYXpvbi5jb20+
CkNvLUF1dGhvcmVkLWJ5OiBHcmVnIEJ1cmQgPGdyZWdAYnVyZC5tZT4KLS0tCiBzcmMvYmFj
a2VuZC9zdG9yYWdlL2J1ZmZlci9mcmVlbGlzdC5jIHwgMTM2ICsrKysrKysrKysrKysrKysr
Ky0tLS0tLS0tCiAxIGZpbGUgY2hhbmdlZCwgOTQgaW5zZXJ0aW9ucygrKSwgNDIgZGVsZXRp
b25zKC0pCgpkaWZmIC0tZ2l0IGEvc3JjL2JhY2tlbmQvc3RvcmFnZS9idWZmZXIvZnJlZWxp
c3QuYyBiL3NyYy9iYWNrZW5kL3N0b3JhZ2UvYnVmZmVyL2ZyZWVsaXN0LmMKaW5kZXggZmRi
NWJhZDc5MTAuLmU4NmVkMWY3ZGEwIDEwMDY0NAotLS0gYS9zcmMvYmFja2VuZC9zdG9yYWdl
L2J1ZmZlci9mcmVlbGlzdC5jCisrKyBiL3NyYy9iYWNrZW5kL3N0b3JhZ2UvYnVmZmVyL2Zy
ZWVsaXN0LmMKQEAgLTIyLDYgKzIyLDcgQEAKICNpbmNsdWRlICJzdG9yYWdlL3Byb2MuaCIK
ICNpbmNsdWRlICJzdG9yYWdlL3NobWVtLmgiCiAjaW5jbHVkZSAic3RvcmFnZS9zdWJzeXN0
ZW1zLmgiCisjaW5jbHVkZSAicG9ydC9wZ19udW1hLmgiCiAKICNkZWZpbmUgSU5UX0FDQ0VT
U19PTkNFKHZhcikJKChpbnQpKCooKHZvbGF0aWxlIGludCAqKSYodmFyKSkpKQogCkBAIC0x
MDAsNjggKzEwMSwxMDEgQEAgc3RhdGljIEJ1ZmZlckRlc2MgKkdldEJ1ZmZlckZyb21SaW5n
KEJ1ZmZlckFjY2Vzc1N0cmF0ZWd5IHN0cmF0ZWd5LAogc3RhdGljIHZvaWQgQWRkQnVmZmVy
VG9SaW5nKEJ1ZmZlckFjY2Vzc1N0cmF0ZWd5IHN0cmF0ZWd5LAogCQkJCQkJCUJ1ZmZlckRl
c2MgKmJ1Zik7CiAKKy8qCisgKiBOdW1iZXIgb2YgYnVmZmVyIElEcyB0byBjbGFpbSBmcm9t
IHRoZSBzaGFyZWQgY2xvY2sgaGFuZCBhdCBvbmNlLgorICogTGFyZ2VyIHZhbHVlcyByZWR1
Y2UgY29udGVudGlvbiBvbiB0aGUgc2hhcmVkIGF0b21pYy4gIFdpdGggYSBiYXRjaAorICog
c2l6ZSBvZiA2NCwgY29uY3VycmVudCBiYWNrZW5kcyBzd2VlcCBub24tb3ZlcmxhcHBpbmcg
Y2h1bmtzIG9mIDY0CisgKiBidWZmZXJzIHJhdGhlciB0aGFuIGludGVybGVhdmluZyBvbmUg
YnVmZmVyIGF0IGEgdGltZS4gIFRoZSBnbG9iYWwKKyAqIHN3ZWVwIG9yZGVyIGlzIHByZXNl
cnZlZCDigJQgZWFjaCBidWZmZXIgaXMgc3RpbGwgdmlzaXRlZCBleGFjdGx5IG9uY2UKKyAq
IHBlciBjb21wbGV0ZSBwYXNzLgorICovCisjZGVmaW5lIENMT0NLX1NXRUVQX0JBVENIX1NJ
WkUgNjQKKworLyoKKyAqIFBlci1iYWNrZW5kIHN0YXRlIGZvciBiYXRjaGVkIGNsb2NrIHN3
ZWVwLgorICovCitzdGF0aWMgdWludDMyIE15QmF0Y2hQb3MgPSAwOwkvKiBuZXh0IGJ1ZmZl
ciB3aXRoaW4gYmF0Y2ggKi8KK3N0YXRpYyB1aW50MzIgTXlCYXRjaEVuZCA9IDA7CS8qIG9u
ZSBwYXN0IGxhc3QgYnVmZmVyIGluIGJhdGNoICovCisKKy8qCisgKiBFZmZlY3RpdmUgYmF0
Y2ggc2l6ZSBmb3IgdGhlIGNsb2NrIHN3ZWVwLCBjb21wdXRlZCBvbmNlIGF0IHN0YXJ0dXAu
CisgKiBPbiBub24tTlVNQSBzeXN0ZW1zIChzaW5nbGUgc29ja2V0LCBubyBsaWJudW1hLCBv
ciBjb250YWluZXJzIGJsb2NraW5nCisgKiBnZXRfbWVtcG9saWN5KSwgdGhpcyBpcyAxIC0t
IHRoZSBvcmlnaW5hbCBvbmUtYXQtYS10aW1lIGJlaGF2aW9yLgorICogT24gbXVsdGktbm9k
ZSBOVU1BIHN5c3RlbXMsIHRoaXMgaXMgTWluKENMT0NLX1NXRUVQX0JBVENIX1NJWkUsIE5C
dWZmZXJzKQorICogdG8gcmVkdWNlIGNyb3NzLXNvY2tldCBhdG9taWMgY29udGVudGlvbiBv
biBuZXh0VmljdGltQnVmZmVyLgorICovCitzdGF0aWMgdWludDMyIENsb2NrU3dlZXBCYXRj
aFNpemUgPSAxOworCitzdGF0aWMgaW5saW5lIHVpbnQzMgorRWZmZWN0aXZlQmF0Y2hTaXpl
KHZvaWQpCit7CisJcmV0dXJuIENsb2NrU3dlZXBCYXRjaFNpemU7Cit9CisKIC8qCiAgKiBD
bG9ja1N3ZWVwVGljayAtIEhlbHBlciByb3V0aW5lIGZvciBTdHJhdGVneUdldEJ1ZmZlcigp
CiAgKgotICogTW92ZSB0aGUgY2xvY2sgaGFuZCBvbmUgYnVmZmVyIGFoZWFkIG9mIGl0cyBj
dXJyZW50IHBvc2l0aW9uIGFuZCByZXR1cm4gdGhlCi0gKiBpZCBvZiB0aGUgYnVmZmVyIG5v
dyB1bmRlciB0aGUgaGFuZC4KKyAqIFJldHVybiB0aGUgbmV4dCBidWZmZXIgdG8gY29uc2lk
ZXIgZm9yIGV2aWN0aW9uLiAgQmFja2VuZHMgY2xhaW0gYmF0Y2hlcworICogb2YgY29uc2Vj
dXRpdmUgYnVmZmVyIElEcyBmcm9tIHRoZSBzaGFyZWQgY2xvY2sgaGFuZCwgdGhlbiBpdGVy
YXRlIHRocm91Z2gKKyAqIHRoZW0gbG9jYWxseSB3aXRob3V0IGZ1cnRoZXIgYXRvbWljIG9w
ZXJhdGlvbnMuICBUaGlzIHByZXNlcnZlcyB0aGUgZ2xvYmFsCisgKiBzd2VlcCBvcmRlciB3
aGlsZSByZWR1Y2luZyBjcm9zcy1zb2NrZXQgY29udGVudGlvbiBvbiB0aGUgc2hhcmVkIGNv
dW50ZXIuCiAgKi8KIHN0YXRpYyBpbmxpbmUgdWludDMyCiBDbG9ja1N3ZWVwVGljayh2b2lk
KQogewogCXVpbnQzMgkJdmljdGltOwogCi0JLyoKLQkgKiBBdG9taWNhbGx5IG1vdmUgaGFu
ZCBhaGVhZCBvbmUgYnVmZmVyIC0gaWYgdGhlcmUncyBzZXZlcmFsIHByb2Nlc3NlcwotCSAq
IGRvaW5nIHRoaXMsIHRoaXMgY2FuIGxlYWQgdG8gYnVmZmVycyBiZWluZyByZXR1cm5lZCBz
bGlnaHRseSBvdXQgb2YKLQkgKiBhcHBhcmVudCBvcmRlci4KLQkgKi8KLQl2aWN0aW0gPQot
CQlwZ19hdG9taWNfZmV0Y2hfYWRkX3UzMigmU3RyYXRlZ3lDb250cm9sLT5uZXh0VmljdGlt
QnVmZmVyLCAxKTsKLQotCWlmICh2aWN0aW0gPj0gTkJ1ZmZlcnMpCisJaWYgKE15QmF0Y2hQ
b3MgPj0gTXlCYXRjaEVuZCkKIAl7Ci0JCXVpbnQzMgkJb3JpZ2luYWxWaWN0aW0gPSB2aWN0
aW07Ci0KLQkJLyogYWx3YXlzIHdyYXAgd2hhdCB3ZSBsb29rIHVwIGluIEJ1ZmZlckRlc2Ny
aXB0b3JzICovCi0JCXZpY3RpbSA9IHZpY3RpbSAlIE5CdWZmZXJzOwotCiAJCS8qCi0JCSAq
IElmIHdlJ3JlIHRoZSBvbmUgdGhhdCBqdXN0IGNhdXNlZCBhIHdyYXBhcm91bmQsIGZvcmNl
Ci0JCSAqIGNvbXBsZXRlUGFzc2VzIHRvIGJlIGluY3JlbWVudGVkIHdoaWxlIGhvbGRpbmcg
dGhlIHNwaW5sb2NrLiBXZQotCQkgKiBuZWVkIHRoZSBzcGlubG9jayBzbyBTdHJhdGVneVN5
bmNTdGFydCgpIGNhbiByZXR1cm4gYSBjb25zaXN0ZW50Ci0JCSAqIHZhbHVlIGNvbnNpc3Rp
bmcgb2YgbmV4dFZpY3RpbUJ1ZmZlciBhbmQgY29tcGxldGVQYXNzZXMuCisJCSAqIENsYWlt
IGEgbmV3IGJhdGNoIGZyb20gdGhlIHNoYXJlZCBjbG9jayBoYW5kLiAgVGhpcyBpcyB0aGUg
b25seQorCQkgKiBhdG9taWMgb3BlcmF0aW9uIHBlciBiYXRjaCwgcmVkdWNpbmcgY29udGVu
dGlvbiBieSB0aGUgYmF0Y2ggc2l6ZS4KIAkJICovCi0JCWlmICh2aWN0aW0gPT0gMCkKKwkJ
dWludDMyCQlzdGFydDsKKwkJdWludDMyCQliYXRjaF9zaXplID0gRWZmZWN0aXZlQmF0Y2hT
aXplKCk7CisKKwkJc3RhcnQgPSBwZ19hdG9taWNfZmV0Y2hfYWRkX3UzMigmU3RyYXRlZ3lD
b250cm9sLT5uZXh0VmljdGltQnVmZmVyLAorCQkJCQkJCQkJCWJhdGNoX3NpemUpOworCisJ
CWlmIChzdGFydCA+PSAodWludDMyKSBOQnVmZmVycykKIAkJewotCQkJdWludDMyCQlleHBl
Y3RlZDsKLQkJCXVpbnQzMgkJd3JhcHBlZDsKLQkJCWJvb2wJCXN1Y2Nlc3MgPSBmYWxzZTsK
KwkJCXN0YXJ0ID0gc3RhcnQgJSBOQnVmZmVyczsKIAotCQkJZXhwZWN0ZWQgPSBvcmlnaW5h
bFZpY3RpbSArIDE7CisJCQkvKgorCQkJICogSWYgdGhlIGNvdW50ZXIgaGFzIGdyb3duIGJl
eW9uZCBOQnVmZmVycywgdHJ5IHRvIHdyYXAgaXQgYmFjay4KKwkJCSAqIFdlIG11c3QgaG9s
ZCB0aGUgc3BpbmxvY2sgc28gU3RyYXRlZ3lTeW5jU3RhcnQoKSBjYW4gcmVhZAorCQkJICog
bmV4dFZpY3RpbUJ1ZmZlciBhbmQgY29tcGxldGVQYXNzZXMgY29uc2lzdGVudGx5LgorCQkJ
ICoKKwkJCSAqIE11bHRpcGxlIGJhY2tlbmRzIG1heSBlbnRlciB0aGlzIHNlY3Rpb24gY29u
Y3VycmVudGx5LiBBZnRlcgorCQkJICogYWNxdWlyaW5nIHRoZSBzcGlubG9jaywgcmUtcmVh
ZCB0aGUgY291bnRlcjogaWYgYW5vdGhlciBiYWNrZW5kCisJCQkgKiBhbHJlYWR5IHdyYXBw
ZWQgaXQgYmVsb3cgTkJ1ZmZlcnMsIHdlJ3JlIGRvbmUuCisJCQkgKi8KKwkJCVNwaW5Mb2Nr
QWNxdWlyZSgmU3RyYXRlZ3lDb250cm9sLT5idWZmZXJfc3RyYXRlZ3lfbG9jayk7CiAKLQkJ
CXdoaWxlICghc3VjY2VzcykKIAkJCXsKLQkJCQkvKgotCQkJCSAqIEFjcXVpcmUgdGhlIHNw
aW5sb2NrIHdoaWxlIGluY3JlYXNpbmcgY29tcGxldGVQYXNzZXMuIFRoYXQKLQkJCQkgKiBh
bGxvd3Mgb3RoZXIgcmVhZGVycyB0byByZWFkIG5leHRWaWN0aW1CdWZmZXIgYW5kCi0JCQkJ
ICogY29tcGxldGVQYXNzZXMgaW4gYSBjb25zaXN0ZW50IG1hbm5lciB3aGljaCBpcyByZXF1
aXJlZCBmb3IKLQkJCQkgKiBTdHJhdGVneVN5bmNTdGFydCgpLiAgSW4gdGhlb3J5IGRlbGF5
aW5nIHRoZSBpbmNyZW1lbnQKLQkJCQkgKiBjb3VsZCBsZWFkIHRvIGFuIG92ZXJmbG93IG9m
IG5leHRWaWN0aW1CdWZmZXJzLCBidXQgdGhhdCdzCi0JCQkJICogaGlnaGx5IHVubGlrZWx5
IGFuZCB3b3VsZG4ndCBiZSBwYXJ0aWN1bGFybHkgaGFybWZ1bC4KLQkJCQkgKi8KLQkJCQlT
cGluTG9ja0FjcXVpcmUoJlN0cmF0ZWd5Q29udHJvbC0+YnVmZmVyX3N0cmF0ZWd5X2xvY2sp
OwotCi0JCQkJd3JhcHBlZCA9IGV4cGVjdGVkICUgTkJ1ZmZlcnM7CisJCQkJdWludDMyCQlj
dXJyZW50OworCQkJCXVpbnQzMgkJd3JhcHBlZDsKIAotCQkJCXN1Y2Nlc3MgPSBwZ19hdG9t
aWNfY29tcGFyZV9leGNoYW5nZV91MzIoJlN0cmF0ZWd5Q29udHJvbC0+bmV4dFZpY3RpbUJ1
ZmZlciwKLQkJCQkJCQkJCQkJCQkJICZleHBlY3RlZCwgd3JhcHBlZCk7Ci0JCQkJaWYgKHN1
Y2Nlc3MpCi0JCQkJCVN0cmF0ZWd5Q29udHJvbC0+Y29tcGxldGVQYXNzZXMrKzsKLQkJCQlT
cGluTG9ja1JlbGVhc2UoJlN0cmF0ZWd5Q29udHJvbC0+YnVmZmVyX3N0cmF0ZWd5X2xvY2sp
OworCQkJCWN1cnJlbnQgPSBwZ19hdG9taWNfcmVhZF91MzIoJlN0cmF0ZWd5Q29udHJvbC0+
bmV4dFZpY3RpbUJ1ZmZlcik7CisJCQkJaWYgKGN1cnJlbnQgPj0gKHVpbnQzMikgTkJ1ZmZl
cnMpCisJCQkJeworCQkJCQl3cmFwcGVkID0gY3VycmVudCAlIE5CdWZmZXJzOworCQkJCQlp
ZiAocGdfYXRvbWljX2NvbXBhcmVfZXhjaGFuZ2VfdTMyKCZTdHJhdGVneUNvbnRyb2wtPm5l
eHRWaWN0aW1CdWZmZXIsCisJCQkJCQkJCQkJCQkJICAgJmN1cnJlbnQsIHdyYXBwZWQpKQor
CQkJCQkJU3RyYXRlZ3lDb250cm9sLT5jb21wbGV0ZVBhc3NlcysrOworCQkJCX0KIAkJCX0K
KworCQkJU3BpbkxvY2tSZWxlYXNlKCZTdHJhdGVneUNvbnRyb2wtPmJ1ZmZlcl9zdHJhdGVn
eV9sb2NrKTsKIAkJfQorCisJCU15QmF0Y2hQb3MgPSBzdGFydDsKKwkJTXlCYXRjaEVuZCA9
IHN0YXJ0ICsgYmF0Y2hfc2l6ZTsKIAl9CisKKwl2aWN0aW0gPSBNeUJhdGNoUG9zICUgTkJ1
ZmZlcnM7CisJTXlCYXRjaFBvcysrOworCiAJcmV0dXJuIHZpY3RpbTsKIH0KIApAQCAtNDA4
LDYgKzQ0MiwyNCBAQCBTdHJhdGVneUN0bFNobWVtSW5pdCh2b2lkICphcmcpCiAKIAkvKiBO
byBwZW5kaW5nIG5vdGlmaWNhdGlvbiAqLwogCVN0cmF0ZWd5Q29udHJvbC0+Ymd3cHJvY25v
ID0gLTE7CisKKwkvKgorCSAqIERldGVybWluZSB0aGUgZWZmZWN0aXZlIGNsb2NrLXN3ZWVw
IGJhdGNoIHNpemUuCisJICoKKwkgKiBPbiBtdWx0aS1ub2RlIE5VTUEgc3lzdGVtcywgY2xh
aW1pbmcgYmF0Y2hlcyBvZiBidWZmZXJzIGZyb20gdGhlIHNoYXJlZAorCSAqIGNsb2NrIGhh
bmQgcmVkdWNlcyBjcm9zcy1zb2NrZXQgY29udGVudGlvbiBvbiB0aGUgYXRvbWljIGNvdW50
ZXIuICBPbgorCSAqIHNpbmdsZS1zb2NrZXQgc3lzdGVtcywgYmF0Y2hpbmcgcHJvdmlkZXMg
bm8gYmVuZWZpdCAodGhlIGF0b21pYyBpcworCSAqIGFscmVhZHkgc29ja2V0LWxvY2FsKSBh
bmQganVzdCBjYXVzZXMgYmFja2VuZHMgdG8gc2tpcCBidWZmZXJzLCBzbyB3ZQorCSAqIHVz
ZSBiYXRjaCBzaXplIDEgZm9yIHRoZSBvcmlnaW5hbCBiZWhhdmlvci4KKwkgKgorCSAqIHBn
X251bWFfaW5pdCgpIHJldHVybnMgLTEgd2hlbiBOVU1BIGlzIHVuYXZhaWxhYmxlLgorCSAq
IHBnX251bWFfZ2V0X21heF9ub2RlKCkgcmV0dXJucyAwIGZvciBhIHNpbmdsZSBOVU1BIG5v
ZGUuCisJICovCisJaWYgKHBnX251bWFfaW5pdCgpICE9IC0xICYmIHBnX251bWFfZ2V0X21h
eF9ub2RlKCkgPj0gMSkKKwkJQ2xvY2tTd2VlcEJhdGNoU2l6ZSA9IE1pbihDTE9DS19TV0VF
UF9CQVRDSF9TSVpFLAorCQkJCQkJCQkgICh1aW50MzIpIE5CdWZmZXJzKTsKKwllbHNlCisJ
CUNsb2NrU3dlZXBCYXRjaFNpemUgPSAxOwogfQogCiAKLS0gCjIuNTAuMSAoQXBwbGUgR2l0
LTE1NSkKCg==

--17433b00b01b98fb89b021ca533c5f80a3c06626--