RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v2]

Xiaolong Peng xpeng at openjdk.org
Thu Sep 25 19:07:22 UTC 2025


> Shenandoah always allocates memory with heap lock, we have observed heavy heap lock contention on memory allocation path in performance analysis of some service in which we tried to adopt Shenandoah. This change is to propose an optimization for the code path of mutator memory allocation to improve heap lock contention, at vey high level, here is how it works:
> * ShenandoahFreeSet holds a N (default to 13) number of ShenandoahHeapRegion* which are used by mutator threads for regular object allocations, they are called shared regions/directly allocatable regions, which are stored in PaddedEnd data structure(padded array).
> * Each mutator thread will be assigned one of the directly allocatable regions, the thread will try to allocate in the directly allocatable region with CAS atomic operation, if fails will try 2 more consecutive  directly allocatable regions in the array storing directly allocatable region.
> * If mutator thread fails after trying 3 directly allocatable regions, it will:
>    * Take heap lock
>    * Try to retire the directly allocatable regions which are ready to retire.
>    *  Iterator mutator partition and allocate directly allocatable regions and store to the padded array if any need to be retired.
>    *  Satisfy mutator allocation request if possible.
> 
> 
> I'm not expecting significant performance impact for most of the cases since in most case the contention on heap lock it not high enough to cause performance issue, I have done many tests, here are some of them:
> 
> 1. Dacapo lusearch test on EC2 host with 96 CPU cores:
> Openjdk TIP:
> 
> [ec2-user at ip-172-31-42-91 jdk]$ ./master-jdk/bin/java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms4G -Xmx4G -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions  -XX:-ShenandoahUncommit -XX:ShenandoahGCMode=generational  -XX:+UseTLAB -jar ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar  -n 10 lusearch  | grep "metered full smoothing"
> ===== DaCapo tail latency, metered full smoothing: 50% 131684 usec, 90% 200192 usec, 99% 211369 usec, 99.9% 212517 usec, 99.99% 213043 usec, max 235289 usec, measured over 524288 events =====
> ===== DaCapo tail latency, metered full smoothing: 50% 1568 usec, 90% 36101 usec, 99% 42172 usec, 99.9% 42928 usec, 99.99% 43100 usec, max 43305 usec, measured over 524288 events =====
> ===== DaCapo tail latency, metered full smoothing: 50% 52644 usec, 90% 124393 usec, 99% 137711 usec, 99.9% 139355 usec, 99.99% 139749 usec, max 146722 usec, measured over 524288 events ====...

Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 131 commits:

 - Merge branch 'openjdk:master' into cas-alloc-1
 - Merge branch 'master' into cas-alloc-1
 - Move ShenandoahHeapRegionIterationClosure to shenandoahFreeSet.hpp
 - Merge branch 'openjdk:master' into cas-alloc-1
 - Fix errors caused by renaming ofAtomic to AtomicAccess
 - Merge branch 'openjdk:master' into cas-alloc-1
 - Remove unused flag
 - Merge branch 'openjdk:master' into cas-alloc-1
 - Merge branch 'cas-alloc-1' into cas-alloc
 - Merge branch 'cas-alloc-1' of https://github.com/pengxiaolong/jdk into cas-alloc-1
 - ... and 121 more: https://git.openjdk.org/jdk/compare/3c9fd768...666f2ef1

-------------

Changes: https://git.openjdk.org/jdk/pull/26171/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=01
  Stats: 735 lines in 16 files changed: 674 ins; 7 del; 54 mod
  Patch: https://git.openjdk.org/jdk/pull/26171.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171

PR: https://git.openjdk.org/jdk/pull/26171


More information about the hotspot-gc-dev mailing list