RFR: 8374045: Add support to run benchmarking with fragmented CodeCache

Chad Rakoczy duke at openjdk.org
Thu Jan 29 20:36:48 UTC 2026


On Fri, 19 Dec 2025 20:08:20 GMT, Chad Rakoczy <duke at openjdk.org> wrote:

> [JDK-8374045](https://bugs.openjdk.org/browse/JDK-8374045)
> 
> This PR adds a new utility tool CodeCacheFragmenter to help with testing HotSpot code cache fragmentation scenarios. The tool is a Java agent that uses the WhiteBox API to create and randomly free dummy code blobs in the NonProfiled code heap to achieve a specified fill percentage. It includes configurable parameters for blob sizes, target fill percentage (0-100%), and random seeding to enable reproducible fragmentation patterns. The utility is built via a standard Makefile and produces `codecachefragmenter.jar` which can be used as a Java agent with `-javaagent` flag. This tool is intended for performance testing and experimentation with code cache behavior under various fragmentation conditions.
> 
> This is useful to show the performance benefits of [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205)
> 
> With the same amount of free code cache memory, adding fragmentation increases execution time of Renaissance Dotty by ~5x. See https://github.com/openjdk/jdk/pull/28934#issuecomment-3820151693 for more details.

Performance results for 3 runs. All runs have 64m of usable code cache for C2 code.

1. NonProfiledCodeHeapSize=64m with no fragmentation. 
2. NonProfiledCodeHeapSize=128m with half (64m) filled up with dummy blobs
3. NonProfiledCodeHeapSize=112m with half (56m) filled up with dummy blobs and 8m of HotCodeCache using #27858

The results show that code cache fragmentation can significantly degrade performance. Introducing fragmentation increases execution time by ~5x and is accompanied by a large increase in branch mispredictions (~7x).

Run 3 utilizes [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205) which shows grouping hot code can significantly improve performance reducing the degradation from ~5x to ~1.5x.


# Run1 (No frag)
perf stat ./build/linux-aarch64-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+AlwaysPreTouch -XX:+PrintCodeCache -XX:-TieredCompilation -XX:ReservedCodeCacheSize=72m -XX:InitialCodeCacheSize=72m -XX:+SegmentedCodeCache -XX:+WhiteBoxAPI -XX:NonNMethodCodeHeapSize=8m -XX:ProfiledCodeHeapSize=0 -XX:NonProfiledCodeHeapSize=64m -jar renaissance-mit-0.16.0.jar -r 1190 dotty

...
====== dotty (scala) [default], iteration 1188 completed (746.699 ms) ======
====== dotty (scala) [default], iteration 1189 completed (748.863 ms) ======

 Performance counter stats:

        3031251.35 msec task-clock:u                     #    2.164 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec       
         199250260      page-faults:u                    #   65.732 K/sec  
     3224539067410      cycles:u                         #    1.064 GHz
     6062234799740      instructions:u                   #    1.88  insn per cycle
   <not supported>      branches:u                                         
       28253496347      branch-misses:u                    

    1401.045295428 seconds time elapsed

    1262.547605000 seconds user
    1769.354116000 seconds sys



# Run 2 (128m 0.5frag 0mHCH)
perf stat ./build/linux-aarch64-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+AlwaysPreTouch -XX:+PrintCodeCache -XX:-TieredCompilation -XX:ReservedCodeCacheSize=136m -XX:InitialCodeCacheSize=136m -XX:+SegmentedCodeCache -XX:+WhiteBoxAPI -XX:NonNMethodCodeHeapSize=8m -XX:ProfiledCodeHeapSize=0 -XX:NonProfiledCodeHeapSize=128m -XX:+WhiteBoxAPI -javaagent:codecachefragmenter.jar=FillPercentage=50.0,RandomSeed=42 -Xbootclasspath/a:codecachefragmenter.jar -jar renaissance-mit-0.16.0.jar -r 1190 dotty

====== dotty (scala) [default], iteration 1188 completed (4022.919 ms) ======
====== dotty (scala) [default], iteration 1189 completed (4038.655 ms) ======

 Performance counter stats:

        6321396.03 msec task-clock:u                     #    1.135 CPUs utilized
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec   
          70083590      page-faults:u                    #   11.087 K/sec
    14233471439404      cycles:u                         #    2.252 GHz        
    15400934891881      instructions:u                   #    1.08  insn per cycle
   <not supported>      branches:u                         
      200741000177      branch-misses:u                                         

    5568.255657415 seconds time elapsed

    5498.865476000 seconds user 
     824.319759000 seconds sys



# Run 3 (112m 0.5frag 8mHCH)
perf stat ./build/linux-aarch64-server-release/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+AlwaysPreTouch -XX:+PrintCodeCache -XX:-TieredCompilation -XX:ReservedCodeCacheSize=128m -XX:InitialCodeCacheSize=128m -XX:+SegmentedCodeCache -XX:+WhiteBoxAPI -XX:NonNMethodCodeHeapSize=8m -XX:ProfiledCodeHeapSize=0 -XX:NonProfiledCodeHeapSize=112m -XX:+HotCodeHeap -XX:HotCodeHeapSize=8m -XX:+WhiteBoxAPI -javaagent:codecachefragmenter.jar=FillPercentage=50.0,RandomSeed=42 -Xbootclasspath/a:codecachefragmenter.jar -jar renaissance-mit-0.16.0.jar -r 1190 dotty

====== dotty (scala) [default], iteration 1188 completed (1137.186 ms) ======
====== dotty (scala) [default], iteration 1189 completed (1134.037 ms) ======

 Performance counter stats:

        3936522.85 msec task-clock:u                     #    1.466 CPUs utilized
                 0      context-switches:u               #    0.000 /sec   
                 0      cpu-migrations:u                 #    0.000 /sec
         119342149      page-faults:u                    #   30.317 K/sec       
     7050404447349      cycles:u                         #    1.791 GHz    
     9984653084133      instructions:u                   #    1.42  insn per cycle
   <not supported>      branches:u                                             
       74832021136      branch-misses:u                                    

    2684.809939553 seconds time elapsed

    2735.116925000 seconds user 
    1203.549873000 seconds sys

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28934#issuecomment-3820151693


More information about the compiler-dev mailing list